A Mixed-Methods Study Integrating Model Performance with Analyst Decision Workflows in Trustworthy AI for Financial Fraud Detection

Istiaq Ahmed; Md. Hasan Or Rashid

doi:10.63125/xdmkbj34

Authors

Istiaq Ahmed M.S., Information Technology - Southern New Hampshire University (SNHU), New Hampshire, USA Author
Md. Hasan Or Rashid Master of Science in Business Analytics, East Texas A&M University, Texas, USA Author

DOI:

https://doi.org/10.63125/xdmkbj34

Keywords:

Trustworthy AI, Financial Fraud Detection, Analyst Decision Workflows, Model Performance, Explainable Artificial Intelligence

Abstract

This study examined the relationship between artificial intelligence model performance and analyst decision workflows in trustworthy financial fraud detection environments. The study was motivated by the need to move beyond purely model-centered evaluation and assess whether fraud detection systems that perform well statistically also support effective, consistent, and operationally credible analyst decision-making. A quantitative, cross-sectional explanatory design was adopted to investigate how model precision, recall, false positive rate, explanation quality, and perceived reliability were associated with key workflow outcomes, including alert acceptance, review efficiency, escalation quality, and decision consistency. Data were collected from 268 initial responses and matched workflow records drawn from fraud analysts, senior investigators, fraud operations supervisors, and AI-supported risk review personnel working across commercial banks, digital payment providers, insurance companies, and fintech lending platforms. After data screening and exclusion of incomplete or invalid cases, 240 valid cases were retained for final analysis, yielding a usable response rate of 89.6%. The findings showed that model performance and trustworthy AI characteristics were significantly associated with analyst workflow outcomes. Correlation analysis indicated that model precision had a strong positive relationship with alert acceptance rate (r = .68, p < .001), while model recall was strongly associated with escalation quality (r = .65, p < .001). False positive rate had a strong negative relationship with review efficiency (r = -.62, p < .001). Regression analysis further showed that model precision significantly predicted alert acceptance (beta = .36, p < .001), model recall significantly predicted escalation quality (beta = .43, p < .001), and perceived reliability significantly predicted decision consistency (beta = .35, p < .001). The regression models explained between 48% and 61% of the variance in the major workflow outcomes. Subgroup analysis also showed that highly explainable systems produced stronger workflow stability, with a large effect size (d = 1.28), while high false positive environments were associated with weaker workflow stability (d = 1.34). The study concluded that trustworthy AI in financial fraud detection should be evaluated through the combined lens of predictive accuracy, interpretability, reliability, and workflow usability because operational effectiveness depends on both model quality and human decision integration.