WHITEPAPER Unraveling Fraud Networks
Harnessing graph-based techniques for robust and real-time fraud detection Fraud is nothing new, but today the urgency for a fail-safe fraud detection system is more critical than ever. This necessity springs from the paradox that technological advancement, while a boon for users, also empowers fraudsters. Fraud detection has traditionally been anchored on data mining and statistical analysis — tools sufficient for detecting comparatively simple fraud. But as fraudsters begin to weave more complex webs of deceit, these traditional methods are fast becoming outpaced and outdated. As a result, cutting-edge detection mechanisms and architectures have surfaced, bolstering companies' capabilities to spot fraud. This whitepaper delves into the innovative role of graph-based technologies, demonstrating their potential in real-time detection and accurate prediction of complex fraudulent activities, such as money laundering and other elaborate schemes.
Breaking away from tradition
As fraudsters become more sophisticated, traditional approaches become less effective. Current methods for fraud detection include:
1. Statistical analysis
2. Data mining
A traditional approach where statistical models scrutinize data, looking for unusual patterns or anomalies that could hint at fraudulent activity.
This method involves an exhaustive analysis of vast data sets to unearth patterns and connections that could signal fraud.
3. Rule-based systems
4. Pattern recognition
This approach operates by devising a set of predetermined rules or criteria, serving as a beacon to pinpoint potential fraudulent transactions.
By leveraging machine learning algorithms, this method identifies recurring patterns or data anomalies that might suggest fraud.
Innovative strategies like machine learning and graph-based technologies that combine traditional and advanced methods are needed to deter and prevent fraud effectively.
© 2023 Fractal Analytics Inc. All rights reserved
01
A fresh perspective on fraud detection
Existing techniques struggle to discern the intricate relationships between entities, often the key to spotlighting suspicious behavioral patterns. Graph-based algorithms have emerged as a compelling answer to this challenge. In this approach, transactions and customers are transformed into nodes and edges, enabling fraud detection algorithms to tap into the strength of relationship mapping to identify fraudulent activities.
Graphs underscore the relationships between entities, making it convenient for investigators to discover patterns that would remain camouflaged within conventional tables and help reduce the false positives that often plague traditional methods by offering an encompassing visualization of the network of connections. This method proves invaluable in unmasking fraud networks, where behaviors are interwoven rather than standalone.
Key metrics: The facets of graph algorithms
KEY METRICS
WHY KEY METRICS ARE CRUCIAL
USING KEY METRICS TO DETECT FRAUD
Community detection clusters nodes in a graph using modularity or spectral clustering methods based on attribute or connection similarities. This paves the way for detailed analysis of these clusters to spot potential fraudulent actors or activities.
Community detection plays a crucial role by pinpointing groups of nodes exhibiting similar properties or behaviors, potentially signaling fraudulent activity. Given that fraudsters often operate in cohorts or employ similar strategies, identifying these communities is instrumental in fraud prevention. Centrality analysis highlights influential nodes in a graph that could potentially signal fraudulent activities. If each node represents a criminal act, this analysis highlights the crime with the most involvement, offering a glimpse into its popularity.
Community Detection
Centrality analysis utilizes measures like PageRank or eigenvector centrality to pinpoint influential nodes within a graph. By harnessing these metrics, we can enhance our ability to identify potential perpetrators of fraudulent activity.
Centrality Analysis
© 2023 Fractal Analytics Inc. All rights reserved
02
KEY METRICS
USING KEY METRICS TO DETECT FRAUD
WHY KEY METRICS ARE CRUCIAL
PageRank gauges the significance of each node in a graph, assigning a score based on the quantity and quality of its interconnected links. It thoroughly evaluates incoming and outgoing connections to create a comprehensive link structure analysis.
PageRank scores nodes in a network, identifying anomalies based on their prominence. Nodes with high scores often have numerous inbound links from dubious sources, indicating potential fraudulent involvement. A thorough investigation of these nodes could significantly reduce fraud network risks. Graph-based clustering analysis groups similar or proximate nodes, which could signal fraudulent activity. As fraudsters typically operate in clusters or employ similar methods, detecting these groupings can prove beneficial in identifying fraud. The shortest path analysis uncovers hidden node relationships within the network. Fraudsters often employ indirect connections to elude detection. The shortest path algorithm can expose these hidden links, assisting investigators in identifying suspicious transactions.
Page Rank
Clustering coefficient analysis groups graph nodes based on attribute or connection similarities using hierarchical or k-means clustering techniques. The resulting clusters are cross-checked against a maintained list of fraudulent transactions for potential matches.
Clustering Coefficient
The shortest path algorithm traces the quickest route between two graph nodes, highlighting the minimum number of connecting edges. This tool proves valuable in fraud detection, unveiling suspicious transactions across multiple nodes, and potentially exposing indirect connections or intermediary involvement. Classification employs evidence from past cases to predict an entity's category, serving as a robust fraud prevention tool. The model harnesses graph-extracted features like node attributes, transaction specifics, and inter-node relationships. After training, it can classify new transactions or nodes as legitimate or suspect.
Shortest Path
Classification aids in the real-time identification of potential fraudsters and their activities. Automated fraud detection allows swift identification and flagging of dubious transactions or customers, mitigating financial risks and preserving an institution's reputation.
Classification
© 2023 Fractal Analytics Inc. All rights reserved
03
How to implement a graph-based algorithm
There are three key steps to implementing graph-based algorithms to detect fraudulent activities.
GRAPH ANALYTICS AND MACHINE LEARNING
MODEL EVALUATION
DATA COLLECTION AND PRE-PROCESSING
START
Apply community detection algorithms to group nodes into clusters based on their connectivity patterns
Train the model on labeled data and evaluate its performance
Gather data on the entities to be analyzed
Compare model performance with baseline and basic graph features models
Construct a graph from the data, with nodes representing entities and edges representing relationships between them
Extract graph features and use them as inputs for machine learning models
Evaluate the model’s accuracy and variable importance to assess the impact of the graph features
Extract graph features, such as node degree, clustering coefficient, and centrality measures
Monitor and update the model as needed to ensure ongoing effectivenes
End
© 2023 Fractal Analytics Inc. All rights reserved
04
A financial institution or credit card company. Case Study: Graph-based Techniques in Action The Client The Challenge
The deployment of a real-time fraud detection system capable of accurately identifying fraudulent transactions. This proactive approach enables the financial institution to initiate timely countermeasures, mitigating risks. The client is grappling with detecting and preventing fraudulent transactions within their credit card platform. Their goals are twofold — to curtail financial losses and to shield their customers from unauthorized charges.
The Proposed Solution
To illuminate complex transactional relationships, which are instrumental in detecting potential fraudulent behavior.
Fractal's Role
Our approach follows a logical progression from extracting the relevant data to evaluating the results.
Handling null values, handling categorical values, dropping off unnecessary features
Tabular Dataset to Graph Networks using networkx library
Finding edge weight distribution, node degree distribution, centralities etc.
Credit Card dataset
Data Extraction
Data pre-processing
Graph Network
Exploratory Data Analysis
Evaluating the performance of the model
Model training using classifiers & train - te st split using stratified k-fold
Handling categorical values using one hot encoding, standardizing the features
Evaluation
Model Building
Final Data Preparation
To accurately reflect the connections between customers and their transactions, Fractal creates graphs that are structured as follows:
Nodes: These represent the credit card number and merchant. Edges denote transactions between the credit card number and the merchant. Edge Weight: This signifies the transaction's magnitude or amount.
When graph features are incorporated into the model, they emerge as the most influential factors. In our case study, an increase in accuracy was noted, with the Area Under the Curve (AUC) metric rising from 0.72 to 0.76, reflecting an increase of nearly 6%.
© 2023 Fractal Analytics Inc. All rights reserved
05
Using intrinsic features
precision
recall
f1-score
support
0.0
0.66
0.76 0.60
0.71
1307 1258 2565 2565 2565
1.0
0.71
0.65 0.68 0.68 0.68
accuracy macro avg
0.69 0.69
0.68 0.68
weighted avg
Using graph features
precision
recall
f1-score
support
0.0
0.72 0.71
0.72 0.70
0.72 0.71 0.71 0.71 0.71
1307 1258 2565 2565 2565
1.0
accuracy macro avg
0.71 0.71
0.71 0.71
weighted avg
Top 20 Feature Importances
Top 20 Feature Importances
0.30 0.25 0.20 0.15 0.10 0.05 0.00
0.14 0.12 0.10 0.08 0.06 0.04 0.02 0.00
The key features mimic those of the model that lacks graph features; however, it's been enriched with graph metrics. Additionally, the relevance of each feature sees a significant boost compared to their counterparts in the base model. This suggests that including fewer graph features can augment the model's accuracy and insight, offering a greater yield than a wide range of features used in a model without graph elements
© 2023 Fractal Analytics Inc. All rights reserved
06
The key features mimic those of the model that lacks graph features; however, it's been enriched with graph metrics. Additionally, the relevance of each feature sees a significant boost compared to their counterparts in the base model. This suggests that including fewer graph features can augment the model's accuracy and insight, offering a greater yield than a wide range of features used in a model without graph elements
Three-Level Network Graph for Fraud Transactions
Customer: 4378993458389626
Customer: 3575540972310993 shopping_net
personal_care
shopping_pos
fraud_kerluke-Abshire
fraud_Con Greenholt, O’Hara and Balistreri fraud_Streich,
fraud_Volkman Ltd
fraud_Reichel LLC
misc_pos
fraud_jacobi and Sons
misc_fraud_Bemier_Volkman and Hoeger
fraud_halley Group
fraud_Hamill-Daugherty
shopping_net
3575540972310993
fraud_Miller-Harris
4378993458389626fraud_Labadie, Treutel and Bode
fraud_Herman, Treutel and Dickens
fraud_Huel-Langworth
misc_net
fraud_DuBuque LLC
fraud_Miller-Harris
fraud_Mosciski, Gislason and Mertz
fraud_DuBuque LLC
fraud_Luettgen PLC fraud_Raynon, Feest and Miller
fraud_Bins-Rice
fraud_Hickle Group
grocery_pos
Customer Credit Card Number Merchant Name Category
shopping_pos
grocery_pos
gas_transport
The tri-level network graphs mapping fraudulent transactions (illustrated above) provide a striking visualization of relationships between customers, fraudulent merchants, and corresponding categories for two customers. This graphical representation is instrumental in pinpointing patterns, deciphering connections, and distinguishing clusters within fraudulent transactions. The inclusion of graph features also enables us to analyze the correlation between the target and the graphical attributes. Integrating these graph features has enhanced precision and accuracy, outperforming the model that lacks graph features. The model's performance could be further amplified by leveraging the power of community analysis, suggesting promising avenues for future optimization. The final hurdles As with any emerging technology, several challenges must be addressed before graph-based fraud detection methods can be widely adopted. These include: • High computation time: The process of computing graph features can be time-consuming, mainly if the data set is large. • Data quality: Sparse data and missing information can introduce complexities in creating graphs or network features. • Graph network visualization: Plotting a graph network can be extremely challenging when dealing with a dense network or a large data set. • Domain expertise: A strong foundation in the subject matter is a crucial prerequisite for identifying network structure and determining relationships.
© 2023 Fractal Analytics Inc. All rights reserved
07
Conclusion Harnessing the potential of graph techniques can highlight underlying data and its relationships, providing critical insights into seemingly unconnected events in a given use case.
In the banking industry, where fraud incurs high costs, financial services firms using graph database techniques have reported millions of dollars in savings due to the increased accuracy when using graph techniques. The strength of this network approach enables stakeholders to pinpoint and address critical areas in the network, broadening the possibilities for graph analytics and other computational applications. To build this capability, substantial investment in infrastructure is required, alongside the development of unique customer identifiers that can be used across various systems. Multiple tools are available today for creating graph databases and graph features, which can be subsequently integrated into machine learning models to increase prediction accuracy.
Authors
Supriya Panigrahi
Sray Agarwal
Ashna Taneja
Consultant, Fractal Dimension
Consultant, Fractal Dimension
Principal Consultant, Fractal Dimension
© 2023 Fractal Analytics Inc. All rights reserved
08
About Fractal
Fractal is one of the most prominent providers of Artificial Intelligence to Fortune 500® companies. Fractal's vision is to power every human decision in the enterprise, and bring AI, engineering, and design to help the world's most admired companies. Fractal's businesses include Crux Intelligence (AI driven business intelligence), Eugenie.ai (AI for sustainability), Asper.ai (AI for revenue growth management) and Senseforth.ai (conversational AI for sales and customer service). Fractal incubated Qure.ai, a leading player in healthcare AI for detecting Tuberculosis and Lung cancer. Fractal currently has 4000+ employees across 16 global locations, including the United States, UK, Ukraine, India, Singapore, and Australia. Fractal has been recognized as 'Great Workplace' and 'India's Best Workplaces for Women' in the top 100 (large) category by The Great Place to Work® Institute; featured as a leader in Customer Analytics Service Providers Wave™ 2021, Computer Vision Consultancies Wave™ 2020 & Specialized Insights Service Providers Wave™ 2020 by Forrester Research Inc., a leader in Analytics & AI Services Specialists Peak Matrix 2022 by Everest Group and recognized as an 'Honorable Vendor' in 2022 Magic Quadrant™ for data & analytics by Gartner Inc. For more information, visit fractal.ai
Corporate Headquarters Suite 76J, One World Trade Center, New York, NY 10007
Get in touch
© 2023 Fractal Analytics Inc. All rights reserved
09
Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Page 7 Page 8 Page 9 Page 10Made with FlippingBook - PDF hosting