Unraveling Fraud Networks

WHITEPAPER Unraveling Fraud Networks

Harnessing graph-based techniques for robust and real-time fraud detection Fraud is nothing new, but today the urgency for a fail-safe fraud detection system is more critical than ever. This necessity springs from the paradox that technological advancement, while a boon for users, also empowers fraudsters. Fraud detection has traditionally been anchored on data mining and statistical analysis — tools sufficient for detecting comparatively simple fraud. But as fraudsters begin to weave more complex webs of deceit, these traditional methods are fast becoming outpaced and outdated. As a result, cutting-edge detection mechanisms and architectures have surfaced, bolstering companies' capabilities to spot fraud. This whitepaper delves into the innovative role of graph-based technologies, demonstrating their potential in real-time detection and accurate prediction of complex fraudulent activities, such as money laundering and other elaborate schemes.

Breaking away from tradition

As fraudsters become more sophisticated, traditional approaches become less effective. Current methods for fraud detection include:

1. Statistical analysis

2. Data mining

A traditional approach where statistical models scrutinize data, looking for unusual patterns or anomalies that could hint at fraudulent activity.

This method involves an exhaustive analysis of vast data sets to unearth patterns and connections that could signal fraud.

3. Rule-based systems

4. Pattern recognition

This approach operates by devising a set of predetermined rules or criteria, serving as a beacon to pinpoint potential fraudulent transactions.

By leveraging machine learning algorithms, this method identifies recurring patterns or data anomalies that might suggest fraud.

Innovative strategies like machine learning and graph-based technologies that combine traditional and advanced methods are needed to deter and prevent fraud effectively.

© 2023 Fractal Analytics Inc. All rights reserved

01

A fresh perspective on fraud detection

Existing techniques struggle to discern the intricate relationships between entities, often the key to spotlighting suspicious behavioral patterns. Graph-based algorithms have emerged as a compelling answer to this challenge. In this approach, transactions and customers are transformed into nodes and edges, enabling fraud detection algorithms to tap into the strength of relationship mapping to identify fraudulent activities.

Graphs underscore the relationships between entities, making it convenient for investigators to discover patterns that would remain camouflaged within conventional tables and help reduce the false positives that often plague traditional methods by offering an encompassing visualization of the network of connections. This method proves invaluable in unmasking fraud networks, where behaviors are interwoven rather than standalone.

Key metrics: The facets of graph algorithms

KEY METRICS

WHY KEY METRICS ARE CRUCIAL

USING KEY METRICS TO DETECT FRAUD

Community detection clusters nodes in a graph using modularity or spectral clustering methods based on attribute or connection similarities. This paves the way for detailed analysis of these clusters to spot potential fraudulent actors or activities.

Community detection plays a crucial role by pinpointing groups of nodes exhibiting similar properties or behaviors, potentially signaling fraudulent activity. Given that fraudsters often operate in cohorts or employ similar strategies, identifying these communities is instrumental in fraud prevention. Centrality analysis highlights influential nodes in a graph that could potentially signal fraudulent activities. If each node represents a criminal act, this analysis highlights the crime with the most involvement, offering a glimpse into its popularity.

Community Detection

Centrality analysis utilizes measures like PageRank or eigenvector centrality to pinpoint influential nodes within a graph. By harnessing these metrics, we can enhance our ability to identify potential perpetrators of fraudulent activity.

Centrality Analysis

© 2023 Fractal Analytics Inc. All rights reserved

02

KEY METRICS

USING KEY METRICS TO DETECT FRAUD

WHY KEY METRICS ARE CRUCIAL

PageRank gauges the significance of each node in a graph, assigning a score based on the quantity and quality of its interconnected links. It thoroughly evaluates incoming and outgoing connections to create a comprehensive link structure analysis.

PageRank scores nodes in a network, identifying anomalies based on their prominence. Nodes with high scores often have numerous inbound links from dubious sources, indicating potential fraudulent involvement. A thorough investigation of these nodes could significantly reduce fraud network risks. Graph-based clustering analysis groups similar or proximate nodes, which could signal fraudulent activity. As fraudsters typically operate in clusters or employ similar methods, detecting these groupings can prove beneficial in identifying fraud. The shortest path analysis uncovers hidden node relationships within the network. Fraudsters often employ indirect connections to elude detection. The shortest path algorithm can expose these hidden links, assisting investigators in identifying suspicious transactions.

Page Rank

Clustering coefficient analysis groups graph nodes based on attribute or connection similarities using hierarchical or k-means clustering techniques. The resulting clusters are cross-checked against a maintained list of fraudulent transactions for potential matches.

Clustering Coefficient

The shortest path algorithm traces the quickest route between two graph nodes, highlighting the minimum number of connecting edges. This tool proves valuable in fraud detection, unveiling suspicious transactions across multiple nodes, and potentially exposing indirect connections or intermediary involvement. Classification employs evidence from past cases to predict an entity's category, serving as a robust fraud prevention tool. The model harnesses graph-extracted features like node attributes, transaction specifics, and inter-node relationships. After training, it can classify new transactions or nodes as legitimate or suspect.

Shortest Path

Classification aids in the real-time identification of potential fraudsters and their activities. Automated fraud detection allows swift identification and flagging of dubious transactions or customers, mitigating financial risks and preserving an institution's reputation.

Classification

© 2023 Fractal Analytics Inc. All rights reserved

03

How to implement a graph-based algorithm

There are three key steps to implementing graph-based algorithms to detect fraudulent activities.

GRAPH ANALYTICS AND MACHINE LEARNING

MODEL EVALUATION

DATA COLLECTION AND PRE-PROCESSING

START

Apply community detection algorithms to group nodes into clusters based on their connectivity patterns

Train the model on labeled data and evaluate its performance

Gather data on the entities to be analyzed

Compare model performance with baseline and basic graph features models

Construct a graph from the data, with nodes representing entities and edges representing relationships between them

Extract graph features and use them as inputs for machine learning models

Evaluate the model’s accuracy and variable importance to assess the impact of the graph features

Extract graph features, such as node degree, clustering coefficient, and centrality measures

Monitor and update the model as needed to ensure ongoing effectivenes

End

© 2023 Fractal Analytics Inc. All rights reserved

04

A financial institution or credit card company. Case Study: Graph-based Techniques in Action The Client The Challenge

The deployment of a real-time fraud detection system capable of accurately identifying fraudulent transactions. This proactive approach enables the financial institution to initiate timely countermeasures, mitigating risks. The client is grappling with detecting and preventing fraudulent transactions within their credit card platform. Their goals are twofold — to curtail financial losses and to shield their customers from unauthorized charges.

The Proposed Solution

To illuminate complex transactional relationships, which are instrumental in detecting potential fraudulent behavior.

Fractal's Role

Our approach follows a logical progression from extracting the relevant data to evaluating the results.

Handling null values, handling categorical values, dropping off unnecessary features

Tabular Dataset to Graph Networks using networkx library

Finding edge weight distribution, node degree distribution, centralities etc.

Credit Card dataset

Data Extraction

Data pre-processing

Graph Network

Exploratory Data Analysis

Evaluating the performance of the model

Model training using classifiers & train - te st split using stratified k-fold

Handling categorical values using one hot encoding, standardizing the features

Evaluation

Model Building

Final Data Preparation

To accurately reflect the connections between customers and their transactions, Fractal creates graphs that are structured as follows:

Nodes: These represent the credit card number and merchant. Edges denote transactions between the credit card number and the merchant. Edge Weight: This signifies the transaction's magnitude or amount.

When graph features are incorporated into the model, they emerge as the most influential factors. In our case study, an increase in accuracy was noted, with the Area Under the Curve (AUC) metric rising from 0.72 to 0.76, reflecting an increase of nearly 6%.

© 2023 Fractal Analytics Inc. All rights reserved

05

Using intrinsic features

precision

recall

f1-score

support

0.0

0.66

0.76 0.60

0.71

1307 1258 2565 2565 2565

1.0

0.71

0.65 0.68 0.68 0.68

accuracy macro avg

0.69 0.69

0.68 0.68

weighted avg

Using graph features

precision

recall

f1-score

support

0.0

0.72 0.71

0.72 0.70

0.72 0.71 0.71 0.71 0.71

1307 1258 2565 2565 2565

1.0

accuracy macro avg

0.71 0.71

0.71 0.71

weighted avg

Top 20 Feature Importances

Top 20 Feature Importances

0.30 0.25 0.20 0.15 0.10 0.05 0.00

0.14 0.12 0.10 0.08 0.06 0.04 0.02 0.00

The key features mimic those of the model that lacks graph features; however, it's been enriched with graph metrics. Additionally, the relevance of each feature sees a significant boost compared to their counterparts in the base model. This suggests that including fewer graph features can augment the model's accuracy and insight, offering a greater yield than a wide range of features used in a model without graph elements

© 2023 Fractal Analytics Inc. All rights reserved

06

The key features mimic those of the model that lacks graph features; however, it's been enriched with graph metrics. Additionally, the relevance of each feature sees a significant boost compared to their counterparts in the base model. This suggests that including fewer graph features can augment the model's accuracy and insight, offering a greater yield than a wide range of features used in a model without graph elements

Three-Level Network Graph for Fraud Transactions

Customer: 4378993458389626

Customer: 3575540972310993 shopping_net

personal_care

shopping_pos

fraud_kerluke-Abshire

fraud_Con Greenholt, O’Hara and Balistreri fraud_Streich,

fraud_Volkman Ltd

fraud_Reichel LLC

misc_pos

fraud_jacobi and Sons

misc_fraud_Bemier_Volkman and Hoeger

fraud_halley Group

fraud_Hamill-Daugherty

shopping_net

3575540972310993

fraud_Miller-Harris

4378993458389626fraud_Labadie, Treutel and Bode

fraud_Herman, Treutel and Dickens

fraud_Huel-Langworth

misc_net

fraud_DuBuque LLC

fraud_Miller-Harris

fraud_Mosciski, Gislason and Mertz

fraud_DuBuque LLC

fraud_Luettgen PLC fraud_Raynon, Feest and Miller

fraud_Bins-Rice

fraud_Hickle Group

grocery_pos

Customer Credit Card Number Merchant Name Category

shopping_pos

grocery_pos

gas_transport

The tri-level network graphs mapping fraudulent transactions (illustrated above) provide a striking visualization of relationships between customers, fraudulent merchants, and corresponding categories for two customers. This graphical representation is instrumental in pinpointing patterns, deciphering connections, and distinguishing clusters within fraudulent transactions. The inclusion of graph features also enables us to analyze the correlation between the target and the graphical attributes. Integrating these graph features has enhanced precision and accuracy, outperforming the model that lacks graph features. The model's performance could be further amplified by leveraging the power of community analysis, suggesting promising avenues for future optimization. The final hurdles As with any emerging technology, several challenges must be addressed before graph-based fraud detection methods can be widely adopted. These include: • High computation time: The process of computing graph features can be time-consuming, mainly if the data set is large. • Data quality: Sparse data and missing information can introduce complexities in creating graphs or network features. • Graph network visualization: Plotting a graph network can be extremely challenging when dealing with a dense network or a large data set. • Domain expertise: A strong foundation in the subject matter is a crucial prerequisite for identifying network structure and determining relationships.

© 2023 Fractal Analytics Inc. All rights reserved

07

Conclusion Harnessing the potential of graph techniques can highlight underlying data and its relationships, providing critical insights into seemingly unconnected events in a given use case.

In the banking industry, where fraud incurs high costs, financial services firms using graph database techniques have reported millions of dollars in savings due to the increased accuracy when using graph techniques. The strength of this network approach enables stakeholders to pinpoint and address critical areas in the network, broadening the possibilities for graph analytics and other computational applications. To build this capability, substantial investment in infrastructure is required, alongside the development of unique customer identifiers that can be used across various systems. Multiple tools are available today for creating graph databases and graph features, which can be subsequently integrated into machine learning models to increase prediction accuracy.

Authors

Supriya Panigrahi

Sray Agarwal

Ashna Taneja

Consultant, Fractal Dimension

Consultant, Fractal Dimension

Principal Consultant, Fractal Dimension

© 2023 Fractal Analytics Inc. All rights reserved

08

About Fractal

Fractal is one of the most prominent providers of Artificial Intelligence to Fortune 500® companies. Fractal's vision is to power every human decision in the enterprise, and bring AI, engineering, and design to help the world's most admired companies. Fractal's businesses include Crux Intelligence (AI driven business intelligence), Eugenie.ai (AI for sustainability), Asper.ai (AI for revenue growth management) and Senseforth.ai (conversational AI for sales and customer service). Fractal incubated Qure.ai, a leading player in healthcare AI for detecting Tuberculosis and Lung cancer. Fractal currently has 4000+ employees across 16 global locations, including the United States, UK, Ukraine, India, Singapore, and Australia. Fractal has been recognized as 'Great Workplace' and 'India's Best Workplaces for Women' in the top 100 (large) category by The Great Place to Work® Institute; featured as a leader in Customer Analytics Service Providers Wave™ 2021, Computer Vision Consultancies Wave™ 2020 & Specialized Insights Service Providers Wave™ 2020 by Forrester Research Inc., a leader in Analytics & AI Services Specialists Peak Matrix 2022 by Everest Group and recognized as an 'Honorable Vendor' in 2022 Magic Quadrant™ for data & analytics by Gartner Inc. For more information, visit fractal.ai

Corporate Headquarters Suite 76J, One World Trade Center, New York, NY 10007

Get in touch

© 2023 Fractal Analytics Inc. All rights reserved

09

Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Page 7 Page 8 Page 9 Page 10

Made with FlippingBook - PDF hosting