Our Experimental Results Synthesized data must accurately represent actual data to produce relevant insights. We discovered that real data sample sizes as small as 10,000 samples are enough to train our architecture to generate synthetic data sets of up to a million unique, accurate records. Currently, there is no universally accepted single metric to measure accuracy, so we have created a custom metric based on Mutual information data matrices calculated on real and synthetic data. We followed it up with MAE calculation on these matrices to arrive at a single 0 to 1 score, closer to 0, indicating synthetic data is remarkably similar to real data. Along with this custom metric, we also use qualitative metrics such as PCA plots, TSE plots, correlations, and distribution comparisons to see univariate distributions of features in the real data and check that the counterparts in the synthetic data were similar. In both phase 1 and phase 2 iterations, our model outperformed traditional GANs.
Data set: Churn data set- bank attrition data with information about the customer demographics and transaction amounts at a customer level.
Samples: 10,000 samples
Features: MAE values for 20 features across various experiments show improvement in performance from left to right:
Univariate comparison plots:
Customer_Age Distribution Plot Comparison
1.0
1.0
1.0
1.0
Original VGAN
Original WGAN
Original CTGAN
Original Data Generator
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
20
20
40
60
20
40
60
20
40
60
40
60
Customer_Age
Customer_Age
Customer_Age
Customer_Age
Months_on_book Distribution Plot Comparison
1.0
1.0
1.0
1.0
Original VGAN
Original WGAN
Original CTGAN
Original Data Generator
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
10
10
20
30
40
50
60
10
20
30
40
50
60
10
20
30
40
50
60
20
30
40
50
60
Months_on_book
Months_on_book
Months_on_book
Months_on_book
Attrition_Flag Comparison across Models
Dependent_count Comparison across Models
10000
10000
Original VGAN
Original VGAN
WGAN CTGAN Data Generator
WGAN CTGAN Data Generator
8000
8000
6000
6000
4000
4000
2000
2000
0
0
1
2
3
4
5
Attrited Customer
Existing Customer
8
© 2023 Fractal Analytics Inc. All rights reserved
Made with FlippingBook - PDF hosting