The Gen AI Frontier

Our Experimental Results Synthesized data must accurately represent actual data to produce relevant insights. We discovered that real data sample sizes as small as 10,000 samples are enough to train our architecture to generate synthetic data sets of up to a million unique, accurate records. Currently, there is no universally accepted single metric to measure accuracy, so we have created a custom metric based on Mutual information data matrices calculated on real and synthetic data. We followed it up with MAE calculation on these matrices to arrive at a single 0 to 1 score, closer to 0, indicating synthetic data is remarkably similar to real data. Along with this custom metric, we also use qualitative metrics such as PCA plots, TSE plots, correlations, and distribution comparisons to see univariate distributions of features in the real data and check that the counterparts in the synthetic data were similar. In both phase 1 and phase 2 iterations, our model outperformed traditional GANs.

Data set: Churn data set- bank attrition data with information about the customer demographics and transaction amounts at a customer level.

Samples: 10,000 samples

Features: MAE values for 20 features across various experiments show improvement in performance from left to right:

Univariate comparison plots:

Customer_Age Distribution Plot Comparison

1.0

1.0

1.0

1.0

Original VGAN

Original WGAN

Original CTGAN

Original Data Generator

0.8

0.8

0.8

0.8

0.6

0.6

0.6

0.6

0.4

0.4

0.4

0.4

0.2

0.2

0.2

0.2

0.0

0.0

0.0

0.0

20

20

40

60

20

40

60

20

40

60

40

60

Customer_Age

Customer_Age

Customer_Age

Customer_Age

Months_on_book Distribution Plot Comparison

1.0

1.0

1.0

1.0

Original VGAN

Original WGAN

Original CTGAN

Original Data Generator

0.8

0.8

0.8

0.8

0.6

0.6

0.6

0.6

0.4

0.4

0.4

0.4

0.2

0.2

0.2

0.2

0.0

0.0

0.0

0.0

10

10

20

30

40

50

60

10

20

30

40

50

60

10

20

30

40

50

60

20

30

40

50

60

Months_on_book

Months_on_book

Months_on_book

Months_on_book

Attrition_Flag Comparison across Models

Dependent_count Comparison across Models

10000

10000

Original VGAN

Original VGAN

WGAN CTGAN Data Generator

WGAN CTGAN Data Generator

8000

8000

6000

6000

4000

4000

2000

2000

0

0

1

2

3

4

5

Attrited Customer

Existing Customer

8

© 2023 Fractal Analytics Inc. All rights reserved

Made with FlippingBook - PDF hosting