Synthetic data generation approaches are evolving by rapidly incorporating the latest ideas, such as few-shot learning, simulations, generative adversarial networks, variational autoencoders and transformers. However, the availability of synthetic data is uneven across industries and types of synthetic data. Generation of different kinds of data — images, text, video, audio, tabular data — requires separate tools. Those tools have only just appeared on the market and are evolving. Work is underway to include support for synthetic data in some ML platforms. By 2024, synthetic data generation capability will be expected in most ML and analytics platforms.
Mass: Medium
The mass is medium because synthetic data is a good approach for the next generation of AI innovation, which is data-driven AI. Innovation in algorithms is widespread, but data is often patchy, incomplete or biased, and synthetic data can be used to balance datasets. The next leap in AI efficacy cannot be achieved without injecting domain knowledge into training datasets using synthetic data techniques. Tabular synthetic data is designed to address key shortcomings in real datasets, which are typically incomplete, imbalanced and not fully representative of the business domain, and privacy restricted. Tabular synthetic data also offers an ideal approach to provide the necessary information for data monetization and sharing within legal and ethical parameters. A range of insurance use cases exist, including eliminating bias from existing trained models, identifying new fraud patterns and retraining existing models to improve performance. Synthetic data has the potential to replace outdated efforts such as data masking or randomization for insurers. There is considerable benefit to train AI models for personal P&C lines or life insurance where privacy concerns from using PII may prohibit personal data being used in areas such as pricing — limiting model training and accuracy for insurers and for vendors targeting the insurance vertical, as well. The importance of PII data in insurance means that production data is frequently subject to legislation, and the ability to synthesize data can greatly extend capabilities for software testing.
Recommended Actions:
Reduce sales cycles for tabular synthetic data services and software by building marketing materials that describe real-world, use-case examples and customer references. ■ Demonstrate how your system overcomes privacy restrictions such as HIPAA and CCPA, which impact insurers as they seek to drive pricing and rating differentiation. ■
Gartner, Inc. | G00786204
Page 27 of 48
This research note is restricted to the personal use of abhishek.sharma@fractal.ai.
Made with FlippingBook - PDF hosting