Quantum Leap: Harnessing the power of AI at scale

Data drift: Identifying. Preventing. Automating.

Author: Snehotosh Banerjee | Principal Architect | AI @scale

Data drift is the unexpected changes in the data pattern, data structure, and semantics where the data fed into the model differs from the initial information. The past 24 months of disruption have given a new dimension to the data drift challenge. Businesses grappled with changing consumer behaviors, leading to changing data patterns that could disrupt complete processes. However, the question is, how can we architect for change, manage data drift and even harness its power to accelerate digital transformation for your business?

Observability permits teams to interpret and explain unexpected behavior and effectively and proactively manage data. Even though drift prevention may not be completely possible, it can be managed to a large extent.

At Fractal, we capture all types of drift but covariate shift is the most prevalent and widely used. This is mainly done at the feature store level to anticipate drift by comparing the distribution of representative data sets. With the right observability strategy, translates to higher reliability, improved consumer experience and scaled productivity.

Detecting data drift

There are multiple ways to detect data drift. One of the approaches is using statistical tests that compare the distribution of baseline data to the live or production data. If we see there is a significant difference between the two distributions, then a drift has occurred. Data drift detection can happen due to three broad things. There could be an observation gap, the freshness or relevance of data in the current time and the quality of data.

Freshness Check

OBSERVABILITY

With consumer behavior changing dynamically and rapidly, model performance degrades over time. Businesses must regularly check the freshness and data volume and monitor changes in the data schema. If there are any changes in the schema, it can lead to a potential data drift. There are few models that can last for a long time, without any update, like computer vision or language models. Model quality metric is the ultimate measure and it can be accuracy, mean error rate or even downstream business KPIs.

FRESHNESS

DATA QUALITY

Figure 1: Data drift detection

Observability

Data Quality

Data can be static or dynamic, however irrespective of their nature, it is subject to variations. What differs is the intensity of these variations. Businesses should start with data observability early to understand and spot these changes.

At certain times, data fed into the serving model may be skewed or there may be distribution changes compared to the training data. Hence, we can say that wrong data is a data quality issue. Data that is incomplete, incorrect or full of duplicates can lead to data drifts.

Made with FlippingBook - PDF hosting