The main concern of data quality is if the data is ‘right’ or ‘wrong.’ A few common sources of data quality issues include: Incorrect data entry Quality control failed to remove data quality issues Duplicate data records created Data is not used or interpreted correctly All available data about an object was not integrated Data is too old to be useful in the current context It is critical to have quality assurance for features from both ethical aspects and from anticipating the drift probability.
may not be possible because the more flexibility we try to bring in through automation, the more complex the engineering can get. It is important to understand how much flexibility desirable vis-a-vis the system’s complexity is. Hence, understanding the requirement very well is the first step. It will help us estimate how much to go ahead with. So, there must be a balance between flexibility and complexity. Even if businesses achieve 70 to 80% automation, that would reduce repetitive work, and ensure data quality and data monitoring. To drive results with data drift automation, we need these five pillars of AI engineering or machine learning operations in place.
Challenges faced
Pillars of AI and engineering
The industry has numerous tools, and a business has different teams. These teams use different tools to handle each component, making it difficult to keep tight integration among the data. Moreover, a hundred tools are doing the same work, and there is no clear benchmarking yet. Organizations tend to choose tools by not understanding the actual requirement. This makes things complicated.
Solid Data curation layer (Data lake, Data warehouse) Feature Store and Feature QA Model training, Management CI/CD Monitoring If we overcomplicate the ecosystem, it becomes difficult to manage. The art is to keep things simple and still deliver value.
How automation can help
The automation journey has already started. However, adoption is where businesses are facing challenges, both from technical and operational sides. Cloud providers are moving towards automation. However, 100% automation
How we can help At Fractal, a lot of importance is given to monitoring and observability. It is not only limited to the model but the entire machine learning lifecycle. Currently, we are implementing a monitoring ecosystem in two projects and have created a complete E2E ecosystem of ML lifecycle and monitoring as part of the internal initiative of the CoE.
We used all open-source tech stacks and entirely on Kubernetes. Our goal is to move towards CNCF and use all SOTA tools with good community support.
11
© 2023 Fractal Analytics Inc. All rights reserved
Made with FlippingBook - PDF hosting