Data-Centric AI: A Must For Today’s Marketing Leaders

As people discuss artificial intelligence, the machine-learning community is focusing on an important topic. This topic is highly relevant to marketing leaders: the shift towards “data-centric AI.”


Data scientists, programmers, and engineers used to be the only ones driving AI advancements. Now, marketing leaders must actively engage in data-centric AI. 

In the digital age, marketers can benefit from using data to make decisions. They can also personalize customer experiences and improve customer engagement. Additionally, businesses can enhance predictive analytics, automate repetitive tasks, make better predictions, and enjoy other advantages.  

Andrew Ng: Data-Centric AI 

Andrew Ng gained recognition for founding Google Brain Research Lab and Coursera. At a recent MIT conference, this AI pioneer discussed the importance of “data-centric AI.” He defined it as “the discipline of systematically engineering the data needed to build a successful AI system.” 

Ng highlighted a pressing challenge. Many companies in different industries, like biotech and manufacturing, can’t fully use AI because of problems with their data. These datasets lack proper labeling, organization, and consistency—essential prerequisites for effective machine learning (ML) systems. 

Isn’t All AI Data-Centric? 

This simple answer to this common question is “no.” Many AI projects focus on creating the best model for a dataset using a “model-centric” approach. In contrast, data-centric AI prioritizes the creation of the highest-quality dataset to train an ML model effectively. 

Historically, AI has been mostly model-centric, emphasizing aspects like model architecture and hyperparameter tuning. Practitioners and researchers often considered the data fueling these models as static ground truth, beyond their control. 

Preparing Data For AI Success 

Ng and other experts suggest changing our mindset and focusing on the enhancement and preparation of datasets within companies. This will enable us to maximize their potential in AI. To make this shift, successful organizations must collaborate with data scientists who work closely with experts in the field. 

Data Labeling, Augmentation And Distribution Drift 

In a data-centric AI approach, the focus shifts to data quality, data augmentation, and data deployment. This approach may require multiple attempts to find and correct labeling errors, fill in missing information, and select the best examples for training the model. 

Data Labeling 

Data labeling is adding tags or labels to data to make it understandable and useful for machine learning. Accurate labeling is crucial for optimization. Inconsistent or unclear labeling can confuse AI systems, resulting in higher costs and less effective AI deployment. 

Data Augmentation 

Data augmentation is a technique used in machine learning and data science. It helps generate additional training data by making slight changes to the existing data. It helps improve the performance and robustness of machine learning models. 

Data augmentation is helpful for creating additional training data. However, it can also result in errors, which makes it costly and less valuable. So, it’s vital to use data augmentation strategically to improve data quality and make AI models more accurate.  

Distribution Drift 

Another challenge for marketing leaders using data-centric AI is understanding distribution drift. It happens when the data used to teach a machine learning model doesn’t match the patterns in the new data. This is typically because of changes in consumer behavior or external factors. As the historical datasets become outdated, they yield inaccurate results.  

For example, to predict customer churn, a marketing leader must consider biases and limitations in past churn data. 

  • Does the time period truly represent a typical business cycle, or were there one-time events skewing the data? 
  • How is “churn” defined and measured?  
  • Does churn depend on changes in product usage, cancellations, or declines in repeat purchases?

Understanding these factors is the initial step toward determining the data to include in a model. To fix distribution drift, check how well the ML model works and adjust it as conditions change. This avoids expensive mistakes and keeps people trusting AI systems. 

Placing High-Quality Data at the Core of Marketing Leadership 

Using a data-centric approach to AI is the best strategy for overcoming these challenges. It also promotes cross-team collaboration within organizations. 

Marketing leaders can collaborate with data engineering, data science, and machine learning experts. This ensures that we train models using the appropriate data. Furthermore, collaboration encourages the development of comprehensive data governance strategies, ensuring the quality of collected and used data. 

Marketing leaders can use AI to gain competitive advantages by effectively managing data. Data-centric AI improves how businesses utilize their data. It enables the creation of AI and data-driven solutions that are more accurate, reliable, and cost-effective. 

Marketing leaders can make AI solutions a reality by focusing on data quality and using data-centric AI methods. They can also take advantage of the many opportunities in the rapidly changing AI landscape. 


About The Author 

Lilith Bat-Leah, Vice President of Data Services at Mod Op, is responsible for strategic consulting on use cases for data analytics, data science, and machine learning. She has more than 11 years of experience managing, delivering, and consulting on the identification, preservation, collection, processing, review, analysis, and production of digital data. Lilith also has experience in research and development of machine learning software. She specializes in the application of statistics, analytics, machine learning, and data science to natural language and other unstructured data. She is passionate about making impossible things possible and is driven by curiosity.