AI-supported sales forecasting in retail: systematically reducing deviations from plan

Written by Michael Damatov | Apr 24, 2026 1:38:16 PM

Every retail chain is constantly faced with the same question: which items should be delivered to which store and in what quantity? The problem is not new - it has been solved in one form or another for decades, even with the awareness that absolutely precise planning is impossible. The goal is therefore not perfect forecasting, but the systematic reduction of deviations from the plan.

A quick note in advance: when we talk about "AI" in this article, we are not talking about generative models such as chatbots or image generators, but about classic, predictive machine learning - specifically demand forecasting at store and item level.

The initial situation: why precise store planning is so difficult in retail

Sales volumes fluctuate depending on the season, location, weather, promotions, public holidays, events - and of course the product range itself. Manual planning quickly reaches its limits here because too many variables affect sales at the same time. Traditional planning rules ("last year plus 5%") do not capture this complexity. This is exactly where machine learning comes in.

The solution: supervised machine learning for sales forecasting

With modern methods, this becomes a typical problem that can be solved with supervised machine learning. An AI can be trained using historical data. We used the sales figures from the last three years - enough data to map seasonal fluctuations without older data providing any significant additional benefits.

The sales volumes became our labels - they represent the answers to the question: "How many individual items were actually sold?"

The features: Which factors really explain sales

What factors lead to these answers? Our experiments showed which variables (known as features) had a positive influence on the quality of the forecast. An excerpt from the final feature set:

  • Item data such as price or packaging size
  • Location-based data such as zip code, city and regional affiliation
  • Demographic data such as population density and average age of the respective region
  • Time-related data such as time of day, day of the week, weekday, month, proximity to public holidays and proximity to major events (e.g. a soccer World Cup)

We have deliberately excluded some factors. One example: although the weather had a recognizably positive influence on sales volumes, we did not include it at the time - according to the state of the art at the time, weather forecasts could only be projected very roughly into the future.

Training and validation: how the model learns - and how we measure it

The next step is to train the AI. The data is first divided into five groups. Four groups (approx. 80% of the data) are used for training; the fifth group is used to check how much the calculated labels deviate from the actual labels. This process is repeated four times (5-fold cross-validation), each time with different combinations of training and validation data. At the end, we obtain metrics that show how accurate the AI model actually is.

The data pipeline: From raw data to store prediction

From a technical perspective, we have set up an end-to-end data pipeline. Fully automated:

  • Collecting the data - from multiple sources, complying with all security criteria, into an analytics platform on Microsoft Azure.
  • Generation of features - the raw data is prepared for training.
  • Creation of labels - also as part of the training preparation.
  • Inference - the predicted sales volumes are calculated using the AI model.
  • Provision - the results are written to internal databases, where downstream systems pick them up and use them further.

Automated, but not unmonitored: MLOps in practice

Although some processes can be automated, they should not run unsupervised. Today, this falls under the term MLOps and includes model monitoring, drift detection and controlled retraining:

  • Training - every model version is unique. We use targeted metrics to ensure that the model does not deteriorate with each retraining. If it does happen, the cause must be documented and explained.
  • Hyperparameter tuning - previously laborious and manual, now largely automated using tools.

Experimenting instead of guessing: What really makes the model better

Experiments are a central component of our approach. We use them to systematically test hypotheses such as:

  • whether an additional feature makes a measurable contribution
  • whether an existing feature is still "good enough"
  • whether training configurations can be improved
  • and many more

Not every experiment leads to an improvement in the model. On the contrary: we have often found that a supposedly "good" idea had no positive effect in the test. This is precisely the value of disciplined experimentation - instead of gut feelings, there are measurable results.

Challenges: Data quality, integration and domain knowledge

As is so often the case in IT, the existing customer data was not collected for the purpose for which we later used it. In our case, this meant that the sales data still had to be adapted and transformed. The data quality was not perfect - in some cases data could be corrected, in others we had to deliberately omit individual data points.

Added to this was the integration of external data sources: Demographic data, for example, does not come from the customer databases and has to be merged with the internal data.

The client's domain expertise was invaluable. Customer-specific aspects were discussed on a regular basis, including

  • Interpretation of data - not all data is created equal. They differ in terms of importance, reliability, completeness and accuracy.
  • Business processes - who collected the data, when, for what original purpose?
  • Thresholds - sensible limits and tolerances from the customer's perspective.
  • Assumptions - critically questioning our own hypotheses.

Conclusion: Smaller deviations from plan, greater impact

The AI model is not perfect - but it has significantly reduced deviations from the plan compared to previous planning. Just as importantly, it has created a basis on which further experiments can be built. At prodot, we also consider AI governance from the outset. This regulates traceability, documentation and responsibilities in such projects.

Experience has shown that three factors are crucial for retail chains considering their own AI-supported sales forecasting: clean data, close cooperation with specialist departments and a willingness to experiment continuously. Everything else is a craft.

Would you like to know what an AI-supported sales forecast could look like for your retail chain? Get in touch with us.