AI-Driven Product Matching for Heureka Group

Handling dozens of millions of offers daily

The solution matches offers in real time

Precision exceeding 98%

The system achieves a high success rate in accurately matching offers

Matching over 30% of offers

The solution automates the matching of offers that would otherwise require manual processing

"Thanks to the DataSentics AI strike team we were able to automate the matching of millions of offers coming from tens of thousands of e-shops and therefore save significant time spent previously on manual matching"

Ondrej Walter

Product Head of Content Tribe

About the Client

Heureka Group is the largest price comparator and shopping advisor in Europe. Present in 9 countries in Central and Eastern Europe with over 23 million visitors per month and a network of over 55,000 online stores.

The Challenge

Heureka Group, a product aggregator, aims to enhance the shopping experience by efficiently matching e-shop offers with products in their catalog. This process ensures that shoppers find the right products within a vast online store network, thereby improving the platform's search and comparison capabilities. Previously, this matching was manually done, a time-consuming task dependent on a large workforce and limited automation. The existing automation was able to only match offers to products based on few rules, like identical name or ISBN code. The challenge was to develop a more efficient method for this critical process, using AI.

The Solution

The AI strike team, composed of both the client's and our machine learning engineers, aimed to develop and implement a robust machine-learning solution. DataSentics' team collaborated closely with the internal team from the outset, focusing on understanding the business problem, data, and processes. This collaboration extended to regular interactions for in-depth data understanding, resolving technical challenges, and solution validation. The internal product owner played a key role in aligning business impacts and maintaining connections across departments. Together, they developed a near-real-time, multi-staged architecture for product matching.

Technical Details

The model is served using FastAPI, while using MLflow registry.
All parts of the solution are running on on-premises infrastructure, orchestrated by Kubernetes.
The training and deployment are automated using GitLab CICD pipelines.
The results are monitored using Prometheus and Grafana.

The Decesion-Making Process

Elasticsearch is used to select several candidate products based on name only.
An XGBoost model is used to compare the offer with each of the candidates. The model uses a set of features including several name similarity measures, comparison of price, and various attributes like weight, size or color.
The resulting decisions for each pair offer-candidate are gathered, additional business rules might be applied, and the final decision is made if the corresponding product was identified unambiguously.

The Benefits

The developed solution operates in real-time, processing tens of millions of offers daily. It matches offers in all categories (except for Fashion) with a precision exceeding 98%. This machine learning solution is responsible for matching over 30% of offers that would otherwise require manual matching. As the solution reduces the workload for content editors, they can now focus more on tasks with higher added value and engage in more creative activities.