Machine learning on a production platform
Multi-year ML work on a production service — inference, embeddings, NER, and platform refactoring (GCP, Kubernetes, TensorFlow).
Context and problem
At an international conversational AI product, the inference and training load grew with the customer base. The ML layer had to stay production-grade: measurable performance, maintainable code, and features that ship — not separate research prototypes. They needed a machine learning engineer who builds, refactors, and scales the same platform over the long term.
What was done
I was responsible for the production ML layer in roughly 2018–2021. I optimised the inference architecture and code: the TensorFlow, NumPy, and model workflows became lighter, latency and throughput improved, and GCP and Kubernetes carried the load. Embeddings were split out into their own microservice so that memory use and scaling were decoupled from the core application. Autoscaling was implemented for the training infrastructure to use resources more efficiently. I led the removal of a legacy database lookup from the inference path and guided stakeholders through the change. The bottleneck blocking scaling was removed.
I refactored the platform over the long term: I carried out the TensorFlow 1 → 2 migration, built a vectorisation-agnostic architecture (different vectorisation sources without tearing apart the core network), and design patterns that kept the code maintainable as it grew.
I brought ML features to production: a NER/PII component with a unified interface (regex, datasets, models; person-name detection included), statistical anomaly detection, language detection for multilingual use, and confusion matrix analytics for the customer-facing view. MongoDB and data-layer performance was improved alongside the inference path.
Key technologies: Python, TensorFlow, NumPy, Pandas, GCP, Kubernetes, Docker, MongoDB, MLOps, NLP/NER.
Outcome
The platform's ML layer scaled better, costs and response times improved measurably, and new features reached production repeatably. The work showed that the machine learning engineer's role in this context is a combination of models and inference together with architecture and production pipelines — one engagement, several deliveries, one whole.
Image: Martin Thoma, CC0 1.0.