Databricks platform for data and ML work
A production-grade Databricks platform — data pipelines, ML workflows, and cloud infrastructure as a single whole.
Context and problem
At an electric-charging equipment manufacturer, data and ML work was done across several tools without a shared platform. A growing business called for scalable pipelines, repeatable infrastructure, and a clear way to bring together data engineering, analytics, and machine learning. They needed a platform that could be adopted broadly — not just a single project, but the company's way of doing data and ML work.
What was done
I led the platform's development as architect and hands-on developer. I built a Databricks-based environment: PySpark and Pandas pipelines, scikit-learn and SciPy workflows, AWS services (Lambda, CloudFormation), Terraform, and DevOps practices. A significant part of the work was also Elasticsearch and DynamoDB migrations and integrations.
I was responsible for the technical direction, the implementation, and making sure stakeholders could move onto the platform. Delivery March–August 2023.
Key technologies: Databricks, Spark, Python, Terraform, AWS, DevOps, scikit-learn.
Outcome
Databricks became the central platform for data and ML work. The pipelines and model work were production-ready and repeatable, and every team no longer built its own fragmented solution. The solution showed that in a growth company a data platform can be put into use both technically and at the organisational level on a short timeline.
Image: Delta Lake, NY — Ducio1234, CC BY-SA 3.0.