Delta Lake in New York.

Databricks platform for data and ML work

A production-grade Databricks platform — data pipelines, ML workflows, and cloud infrastructure as a single whole.

Context and problem

At an electric-charging equipment manufacturer, data and ML work was done across several tools without a shared platform. A growing business called for scalable pipelines, repeatable infrastructure, and a clear way to bring together data engineering, analytics, and machine learning. They needed a platform that could be adopted broadly — not just a single project, but the company's way of doing data and ML work.

What was done

I led the platform's development as architect and hands-on developer. I built a Databricks-based environment: PySpark and Pandas pipelines, scikit-learn and SciPy workflows, AWS services (Lambda, CloudFormation), Terraform, and DevOps practices. A significant part of the work was also Elasticsearch and DynamoDB migrations and integrations.

I was responsible for the technical direction, the implementation, and making sure stakeholders could move onto the platform. Delivery March–August 2023.

Key technologies: Databricks, Spark, Python, Terraform, AWS, DevOps, scikit-learn.

Outcome

Databricks became the central platform for data and ML work. The pipelines and model work were production-ready and repeatable, and every team no longer built its own fragmented solution. The solution showed that in a growth company a data platform can be put into use both technically and at the organisational level on a short timeline.

← Back to assignments

Image: Delta Lake, NY — Ducio1234, CC BY-SA 3.0.

social