Data pipelines and topic modelling from feedback
A data platform overhaul and NLP-based topic modelling — repeatable insight from customer feedback.
Context and problem
At a large airline, data is mission-critical: classified, regulated, and part of running the core business. Changes had to be controlled, traceable, and compliant with security requirements. Customer feedback accumulated from several channels, but analysis was slow and the data pipelines were legacy in nature. Technical debt slowed new use cases, and feedback did not systematically surface trends and pain points. I helped the company develop a reliable data layer and built an analytics tool to support decision-making.
What was done
The company's data and data pipelines were mission-critical. I modernised and fixed the data infrastructure — rebuilding the Airflow pipelines, optimising the Snowflake and Elasticsearch integrations, moving IaC to AWS CDK, and CI/CD and Docker for releases. Python refactoring, monitoring, and network/security settings reduced operational friction.
The company's handling of customer feedback was mission-critical, and the volume of feedback was massive. I built a topic modelling solution for the customer feedback — an NLP pipeline for unstructured data, with LDA and BERTopic to identify trends and problem areas. Reporting and tools for the customer success team.
Key technologies: Python, Airflow, Snowflake, Elasticsearch, AWS CDK, Docker, LDA, BERTopic, Pandas, NumPy, AWS QuickSight.
Outcome
The data environment was more maintainable and cheaper to operate; the feedback yielded a repeatable view into topics and sentiment without manual review. First the data engineering foundation in order, then NLP modelling for the benefit of the business — two distinct roles in the same client context.
Image: Viento cruzado I — Jumbero, CC BY-SA 2.0.