Data Engineering Internship at Booking.com
The Financial Planning and Analytics team in Booking.com mainly works on data engineer and data analyst oriented tasks starting with processing raw booking ID level data.
As an intern at Booking.com, you can expect to work on various tasks:
- platform development: involves building and maintaining the big data platform with infrastructure tooling such as Snowflake for data warehousing, Dagster for data orchestration and monitoring. We are handling both event (e.g., Apache Kafka) and batch based data ingestion (> 1TB daily data volume) of transaction level data points.
- build the infrastructure and design the solution to enable self-service reporting. Currently, the team is exploring tooling such as Cube and is in the process of determining how to design and manage the semantic layer: https://cube.dev/ .
- data product build and management: modernize the data product by using data vault modelling to avoid repetitive codes and any deviation in business logic definition and improving computing efficiency. In this work stream, we will need to design the data model to handle real issues such as back-fill.
- Outlier identification in daily business performance, such as outlier in business metric of room nights and the root cause explainable for the outliers, such as fraudulent, seasonality or holiday impact.
- GenAI based chatbot to provide accurate metric information and insights (i.e., interpretation of the metric) to business users. We are using Snowflake cortex analyst feature but not limited to it. Note that to be able to have a Chatbot provide correct numeric insights, a good data model will be needed as a foundation. This means the student will have chance to understand the data model and suggest what the design of the tables is supposed to be. We really try to avoid the situation have the garbage-in and garbage-out chatbot caused by the dirty data warehouse. For our end user, the chatbot will first need to be trained to understand the question and return with correct fact/number based answer. This is a different requirement from those use cases involving asking some open questions to the bot with no right/wrong answer.
- We also have a project related to machine-learning with a focus on financial forecasting.
We have good autonomy in defining the project and the all the aforementioned topics are company/department priority. This means there will be full exposure to business stakeholders to collect and refine requirements and iterate the solution based on feedback.
Contact: Xiaowen Lu and Tom Heskes