Data Engineering at Dolead

Explore the intricacies of data engineering dolead tackles. Delve into the tools we use such as Google Cloud, BigQuery, and DBT to solve data challenges.

5 minutes

July 8, 2024

Nahid Oulmi

Share this article

Want to join Dolead?

Hello, and welcome to the data engineering podcast.

My name is Nahid, and I am a data engineer at Dolead. In the podcast below I will talk about the data environment at Dolead.

Today I will talk about the data environment at Dolead.

First, I will introduce you to the challenges we have at Dolead :

Data reporting
Data dashboarding
Data exploration
Data intelligence

Then, I will talk about the technical stack we chose to tackle these challenges :

Google Cloud
BigQuery
DBT
Airbyte
Airflow

And finally, I will explain the different challenges we are currently working on.

Data environment

I will start with our data environment and the motivation behind our data-driven approach.

Dolead is a digital advertising company and as such, we receive hundreds of Gigabytes everyday from different sources from all units in the company.

These sources include but are not limited to :

Social network data that we synchronize from Meta, Google, Tiktok or Bing
Ad Network data from Outbrain or Taboola
User behavior tracking on our Landing Pages
Online forms and surveys
Advertiser data such as the number of sales stemming from our ads

All for these sources creates a complex data environment that I have to synthesize, organize in order for our users and stakeholders to make the most out of it in a self service mode.

Data challenges and usecases

Data dashboarding is the act of providing tailored data dashboards and vizualisation for day to day monitoring. Data quality, data freshness & data transformations are important. Data freshness is usually at the day level, or even at the hour level for some data
Data reporting is about periodic data exports and reports for meetings and milestones. Quality & transformation is crucial
Data exploration : Period data exploration by the technical teams, data scientists. Data quality & data freshness is required
Data intelligence : Automation, Machine Learning algorithms. Data quality is paramount for a good training

Technical stack

Google Cloud
- BigQuery
  - Serverless SQL data warehouse
  - Scale up and scale down resources as needed
  - Pay as you go model related to the amount of data you process in your SQL queries and transforms
- Looker
  - Reporting & dashboarding software that is connected to BigQuery for displaying and visualizing the data hosted on BigQuery
- Cloud Storage
  - Raw data storage facility that we use to store files, objects, media or staging step before loading to BigQuery
Airbyte
- Extract and Load software that exists as a Cloud option as well as an open source software
- You can write your Python data pipelines as code and host it in an Airbyte server that runs your code
- You can use on-the-shelf data pipelines written by the community for standard Extract and Load pipelines
- Not suited for transformations as it runs on a single server ; it’s better to let the transformations happen in your data warehouse, such as BigQuery
DBT
- SQL Transform tool that give you the ability to write your data transformations as SQL code and runs it using your data warehouse
- Lets your write macros - meaning reusable SQL functions - to package and optimize your code
- It also comes with data testing facilities
ChallengesScalability of the data warehouseAs I mentionned earlier, BigQuery offers a virtually endless scalability.But in reality, it’s pay-as-you-go model makes it more and more expensive to process larger amounts of data.Moreover, understanding and predicting costs can be complicated due to the variable pricing model.It means that we still need to optimize our storage, our processes and our code to be able to scale up without hitting our budgets.We are constantly challenging, reviewing and refactoring our code to meet these expectations.Data qualitySecond challenge is data quality.Data quality refers to the condition of data based on factors like accuracy, completeness, reliability, and relevance. High-quality data should be correct, up-to-date, comprehensive, and applicable to the needs of the user or application.Poor data quality can lead to erroneous conclusions and inefficient processes, which can have severe implications in business settings—ranging from minor inefficiencies to major blunders in strategic decisions.The 3 challenges when it comes to data quality are the following :
- Orchestrator
- Airflow acts like a project manager for your data tasks, making sure everything runs smoothly, on schedule, and in the correct order. Like a recipe that involves steps to prepare a dish.
- Airflow allows you to program workflows using Python scripts. These workflows are designed as "Directed Acyclic Graphs" (DAGs). In simple terms, a DAG is just a way of organizing tasks where each task has a specific order and none of the steps loop back on themselves (that's what acyclic means).
- Volume of data: The sheer amount of data generated today makes manual checks impractical.
- Variety of sources: Data coming from different sources may have different formats or standards.
- Velocity: The speed at which data is generated requires automated tools to manage and maintain quality.

The solution we have is to using

Continuous Monitoring: Regularly checking data quality metrics to catch issues early.

Looking for a new career opportunity?

Real Growth. Real Impact.

Our technology is designed to measure success. With Dolead, track and measure success at the most granular level, ensuring transparency and continuous improvement.

Our Marketing Partners

Welcome to the future of lead generation, where your business’ growth potential is unlocked by a team of experts equipped with high-powered AI technologies and automation capabilities.

Solutions

Dolead Performance Expert Media Buying All Solutions

Solution By Role

For Marketers For Sales Professionals For Agencies For Tech Teams For BPOs

Solution By Industry

Pest Control Roofing HVAC Bathroom Remodel Home Security Kitchen Remodel Digital Marketing

Online Education Health Insurance Hearing Aids Flooring Electrician B2B Software

View all categories

Quick Links

About us Technology Careers Growth Hub

Legal (EU)

Mentions Légales Politique de confidentialité Politique des cookies

Legal (US)

Terms & Conditions Cookies & Privacy Policy

Dolead Europe HQ

3 rue de Gramont 75002 Paris, France

Dolead Americas HQ

177 Huntington Ave Ste 1703 PMB 68292, Boston, MA, 02115-3153, USA