The daily life of a Data Engineer

Mourad EL GHISSASSI
6 min readApr 21, 2023

--

Whenever I’m asked what I do for work, I find it hard to answer. So I will try to explain what a Data Engineer does, by giving you a glimpse of the work I did for my latest client.

At its core, Data Engineering aims to provide quality data for analytics and decision-making. And it does this by collecting raw source data, processing it into usable information, securely storing it and making it available to authorized users.

Let’s start with the technical part. Data engineering is basically a more specific software engineering role, focused on designing and building data infrastructures. So the first requirement is coding/programming. My main task was setting up data pipelines to collect all necessary data from various sources, including SAP BW, CSV files, Google Sheets, and storing it in a Data Lake. This was done using Talend, which is an ETL software. ETL stands for Extract — Load — Transform. Talend’s no-code/low-code graphical interface allows users to easily drag and drop various components, which the software then converts into Java code. So if coding is not your thing, don’t worry, there are workarounds to help you become a successful Data Engineer. However, it’s still essential to have some basic knowledge of at least one programming language.

Talend workflow

As the application was hosted on BigQuery, a significant portion of my day was spent writing and executing SQL (Structured Query Language), which lets you access and manipulate databases. I would highly recommend anyone interested in becoming a Data Engineer to prioritize learning SQL, as it is a fundamental skill in the field.

After collecting and cleaning the data, the next critical step is to utilize it for different calculations and to create various key performance indicators (KPIs). In the case of this project, one the use cases was calculating the amount of CO2 emitted during the production of a chemical substance. This calculation accounted for the transportation of the substance to the factory, the energy required for its transformation, and the greenhouse gases produced during the process.

Tableau dashboard

Another crucial step that you need to take care of while wearing the programmer hat, is orchestrating the different tasks (you can’t make the calculations before extracting and loading the data first for example) as well as automating the jobs at regular intervals (for example once a week or once a month.)

Another essential tool for almost any software project is a version control system. In this project, we used Git/Gitlab. Think of an IT project as a giant puzzle, Git is like a safety deposit box where you can store your part of the puzzle safely. On the other hand, Git makes it easy to manage multiple people working on the same project. You can send your changes to your colleagues, and they can send their changes to you. Git will help you merge all the changes together so you end up with a complete puzzle that everyone worked on together.

(Midjourney)

My work as data engineer wouldn’t be complete without proper documentation. Prior to moving to production, I create technical documentation and review it with the data operations team that will manage the project after the development phase. This documentation includes a data dictionary, data flow diagrams, technical specifications, and source code documentation. By providing clear and detailed documentation, it ensures that everyone involved has a comprehensive understanding of the project and its requirements, reducing the risk of errors and improving overall efficiency.

Talking about the deployment of the project, while the development environment was used to code, the jobs have to be pushed to the test environment for the stakeholders to test them, and later to the production environement that users have access to. This is done hand with hand with the data operations team, in what is called DevOps : a fairly recent methodology in the software development that became prominent in the early 2010s.

turnoff.us

DevOps is the combination of two teams who work together to build a really cool toy. Team A (developers) comes up with the idea and design the toy, and team B (operations) makes sure the toy gets made correctly and is ready to play with. This means that the developers and operators are always communicating and working together throughout the entire process, from start to finish to build and maintain high-quality software.

Lastly, in the field of IT, it’s important to stay up-to-date with recent updates to the software being used, as well as to keep an eye out for potential new tools that could be helpful. On this project, I had the opportunity to work onmigrating the existing jobs from Talend 7 to Talend 8, which is cloud-based. This allowed us to take advantage of new features and improvements that were not available in the previous version, and ultimately improved the performance and functionality of the project.

(Midjourney)

Now that we’ve covered the technical skills, let’s talk about the importance of soft skills in Data engineering. Communication is a critical aspect of the job. As a data engineer, I collaborated closely with a diverse team consisting of a data engineer, a UI designer, a data visualization expert, a product owner, a scrum master, a data functional analyst, a data architect, a project manager, as well as various stakeholders. It is essential to effectively communicate information with all team members involved in the project. I would say that 20% to 30% of a data engineer’s day is typically spent in meetings to ensure everyone is on the same page and progress is being made.

To better handle a big project and also to avoid spending more time than necessary in endless meetings, we use something called the Agile framework. If you’re not familiar with management frameworks, It’s like playing with building blocks: we take the big project and break it down into smaller pieces. We call these pieces “sprints” and they usually last from two to four weeks to build. Every increment consists of defining the goals of the sprint by choosing which tasks will be done, then working on those tasks with constant feedback, at least on a daily basis, starting to prepare for the next sprint, as well as making a presentation to the client on the last day, and then doing a retrospective to see what went well and what can be improved, which is critical for constant improvement.

Those were the big lines of the Data Engineer’s job, which can be both challenging and rewarding. One of the main tasks is to create intricate data pipelines and work collaboratively with different teams to ensure that data is gathered, organized, and processed appropriately to achieve the organization’s objectives. Success in this role requires technical proficiency, problem-solving abilities, and effective communication and teamwork skills. In today’s world, data engineering is an increasingly important field that presents various career opportunities. It is a fast-paced and ever-changing work environment, so be prepared for that if considering a career in this field. However, know that your contributions can have a significant impact on the organization’s success.

--

--

Mourad EL GHISSASSI
Mourad EL GHISSASSI

Written by Mourad EL GHISSASSI

Human, traveler, perpetual curious, senior Data Engineer

No responses yet