A typical data engineering problem, often referred to as extract, transform and load (ETL), consists of the following:
- take data in one place (extract)
- change its form (transform)
- move it to a new place, in this new form (load)
This process gets interesting when data volumes are large, and you have to consider performance. Long turnaround time (e.g., a run taking several hours or days) makes the typical serially iterative software engineering approach inefficient. In this article, we offer some tips on re-structuring the software engineering process and leveraging the cloud to make iteration more efficient.