From the course: Data Pipeline Automation with GitHub Actions Using R and Python

Unlock the full course today

Join today to access over 24,700 courses taught by industry experts.

Data refresh process

Data refresh process

- [Instructor] In this video, we will focus on the data refresh process. Recall, the goal of the data refresh process is to keep our normalized table aligned with the most recent data that is available in the source data. In this process, the function's main logic when triggered is to check if new data is available on the data source, and if so, to pull the incremental data, process it, and append it to the normalized data. Note that in some cases, you may want to pull data beyond the incremental data. For example, let's assume that you are working with sales data and restatements may have occurred in the data during the last seven days due to the company's product return policy. Therefore, in this case, each time the pipeline refreshes the data, you may want to repull the last seven days, in addition to the incremental data. This adds some complexity to the process, as you will have to drop the overlapping observation when appending the data back to the normalized table. You want to…

Contents