From the course: Data Pipeline Automation with GitHub Actions Using R and Python
Unlock the full course today
Join today to access over 24,700 courses taught by industry experts.
Data pipeline architecture - GitHub Tutorial
From the course: Data Pipeline Automation with GitHub Actions Using R and Python
Data pipeline architecture
- [Host] In the previous video, we reviewed the data scope and pipeline requirements. In this video, we'll review the data pipeline architecture to automate the California sub regions demand for electricity data. We'll use the following deployment. Let's now break it down into the its different components, starting with the EIA API, our source data or raw data. In the previous chapter, we reviewed how we can set and send a gate request to pull metadata and data from the API using the EI metadata and the EI backfill functions. The pipeline supporting functions will leverage those functions to extract data from the API. The second component is the data pipeline, whose main functionality is to check if new data is available in the API and refresh the data when applicable. In addition, this function also collect metadata on each steps enabling us to monitor the health of the data pipeline. The process is deployed on GitHub actions and we'll dive into more details about the deployment in…