Microsoft Business Intelligence (Data Tools)|Data Engineering

Tuesday, October 3, 2023

Data Engineering — How to Data Pipeline Scaling

If you work as a data engineer, data analytics, or data scientist in an organization that needs you on a project and are using a pretty standard ELT architecture to extract data from several sources into on-premise or cloud-based systems, this is a good fit.

Data Curiosity: Data curiosity is essential for a successful company that values data before you begin creating your data pipeline. It’s a constantly changing part of data culture that pushes you to seek out new or current data, challenge it, and utilize it to make more accurate decisions about data patterns within source systems, such as —

How much data in the DB?
How much in the API?
Are queries to the API deterministic?
Do they have cases of combinatorial explosion, or is it fairly straightforward?

You could clarify the data curiosity by assuming that the data in the database consists of customer-level aggregates at multiple dimensions, which are already quite large in Snowflake/On-premise databases or cloud based databases and will grow linearly with customer growth. The API access consists of both point and range queries; paginated responses for range queries are required. Moving this data to an RDBMS at regular periods is an option, but it adds complexity in terms of frequency of loads, database pressure, and adding another layer for us to reconcile, etc.

To read the full story, please reach out to my Medium article here.

To learn more, please follow us -

http://www.sql-datatools.com

To Learn more, please visit our YouTube channel at —

http://www.youtube.com/c/Sql-datatools

To Learn more, please visit our Instagram account at -

https://www.instagram.com/asp.mukesh/

To Learn more, please visit our twitter account at -

https://twitter.com/macxima

Mukesh Singh

With over 17 years of experience in the Data Engineering stack across a variety of cloud and on-premises systems, I have successfully delivered more than ten complete business product solutions. My expertise lies in building robust infrastructure and architecture to support data engineering, data analytics, and machine learning processes. These solutions have significantly improved collaboration among cross-functional teams, including data scientists, business analysts, software engineers, and stakeholders. Key Contributions Data Modelling and Integration • Data Modeling: Developed various data models to produce suitable data for business users, data analytics, data science, and data visualization teams. • Legacy Systems and Cloud Technologies: Integrated legacy systems with modern cloud-based technologies (AWS, Azure, GCP), data lakes, and data warehouses. • Streamlined Data Pipelines: Built efficient data pipelines, data warehouses, BI reports, and dashboards to streamline data access and insights.

Microsoft Business Intelligence (Data Tools)

Tuesday, October 3, 2023

Data Engineering — How to Data Pipeline Scaling

No comments:

Post a Comment

Popular Posts