Microsoft Business Intelligence (Data Tools)|October 2023

Tuesday, October 31, 2023

GCP— Cloud Run a fully managed compute platform

Cloud Run is undoubtedly the next generation of Google Cloud’s “serverless” remedies, and it is one step down in cloud abstractions, enabling you to tweak a bit more without having to worry about scaling (too much). Google Cloud Run is a fully managed compute platform that enables you to deploy containerized applications in a serverless environment. It is part of Google Cloud Platform (GCP) and is designed to simplify the deployment and scaling of containerized applications without the need for managing infrastructure.

Cloud Run is Google’s next generation of serverless, with AppEngine remaining to help those who have previously committed to it.

Cloud Run has several advantages –

— Cloud run enables you to deploy a service to any area inside a single project, making your API truly global.

— Cloud Run also lets you configure a static IP address, while AppEngine does not. This is useful when you need to relay mail or connect to a service that restricts access based on IP address.

— Cloud Run’s docker image support is also more configurable than AppEngine standard, and Cloud Run offers more robust choices (additional ram, etc.).

— Cloud Run is utilised for scaling out a single container several times, which is typically used for microservices, APIs, and frontend UIs that are wrapped within containers.

Here are key features and concepts of Google Cloud Run:

Containerized Applications: Google Cloud Run supports containerized applications, allowing you to use Docker containers to package and deploy your applications.

Serverless Platform: Cloud Run follows a serverless computing model, where developers focus on writing code and deploying containers without dealing with the underlying infrastructure. It automatically scales based on the incoming request traffic.

Scaling: Cloud Run can scale from zero to handle any number of requests. It automatically provisions and scales container instances based on demand.

HTTP(S) Request Handling: Cloud Run is primarily designed for handling HTTP(S) requests, making it suitable for web services, APIs, and microservices.

Event-Driven Architecture: In addition to handling HTTP requests, Cloud Run can be triggered by events from various sources, such as Cloud Storage changes, Pub/Sub messages, and more.

Integration with GCP Services: Cloud Run seamlessly integrates with other Google Cloud services, enabling you to build end-to-end solutions. It can be used in conjunction with Cloud Storage, Cloud Pub/Sub, Cloud SQL, and other GCP services.

Build and Deploy from Container Registry: You can build your container image and deploy it directly from Container Registry, Google Cloud’s container image registry.

Environment Variables and Secrets: Cloud Run allows you to configure environment variables and manage secrets securely. This is useful for configuring your application and handling sensitive information.

Managed TLS Certificates: Cloud Run provides managed TLS certificates, enabling secure communication over HTTPS without the need for manual certificate management.

Multi-Region Deployment: You can deploy Cloud Run services to multiple regions, allowing you to serve content closer to your users.

Cost Model: Google Cloud Run follows a pay-as-you-go model based on the number of vCPU-seconds and GB-seconds consumed during request processing.

Google Cloud Run provides a flexible and cost-effective solution for deploying and managing containerized applications. It allows developers to focus on building features and applications while abstracting away the complexities of infrastructure management.

Cloud Run is significantly more user-friendly, especially when it comes to deployment and development iterations. You may specify a minimum number of instances to aid with cold boot and vertically scale the instances you do have. Running your own instance is undoubtedly the lowest choice in terms of CPU time expenditures, but I would advise you to consider man hours as well when deciding between serverless and self-managed.

Ease of Deployment and Development Iterations: Google Cloud Run is praised for its simplicity and ease of use. The deployment process is streamlined, and developers can iterate on their applications quickly. This ease of deployment and iteration is a significant advantage for projects with dynamic development requirements.

Minimum Number of Instances and Cold Boot: Cloud Run allows you to set a minimum number of instances, which can be beneficial for addressing cold start latency. This ensures that there are pre-warmed instances ready to handle incoming requests, reducing the impact of cold starts on application responsiveness.

Vertical Scaling: Cloud Run’s ability to vertically scale instances based on demand is valuable. It ensures that resources are allocated efficiently, and the platform can handle varying workloads effectively.

Cost Considerations: While running your own instances may be the cheapest option in terms of raw CPU time costs, it’s crucial to factor in other costs, including development time and operational overhead. Serverless solutions often abstract away infrastructure management complexities, saving time and effort.

Man Hours and Development Efficiency: Considering man hours is a wise approach. The ease of use and reduced operational burden with serverless platforms can contribute to higher development efficiency and faster time-to-market. This becomes particularly relevant when balancing the costs associated with managing your own infrastructure.

Recommendation- We often recommend Serverless (Cloud Run, App Engine) over computational Engine, especially if you are unfamiliar with and are unwilling to figure out how much computational capacity you require.

Cloud Run is newer and offers some additional capabilities, such as concurrency (a single instance may handle several simultaneous requests), but both should function properly. Cloud Run goes the distance and provides an exceptional service. It may take you an hour to construct a decent and optimized Dockerfile, but everything else will be a breeze after that.

To learn more, please follow us -
http://www.sql-datatools.com
To Learn more, please visit our YouTube channel at —
http://www.youtube.com/c/Sql-datatools
To Learn more, please visit our Instagram account at -
https://www.instagram.com/asp.mukesh/
To Learn more, please visit our twitter account at -
https://twitter.com/macxima

Mukesh Singh

With over 17 years of experience in the Data Engineering stack across a variety of cloud and on-premises systems, I have successfully delivered more than ten complete business product solutions. My expertise lies in building robust infrastructure and architecture to support data engineering, data analytics, and machine learning processes. These solutions have significantly improved collaboration among cross-functional teams, including data scientists, business analysts, software engineers, and stakeholders. Key Contributions Data Modelling and Integration • Data Modeling: Developed various data models to produce suitable data for business users, data analytics, data science, and data visualization teams. • Legacy Systems and Cloud Technologies: Integrated legacy systems with modern cloud-based technologies (AWS, Azure, GCP), data lakes, and data warehouses. • Streamlined Data Pipelines: Built efficient data pipelines, data warehouses, BI reports, and dashboards to streamline data access and insights.

Monday, October 30, 2023

AWS Lambdas are bad ideas for running Memory Intensive Computations

In today's data-driven world, businesses of all sizes must be able to properly acquire, analyse, and interpret data in order to make intelligent decisions and stay ahead of the curve.

Mukesh Singh

Tuesday, October 3, 2023

Data Engineering — How to Data Pipeline Scaling

If you work as a data engineer, data analytics, or data scientist in an organization that needs you on a project and are using a pretty standard ELT architecture to extract data from several sources into on-premise or cloud-based systems, this is a good fit.

Data Curiosity: Data curiosity is essential for a successful company that values data before you begin creating your data pipeline. It’s a constantly changing part of data culture that pushes you to seek out new or current data, challenge it, and utilize it to make more accurate decisions about data patterns within source systems, such as —

How much data in the DB?
How much in the API?
Are queries to the API deterministic?
Do they have cases of combinatorial explosion, or is it fairly straightforward?

You could clarify the data curiosity by assuming that the data in the database consists of customer-level aggregates at multiple dimensions, which are already quite large in Snowflake/On-premise databases or cloud based databases and will grow linearly with customer growth. The API access consists of both point and range queries; paginated responses for range queries are required. Moving this data to an RDBMS at regular periods is an option, but it adds complexity in terms of frequency of loads, database pressure, and adding another layer for us to reconcile, etc.

To read the full story, please reach out to my Medium article here.

To learn more, please follow us -

http://www.sql-datatools.com

To Learn more, please visit our YouTube channel at —

http://www.youtube.com/c/Sql-datatools

To Learn more, please visit our Instagram account at -

https://www.instagram.com/asp.mukesh/

To Learn more, please visit our twitter account at -

https://twitter.com/macxima

Mukesh Singh

Tuesday, October 31, 2023

GCP— Cloud Run a fully managed compute platform

Monday, October 30, 2023

AWS Lambdas are bad ideas for running Memory Intensive Computations

Tuesday, October 3, 2023

Data Engineering — How to Data Pipeline Scaling

Popular Posts