Part 1 - Interview Questions and Answers
❓How can you move the data between access tiers automatically in Azure? what are the business use cases?✅Answer - In Azure, you can move data between access tiers automatically using Azure Blob Storage lifecycle management policies. This feature allows you to define rules to transition blobs between different access tiers (Hot, Cool, and Archive) based on criteria such as last access time or blob age.
Here's how you can set it up:
- Define Lifecycle Management Rules: In Azure Portal or through Azure CLI/PowerShell, you can define rules specifying when to transition blobs between access tiers. These rules include conditions like the number of days since the blob was last modified or the last time it was accessed.
- Select Access Tiers: Specify which access tiers you want to transition the blobs between. For example, you might want to move less frequently accessed data to a cooler tier like Cool or Archive to reduce storage costs.
- Automatic Execution: Once the rules are defined, Azure Blob Storage automatically evaluates the criteria and moves the blobs between access tiers accordingly. This process is handled automatically by Azure, relieving you of manual intervention.
Business Use Cases:
- Cost Optimization: By automatically moving data to cooler access tiers when it becomes less frequently accessed, businesses can optimize storage costs. They can ensure that frequently accessed data remains in the Hot tier for fast access while reducing costs by storing less accessed data in cheaper tiers.
- Compliance and Data Management: Some data management policies require data to be retained for a certain period but accessed infrequently. Lifecycle management policies can help automatically move such data to cooler tiers without manual intervention, ensuring compliance and efficient data management.
- Performance Optimization: By keeping frequently accessed data in the Hot tier and moving less accessed data to cooler tiers, businesses can optimize performance. Hot tier is optimized for frequent access, while cooler tiers offer cost savings at the expense of slightly longer access times.
- Backup and Archiving: Automatic tiering can be used for managing backups and archives. For example, backups may be frequently accessed immediately after creation but become less accessed over time. Automatically transitioning them to cooler tiers helps in managing storage costs without sacrificing accessibility.
✅Answer - To check how much a storage account is costing you in Azure, you can follow these steps:
- Azure Portal: Sign in to the Azure Portal (https://portal.azure.com).
- Navigate to Storage Accounts: In the Azure Portal, navigate to the "Storage accounts" section. You can find it by searching for "Storage accounts" in the search bar at the top.
- Select the Storage Account: Click on the storage account you want to check the cost for from the list displayed.
- View Cost Analysis: Inside the storage account, you can view cost-related information. Look for a section related to cost or billing. Typically, you'll find a "Cost analysis" or "Monitoring" section where you can view usage and cost details.
- Cost Breakdown: In the cost analysis section, you should be able to see a breakdown of costs associated with the storage account. This breakdown may include costs related to storage, data transfer, operations, etc.
- Filter by Time Range (Optional): You can filter the cost analysis by a specific time range to see costs incurred during that period.
- Explore Cost Details: Depending on your Azure subscription and configuration, you may have access to detailed cost breakdowns, including costs associated with different access tiers, data redundancy options, data egress, and other related services.
- Additional Tools: Azure also provides additional tools like Azure Cost Management + Billing, which offer more comprehensive insights into your Azure spending, including storage account costs.
    ❓CSV vs Parquet file format?
    ✅Answer - You can consider speed, storage and cost in the various cloud based technologies as given below - 
- Speed - Apache parquet is column-based file format whereas CSV is row-based file format. As parquet is a column-based file format , aggregation queries and skipping non-relevant data while querying is less time consuming compared to CSV which is row-based. Parquet is approx. 34X faster than CSV.
- Storage -Parque supports many flexible compression options and efficient encoding schemes. Hence, Parque takes much less storage compared to CSV. Parquet takes almost 87% less storage than CSV.
- Cost - Amazon Athena like service charge based on the amount of data scanned per query, Amazon charges according to the amount of data stored S3, Google dataproc charges are time-based . Hence, based on the file size, data scanned per query, time taken to query, we are charged by service providers. When using parquet cost involved is much less compared to CSV based on the storage-query time-data scanned amount. While using Parquet , approx. 99% less data scanned
   ⛳ Overall saving : While using Parquet compared to CSV , overall cost saving is approx.. 99.7%

 
 
 
 
 
 
No comments:
Post a Comment