Showing posts with label What are OLTP and OLAP?. Show all posts
Showing posts with label What are OLTP and OLAP?. Show all posts

Tuesday, July 26, 2016

DW - Hybrid OLAP Servers

OLAP is a powerful analysis tool for forecasting, statistical computations, aggregations and involves more than just the multidimensional display of information. OLAP tools also must be able to extract and summarise requested data according to the needs of an end user, and there are two approaches for this data extraction that need to be discussed.
OLAP's multidimensional data model and data aggregation techniques organize and summarize large amounts of data so it can be evaluated quickly using online analysis and graphical tools. The answer to a query into historical data often leads to subsequent queries as the analyst searches for answers or explores possibilities. OLAP systems provide the speed and flexibility to support the analyst in real time.
Types of OLAP Servers
Cubes in a data warehouse are stored in three different modes and we can have four types of OLAP servers which are given below:
Hybrid OLAP Servers: They are combination of ROLAP (Relational OLAP) and MOLAP (Multidimensional OLAP) which are other possible implementations of OLAP. HOLAP allows storing part of the data in a MOLAP store and another part of the data in a ROLAP store, allowing a tradeoff of the advantages of each. 
The degree of control that the cube designer has over this partitioning varies from product to product.


HOLAP technologies attempt to combine the advantages of MOLAP and ROLAP. For summary-type information, HOLAP leverages cube technology for faster performance. When detail information is needed, HOLAP can "drill through" from the cube into the underlying relational data. 
For example, a HOLAP server may allow large volumes of detail data to be stored in a relational database, while aggregations are kept in a separate MOLAP store. The Microsoft SQL Server 7.0 OLAP Services supports a hybrid OLAP server.

Friday, September 11, 2015

DW - OLAP Servers

OLAP is a powerful analysis tool for forecasting, statistical computations, aggregations and involves more than just the multidimensional display of information. OLAP tools also must be able to extract and summarise requested data according to the needs of an end user, and there are two approaches for this data extraction that need to be discussed.
As we know about that data in a data warehouse is organized to support analysis rather than to process real-time transactions as in online transaction processing systems (OLTP).  OLAP technology enables data warehouses to be used effectively for online analysis, providing rapid responses to iterative complex analytical queries.

OLAP is stand for Online Analytical Processing and OLTP Server is the chief component which stays between a client and a database management systems (DBMS).

In very simple words, OLAP servers present business users with multidimensional data from data warehouse or data marts, without concerns regarding how or where the data are stored. The OLAP servers are key points to understand that how data is organized in the database and has special functions for analyzing the data.

OLAP's multidimensional data model and data aggregation techniques organise and summarise large amounts of data so it can be evaluated quickly using online analysis and graphical tools. The answer to a query into historical data often leads to subsequent queries as the analyst searches for answers or explores possibilities. OLAP systems provide the speed and flexibility to support the analyst in real time.
Types of OLAP Servers
Cubes in a data warehouse are stored in three different modes and we can have four types of OLAP servers which are given below:
Relational OLAP (ROLAP) Servers: These are the intermediate servers that stand in between a relational back-end server and client front-end tools. They use a relational or extended-relational DBMS to store and manage warehouse data, and OLAP middle-ware to support missing pieces. ROLAP servers include optimization for each DBMS back end, implementation of aggregation navigation logic, and additional tools and services. ROLAP technology tends to have greater scalability than MOLAP technology. 
The ROLAP storage mode causes the aggregations of the partition to be stored in indexed views in the relational database that was specified in the partition's data source.
Advantages of ROLAP
  1. ROLAP is considered to be more scalable in handling large data volumes, especially models with dimensions with very high cardinality (i.e., millions of members).
  2. With a variety of data loading tools available, and the ability to fine-tune the ETL code to the particular data model, load times are generally much shorter than with the automated MOLAP loads.
  3. The data are stored in a standard relational database and can be accessed by any SQL reporting tool (the tool does not have to be an OLAP tool).
  4. ROLAP tools are better at handling non-aggregable facts (e.g., textual descriptions). MOLAP tools tend to suffer from slow performance when querying these elements.
  5. By decoupling the data storage from the multi-dimensional model, it is possible to successfully model data that would not otherwise fit into a strict dimensional model.
  6. The ROLAP approach can leverage database authorization controls such as row-level security, whereby the query results are filtered depending on preset criteria applied, for example, to a given user or group of users (SQL WHERE clause).
Disadvantages of ROLAP
  1. There is a consensus in the industry that ROLAP tools have slower performance than MOLAP tools. However, see the discussion below about ROLAP performance.
  2. The loading of aggregate tables must be managed by custom ETL code. The ROLAP tools do not help with this task. This means additional development time and more code to support.
  3. When the step of creating aggregate tables is skipped, the query performance then suffers because the larger detailed tables must be queried. This can be partially remedied by adding additional aggregate tables, however it is still not practical to create aggregate tables for all combinations of dimensions/attributes.
  4. ROLAP relies on the general purpose database for querying and caching, and therefore several special techniques employed by MOLAP tools are not available (such as special hierarchical indexing). However, modern ROLAP tools take advantage of latest improvements in SQL language such as CUBE and ROLLUP operators, DB2 Cube Views, as well as other SQL OLAP extensions. These SQL improvements can mitigate the benefits of the MOLAP tools.
  5. Since ROLAP tools rely on SQL for all of the computations, they are not suitable when the model is heavy on calculations which don't translate well into SQL. Examples of such models include budgeting, allocations, financial reporting and other scenarios.
Multidimensional OLAP (MOLAP) servers: These servers support multidimensional views of data through array-based multidimensional storage engines. They map multidimensional views directly to data cube array structures. The advantage of using a data cube is that it allows fast indexing to pre-computed summarized data.
This is the more traditional way of OLAP analysis. In MOLAP, data is stored in a multidimensional cube. The storage is not in the relational database, but in proprietary formats. Most MOLAP solutions store these data in optimized multidimensional array storage, rather than in a relational database.

Advantages of MOLAP
  1. Fast query performance due to optimized storage, multidimensional indexing and caching.
  2. Smaller on-disk size of data compared to data stored in relational database due to compression techniques.
  3. Automated computation of higher level aggregates of the data.
  4. It is very compact for low dimension data sets.
  5. Array models provide natural indexing.
  6. Effective data extraction achieved through the pre-structuring of aggregated data.
  7. Cube technology are often proprietary and do not already exist in the organization. Therefore, to adopt MOLAP technology, chances are additional investments in human and capital resources are needed.
Disadvantages of MOLAP
  1. Within some MOLAP Solutions the processing step (data load) can be quite lengthy, especially on large data volumes. This is usually remedied by doing only incremental processing, i.e., processing only the data which have changed (usually new data) instead of reprocessing the entire data set.
  2. Some MOLAP methodologies introduce data redundancy.
Hybrid OLAP Servers: They are combination of ROLAP (Relational OLAP) and MOLAP (Multidimensional OLAP) which are other possible implementations of OLAP. HOLAP allows storing part of the data in a MOLAP store and another part of the data in a ROLAP store, allowing a tradeoff of the advantages of each. The degree of control that the cube designer has over this partitioning varies from product to product.

HOLAP technologies attempt to combine the advantages of MOLAP and ROLAP. For summary-type information, HOLAP leverages cube technology for faster performance. When detail information is needed, HOLAP can "drill through" from the cube into the underlying relational data. For example, a HOLAP server may allow large volumes of detail data to be stored in a relational database, while aggregations are kept in a separate MOLAP store. The Microsoft SQL Server 7.0 OLAP Services supports a hybrid OLAP server.


Specialized SQL Servers: Specialized SQL servers provide advanced query language and query processing support for SQL queries over star and snowflake schemas in a read-only environment.

References:
http://social.technet.microsoft.com/wiki/contents/articles/19898.aspx

Monday, August 3, 2015

DW - OLTP vs OLAP

OLTP stands for On-line Transaction Processing. OLTP based systems (account, ticket booking, banking systems, money transfer system) are used to perform a large number of short transactions. Almost all of the database queries in OLTP system consist of commands insert, update, delete. Select queries are mainly designed to enable users to select data from different entities. The majority of query is known in advance at the design stage of the system. Thus, critical for OLTP system is the speed and reliability of the performance of short operations updating data.

We know that OLTP has also been used to refer to processing in which the system responds immediately to user requests. In OLTP databases, there are detailed and current data, and schema used to store transnational databases is the entity model.

OLAP stands for On-line Analytical Processing which was described by Edgar Codd in 1993. OLAP based systems are used for management decisions and these system known as Decision Support System (DSS). OLAP-applications operate with large amounts of data accumulated in the operational databases OLTP. Typical applications of OLAP include business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas.

As we know that OLAP is the most important component of business intelligence, which also encompasses relational database, report writing and data mining. 

Collaboration of OLTP and OLAP systems
Following work flow structure is capable to show the importance of the each system based application and how are they link with each -

Major differences between OLTP and  OLAP

Contents
OLTP System 
OLAP System 
Source of data
Operational data; OLTP databases are meant to be used to do many small transactions, and usually serve as a "Single Source Of Truth"
Consolidation data; OLAP data comes from the various OLTP Databases which is used for analytics, data mining, less queries but they are usually bigger (they operate on more data).
Purpose of data
To control and run fundamental business tasks, facilitate and manage transaction-oriented applications
To help with planning, problem solving, and decision support, report writing and data mining.
What the data
Reveals a snapshot of ongoing business processes, used to refer to processing in which the system responds immediately to user requests.
Multi-dimensional views of various kinds of business activities to help out the better decision making.
Inserts and Updates
Short and fast inserts and updates initiated by end users.
Periodic long-running batch jobs refresh the data by the help of ETLs processes.
Queries
Relatively standardized and simple queries Returning relatively few records which are not more than 2 years
Often complex queries involving aggregations which are based on the large volume of the data more than 2 years.
Processing Speed
Typically very fast and always hit the main databases. Provides primitive and highly detailed data.
Depends on the amount of data involved; batch data refreshes and complex queries may take many hours; query speed can be improved by creating indexes. Provides summarized and consolidated data.
Space Requirements
Can be relatively small if historical data is archived. Mostly Database size is from 100 MB to 100 GB.
Larger due to the existence of aggregation structures and history data; requires more indexes than OLTP. Database size is from 100 GB to 100 TB
Database Design
Highly normalized with many tables and provides high performance.
Highly flexible. Typically de-normalized with fewer tables; use of star and/or snowflake schemas.
Backup and Recovery
Backup religiously; operational data is critical to run the business, data loss is likely to entail significant monetary loss and legal liability
Instead of regular backups, some environments may consider simply reloading the OLTP data as a recovery method