Reportes Mg. Samuel Oporto Díaz.

Slides:



Advertisements
Presentaciones similares
Cubos de Información Son subconjuntos de datos de un almacén de datos, organizado y sumarizado dentro de una estructura multidimensional. Los datos se.
Advertisements

Diccionario de Datos (DD)
Desarrollo de Sitios Web
OLAP Mg. Samuel Oporto Díaz.
Modelo Dimensional Mg. Samuel Oporto Díaz.
From choice, a world of possibilities Captura de Datos de Estadísticas de Servicios en el SGIe Noviembre 2006 Panamá
Diseño de Bases de Datos
Ingeniería de Software II
ORACLE OLAP Integrantes: *Aizaga, Martiniano *Gallegos, Marina
ANÁLISIS Y DISEÑO ESTRUCTURADO
BASE DE DATOS OBJETO RELACIONAL
09/02/2014Curso Bases de Datos1 Ejemplos Álgebra Relacional.
Optimización del rendimiento de las consultas
Evaluaciones de Sistemas de Administración de la Seguridad SMSA
PORTAL WEB Manual de Usuario Perfil Autorizador
BASES DE DATOS MULTIDIMENSIONALES
CONSTRUCCIÓN DE BODEGAS DE DATOS
U NIDAD 2 L ENGUAJE DE DEFINICIÓN DE DATOS (DDL) 1.
La aplicación de los tres conceptos a un problema dan como resultado una solución acorde a las necesidades del cliente a un costo acorde al problema.
• SQL Server Analysis Services
• SQL Server Analysis Services
¿QUÉ SON LAS BASES DE DATOS?
Mejoras a la Cláusula GROUP BY
INTELIGENCIA ARTIFICIAL
Understanding SOA Design Patterns
Informe caso de estudio Implementación de un Datawarehouse
Modelo Multidimensional
Julio Pacheco SQL SERVER 2005 XML APRENDIENDO CON EJEMPLOS.
Especialista en Business Intelligence Analysis Services SSAS (Sesión 10) Microsoft SQL Server 2008 R2 (2013) Suscribase a o escríbanos.
Ingeniería del Software
Características Objeto Relacionales en Oracle
Yeimi Constanza Patiño
Unidad 3 Lenguaje SQL Contenidos Que es SQL ? Estructura de SQL
Johanna Lizeth Rodríguez Lorena Fda. Chávarro Ramos
OPERACIONES DE RECUPERACIÓN. WHERE ciudad =‘Las Palmas’; de los suministradores Obtener el código y la situación de Las Palmas SELECT sn, situacion FROM.
PL/SQL Francisco Moreno Universidad Nacional.
INTELIGENCIA DE NEGOCIOS
Bases de Datos Relacionales
Gung Ho! Gung Ho! Gestión de los Stakeholders

Instituto Tecnológico de La Paz Ing. Fernando Ortiz Ahumada.
Subconsultas Avanzadas
Bases de Datos II OLAP Online Analytical Processing
Capítulo 4: Inventario de Emisiones
Implementación de Datawarehouse
Bases de datos en la Web n Las bases de datos permiten almacenar de una forma estructurada y eficiente toda la información de un sitio web n Ventajas –Proporcionar.
Gestión de Objetos con Vistas de Diccionario de Datos
Consulta Típica en OLAP
Universidad del Cauca – FIET – Departamento de Sistemas CAPITULO 11 Creando Vistas.
Inteligencia de Negocios Buenos Aires, mayo de 2009 U.T.N. – F.R.B.A. Prof: Ing. Pablo Cigliuti Ayud: Ing. Rafael Rizzo.
Evaluación de sistemas de cómputo Edna Martha Miranda Chavez Sergio Fuenlabrada Velázquez Sep 2010 BENCH MARK para compra de software de base, herramientas,
La Información en las Organizaciones. Datos Externos Datos Internos Datos Personales Data Mining Data Warehouse Data Marts Meta Data OLAP Queries DSS.
Sistemas de Información I Sistema de Compras
UNIVERSIDAD NACIONAL MAYOR DE SAN MARCOS
Universidad del Cauca – FIET – Departamento de Sistemas CAPITULO 14 Uso de Operadores de Conjuntos.
Universidad del Cauca – FIET – Departamento de Sistemas CAPITULO 0 Introducción.
ORACLE OLAP CAECE Integrantes: *Aizaga, Martiniano *Gallegos, Marina *Kleinlein, Guillermo *Schiano di Cola, Emiliano.
Servicios e instalaciones
MIA - Grupo 5 Unidad 2.
Selección de Productos de Software (SPS)
Taller de Inteligencia de Negocios SQL Server Analysis Services Semana 8.
SOLUCIONES EMPRESARIALES
Marco de Trabajo para Indexación, Clasificación y Recopilación Automática de Documentos Digitales Javier Caicedo Espinoza Gonzalo Parra Chico.
Ing. Ernesto Sierraalta Fundamentos de Desarrollo de Proyectos de Inteligencia de Negocios ( Decision Support Systems & Data Warehousing.
Almacén de Datos MC BEATRIZ BELTRÁN MARTÍNEZ. Primavera 2016 MC BEATRIZ BELTRÁN MARTÍNEZ 33 Definición Colección de datos orientada a un dominio, integrado,
Diccionario/Directorio de Datos
Bases de datos II Universidad del Cauca Ing. Wilson Ortega.
Copyright  Oracle Corporation, All rights reserved. 11 Creación de Vistas.
Sistemas de Información I Sistema de Compras
Fundamentos de Bases de Datos
Transcripción de la presentación:

Reportes Mg. Samuel Oporto Díaz

Mapa del Curso Inteligencia de Negocios Metodología Kimball Planeamiento del Proyecto Modelo del Negocio Modelado Dimensional Modelado Físico ETL Reportes Minería de Datos

Tabla de Contenido Resúmenes Selección de resúmenes Resúmenes en Oracle

Objetivos Explicar porque los resúmenes son usados en el warehouse y listar los beneficios de tablas resumidas. Discutir la configuración de tablas resumen. Describir recomendaciones para seleccionar dimensiones y niveles de resumen. Identificar las restricciones para administrar las tablas resumen.

RESÚMENES

¿Que son los Resúmenes? Almacena datos pre resumidos. Son basados en requerimientos de consultas de usuarios. Ventas Región Resumen de Ventas Estado Ciudad What Are Summaries? A summary (or aggregate) is a fact table record representing a summarization of base-level fact table records. Aggregation calculations are performed on base-level detail data and the results are stored in “summary” tables, providing precalculated answers to known queries. Because most decision support queries call for aggregation of data elements, using summaries in your design reduces the need to repeat resource-intensive calculations and avoids slow and costly GROUP BY operations by individual queries. Based on User Query Requirements The summary data that needs to be maintained in the warehouse is an early design consideration, and is based on the users’ query requirements. The initial summary requirements can be identified by a summary analysis, detailed later in this lesson. Summary requirements can also be identified by monitoring code (using the warehouse or query manager processes) and identifying the GROUP BY clauses used commonly in SQL statements. The success of your summary strategy in the initial database design is directly correlated to the end user’s ability to communicate the data analysis and reporting needs. Tiempo Producto

¿Porque resumir Datos? Optimiza la utilización de recursos. Incrementa el tiempo de respuesta de consultas Optimiza la utilización de recursos. Mejora el análisis de procesos. Why Summarize Data? Summary tables are important in the design of the data warehouse because they: Improve service to end users by providing better response time to analytical queries. The use of prestored summaries is the single most effective tool the data warehouse designer has to improve performance. Provide improved and optimized use of resources, storage, and CPU Enhance the analysis process allowing for drilling down from higher levels of detail, and drilling up from lower levels of detail. Summary Candidates Summaries should be considered for commonly used queries that reflect one or more of the following characteristics: Retrieval time is high, because of sheer dimension size or a complex dimensional structure. Aggregation time is high. With a simple fact table, dynamic aggregation of data to create summary information may be sufficient. However, a complex fact table with complicated calculations would suggest preaggregation to avoid complex SQL statements at query time. Compression ratio is low. The compression ratio for a dimension is determined by dividing the number of rows returned in a query by the number of rows actually scanned.

¿Porque un modelo de Resúmenes ahora? Diseñar los resúmenes antes de la implementación. Crear los resúmenes. Evaluar el uso de resúmenes y potencialmente revisar la aproximación. Why Model Summaries Now? As a designer, you want to create predictable response times for user queries. A certain maximum response time should be established as the general accepted performance criteria. Then summaries should be designed to meet this criteria. The Hidden Demand Syndrome If you do not design for predictable response time with summaries before implementing the warehouse, and you rely on query statistics as the sole summary design input, you may never get an accurate picture of the true summary needs of the user community, because of the hidden demand syndrome. This common syndrome occurs when users run aggregation queries that do not respond in a timely fashion. They either kill the query and do not run it again, or they let the query run but don’t run it again because of the slow response. Iterative Process At the same time, the design and implementation of summaries is an ongoing, iterative process. Summary implementations should be evaluated and possibly revised as the warehouse matures. Identification of additional summaries needed to support user demand and existing summary tables may be dropped.

Un ejemplo simple sin resúmenes Búsqueda (Scan) Tabla hechos ventas 109,500,000 filas Almacén 100 almacenes Tiempo 1095 días Producto 10,000 producto A Simple Example: Without Summaries In the example above, data in the fact table is stored at the atomic level (the lowest level of detail along three dimensions), providing the raw data to be summarized. Data is stored for each day, for every store and for every product. The fact table has approximately 109,500,000 rows (assuming that 10 percent of the products sell every day at every store). A query requesting yearly summaries would need to scan through the entire fact table by adding daily sales by years. Total ventas por año

Un ejemplo simple con resúmenes Almacén 100 almacenes tabla resumen ventas 3,000,000 filas Año 3 años Producto 10,000 productos A Simple Example: With Summaries In this example, the detailed data shown in the previous slide is summarized along one of the three dimensions, from the day level to the year level in the time dimension, to create a new Sales summary table. This simple summarization significantly reduces the number of rows in the new summary fact table. For a query requesting sales of product by store by year, accessing the summarized data results in a 36-times reduction in the number of rows to scan.

Atributos Jerárquicos - Tablas dimensión Dimensión Producto: 1 Jerarquía Dimensión Almacén: 2 Jerarquías Grupo Total Clase Producto Total Región Distrito Store Total Estado Ciudad Store Hierarchical Attributes Within Dimension Tables Evaluating Hierarchies As you evaluate your dimensional design to identify opportunities for summaries, look at the inherent hierarchies within each dimension. The hierarchies or relationships between the attributes within the dimension provide a road map for determining the summaries. Within dimensions, there may be multiple business-oriented or enterprise-specific hierarchies. For example, in the slide the store dimension contains market and geographic hierarchies. Jerarquía Mercado Jerarquía Geográfica

Resúmenes de N - Caminos T3 Año Cat. S3 Región Total ventas por año, por ítem, por almacén por día, por categoría, por región por mes, Mes T2 P2 S2 Distrito Día P1 T1 S1 Almacén Ítem 2 - Camino T3 S3 T2 P2 S2 T1 P1 S1 3 - Camino T3 S3 N-Way Summaries There are as many combinations of summaries as there are combinations of levels within the dimensions in the database. N refers to the number of aggregate levels by which the data is summarized excluding the leaf level. One-Way Summary A one-way summary means that one dimension value for the fact information is at a summary level. In the example shown in the slide, the time (T) dimension is summarized to the year level, while the product (P) and store (S) dimensions are not summarized. The possible one-way summary combinations are: T2-P1-S1 T3-P1-S1 P2-T1-S1 S2-P1-T1 S3-P1-T1 Two-Way Summary A two-way summary means data is aggregated along two dimensions. In the example, the the product and store dimensions are summarized. Three-Way Summary A three-way summary has data summarized along three dimensions. T2 P2 S2 T1 P1 S1

Alternativas de diseño Dos aproximaciones fundamentales de diseño para resúmenes: Múltiples tablas hechos resumen (configuración de constelación) Una tabla hechos grande con datos hechos detallados y datos resumidos almacenados en la misma tabla. Design Alternatives Summaries can be stored in either: Multiple fact tables. Multiple fact tables use the constellation configuration. This design alternative is the most commonly used approach, because it provides for: Intuitive design, backup, and recovery strategies Ease of management (different summaries can be kept in different tablespaces) One large fact table. This design alternative was employed in some early adopter data warehouse designs but is rarely used anymore. Two reasons for this include: Recasting of history is impossible. It is difficult to manage and maintain.

Configuración de Constelación Tablas Dimensión Resumen Hecho Resumen Hecho Atómico Constellation Configuration The constellation configuration consists of a central star, including an atomic fact table where the base grain data is stored. The base grain fact table is linked to the primary dimension tables, each of which should contain keys for all hierarchical levels. Conceptually, surrounding this central star are other stars with summary fact tables and summarized dimension tables. A summary fact table is always associated with one or more summary dimension table records. A summary fact table can also be joined to one or more of the primary dimension tables. In this way, summary fact tables can “share” dimension tables with the atomic fact table.

Resumen 1-Camino: Distrito Tabla Hecho Resumen (por distrito) d Distrito (almacén) Tabla Dimensión Resumen P C S T One-way Summary: District A central atomic fact table is joined to four dimension tables: Product (P), Customer (C), Time (T) and Store (S). The product, time, and store dimensions include hierarchies. Store dimension summary table candidates include: District, Region, and Total Store. In this example, a one-way summary table stores summarized data for the district level (d). Note: With this design, region summaries could be aggregated dynamically from this summary table rather than the base fact table. The other dimensions are not summarized (join to the primary dimension tables). Tabla hecho Atómica Base Fact Table Store Dimension Table Blue Green Red White Store Key MA FL District Key North South Region Key Other Attributes Date Key Product Customer Total Sales (Total by Day, by District, by Product, by Customer) District Summary Table District Dimension Table Date Key District Key Product Customer Total Sales Other Attributes MA FL Region Key North South

Resumen 1-Camino: Región Tabla Hecho Resumen (por región) d r Región (almacén) Tabla resumen dimensión P C S T One-Way Summary: Region In this example, a one-way summary table stores summarized data for the region level (r) in the store dimension. The other dimensions are not summarized (join to the primary dimension tables). Tabla hecho Atómico Base Fact Table Store Dimension Table Blue Green Red White Store Key MA FL District Key North South Region Key Other Attributes Date Key Product Customer Total Sales (Total by Day, by Region, by Product, by Customer) Region Summary Table Region Dimension Table Date Key Region Key Product Customer Total Sales Other Attributes Region Key North South

Resumen 2-Caminos: Categoría y Distrito Categoría (producto) tabla dimensión resumen Tabla Hecho resumen (por categoría y distrito) c d Distrito (almacén) tabla dimensión resumen P C S T Two-Way Summary: Category and District In this example, a two-way summary table stores summarized data for the district level (d) in the store dimension and the category level (c) in the product dimension. With this design, region summaries could be aggregated dynamically from this summary table rather than the base fact table. The other dimensions are not summarized (join to the primary dimension tables). Tabla hecho Atómico Base Fact Table Store Dimension Table Store Key District Key Region Key Other Attributes Date Key Product Customer Total Sales Product Dimension Table Product Key Category Key (Total by Day, by District, by Category, by Customer) District/Category Summary Table Date Key District Key Category Customer Total Sales District Dimension Table Other Attributes Category Dimension Table Category Key Region Key

Entendiendo las restricciones de Resumen Tamaño restricciones Restricciones de Carga en lote Understanding Summary Constraints Before summary selection techniques are discussed, a brief examination of the general constraints associated with summary creation is helpful. Summarizing on every possible combination would achieve the best performance, but the cost in database size and maintenance is the direct trade off. Size Constraints The primary constraint associated with summaries is the increase in database size due to the addition of summary data. You should design summaries with the goal of a 50 to 100% increase in size over the base fact table. Batch Load Window Constraints Summaries must be restated when the base data is refreshed. The batch load window may limit the resources available to restate the summary tables to keep them consistent with the raw data. Strategies to manage the summary table update outside of the load window include: Using mirrored data Use of a dedicated sort/load package, such as Syncsort, outside of the DBMS

Estimando el tamaño de hecho resumen Creando resúmenes en toda combinación de niveles dentro de los siguientes esquemas: Dimensión Producto Mercado Tiempo Base 2500 450 60 Level1 50 30 20 Level2 5 Level3 1 Estimating the Summary Fact Table Size The following matrix shows the calculation of the increase in rows for a fully summarized fact table using the compression ratios for each level in each dimension. In this example, the original fact table contains 67,500 rows. Calculate the total estimated summary fact table size by multiplying this value by the first dimension compression ratio (product/level 1). The resulting value is recorded and then multiplied by the next compression ratio, and so on.

Escogiendo Resúmenes 6 atributos dimensionales con profundidades jerárquicas modestas Choosing Summaries Ultimately, the most important task in summary modeling is determining which data should be aggregated, created, and stored. In the slide example, a straightforward dimensional model of six dimensions with modest hierarchical depths results in a maximum of 1,800 summaries (5 × 2 × 4 × 3 × 3 × 5). What Dimensions? All possible summary combinations are rarely manageable and are never needed. Not all combinations address the information and analysis needs of the end user. What Levels? When dimensions are selected, there are two approaches for levels of summarization: Summarize the whole dimension so no dynamic aggregation is required. Summarize part of the dimension to realize performance gain partially yet allowing some dynamic aggregation (during business requirements analysis or from ongoing query statistics monitoring, you may recognize that users perform the bulk of their analysis only at certain hierarchical levels).

SELECCIÓN DE RESÚMENES

Guía para selección de resúmenes Orden de la clase/análisis agregado Utilización de análisis de patrones. Guidelines for Summary Selection Given the constraints associated with summaries, summary tables must be built judiciously. You should applying rigor to the analysis and selection of summary tables at the outset to optimize available resource while providing acceptable performance levels to the users. The following techniques can aid in identifying summary candidates: sort order/aggregate analysis and usage pattern analysis. Sort Order/Aggregate Analysis The Sort order/aggregate analysis technique provides you with measurable data upon which to base your initial summary strategies. It provides a solid starting point for more sophisticated summary modeling. This technique includes two inputs: First, you should attempt to understand the primary user access paths (generally while performing business requirements analysis during the business modeling phase). The primary access paths are those dimensions on which the majority of aggregate analysis is performed. Once you have identified the primary access path dimensions, a sort order/aggregate analysis can help identify which dimensions are the best candidates for summaries. This technique is examined in the following pages. Usage Pattern Analysis After the initial warehouse implementation has been rolled out, you should use information gathered by user query statistics to refine your summary strategies.

Análisis Orden de la clase/agregado Análisis orden de la clase es desarrollado para determinar el beneficio relativo de pre ordenar la tabla hecho. Análisis agregado es desarrollado para determinar el impacto de adicionar resúmenes para una tabla hechos pre ordenada. La mejor combinación de ordenamiento y resúmenes son seleccionados. Sort Order/Aggregation Analysis Sort order/aggregation analysis can be used to evaluate the impact of aggregates on data block I/O performance. The outcome of this analysis suggests which dimensions are the best candidates for summaries and which are the best candidates for a fact table sort. The three-step process includes: 1. A fact table sort order analysis is performed to determine the relative benefits of sorting the fact table when loading the data. This analysis should be performed for each dimension identified as a primary access path. 2. An aggregate analysis is performed to determine the impact of adding summaries, both on database size and performance. 3. The best combination of sorted data and summary creation is selected.

Paso 1: Fact Table orden de la clase Objetivo: Almacena los datos en un orden de clase que mezcla un acceso de camino primario. Beneficios: Provee datos localmente para consultas a través de un camino de acceso primario. Disminuye la necesidad para resumir. Fact Table Sort Order The objective of fact table sort order is: To store the data in a sort order that matches a primary access path in one of the dimensions. This fact table sort ordering technique includes the following benefits: Provides data locality so that queries along that access path find more candidate records with every block I/O Reduces the number of summaries needed to support good performance

Una tabla de hechos no ordenada Trabajo de carga de Alquileres Hollywood: Los administradores del almacén necesitan alquileres totales para sus almacenes cada día: 3000 I/Os por 3000 filas de productos. Los administradores del Producto necesitan alquileres totales para sus productos cada dia: 150 I/Os por 150 filas de almacén. Total de carga de trabajo: 3150 I/Os An Unordered Fact Table Hollywood Rentals is on schedule to roll out its data warehouse to track its primary business process of video rentals. The following information has been determined: Company wide, the organization records rentals of approximately 3,000 different videos through its POS (point of sale) system per day per store. The company operates 150 stores. The database contains the following physical characteristics: 40 bytes per record 200 records per block 8 K per block The product and store dimensions are the primary user access path for aggregate analysis, and the business analysis requirements include: Daily access to total rentals by store Daily access to total rentals by product The total workload between the two primary dimensions in the currently unordered fact table is: 3150 I/Os (3000 I/Os for 3000 product rows + 150 I/Os for 150 store rows)

Día / Almacén / Producto Ordenada por Almacén Día / Almacén / Producto Bloque 1 Día 1, Store 1, Prod 1, unidades, dólares -- -- -- -- -- fila 200 Día 1, Store 1, Prod 200, unidades, dólares Bloque 2 Día 1, Store 1, Prod 201, unidades, dólares fila 400 Día 1, Store 1, Prod 400, unidades, dólares Bloque 15 Día 1, Store 1, Prod 2801, unidades, dólares fila 3000 Día 1, Store 1, Prod 3000, unidades, dólares Bloque 2250 Día 1, Store 150, Prod 2801,unidades, dólares fila 450000 Día 1, Store 150, Prod 3000,unidades, dólares Ordered by Store Ordered Fact Table Record Layout 150 stores for each day 3,000 product records per store I/O Data Block Access Results Total rentals for a store: 3,000 records 15 block I/Os—contiguous Total rentals for a product: 150 records 150 block I/Os—non-contiguous Total I/Os for store and product = 165

Día / Producto / Almacén Ordenada por Producto Día / Producto / Almacén Bloque 1 Día 1, Prod 1, Store 1, units, dólares -- -- -- -- -- fila 150 Día 1, Prod 1, Store 150, units, dólares fila 151 Día 1, Prod 2, Store 1, units, dólares fila 200 Día 1, Prod 2, Store 50, units, dólares Bloque 2 Día 1, Prod 2, Store 51, units, dólares fila 400 Día 1, Prod 3, Store 100, units, dólares Bloque 2250 Día 1, Prod 2999, Store 101, units, dólares Día 1, Prod 2999, Store 150, units, dólares Día 1, Prod 3000, Store 1, units, dólares fila 450000 Día 1, Prod 3000, Store 150, units, dólares Ordered by Product Ordered Fact Table Record Layout 3,000 product records per store for each day 150 store records I/O Data Block Access Results Total rentals for a store: 3,000 records 2,250 block I/Os Total rentals for a product: 150 records 1 block I/O Total I/Os for store and product = 2,251

Ordenado versus No ordenado Carga trabajo total para Almacén y Producto: No ordenada = 3,150 I/Os Ordenada por almacén = 165 I/Os Ordenada por producto = 2,251 I/Os Ordered Versus Unordered

Paso 2: Considerar inclusión de Resúmenes Datos localmente pueden ser eliminar la necesidad para algunos resúmenes. Tomar dentro de la cuenta: Caminos de acceso Primario Requerimiento de tiempo de respuesta. Requerimiento de Carga Requerimiento de Construcción Desarrolla regla del pulgar: 10/20 Step 2: Consider Including Summaries Summary selection is a balancing act. When modeling summaries, you must consider the following: Data locality can eliminate the need for some summaries along the targeted access path. Take into account: What access paths are used for aggregation analysis Response time requirements The time it will take to load the aggregates How long it will take to rebuild aggregates if a dimension structure changes Performance rule of thumb: 10/20 If an aggregate takes less than 10 block I/Os to retrieve, it is not a summary candidate. If an aggregate takes between 10 and 20 block I/Os to retrieve, it may be a summary candidate. If an aggregate takes more than 20 block I/Os to retrieve, a summary is required.

Análisis de resúmenes: Desarrollar Criterio Si es ordenada por almacén: 15 I/Os para total almacén (contiguo): Resumen de almacén no necesariamente requerido. 150 I/Os para total producto: Resumen Producto requerido. Si es ordenada por producto: 1 I/O para total producto: Resumen Producto no es requerido. 2,250 I/Os para total almacén: Resumen Almacén requerido. Summary Analysis: Performance Criteria

Análisis de resúmenes Impacto de resúmenes en el tamaño y performance de la tabla hechos Ordenado por almacén con resumen producto: 3,000 total resumen filas por día. 2 I/Os requerido (1 I/O por total de ventas por producto de la tabla hecho resumen; 1 I/O para total ventas por almacén de la tabla hechos base). Ordenada por producto con el resumen de almacén: 150 total de filas resumidas por día. 2 I/Os requerido (1 I/O por total venta por producto de la tabla hecho base; 1 I/O por total venta por almacén de la tabla hecho resumen) Impact of Summaries on Fact Table Size

Paso 3: Que combinación es la mejor? Ordenada por producto con un resumen de almacén: 2 I/Os y 150 filas agregadas por día. Regla del pulgar: El camino de acceso primario dimensiona con la mas alta cardinalidad es la mejor candidata para el ordenamiento. Step 3: Which Combination Is Best? Order by Product and Create a Store Summary In the example analysis, the best selection is to order the base fact table at load time by sorting on product, then create a summary for the store dimension. This strategy results in a big payoff in performance with a modest cost in additional size: Substantial increase in block I/O performance from the ordered fact table without summaries—down to two block I/Os to retrieve the store and product totals Modest cost of only 150 aggregated rows added to the summary fact table per day Store Summary Strategy The summary fact table will contain data for total product, by day, by store. On-the-fly aggregation analysis along the store dimension will perform well for all total product queries, given that only 150 rows per day need to be aggregated to the district, region, or total store levels in the store hierarchy. Product Summary Strategy Because the company records rentals of 3,000 products a day, you may also build one or more summaries within the product hierarchies. If so, these summaries should focus on the product hierarchy levels that are most commonly used for aggregate analysis (such as product category, product status, and so on).

Resumen de Navegación Uso efectivo de tablas resumen requiere conocimiento de la tabla resumen. Métodos para navegación resumida: Motor de Base de Datos Warehouse. Productos propietarios completamente resumidos. Middleware abierto completamente resumidos. 3GL y soluciones de meta data. select total_sales... Que resumenes? Summary Navigation Having developed summary tables, you are now challenged with using them appropriately. The tool or the query mechanism must be summary table-aware. In other words, the existence of the summary tables must be known to the query. Summary (Aggregate) Navigators Summary navigators are software components that intercept the end user’s SQL and transform it so as to use the best available summary (usually the smallest available table that can answer the user’s request). The summary navigator maintains special metadata describing the current profile of summary tables stored in the data warehouse. It also should maintain statistics on queries, showing which aggregates are being used and which should be built to help slow-running queries. Methods for Summary Navigation Summary navigators can be located in: The warehouse database engine. This is the best case scenario, Because this approach makes summary redirection (or query rewrite) accessible to all applications. For example, Oracle8i contains its own built-in summary navigator. End-user query tools. In recent years, this approach has been the most common, owing primarily to the fact that few database engines incorporated their own navigator. However, this approach requires all query tools to maintain their own summary navigation facility and metadata layer. Middleware tools that facilitate the navigation to the summary tables (older systems) Your own summary navigation technique using 3GL code and metadata (older systems)

Administrando datos resumen históricos Detalle diario últimos 12 meses Datos resumidos mensualmente Cuatrimestre anualmente Managing Historical Summary Data in the Warehouse Developing a strategy to manage summary tables in the warehouse is another (and major) design consideration. Because use of the data is the most important factor in determining which summaries should be created, you may not be able to determine this strategy immediately. Summaries do not have to be consistently applied across the warehouse. For example, you may want to examine more recent data in greater detail than older data. Therefore, you might keep daily data for the last 12 months, along with summarized monthly data. Older data might be summarized to the month, quarter, or year. 1993/1994 1995 1996 1997

RESÚMENES EN EL ORACLE

Administración de resúmenes en Oracle Resúmenes son creados utilizando vistas materializadas (materialized views) y dimensiones. Summary Advisor provee consejo en la creación, retención y borrado de vistas materializadas. Ventas Cantidad Productos Summary Management in Oracle8i In Oracle8i, materialized views and dimensions can be used to implement summaries. Materialized Views A materialized view is an instantiation of a SQL statement that stores both the definition of a view plus the rows resulting from the execution of the view. Like a view, it uses a query as the basis, but the query is executed at the time the view is created and the results are stored in a table. When a query can be satisfied with data in a materialized view, the server transforms the query to reference the view rather than the base tables. The process of modifying a query to use the view is called a query rewrite. Dimensions Oracle8i provides a new schema object type, called a dimension. Dimensions provide for the creation of hierarchical relationships between columns in one or more tables for rollup purposes. If summaries are created with Oracle8i dimensions, query rewrites options are greatly enhanced. Dimensions will be examined in greater detail later in this lesson. Summary Advisor Summary Advisor is a facility that can be used by a database administrator to study the usefulness of summaries based on utilization. It compares the performance benefit to storage costs, and advises on creating, retaining, or deleting summaries. Ventas Regionales Ventas Cuatrimestrales

Creando una vista materializada resumida CREATE MATERIALIZED VIEW sales_sumry TABLESPACE sum_data STORAGE(INITIAL 200K NEXT 200K PCTINCREASE 0) PARALLEL(...) BUILD IMMEDIATE REFRESH FAST ENABLE QUERY REWRITE AS SELECT p.brand, c.city_name, t.month, SUM(s.amt) AS tot_sales . . . GROUP BY p.brand,c.city_name,t.month; Creating a Materialized View Summary Materialized views can be defined with the same storage parameters as any other table and placed in the tablespace of your choice. You can also index and partition the materialized view to improve the performance of queries executed against them. Example The example creates a summary and populates it with the result of the query. The performance and storage costs of maintaining the materialized view have to be compared to the costs of reexecuting the original query whenever it is needed.

Consultando reescritura en Oracle SELECT p.brand, c.city_name, t.month, SUM(s.amt) FROM sales s, city c, timetab t, product p WHERE s.city_code = c.city_code AND s.state_code = c.state_code AND s.sdate = t.sdate AND s.prod_code = p.prod_code GROUP BY p.brand, c.city_name, t.month HAVING SUM(s.amt) > 5000000; SELECT brand, city_name, month, tot_sales FROM sales_sumry WHERE tot_sales > 5000000; Query Rewrite in Oracle8i Accessing a materialized view may be significantly faster than accessing the underlying base tables, so the cost-based optimizer will rewrite a query to access the view when the query allows it. Query rewrite is the key benefit enabled by materialized views. The query rewrite activity is transparent to applications. In this respect, its use is similar to the use of an index. Users do not need explicit privileges on materialized views to use them. Queries executed by any user with privileges on the underlying tables may be rewritten to access the materialized view. A materialized view can be enabled or disabled. A materialized view that is enabled is available for query rewrites. Example In the slide example, the optimizer is able to perform a query rewrite and use the summary created earlier to satisfy the query instead of the base SALES table. If the SALES table consists of several million rows and the materialized view contains a few thousand rows, the query will execute much faster.

Refrescando las vistas materializadas Vistas materializadas necesitan ser actualizadas para reflejar modificaciones para datos de la tabla base utilizando uno de los tipos contemplados: Complete Fast Forcé Never Materialized View Refresh Refresh is the operation that is used to synchronize the contents of a materialized view with the data in the base tables. Depending on the activity on the base tables and the accuracy of the information required, refreshing of the materialized view may be done more or less often. Refresh options include: Complete A complete refresh of a materialized view involves truncating existing data and reinserting all the data based on the detail tables by reexecuting the query definition from the CREATE command. Fast Fast refreshes apply only the changes made since the last refresh. Two types are available: Fast refresh using materialized view logs: In this case, all changes to the base tables are captured in a log and then applied to the materialized view. Fast refresh using ROWID range: A materialized view can be refreshed using fast refresh after direct path loads, based on the ROWIDs of the new rows (direct loader logs are required). Force A view defined with a refresh type of force will refresh with the fast mechanism if one is possible, or else will use a complete refresh. Force is the default refresh type. Never The Never option suppresses all refreshes of the materialized view.

Dimensión de Oracle Estructura de Diccionario de Datos que definimos jerarquías basadas en las columnas existentes. Dimensiones son opcional, pero altamente recomendadas, porque ellos: Facilita reescribir consultas adicionales sin el uso de restricciones. Ayuda documentación de Jerarquías. Puede ser usado por herramientas OLAP de procesamiento analítico en línea (OLAP). Oracle8i Dimensions Dimensions in Oracle8i are purely metadata. They are data dictionary structures that define hierarchies based on columns in existing database tables. Although they are optional, they are highly recommended, for the following reasons: They enable additional rewrite possibilities without the use of constraints. Implementation of constraints may not be desirable in a data warehouse for performance reasons. They help document dimensions and hierarchies explicitly. They can be used by online analytical processing (OLAP) tools.

Dimensiones y Jerarquías en Oracle Todos Año _Key Nivel llaves Jerarquía Calendario Cuatrimestre_Key Atributo Mes_Key Desc_Mes Dimensions Oracle8i dimensions describe analytic business entities such as products, departments, and time in a hierarchical, categorized manner. An Oracle8i dimension can consist of one or more hierarchies. In the example shown, the time dimension consists of a calendar hierarchy. Hierarchies A hierarchy consists of multiple levels. Each value at a lower level in the hierarchy is the child of one and only one parent at the next higher level. A hierarchy consists of a 1:n relationship between levels, with the parent level representing a level of aggregation of the child level. In the example, the calendar hierarchy consists of sales date, month, quarter, and year. The arrows indicate the direction of traversing a hierarchy to roll up data at one level to get aggregate information at the next level. For example, rolling up daily data yields monthly data, rolling up monthly data yields quarterly data, and so on. Level Keys and Attributes A level key is used to identify one level in a hierarchy. The use of surrogate keys to identify hierarchical elements during the dimensional design phase further leverages the performance advantage provided by level keys. There may be additional attributes for a level, which can be determined given the level key. Attributes can be used as aliases for a level. In the example, MONTH_KEY (defined as two digits) is the level key that identifies a month, and MONTH_DESC is an attribute that can be used as an alias for a month. Fecha _ ventas

Ejemplo Dimensión Table TIME - YEAR_KEY - QUARTER_KEY - MONTH_KEY - MONTH_DESC - SALES_DATE Dimension TIME_DIM - YR - QTR - MON, MONTH_DESC - SDATE Dimension Example Dimensions, and the hierarchical relationships established between dimensions, can be based on columns in a single table (or columns from several tables in the case of normalized or “snowflake” schemas). In the example, the dimension TIME_DIM is based on the table TIME and has four levels: The highest level in the hierarchy consists of the column YEAR_KEY. The next level is derived from the column QUARTER_KEY. The third level has the MONTH_KEY column as the key and MONTH_DESC as an attribute. The lowest level is based on the column SALES_DATE.

Definiendo Dimensiones y Jerarquías Año CREATE DIMENSION time_dim LEVEL sdate IS time.sales_date LEVEL mon IS time.month_key LEVEL qtr IS time.quarter_key LEVEL yr IS time.year_key HIERARCHY calendar_rollup ( sdate CHILD OF mon CHILD OF qtr CHILD OF yr ) ATTRIBUTE mon DETERMINES month_desc; Cuatrim. Mes Defining Dimensions and Hierarchies A new system privilege, CREATE DIMENSION, is required to create a dimension in one’s own schema based on tables that are within the same schema. Another new privilege, CREATE ANY DIMENSION, allows a user to create dimensions in any schema. In the example shown, the TIME_DIM dimension is based on the table TIME. Fecha Ventas

Dimensiones con múltiples Jerarquías YR = YR Jerarquía SEMANA Jerarquía CALENDARIO QTR WK MON = DT Dimensions with Multiple Hierarchies The previous example showed a single hierarchy within the time dimension, but it is possible to have multiple hierarchies. For example, the pair of hierarchies shown above can be created within a single dimension. The statement to do this is as follows: DT CREATE DIMENSION time_dim LEVEL dt IS time.sales_date LEVEL wk IS time.week_key LEVEL mon IS time.month_key LEVEL qtr IS time.quarter_key LEVEL yr IS time.year_key HIERARCHY cal ( dt CHILD OF mon CHILD OF qtr CHILD OF yr) HIERARCHY week ( wk child of yr);

Rescribe usando Dimensiones en Oracle La siguiente reescritura utiliza un rollup a lo largo de la dimensión TIME_DIM: SELECT t.year, p.brand , c.city_name, SUM(s.amt) FROM sales s, city c, time t, product p WHERE s.sales_date = t.sdate AND s.city_name = c.city_name AND s.state_code = c.state_code AND s.prod_code = p.prod_code GROUP BY t.year, p.brand, c.city_name; SELECT v.year, s.brand, s.city_name, SUM(s.tot_sales) FROM sales_sumry s, (SELECT distinct t.month, t.year FROM time t) v WHERE s.month = v.month GROUP BY v.year, s.brand, s.city_name; Rewrites Using Dimensions in Oracle8i The example shows a rewrite that is enabled by the TIME_DIM dimension. The relationship between month and year is inferred from the definition of the dimension and is used to roll up the sales summary data to obtain yearly sales.

Summary Advisor en Oracle Oracle Trace Carga trabajo Opcional Diccionario de Datos Summary Advisor (DBMS_OLAP package) Utilización de sumario Requerimiento de Espacio Summary Advisor in Oracle8i The DBMS_OLAP package contains several procedures and functions to manage summaries in a data warehouse. The summary advisory functions within the package use two major sources of information: Workload statistics: These can be collected either using Oracle Trace in the Enterprise Manager tuning pack or through third-party tools. A new event, MATERIALIZED VIEW USAGE, is available with Oracle Trace to collect this information. Data dictionary: The data dictionary information used by the advisory functions includes summary and dimension data. Information that can be obtained from the Summary Advisor includes: Summary usage: The number of times a rewrite was made to use a summary, the space used by a summary, a cost-benefit ratio for each summary, and so on Summary recommendations: Creation, retention, and dropping of summaries Space requirements based on queries for possible summaries Recomendaciones Resumen

Resumen En esta lección, ud debería haber aprendido como: Explicar porque los resúmenes son usados en el warehouse y listar los beneficios de tablas resumidas. Discutir configuraciones de tabla resumen. Describir guías para seleccionar dimensiones y niveles de resumen. Identificar las restricciones para administrar las tablas resumen. Discutir las técnicas de administración de resúmenes en Oracle.

Práctica Esta practica cubre los siguientes tópicos: Estimando el tamaño de la tabla hechos resumen si los resúmenes son creados por producto y tiempo. Desarrollar una estrategia de tabla resumen para soportar los requerimientos del negocio para el usuario. Practices 6-1 Overview See Appendix A, “Practice Solutions,” for solutions to this practice.

PREGUNTAS