摘要:DataCubesDEFINITION:Adatacubeisatypeofmultidimensionalmatrixthatletsusersexploreandanalyzeacollectionofdatafrommanydifferentperspectives,usuallyconsideringthreefactors(dimensions)atatime.Whenwetrytoextractinformationfromastackofdata,weneedtoolstoh
Data Cubes
DEFINITION: A data cube is a type of multidimensional matrix that lets users explore and analyze a collection of data from many different perspectives, usually considering three factors (dimensions) at a time.
When we try to extract information from a stack of data, we need tools to help us find what's relevant and what's important and to explore different scenarios. A report, whether printed on paper or viewed on-screen, is at best a two-dimensional representation of data, a table using columns and rows. That's sufficient when we have only two factors to consider, but in the real world we need more powerful tools.
Data cubes are multidimensional extensions of 2-D tables, just as in geometry a cube is a three-dimensional extension of a square. The word cube brings to mind a 3-D object, and we can think of a 3-D data cube as being a set of similarly structured 2-D tables stacked on top of one another.
But data cubes aren't restricted to just three dimensions. Most online analytical processing (OLAP) systems can build data cubes with many more dimensions—Microsoft SQL Server 2000 Analysis Services, for example, allows up to 64 dimensions. We can think of a 4-D data cube as consisting of a series of 3-D cubes, though visualizing such higher-dimensional entities in spatial or geometric terms can be a problem.
In practice, therefore, we often construct data cubes with many dimensions, but we tend to look at just three at a time. What makes data cubes so valuable is that we can index the cube on one or more of its dimensions.
Relational or Multidimensional?
Since data cubes are such a useful interpretation tool, most OLAP products are built around a structure in which the cube is modeled as a multidimensional array. These multidimensional OLAP, or MOLAP, products typically run faster than other approaches, primarily because it's possible to index directly into the data cube's structure to collect subsets of data.
However, for very large data sets with many dimensions, MOLAP solutions aren't always so effective. As the number of dimensions increases, the cube becomes sparser—that is, many cells representing specific attribute combinations are empty, containing no aggregated data. As with other types of sparse databases, this tends to increase storage requirements, sometimes to unacceptable levels. Compression techniques can help, but using them tends to destroy MOLAP's natural indexing. ?
Data cubes can be built in other ways. Relational OLAP uses the relational database model. The ROLAP data cube is implemented as a collection of relational tables (up to twice as many as the number of dimensions) instead of as a multidimensional array. Each of these tables, called a cuboid, represents a particular view.
Because the cuboids are conventional database tables, we can process and query them using traditional RDBMS techniques, such as indexes and joins. This format is likely to be efficient for large data collections, since the tables must include only data cube cells that actually contain data.
However, ROLAP cubes lack the built-in indexing of a MOLAP implementation. Instead, each record in a given table must contain all attribute values in addition to any aggregated or summary values. This extra overhead may offset some of the space savings, and the absence of an implicit index means that we must provide one explicitly.
From a structural perspective, data cubes are made up of two elements: dimensions and measures. Dimensions are already explained; measures are simply the actual data values.
It's important to keep in mind that the data in a data cube has already been processed and aggregated into cube form. Thus we normally don't perform calculations within a data cube. This also means that we're not looking at real-time, dynamic data in a data cube.
The data contained within a cube has already been summarized to show figures such as unit sales, store sales, regional sales, net sale profits and average time for order fulfillment. With this data, an analyst can efficiently analyze any or all of those figures for any or all products, customers, sales agents and more. Thus data cubes can be extremely helpful in establishing trends and analyzing performance. In contrast, tables are best suited to reporting standardized operational scenarios.
時文選讀
數據立方體
定義:數據立方體是一類多維矩陣,讓用戶從多個角度探索和分析數據集,通常是一次同時考慮三個因素(維度)。
當我們試圖從一堆數據中提取信息時,我們需要工具來幫助我們找到那些有關聯的和重要的信息,以及探討不同的情景。一份報告,不管是印在紙上的還是出現在屏幕上,都是數據的二維表示,是行和列構成的表格。在我們只有兩個因素要考慮時,這就足矣,但在真實世界中我們需要更強的工具。
數據立方體是二維表格的多維擴展,如同幾何學中立方體是正方形的三維擴展一樣。 “立方體”這個詞讓我們想起三維的物體,我們也可以把三維的數據立方體看作是一組類似的互相疊加起來的二維表格。
但是數據立方體不局限于三個維度。大多數在線分析處理( OLAP)系統能用很多個維度構建數據立方體,例如,微軟的SQL Server 2000 Analysis Services工具允許維度數高達64個(雖然在空間或幾何范疇想像更高維度的實體還是個問題)。
在實際中,我們常常用很多個維度來構建數據立方體,但我們傾向于一次只看三個維度。數據立方體之所以有價值,是因為我們能在一個或多個維度上給立方體做索引。
關系的還是多維的?
由于數據立方體是一個非常有用的解釋工具,所以大多數 OLAP產品都圍繞著按多維陣列建立立方模型這樣一個結構編制。這些多維的OLAP產品,即MOLAP產品,運行速度通常比其他方法更快,這是因為能直接把索引做進數據立方的結構,方便收集數據子集。
然而,對于非常大的多維數據集, MOLAP方案并不總是有效的。隨著維度數目的增加,立方體變得更稀疏,即表示某些屬性組合的多個單元是空的,沒有集合的數據。相對于其他類型的稀疏數據庫,數據立方體往往會增加存儲需求,有時會達到不能接受的程度。壓縮技術能有些幫助,但利用這些技術往往會破壞MOLAP的自然索引。
數據立方體還可以用其他的方法構建。關系 OLAP就利用了關系數據庫模型。ROLAP數據立方體是按關系表格的集合實現的(最多可達維度數目的兩倍),來代替多維陣列。其中的表格叫做立方單元,代表特定的視圖。
由于立方單元是一個常規的數據庫表格,所以我們能用傳統的 RDBMS技術(如索引和連接)來處理和查詢它們。這種形式對大量的數據集合可能是有效的,因為這些表格必須只能包含實際有數據的數據立方單元。
但是 ROLAP缺少了用MOLAP實現時所具有的內在索引功能。相反,給定表格中的每個記錄必須包括所有的屬性值而任何集合的或摘要的數據。這種額外的開銷可能會抵消掉一些節省出來的空間,而隱性索引的缺少意味著我們必須提供顯性的索引。
從結構角度看,數據立方體由兩個單元構成:維度和測度。維度已經解釋過了,測度就是實際的數據值。
記住這點是很重要的:數據立方體中的數據是已經過處理并聚合成立方形式。因此,通常不需要在數據立方體中進行計算。這也意味著我們看到數據立方體中的數據并不是實時的、動態的數據。
立方體中的數據已經過摘要,表示諸如計件銷售、店面銷售、區域銷售、銷售純利和完成訂單的平均時間等數據。有了這些數據,分析師能針對一個或全部產品、客戶、銷售代理等,就這些數字中的一個或全部進行分析。這樣,在預測趨勢和分析業績時,數據立方體就非常有用,而表格最適合報告標準化的運作情況。
2026年軟考真題答案掃碼查看,估分一步到位??????

軟考備考資料免費領取
去領取
專注在線職業教育25年