department of informatics

Application of Fuzzy Classification to a Data Warehouse in E-Health

Details of Authors
Michael Kaufmann
Thesis Type: 
Submission Date: 
1. April 2006

The ability to analyze large amounts of data for the extraction of valuable information presents a competitive advantage for any organization. The technologies of data warehousing, OLAP, and data classification support that ability. The data warehouse is a central data pool which integrates hetero- geneous data sources and provides strategic information for analysis and decision support. OnLine Analytical Processing (OLAP) presents an approach to data analysis where data is consolidated and aggregated with respect to multiple dimensions of interest. The idea is to consolidate large amounts of data by summarizing and aggregating data elements for every cell of a data cube. Classification of data elements reduces an arbitrarily high number of data elements into an arbitrarily small set of classes, which highly reduces the granularity of data. In OLAP, classification is used for the consolidation of dimensional attributes. Fuzzy classification is achieved by modeling data element classes with fuzzy sets. Thus, data elements are assigned to one or many classes to a certain degree.
This thesis researches the application of fuzzy classification to OLAP data analysis. Specifically, the approach is the fuzzification of dimension classification in OLAP cubes. A framework has been developed, which allows the description of data cubes with fuzzy dimension hierarchies. This frame- work has been implemented in a prototype computer program capable of calculating and manipulating OLAP cubes with fuzzy dimension classification. This prototype allows the definition of fuzzy contexts on dimensional attributes, as well as the definition of OLAP cubes on base relations. The levels of consolidation for every dimension can be drilled down, rolled up, sliced and diced. Data consolidation can be achieved by crisp or fuzzy classification.

Keywords: Data warehouse, fuzzy classification, e-Health, OLAP, multidimensional data analysis, fuzzy data consolidation, fuzzy contexts, fuzzy data aggregation.