The hierarchical agglomerative clustering with Gower index: A methodology for automatic design of OLAP cube in ecological data processing context

Affiliation auteurs!!!! Error affiliation !!!!
TitreThe hierarchical agglomerative clustering with Gower index: A methodology for automatic design of OLAP cube in ecological data processing context
Type de publicationJournal Article
Year of Publication2015
AuteursSautot L, Faivre B, Journaux L, Molin P
JournalECOLOGICAL INFORMATICS
Volume26
Pagination217-230
Date PublishedMAR
Type of ArticleArticle
ISSN1574-9541
Mots-clésAutomatic design, Bird population, Hierarchical agglomerative clustering, OLAP
Résumé

The OLAP systems can be an improvement for ecological studies. In fact, ecology studies, follows and analyzes phenomenon across space and time and according to several parameters. OLAP systems can provide to ecologists browsing in a large dataset One focus of the current research on OIAP system is the automatic design of OLAP cubes and of data warehouse schemas. This kind of works makes accessible OLAP technology to non information technology experts. But to be efficient, the automatic OLAP building must take into account various cases. Moreover the OLAP technology is based on the concept of hierarchy. Thereby the hierarchical clustering methods are often used by OLAP system designer. In this article, we propose using hierarchical agglomerative clustering with a metric that comes from ecological studies (the Gower similarity index) to build automatically hierarchical dimensions in an OLAP cube. With this similarity index we can perform a hierarchical clustering on heterogeneous datasets that contains qualitative and quantitative variables. We offer a prototypical automatic system which builds dimension for an OLAP cube and we measure the performances of this system according to the number of clustered individuals and according to the number of variables used for clustering. Thanks to these measures we can offer an approximation of performances with a large dataset `thereby the Gower index in a hierarchical agglomerative clustering permits the management of heterogeneous datatet with missing values in a context of automatic building of OLAP cube. With this methodology, we can build new dimensions based on hierarchies in the data, which are not evident. The data mining methods can complete the expert knowledge during the design of an OLAP cube, because these methods can explain the inherent structure of the data. (C) 2014 Elsevier B.V. All rights reserved.

DOI10.1016/j.ecoinf.2014.07.011