8 research outputs found

    Análise exploratória de hierarquias em base de dados multidimensionais

    Get PDF
    Cada vez mais, as empresas e organizações utilizam bases de dados multidimensionais e ferramentas OLAP como forma de organizar informação proveniente de sistemas transacionais com o objetivo de aceder e analisar dados com elevada flexibilidade e desempenho. No modelo de dados OLAP, a informação é concetualmente organizada em cubos. Cada dimensão do cubo tem uma hierarquia associada, o que possibilita analisar os dados em diferentes níveis de agregação. Apresenta-se uma metodologia que explora os diferentes níveis de agregação das hierarquias para uma análise exploratória assim como previsões para diferentes horizontes temporais. Esta metodologia mostrou-se muito eficiente, apresentando melhores resultados em comparação com as técnicas usuais de previsão. Os resultados das previsões realizadas são promissores e coerentes com a respetiva análise exploratória.Increasingly, companies and organizations use multidimensional databases and OLAP tools to structure and organize information from transactional systems, with the objective of accessing and analyzing data with high level of flexibility and performance. In the OLAP models, data is conceptually organized into cubes. Each cube’s dimension has an associated hierarchy, which allows for data analysis at different levels of aggregation. A methodology is presented, which explores the different levels of aggregation of the hierarchies for an exploratory analysis as well as the forecasts for different time horizons. This methodology proved to be very efficient, with better results than those obtained from the usual techniques of forecasting. The forecasting results are promising and in line with the respective exploratory analysis

    Multidimensional Prediction Models When the Resolution Context Changes

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-23525-7_31Multidimensional data is systematically analysed at multiple granularities by applying aggregate and disaggregate operators (e.g., by the use of OLAP tools). For instance, in a supermarket we may want to predict sales of tomatoes for next week, but we may also be interested in predicting sales for all vegetables (higher up in the product hierarchy) for next Friday (lower down in the time dimension). While the domain and data are the same, the operating context is different. We explore several approaches for multidimensional data when predictions have to be made at different levels (or contexts) of aggregation. One method relies on the same resolution, another approach aggregates predictions bottom-up, a third approach disaggregates predictions top-down and a final technique corrects predictions using the relation between levels. We show how these strategies behave when the resolution context changes, using several machine learning techniques in four application domains.This work was supported by the Spanish MINECO under grants TIN 2010-21062-C02-02 and TIN 2013-45732-C4-1-P, and the REFRAME project, granted by the European Coordinated Research on Longterm Challenges in Information and Communication Sciences Technologies ERA-Net (CHIST-ERA), and funded by MINECO in Spain (PCIN-2013-037) and by Generalitat Valenciana PROMETEOII2015/013.Martínez Usó, A.; Hernández Orallo, J. (2015). Multidimensional Prediction Models When the Resolution Context Changes. En Machine Learning and Knowledge Discovery in Databases. Springer. 509-524. https://doi.org/10.1007/978-3-319-23525-7_31S509524Agrawal, R., Gupta, A., Sarawagi, S.: Modeling multidimensional databases. In: Proceedings of the Thirteenth International Conference on Data Engineering, ICDE 1997, pp. 232–243. IEEE Computer Society (1997)Bella, A., Ferri, C., Hernández-Orallo, J., Ramírez-Quintana, M.: Quantification via probability estimators. In: IEEE ICDM, pp. 737–742 (2010)Bella, A., Ferri, C., Hernández-Orallo, J., Ramírez-Quintana, M.J.: Aggregative quantification for regression. DMKD 28(2), 475–518 (2014)Bickel, R.: Multilevel analysis for applied research: It’s just regression! Guilford Press (2012)Cabibbo, L., Torlone, R.: A logical approach to multidimensional databases. In: Schek, H.-J., Saltor, F., Ramos, I., Alonso, G. (eds.) EDBT 1998. LNCS, vol. 1377, p. 183. Springer, Heidelberg (1998)Chaudhuri, S., Dayal, U.: An overview of data warehousing and OLAP technology. ACM Sigmod Record 26(1), 65–74 (1997)Chen, B.C.: Cube-Space Data Mining. ProQuest (2008)Chen, B.C., Chen, L., Lin, Y., Ramakrishnan, R.: Prediction cubes. In: Proc. of the 31st Intl. Conf. on Very Large Data Bases, pp. 982–993 (2005)Datahub: Car fuel consumptions and emissions 2000–2013 (2013). http://datahub.io/dataset/car-fuel-consumptions-and-emissionsDhurandhar, A.: Using coarse information for real valued prediction. Data Mining and Knowledge Discovery 27(2), 167–192 (2013)Forman, G.: Quantifying counts and costs via classification. Data Min. Knowl. Discov. 17(2), 164–206 (2008)Goldstein, H.: Multilevel Statistical Models, vol. 922. John Wiley & Sons (2011)Golfarelli, M., Maio, D., Rizzi, S.: The dimensional fact model: a conceptual model for data warehouses. Intl. J. of Coop. Information Systems 7, 215–247 (1998)Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: An update. SIGKDD Explor. 11(1), 10–18 (2009)Hernández-Orallo, J.: Probabilistic reframing for cost-sensitive regression. ACM Transactions on Knowledge Discovery from Data 8(3) (2014)IBM Corporation: Introduction to Aroma and SQL (2006). http://www.ibm.com/developerworks/data/tutorials/dm0607cao/dm0607cao.htmlKamber, M., Jenny, J.H., Chiang, Y., Han, J., Chiang, J.Y.: Metarule-guided mining of multi-dimensional association rules using data cubes. In: KDD, pp. 207–210 (1997)Lin, T., Yao, Y., Zadeh, L.: Data Mining, Rough Sets and Granular Computing. Studies in Fuzziness and Soft Computing. Physica-Verlag HD (2002)Páircéir, R., McClean, S., Scotney, B.: Discovery of multi-level rules and exceptions from a distributed database. In: Proc. of the 6th ACM SIGKDD Intl. Conf. on Knowledge discovery and data mining, pp. 523–532. ACM (2000)Pastor, O., Casamayor, J.C., Celma, M., Mota, L., Pastor, M.A., Levin, A.M.: Conceptual Modeling of Human Genome: Integration Challenges. In: Düsterhöft, A., Klettke, M., Schewe, K.-D. (eds.) Conceptual Modelling and Its Theoretical Foundations. LNCS, vol. 7260, pp. 231–250. Springer, Heidelberg (2012)Perlich, C., Provost, F.: Distribution-based aggregation for relational learning with identifier attributes. Machine Learning 62(1–2), 65–105 (2006)Team, R., et al.: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2012)Ramakrishnan, R., Chen, B.C.: Exploratory mining in cube space. Data Mining and Knowledge Discovery 15(1), 29–54 (2007)Raudenbush, S.W., Bryk, A.S.: Hierarchical linear models: applications and data analysis methods, vol. 1. Sage (2002)UCI Repository: UJIIndoorLoc data set (2014). http://archive.ics.uci.edu/ml/datasets/UJIIndoorLocVassiliadis, P.: Modeling multidimensional databases, cubes and cube operations. In: Proc. of the 10th SSDBM Conference, pp. 53–62 (1998

    Data science for buildings, a multi-scale approach bridging occupants to smart-city energy planning

    Get PDF

    Data science for buildings, a multi-scale approach bridging occupants to smart-city energy planning

    Get PDF
    In a context of global carbon emission reduction goals, buildings have been identified to detain valuable energy-saving abilities. With the exponential increase of smart, connected building automation systems, massive amounts of data are now accessible for analysis. These coupled with powerful data science methods and machine learning algorithms present a unique opportunity to identify untapped energy-saving potentials from field information, and effectively turn buildings into active assets of the built energy infrastructure.However, the diversity of building occupants, infrastructures, and the disparities in collected information has produced disjointed scales of analytics that make it tedious for approaches to scale and generalize over the building stock.This coupled with the lack of standards in the sector has hindered the broader adoption of data science practices in the field, and engendered the following questioning:How can data science facilitate the scaling of approaches and bridge disconnected spatiotemporal scales of the built environment to deliver enhanced energy-saving strategies?This thesis focuses on addressing this interrogation by investigating data-driven, scalable, interpretable, and multi-scale approaches across varying types of analytical classes. The work particularly explores descriptive, predictive, and prescriptive analytics to connect occupants, buildings, and urban energy planning together for improved energy performances.First, a novel multi-dimensional data-mining framework is developed, producing distinct dimensional outlines supporting systematic methodological approaches and refined knowledge discovery. Second, an automated building heat dynamics identification method is put forward, supporting large-scale thermal performance examination of buildings in a non-intrusive manner. The method produced 64\% of good quality model fits, against 14\% close, and 22\% poor ones out of 225 Dutch residential buildings. %, which were open-sourced in the interest of developing benchmarks. Third, a pioneering hierarchical forecasting method was designed, bridging individual and aggregated building load predictions in a coherent, data-efficient fashion. The approach was evaluated over hierarchies of 37, 140, and 383 nodal elements and showcased improved accuracy and coherency performances against disjointed prediction systems.Finally, building occupants and urban energy planning strategies are investigated under the prism of uncertainty. In a neighborhood of 41 Dutch residential buildings, occupants were determined to significantly impact optimal energy community designs in the context of weather and economic uncertainties.Overall, the thesis demonstrated the added value of multi-scale approaches in all analytical classes while fostering best data-science practices in the sector from benchmarks and open-source implementations

    Fouille de données : vers une nouvelle approche intégrant de façon cohérente et transparente la composante spatiale

    Get PDF
    Depuis quelques décennies, on assiste à une présence de plus en plus accrue de l’information géo-spatiale au sein des organisations. Cela a eu pour conséquence un stockage massif d’informations de ce type. Ce phénomène, combiné au potentiel d’informations que renferment ces données, on fait naître le besoin d’en apprendre davantage sur elles, de les utiliser à des fins d’extraction de connaissances qui puissent servir de support au processus de décision de l’entreprise. Pour cela, plusieurs approches ont été envisagées dont premièrement la mise à contribution des outils de fouille de données « traditionnelle ». Mais face à la particularité de l’information géo-spatiale, cette approche s’est soldée par un échec. De cela, est apparue la nécessité d’ériger le processus d’extraction de connaissances à partir de données géographiques en un domaine à part entière : le Geographic Knowlegde Discovery (GKD). La réponse à cette problématique, par le GKD, s’est traduite par la mise en œuvre d’approches qu’on peut catégoriser en deux grandes catégories: les approches dites de prétraitement et celles de traitement dynamique de l’information spatiale. Pour faire face aux limites de ces méthodes et outils nous proposons une nouvelle approche intégrée qui exploite l’existant en matière de fouille de données « traditionnelle ». Cette approche, à cheval entre les deux précédentes vise comme objectif principal, le support du type géo-spatial à toutes les étapes du processus de fouille de données. Pour cela, cette approche s’attachera à exploiter les relations usuelles que les entités géo-spatiales entretiennent entre elles. Un cadre viendra par la suite décrire comment cette approche supporte la composante spatiale en mettant à contribution des bibliothèques de traitement de la donnée géo-spatiale et les outils de fouille « traditionnelle »In recent decades, geospatial data has been more and more present within our organization. This has resulted in massive storage of such information and this, combined with the learning potential of such information, gives birth to the need to learn from these data, to extract knowledge that can be useful in supporting decision-making process. For this purpose, several approaches have been proposed. Among this, the first has been to deal with existing data mining tools in order to extract any knowledge of such data. But due to a specificity of geospatial information, this approach failed. From this arose the need to erect the process of extracting knowledge from geospatial data in its own right; this lead to Geographic Knowledge Discovery. The answer to this problem, by GKD, is reflected in the implementation of approaches that can be categorized into two: the so-called pre-processing approaches and the dynamic treatment of spatial relationships. Given the limitations of these approaches we propose a new approach that exploits the existing data mining tools. This approach can be seen as a compromise of the two previous. It main objective is to support geospatial data type during all steps of data mining process. To do this, the proposed approach will exploit the usual relationships that geo-spatial entities share each other. A framework will then describe how this approach supports the spatial component involving geo-spatial libraries and "traditional" data mining tool
    corecore