46 research outputs found

    SOLAM: A Novel Approach of Spatial Aggregation in SOLAP Systems

    Get PDF
    In the context of a data driven approach aimed to detect the real and responsible factors of the transmission of diseases and explaining its emergence or re-emergence, we suggest SOLAM (Spatial on Line Analytical Mining) system, an extension of Spatial On Line Analytical Processing (SOLAP) with Spatial Data Mining (SDM) techniques. Our approach consists of integrating EPISOLAP system, tailored for epidemiological surveillance, with spatial generalization method allowing the predictive evaluation of health risk in the presence of hazards and awareness of the vulnerability of the exposed population. The proposed architecture is a single integrated decision-making platform of knowledge discovery from spatial databases. Spatial generalization methods allow exploring the data at different semantic and spatial scales while reducing the unnecessary dimensions. The principle of the method is selecting and deleting attributes of low importance in data characterization, thus produces zones of homogeneous characteristics that will be merged

    Materialização à medida de vistas multidimensionais de dados

    Get PDF
    Dissertação de mestrado em Engenharia de InformáticaCom o emergir da era da informação foram muitas as empresas que recorreram a data warehouses para armazenar a crescente quantidade de dados que dispõem sobre os seus negócios. Com essa evolução dos volumes de dados surge também a necessidade da sua melhor exploração para que sejam úteis de alguma forma nas avaliações e decisões sobre o negócio. Os sistemas de processamento analítico (ou OLAP – On-Line Analytical Processing) vêm dar resposta a essas necessidades de auxiliar o analista de negócio na exploração e avaliação dos dados, dotando-o de autonomia de exploração, disponibilizando-lhe uma estrutura multiperspetiva e de rápida resposta. Contudo para que o acesso a essa informação seja rápido existe a necessidade de fazer a materialização de estruturas multidimensionais com esses dados já pré-calculados, reduzindo o tempo de interrogação ao tempo de leitura da resposta e evitando o tempo de processamento de cada query. A materialização completa dos dados necessários torna-se na prática impraticável dada a volumetria de dados a que os sistemas estão sujeitos e ao tempo de processamento necessário para calcular todas as combinações possíveis. Dado que o analista do negócio é o elemento diferenciador na utilização efetiva das estruturas, ou pelo menos aquele que seleciona os dados que são consultados nessas estruturas, este trabalho propõe um conjunto de técnicas que estudam o comportamento do utilizador, de forma a perceber o seu comportamento sazonal e as vistas alvo das suas explorações, para que seja possível fazer a definição de novas estruturas contendo as vistas mais apropriadas à materialização e assim melhor satisfaçam as necessidades de exploração dos seus utilizadores. Nesta dissertação são definidas estruturas que acolhem os registos de consultas dos utilizadores e com esses dados são aplicadas técnicas de identificação de perfis de utilização e padrões de utilização, nomeadamente a definição de sessões OLAP, a aplicação de cadeias de Markov e a determinação de classes de equivalência de atributos consultados. No final deste estudo propomos a definição de uma assinatura OLAP capaz de definir o comportamento OLAP do utilizador com os elementos identificados nas técnicas estudadas e, assim, possibilitar ao administrador de sistema uma definição de reestruturação das estruturas multidimensionais “à medida” da utilização feita pelos analistas.With the emergence of the information era many companies resorted to data warehouses to store an increasing amount of their business data. With this evolution of data volume the need to better explore this data arises in order to be somewhat useful in evaluating and making business decisions. OLAP (On-Line Analytical Processing) systems respond to the need of helping the business analyst in exploring the data by giving him the autonomy of exploration, providing him with a multi-perspective and quick answer structure. However, in order to provide quick access to this information the materialization of multi-dimensional structures with this data already calculated is required, reducing the query time to the answer reading time and avoiding the processing time of each query. The complete materialization of the required data is practically impossible due to the volume of data that the systems are subjected to and due to the processing time needed to calculate all combinations possible. Since the business analyst is the differentiating element in the effective use of these structures, this work proposes a set of techniques that study the user‟s behaviour in order to understand his seasonal behaviour and the target views of his explorations, so that it becomes possible to define new structures containing the most appropriate views for materialization and in this way better satisfying the exploration needs of its users. In this dissertation, structures that collect the query records of the users will be defined and with this data techniques of identification of user profiles and utilization patterns are applied, namely the definition of OLAP sessions, the application of Markov chains and the determination of equivalence classes of queried attributes. In the end of this study, the definition of an OLAP signature capable of defining the OLAP behaviour of the user with the elements identified in the studied techniques will be proposed and this way allowing the system administrator a definition for restructuring of the multi-dimensional structures in “size” with the use done by the analysts

    Multidimensional process discovery

    Get PDF

    A Business Intelligence Solution, based on a Big Data Architecture, for processing and analyzing the World Bank data

    Get PDF
    The rapid growth in data volume and complexity has needed the adoption of advanced technologies to extract valuable insights for decision-making. This project aims to address this need by developing a comprehensive framework that combines Big Data processing, analytics, and visualization techniques to enable effective analysis of World Bank data. The problem addressed in this study is the need for a scalable and efficient Business Intelligence solution that can handle the vast amounts of data generated by the World Bank. Therefore, a Big Data architecture is implemented on a real use case for the International Bank of Reconstruction and Development. The findings of this project demonstrate the effectiveness of the proposed solution. Through the integration of Apache Spark and Apache Hive, data is processed using Extract, Transform and Load techniques, allowing for efficient data preparation. The use of Apache Kylin enables the construction of a multidimensional model, facilitating fast and interactive queries on the data. Moreover, data visualization techniques are employed to create intuitive and informative visual representations of the analysed data. The key conclusions drawn from this project highlight the advantages of a Big Data-driven Business Intelligence solution in processing and analysing World Bank data. The implemented framework showcases improved scalability, performance, and flexibility compared to traditional approaches. In conclusion, this bachelor thesis presents a Business Intelligence solution based on a Big Data architecture for processing and analysing the World Bank data. The project findings emphasize the importance of scalable and efficient data processing techniques, multidimensional modelling, and data visualization for deriving valuable insights. The application of these techniques contributes to the field by demonstrating the potential of Big Data Business Intelligence solutions in addressing the challenges associated with large-scale data analysis

    Mineração de dados em sistemas OLAP

    Get PDF
    Dissertação de mestrado em Engenharia InformáticaAs diversas vantagens que os data warehouses têm proporcionado no que toca ao armazenamento e processamento de informação levaram a uma subida substancial na aquisição deste tipo de estruturas por parte das organizações. De facto, os data warehouses são caracterizados por um modelo de dados que permite, entre várias opções, realizar pesquisas complexas, selecionar conjuntos de dados de maior interesse, executar operações de sintetização, fazer comparações de dados e proporcionar diferentes visualizações dos dados. No entanto, a sua complexidade acarreta diversos custos, nomeadamente custos de computação e de materialização. Por um lado, a pré-computação de um cubo a partir de um data warehouse proporciona tempos de resposta reduzidos às pesquisas realizadas, mas, por outro lado, isso causa problemas no que toca à quantidade de espaço de armazenamento necessário. As técnicas de mineração de dados, nomeadamente aquelas que consideram os algoritmos de mineração de regras de associação, permitem encontrar conjuntos de itens frequentes entre os dados, permitindo, consequentemente, definir um conjunto de preferências de exploração ou de utilização. O estudo de preferências OLAP apresentado nesta dissertação visa identificar os dados mais acedidos por parte dos utilizadores, de forma a ser possível chegar a um consenso sobre quais as partes de um cubo que não são necessárias materializar, uma vez que não são utilizadas em processos de análise, mantendo tempos de resposta das pesquisas aceitáveis e reduzindo significativamente a quantidade de memória utilizada.The many benefits provided by data warehouses, in particular regarding to storage and data processing, have led to a substantial growth of the data warehousing market and in the number of organizations who adopted these systems. In fact, the data model of this type of structures allows the user to perform a large number of different operations: complex queries, find the most interesting information, aggregate and compare different values, and to provide an interactive data visualization. However, its complexity brings some computation and materialization costs. The pre-computation of the all data cube can provide a precise and fast response to analytical queries, but it requires an enormous quantity of space to storage all materialized views. The application of data mining techniques, such as algorithms for mining association rules, allows the discovery of frequent items among data and, consequently, the definition of OLAP preferences. The study of OLAP preferences presented in this dissertation aims to identify the most accessed parts in a data cube and to define which parts should be materialized. With the identification and materialization only of the important parts for the analysis, it is possible to preserve a satisfactory query response time, achieving a significant reduction of memory costs

    Playing With the Image: In Conversation with Margaret Roberts

    Get PDF

    Implantación de los básicos de inventario en Tecnoconfort S.A.

    Get PDF
    Este trabajo trata sobre la implementación de los básicos de inventario, unas reglas y formas de actuar de una empresa que tiene como objetivo principal el de identificar áreas con discrepancias entre inventario físico y del sistema y definir nuevos procesos con el fin de eliminar dichas diferencias. Para ello se llevarán a cabo cinco básicos que son cinco normas básicas que se debe cumplir en todo inventario para mejorar su control, hacer la recepción de materiales en tiempo real, identificar adecuadamente la ubicación de las piezas en el almacén y los envíos a cliente, realizar ciclos de inventario y alertas de stock mínimo que tienen como objetivo final eliminar las discrepancias entre el inventario físico y el realGraduado o Graduada en Ingeniería en Tecnologías Industriales por la Universidad Pública de NavarraIndustria Teknologietako Ingeniaritzan Graduatua Nafarroako Unibertsitate Publikoa

    Intégration de données temps-réel issues de capteurs dans un entrepôt de données géo-décisionnel

    Get PDF
    Nous avons pu, au cours des dernières années, assister à une augmentation du nombre de capteurs utilisés pour mesurer des phénomènes de plus en plus variés. En effet, nous pouvons aujourd'hui utiliser les capteurs pour mesurer un niveau d'eau, une position (GPS), une température et même le rythme cardiaque d'un individu. La grande diversité de capteurs fait d'eux aujourd'hui des outils par excellence en matière d'acquisition de données. En parallèle à cette effervescence, les outils d'analyse ont également évolué depuis les bases de données transactionnelles et ont mené à l'apparition d'une nouvelle famille d’outils, appelés systèmes d’analyse (systèmes décisionnels), qui répond à des besoins d’analyse globale sur les données. Les entrepôts de données et outils OLAP (On-Line Analytical Processing), qui font partie de cette famille, permettent dorénavant aux décideurs d'analyser l'énorme volume de données dont ils disposent, de réaliser des comparaisons dans le temps et de construire des graphiques statistiques à l’aide de simples clics de la souris. Les nombreux types de capteurs peuvent certainement apporter de la richesse à une analyse, mais nécessitent de longs travaux d'intégration pour les amener jusqu'à un entrepôt géo-décisionnel, qui est au centre du processus de prise de décision. Les différents modèles de capteurs, types de données et moyens de transférer les données sont encore aujourd'hui des obstacles non négligeables à l'intégration de données issues de capteurs dans un entrepôt géo-décisionnel. Également, les entrepôts de données géo-décisionnels actuels ne sont pas initialement conçus pour accueillir de nouvelles données sur une base fréquente. Puisque l'utilisation de l'entrepôt par les utilisateurs est restreinte lors d'une mise à jour, les nouvelles données sont généralement ajoutées sur une base hebdomadaire, mensuelle, etc. Il existe pourtant des entrepôts de données capables d'être mis à jour plusieurs fois par jour sans que les performances lors de leur exploitation ne soient atteintes, les entrepôts de données temps-réel (EDTR). Toutefois, cette technologie est encore aujourd’hui peu courante, très coûteuse et peu développée. Ces travaux de recherche visent donc à développer une approche permettant de publier et standardiser les données temps-réel issues de capteurs et de les intégrer dans un entrepôt géo-décisionnel conventionnel. Une stratégie optimale de mise à jour de l'entrepôt a également été développée afin que les nouvelles données puissent être ajoutées aux analyses sans que la qualité de l'exploitation de l'entrepôt par les utilisateurs ne soit remise en cause.In the last decade, the use of sensors for measuring various phenomenons has greatly increased. As such, we can now make use of sensors to measure GPS position, temperature and even the heartbeats of a person. Nowadays, the wide diversity of sensor makes them the best tools to gather data. Along with this effervescence, analysis tools have also advanced since the creation of transactional databases, leading to a new category of tools, analysis systems (Business Intelligence (BI)), which respond to the need of the global analysis of the data. Data warehouses and OLAP (On-Line Analytical Processing) tools, which belong to this category, enable users to analyze big volumes of data, execute time-based requests and build statistic graphs in a few simple mouse clicks. Although the various types of sensor can surely enrich any analysis, such data requires heavy integration processes to be driven into the data warehouse, centerpiece of any decision-making process. The different data types produced by sensors, sensor models and ways to transfer such data are even today significant obstacles to sensors data streams integration in a geo-decisional data warehouse. Also, actual geo-decisional data warehouses are not initially built to welcome new data on a high frequency. Since the performances of a data warehouse are restricted during an update, new data is usually added weekly, monthly, etc. However, some data warehouses, called Real-Time Data Warehouses (RTDW), are able to be updated several times a day without letting its performance diminish during the process. But this technology is not very common, very costly and in most of cases considered as "beta" versions. Therefore, this research aims to develop an approach allowing to publish and normalize real-time sensors data streams and to integrate it into a classic data warehouse. An optimized update strategy has also been developed so the frequent new data can be added to the analysis without affecting the data warehouse performances
    corecore