324 research outputs found

    Materializing aaseline views for deviation detection exploratory OLAP

    Get PDF
    The final publication is available at link.springer.comAlert-raising and deviation detection in OLAP and explora-tory search concerns calling the user’s attention to variations and non-uniform data distributions, or directing the user to the most interesting exploration of the data. In this paper, we are interested in the ability of a data warehouse to monitor continuously new data, and to update accordingly a particular type of materialized views recording statistics, called baselines. It should be possible to detect deviations at various levels of aggregation, and baselines should be fully integrated into the database. We propose Multi-level Baseline Materialized Views (BMV), including the mechanisms to build, refresh and detect deviations. We also propose an incremental approach and formula for refreshing baselines efficiently. An experimental setup proves the concept and shows its efficiency.Peer ReviewedPostprint (author's final draft

    Doctor of Philosophy

    Get PDF
    dissertationRecent advancements in mobile devices - such as Global Positioning System (GPS), cellular phones, car navigation system, and radio-frequency identification (RFID) - have greatly influenced the nature and volume of data about individual-based movement in space and time. Due to the prevalence of mobile devices, vast amounts of mobile objects data are being produced and stored in databases, overwhelming the capacity of traditional spatial analytical methods. There is a growing need for discovering unexpected patterns, trends, and relationships that are hidden in the massive mobile objects data. Geographic visualization (GVis) and knowledge discovery in databases (KDD) are two major research fields that are associated with knowledge discovery and construction. Their major research challenges are the integration of GVis and KDD, enhancing the ability to handle large volume mobile objects data, and high interactivity between the computer and users of GVis and KDD tools. This dissertation proposes a visualization toolkit to enable highly interactive visual data exploration for mobile objects datasets. Vector algebraic representation and online analytical processing (OLAP) are utilized for managing and querying the mobile object data to accomplish high interactivity of the visualization tool. In addition, reconstructing trajectories at user-defined levels of temporal granularity with time aggregation methods allows exploration of the individual objects at different levels of movement generality. At a given level of generality, individual paths can be combined into synthetic summary paths based on three similarity measures, namely, locational similarity, directional similarity, and geometric similarity functions. A visualization toolkit based on the space-time cube concept exploits these functionalities to create a user-interactive environment for exploring mobile objects data. Furthermore, the characteristics of visualized trajectories are exported to be utilized for data mining, which leads to the integration of GVis and KDD. Case studies using three movement datasets (personal travel data survey in Lexington, Kentucky, wild chicken movement data in Thailand, and self-tracking data in Utah) demonstrate the potential of the system to extract meaningful patterns from the otherwise difficult to comprehend collections of space-time trajectories

    Explanation of Exceptional Values in Multi-dimensional Business Databases

    Get PDF
    “How can the functionality of multi-dimensional business databases be extended with diagnostic capabilities to support managerial decision-making?” This question states the main research problem addressed in this thesis. Before giving an answer, the question first requires clarification and delineation. In this chapter, the research question is placed briefly into context, both regarding academic and business relevance. This leads to the formulation of three specific research questions. Subsequently, a section is dedicated to each specific research question. An outline of this thesis concludes the chapter

    A Correlation Framework for Continuous User Authentication Using Data Mining

    Get PDF
    Merged with duplicate records: 10026.1/572, 10026.1/334 and 10026.1/724 on 01.02.2017 by CS (TIS)The increasing security breaches revealed in recent surveys and security threats reported in the media reaffirms the lack of current security measures in IT systems. While most reported work in this area has focussed on enhancing the initial login stage in order to counteract against unauthorised access, there is still a problem detecting when an intruder has compromised the front line controls. This could pose a senous threat since any subsequent indicator of an intrusion in progress could be quite subtle and may remain hidden to the casual observer. Having passed the frontline controls and having the appropriate access privileges, the intruder may be in the position to do virtually anything without further challenge. This has caused interest'in the concept of continuous authentication, which inevitably involves the analysis of vast amounts of data. The primary objective of the research is to develop and evaluate a suitable correlation engine in order to automate the processes involved in authenticating and monitoring users in a networked system environment. The aim is to further develop the Anoinaly Detection module previously illustrated in a PhD thesis [I] as part of the conceptual architecture of an Intrusion Monitoring System (IMS) framework

    Data mining industry : emerging trends and new opportunities

    Get PDF
    Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, June 2000."May 2000."Includes bibliographical references (leaves 170-179).by Walter Alberto Aldana.M.Eng

    Analysis of building performance data

    Get PDF
    In recent years, the global trend for digitalisation has also reached buildings and facility management. Due to the roll out of smart meters and the retrofitting of buildings with meters and sensors, the amount of data available for a single building has increased significantly. In addition to data sets collected by measurement devices, Building Information Modelling has recently seen a strong incline. By maintaining a building model through the whole building life-cycle, the model becomes rich of information describing all major aspects of a building. This work aims to combine these data sources to gain further valuable information from data analysis. Better knowledge of the building’s behaviour due to high quality data available leads to more efficient building operations. Eventually, this may result in a reduction of energy use and therefore less operational costs. In this thesis a concept for holistic data acquisition from smart meters and a methodology for the integration of further meters in the measurement concept are introduced and validated. Secondly, this thesis presents a novel algorithm designed for cleansing and interpolation of faulty data. Descriptive data is extracted from an open meta data model for buildings which is utilised to further enrich the metered data. Additionally, this thesis presents a methodology for how to design and manage all information in a unified Data Warehouse schema. This Data Warehouse, which has been developed, maintains compatibility with an open meta data model by adopting the model’s specification into its data schema. It features the application of building specific Key Performance Indicators (KPI) to measure building performance. In addition a clustering algorithm, based on machine learning technology, is developed to identify behavioural patterns of buildings and their frequency of occurrence. All methodologies introduced in this work are evaluated through installations and data from three pilot buildings. The pilot buildings were selected to be of diverse types to prove the generic applicability of the above concepts. The outcome of this work successfully demonstrates that the combination of data sources available for buildings enable advanced data analysis. This largely increases the understanding of buildings and their behavioural patterns. A more efficient building operation and a reduction of energy usage can be achieved with this knowledge

    Query-Time Data Integration

    Get PDF
    Today, data is collected in ever increasing scale and variety, opening up enormous potential for new insights and data-centric products. However, in many cases the volume and heterogeneity of new data sources precludes up-front integration using traditional ETL processes and data warehouses. In some cases, it is even unclear if and in what context the collected data will be utilized. Therefore, there is a need for agile methods that defer the effort of integration until the usage context is established. This thesis introduces Query-Time Data Integration as an alternative concept to traditional up-front integration. It aims at enabling users to issue ad-hoc queries on their own data as if all potential other data sources were already integrated, without declaring specific sources and mappings to use. Automated data search and integration methods are then coupled directly with query processing on the available data. The ambiguity and uncertainty introduced through fully automated retrieval and mapping methods is compensated by answering those queries with ranked lists of alternative results. Each result is then based on different data sources or query interpretations, allowing users to pick the result most suitable to their information need. To this end, this thesis makes three main contributions. Firstly, we introduce a novel method for Top-k Entity Augmentation, which is able to construct a top-k list of consistent integration results from a large corpus of heterogeneous data sources. It improves on the state-of-the-art by producing a set of individually consistent, but mutually diverse, set of alternative solutions, while minimizing the number of data sources used. Secondly, based on this novel augmentation method, we introduce the DrillBeyond system, which is able to process Open World SQL queries, i.e., queries referencing arbitrary attributes not defined in the queried database. The original database is then augmented at query time with Web data sources providing those attributes. Its hybrid augmentation/relational query processing enables the use of ad-hoc data search and integration in data analysis queries, and improves both performance and quality when compared to using separate systems for the two tasks. Finally, we studied the management of large-scale dataset corpora such as data lakes or Open Data platforms, which are used as data sources for our augmentation methods. We introduce Publish-time Data Integration as a new technique for data curation systems managing such corpora, which aims at improving the individual reusability of datasets without requiring up-front global integration. This is achieved by automatically generating metadata and format recommendations, allowing publishers to enhance their datasets with minimal effort. Collectively, these three contributions are the foundation of a Query-time Data Integration architecture, that enables ad-hoc data search and integration queries over large heterogeneous dataset collections

    Medical Informatics

    Get PDF
    Information technology has been revolutionizing the everyday life of the common man, while medical science has been making rapid strides in understanding disease mechanisms, developing diagnostic techniques and effecting successful treatment regimen, even for those cases which would have been classified as a poor prognosis a decade earlier. The confluence of information technology and biomedicine has brought into its ambit additional dimensions of computerized databases for patient conditions, revolutionizing the way health care and patient information is recorded, processed, interpreted and utilized for improving the quality of life. This book consists of seven chapters dealing with the three primary issues of medical information acquisition from a patient's and health care professional's perspective, translational approaches from a researcher's point of view, and finally the application potential as required by the clinicians/physician. The book covers modern issues in Information Technology, Bioinformatics Methods and Clinical Applications. The chapters describe the basic process of acquisition of information in a health system, recent technological developments in biomedicine and the realistic evaluation of medical informatics

    Mining subjectively interesting patterns in rich data

    Get PDF
    • …
    corecore