13 research outputs found

    ACRMiner: An Incremental Approach for Finding Dense and Sparse Rectangular Regions from a 2D Interval Dataset

    Get PDF
    In many applications, transactions are associated with intervals related to time, temperature, humidity or other similar measures.  The term "2D interval data" or "rectangle data" is used when there are two connected intervals with each transaction. Two connected intervals give rise to a rectangle. The rectangles may overlap producing regions with different density values. The density value or support of a region is the number of rectangles that contain it. A region is closed if its density is strictly bigger than any region properly containing it. For rectangle dataset, these regions are rectangular in shape.In this paper an algorithm named ACRMiner has been proposed that takes as input a sequence of rectangles and computes all closed overlapping rectangles and their density values. The algorithm is incremental and thus is suitable for dynamic environment. Depending on an input threshold the regions can be classified as dense and sparse.Here a tree-based data structure named as ACR-Tree is used. The method has been implemented and tested on synthetic and real-life datasets and results have been reported. Few applications of this algorithm have been discussed. The worst-case time complexity the algorithmis O(n5) where n is the number of input rectangles

    SuRF: Identification of Interesting Data Regions with Surrogate Models

    No full text
    Several data mining tasks focus on repeatedly inspecting multidimensional data regions summarized by a statistic. The value of this statistic (e.g., region-population sizes, order moments) is used to classify the region’s interesting-ness. These regions can be naively extracted from the entire dataspace – however, this is extremely time-consuming and compute-resource demanding. This paper studies the reverse problem: analysts provide a cut-off value for a statistic of interest and in turn our proposed framework efficiently identifies multidimensional regions whose statistic exceeds (or is below) the given cut-off value (according to user’s needs). However, as data dimensions and size increase, such task inevitably becomes laborious and costly. To alleviate this cost, our solution, coined SuRF (SUrrogate Region Finder), leverages historical region evaluations to train surrogate models that learn to approximate the distribution of the statistic of interest. It then makes use of evolutionary multi-modal optimization to effectively and efficiently identify regions of interest regardless of data size and dimensionality. The accuracy, efficiency, and scalability of our approach are demonstrated with experiments using synthetic and real-world datasets and compared with other methods

    Applications of Computational Geometry and Computer Vision

    Get PDF
    Recent advances in machine learning research promise to bring us closer to the original goals of artificial intelligence. Spurred by recent innovations in low-cost, specialized hardware and incremental refinements in machine learning algorithms, machine learning is revolutionizing entire industries. Perhaps the biggest beneficiary of this progress has been the field of computer vision. Within the domains of computational geometry and computer vision are two problems: Finding large, interesting holes in high dimensional data, and locating and automatically classifying facial features from images. State of the art methods for facial feature classification are compared and new methods for finding empty hyper-rectangles are introduced. The problem of finding holes is then linked to the problem of extracting features from images and deep learning methods such as convolutional neural networks. The performance of the hole-finding algorithm is measured using multiple standard machine learning benchmarks as well as a 39 dimensional dataset, thus demonstrating the utility of the method for a wide range of data

    Beiträge zum 17. Interuniversitären Doktorandenseminar Wirtschaftsinformatik

    Get PDF
    Dieser Tagungsband enthält Themenbeiträge des 17. Interuniversitären Doktorandenseminars Wirtschaftsinformatik. Die Aufsätze stammen von Doktoranden der mitteldeutschen Universitäten Halle-Wittenberg, Leipzig, Dresden, Freiberg, Chemnitz und Jena. Die thematische Vielfalt reicht von der IT-Unterstützung für KMUs über intelligente Verfahren im Supply Chain und Business Process Management bis hin zu Fragen der Konzeptionierung und Umsetzung Analytischer Informationssysteme. Diese Themen belegen eindrucksvoll die Forschungsorientierung in der Wirtschaftsinformatik, die auch die Forderung nach praktischer Relevanz nicht scheuen muss

    Metody analizy i oceny bezpieczeństwa oraz jakości informacji

    Get PDF
    Praca recenzowana / Peer-reviewed paperTematyka monografii koncentruje się na metodach zapewnienia bezpieczeństwa informacji, procedurach wspomagających podnoszenie jakości danych oraz narzędziach zwiększających możliwości pozyskiwania z dostępnych informacji wartościowych i rzetelnych wniosków analitycznych. Publikacja podzielona jest na trzy części. Pierwsza część zawiera 3 rozdziały, w których podjęto kwestie zapewnienia bezpieczeństwa informacji. Druga część monografii, ujęta w kolejnych trzech rozdziałach, prezentuje wybrane narzędzia i modele organizacji zasobów cyfrowych mających na celu zapewnienie wysokiej jakości, użyteczności oraz wiarygodności zbiorów danych zawartych w repozytoriach. Ostatnia część pracy przedstawia problematykę związaną z prawem Benforda, pozwalającym ocenić stopień rzetelności danych na podstawie analizy rozkładów cyfr w liczbach weryfikowanego zbioru danych

    INFORMATION VISUALIZATION DESIGN FOR MULTIDIMENSIONAL DATA: INTEGRATING THE RANK-BY-FEATURE FRAMEWORK WITH HIERARCHICAL CLUSTERING

    Get PDF
    Interactive exploration of multidimensional data sets is challenging because: (1) it is difficult to comprehend patterns in more than three dimensions, and (2) current systems are often a patchwork of graphical and statistical methods leaving many researchers uncertain about how to explore their data in an orderly manner. This dissertation offers a set of principles and a novel rank-by-feature framework that could enable users to better understand multidimensional and multivariate data by systematically studying distributions in one (1D) or two dimensions (2D), and then discovering relationships, clusters, gaps, outliers, and other features. Users of this rank-by-feature framework can view graphical presentations (histograms, boxplots, and scatterplots), and then choose a feature detection criterion to rank 1D or 2D axis-parallel projections. By combining information visualization techniques (overview, coordination, and dynamic query) with summaries and statistical methods, users can systematically examine the most important 1D and 2D axis-parallel projections. This research provides a number of valuable contributions: Graphics, Ranking, and Interaction for Discovery (GRID) principles- a set of principles for exploratory analysis of multidimensional data, which are summarized as: (1) study 1D, study 2D, then find features (2) ranking guides insight, statistics confirm. GRID principles help users organize their discovery process in an orderly manner so as to produce more thorough analyses and extract deeper insights in any multidimensional data application. Rank-by-feature framework - a user interface framework based on the GRID principles. Interactive information visualization techniques are combined with statistical methods and data mining algorithms to enable users to orderly examine multidimensional data sets using 1D and 2D projections. The design and implementation of the Hierarchical Clustering Explorer (HCE), an information visualization tool available at www.cs.umd.edu/hcil/hce. HCE implements the rank-by-feature framework and supports interactive exploration of hierarchical clustering results to reveal one of the important features - clusters. Validation through case studies and user surveys: Case studies with motivated experts in three research fields and a user survey via emails to a wide range of HCE users demonstrated the efficacy of HCE and the rank-by-feature framework. These studies also revealed potential improvement opportunities in terms of design and implementation
    corecore