8 research outputs found

    Data mining using the crossing minimization paradigm

    Get PDF
    Our ability and capacity to generate, record and store multi-dimensional, apparently unstructured data is increasing rapidly, while the cost of data storage is going down. The data recorded is not perfect, as noise gets introduced in it from different sources. Some of the basic forms of noise are incorrect recording of values and missing values. The formal study of discovering useful hidden information in the data is called Data Mining. Because of the size, and complexity of the problem, practical data mining problems are best attempted using automatic means. Data Mining can be categorized into two types i.e. supervised learning or classification and unsupervised learning or clustering. Clustering only the records in a database (or data matrix) gives a global view of the data and is called one-way clustering. For a detailed analysis or a local view, biclustering or co-clustering or two-way clustering is required involving the simultaneous clustering of the records and the attributes. In this dissertation, a novel fast and white noise tolerant data mining solution is proposed based on the Crossing Minimization (CM) paradigm; the solution works for one-way as well as two-way clustering for discovering overlapping biclusters. For decades the CM paradigm has traditionally been used for graph drawing and VLSI (Very Large Scale Integration) circuit design for reducing wire length and congestion. The utility of the proposed technique is demonstrated by comparing it with other biclustering techniques using simulated noisy, as well as real data from Agriculture, Biology and other domains. Two other interesting and hard problems also addressed in this dissertation are (i) the Minimum Attribute Subset Selection (MASS) problem and (ii) Bandwidth Minimization (BWM) problem of sparse matrices. The proposed CM technique is demonstrated to provide very convincing results while attempting to solve the said problems using real public domain data. Pakistan is the fourth largest supplier of cotton in the world. An apparent anomaly has been observed during 1989-97 between cotton yield and pesticide consumption in Pakistan showing unexpected periods of negative correlation. By applying the indigenous CM technique for one-way clustering to real Agro-Met data (2001-2002), a possible explanation of the anomaly has been presented in this thesis.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Matrix Reordering Methods for Table and Network Visualization

    Get PDF
    International audienceThis survey provides a description of algorithms to reorder visual matrices of tabular data and adjacency matrix of networks. The goal of this survey is to provide a comprehensive list of reordering algorithms published in different fields such as statistics, bioinformatics, or graph theory. While several of these algorithms are described in publications and others are available in software libraries and programs, there is little awareness of what is done across all fields. Our survey aims at describing these reordering algorithms in a unified manner to enable a wide audience to understand their differences and subtleties. We organize this corpus in a consistent manner, independently of the application or research field. We also provide practical guidance on how to select appropriate algorithms depending on the structure and size of the matrix to reorder, and point to implementations when available

    Design study of MovementSlicer : an interactive visualization of patterns and group meetings in 2D movement data

    Get PDF
    Movement data collected through GPS or other technologies is increasingly common, but is difficult to visualize due to overplotting and occlusion of movements when displayed on 2D maps. An additional challenge is the extraction of useful higher-level information (such as meetings) derived from the raw movement data. We present a design study of MovementSlicer, a tool for visualizing the places visited, and behaviors of, individual actors, and also the meetings between multiple actors. We first present a taxonomy of visualizations of movement data, and then consider tasks to support when analyzing movement data and especially meetings of multiple actors. We argue that Gantt charts have many advantages for understanding the movements and meetings of small groups of moving entities, and present the design of a Gantt chart that can nest people within locations or locations within people along the vertical axis, and show time along the horizontal axis. The rows of our Gantt chart are sorted by activity level and can be filtered using a weighted adjacency matrix showing meetings between people. Empty time intervals in the Gantt chart can be automatically folded, with smoothly animated transitions, yielding a multi-focal view. Case studies demonstrate the utility of our prototype

    Barysentrisen heuristiikan ja mukautuvuusanalyysiin perustuvan algoritmin vertailu matriisien järjestämisessä

    Get PDF
    Tutkielmassa käsitellään matriisien järjestelyyn käytettävien algoritmien ominaisuuksia ja historiaa sekä matriisien ja graafien yhteisiä sovellus- ja ongelmaalueita. Matriisien järjestelyalgoritmeista tutustutaan tarkemmin mukautuvuusanalyysiin ja miinustekniikkaan perustuvan järjestelyalgoritmin toimintaan, minkä lisäksi tutustutaan kaksijakoisen graafin risteymien minimointiin perinteisesti käytettyyn barysentriseen heuristiikkaan ja käsitellään risteymien minimointiongelma matriisin järjestelyongelmana. Kumpaankin tekniikkaan perustuvat järjestelyt toteuttiin ohjelmallisesti ja testattiin erityyppisillä ja -kokoisilla matriiseilla. Työssä toteutettiin myös uusi barysentriseen järjestelyyn ja miinustekniikkaan perustuva järjestely, jota sovellettiin muiden tekniikoiden ohella risteymien minimointiongelmaan. Uusi järjestelytekniikka toimi erityisen lupaavasti synteettisesti generoidulla kaistamaisella aineistolla. Testiajojen perusteella myös mukautuvuusanalyysiin ja miinustekniikkaan perustuva algoritmi toimi lupaavasti risteymien minimointiin liittyvien hahmojen muodostamisessa ja risteymien minimointitehtävässä. Tuloksien perusteella miinustekniikka ja mukautuvuusanalyysi voivat olla varteenotettavia vaihtoehtoja erityisesti sellaisilla graafeilla, joiden järjestelyyn barysentrinen järjestely tai muut perinteiset heuristiikat eivät sovi
    corecore