13 research outputs found

    Data Mining Using the Crossing Minimization Paradigm

    Get PDF
    Our ability and capacity to generate, record and store multi-dimensional, apparently unstructured data is increasing rapidly, while the cost of data storage is going down. The data recorded is not perfect, as noise gets introduced in it from different sources. Some of the basic forms of noise are incorrect recording of values and missing values. The formal study of discovering useful hidden information in the data is called Data Mining. Because of the size, and complexity of the problem, practical data mining problems are best attempted using automatic means. Data Mining can be categorized into two types i.e. supervised learning or classification and unsupervised learning or clustering. Clustering only the records in a database (or data matrix) gives a global view of the data and is called one-way clustering. For a detailed analysis or a local view, biclustering or co-clustering or two-way clustering is required involving the simultaneous clustering of the records and the attributes. In this dissertation, a novel fast and white noise tolerant data mining solution is proposed based on the Crossing Minimization (CM) paradigm; the solution works for one-way as well as two-way clustering for discovering overlapping biclusters. For decades the CM paradigm has traditionally been used for graph drawing and VLSI (Very Large Scale Integration) circuit design for reducing wire length and congestion. The utility of the proposed technique is demonstrated by comparing it with other biclustering techniques using simulated noisy, as well as real data from Agriculture, Biology and other domains. Two other interesting and hard problems also addressed in this dissertation are (i) the Minimum Attribute Subset Selection (MASS) problem and (ii) Bandwidth Minimization (BWM) problem of sparse matrices. The proposed CM technique is demonstrated to provide very convincing results while attempting to solve the said problems using real public domain data. Pakistan is the fourth largest supplier of cotton in the world. An apparent anomaly has been observed during 1989-97 between cotton yield and pesticide consumption in Pakistan showing unexpected periods of negative correlation. By applying the indigenous CM technique for one-way clustering to real Agro-Met data (2001-2002), a possible explanation of the anomaly has been presented in this thesis

    Tietojenkäsittelytieteellisiä tutkielmia. Kevät 2015

    Get PDF

    Data mining using the crossing minimization paradigm

    Get PDF
    Our ability and capacity to generate, record and store multi-dimensional, apparently unstructured data is increasing rapidly, while the cost of data storage is going down. The data recorded is not perfect, as noise gets introduced in it from different sources. Some of the basic forms of noise are incorrect recording of values and missing values. The formal study of discovering useful hidden information in the data is called Data Mining. Because of the size, and complexity of the problem, practical data mining problems are best attempted using automatic means. Data Mining can be categorized into two types i.e. supervised learning or classification and unsupervised learning or clustering. Clustering only the records in a database (or data matrix) gives a global view of the data and is called one-way clustering. For a detailed analysis or a local view, biclustering or co-clustering or two-way clustering is required involving the simultaneous clustering of the records and the attributes. In this dissertation, a novel fast and white noise tolerant data mining solution is proposed based on the Crossing Minimization (CM) paradigm; the solution works for one-way as well as two-way clustering for discovering overlapping biclusters. For decades the CM paradigm has traditionally been used for graph drawing and VLSI (Very Large Scale Integration) circuit design for reducing wire length and congestion. The utility of the proposed technique is demonstrated by comparing it with other biclustering techniques using simulated noisy, as well as real data from Agriculture, Biology and other domains. Two other interesting and hard problems also addressed in this dissertation are (i) the Minimum Attribute Subset Selection (MASS) problem and (ii) Bandwidth Minimization (BWM) problem of sparse matrices. The proposed CM technique is demonstrated to provide very convincing results while attempting to solve the said problems using real public domain data. Pakistan is the fourth largest supplier of cotton in the world. An apparent anomaly has been observed during 1989-97 between cotton yield and pesticide consumption in Pakistan showing unexpected periods of negative correlation. By applying the indigenous CM technique for one-way clustering to real Agro-Met data (2001-2002), a possible explanation of the anomaly has been presented in this thesis.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Barysentrisen heuristiikan ja mukautuvuusanalyysiin perustuvan algoritmin vertailu matriisien järjestämisessä

    Get PDF
    Tutkielmassa käsitellään matriisien järjestelyyn käytettävien algoritmien ominaisuuksia ja historiaa sekä matriisien ja graafien yhteisiä sovellus- ja ongelmaalueita. Matriisien järjestelyalgoritmeista tutustutaan tarkemmin mukautuvuusanalyysiin ja miinustekniikkaan perustuvan järjestelyalgoritmin toimintaan, minkä lisäksi tutustutaan kaksijakoisen graafin risteymien minimointiin perinteisesti käytettyyn barysentriseen heuristiikkaan ja käsitellään risteymien minimointiongelma matriisin järjestelyongelmana. Kumpaankin tekniikkaan perustuvat järjestelyt toteuttiin ohjelmallisesti ja testattiin erityyppisillä ja -kokoisilla matriiseilla. Työssä toteutettiin myös uusi barysentriseen järjestelyyn ja miinustekniikkaan perustuva järjestely, jota sovellettiin muiden tekniikoiden ohella risteymien minimointiongelmaan. Uusi järjestelytekniikka toimi erityisen lupaavasti synteettisesti generoidulla kaistamaisella aineistolla. Testiajojen perusteella myös mukautuvuusanalyysiin ja miinustekniikkaan perustuva algoritmi toimi lupaavasti risteymien minimointiin liittyvien hahmojen muodostamisessa ja risteymien minimointitehtävässä. Tuloksien perusteella miinustekniikka ja mukautuvuusanalyysi voivat olla varteenotettavia vaihtoehtoja erityisesti sellaisilla graafeilla, joiden järjestelyyn barysentrinen järjestely tai muut perinteiset heuristiikat eivät sovi

    Order-Related Problems Parameterized by Width

    Get PDF
    In the main body of this thesis, we study two different order theoretic problems. The first problem, called Completion of an Ordering, asks to extend a given finite partial order to a complete linear order while respecting some weight constraints. The second problem is an order reconfiguration problem under width constraints. While the Completion of an Ordering problem is NP-complete, we show that it lies in FPT when parameterized by the interval width of ρ. This ordering problem can be used to model several ordering problems stemming from diverse application areas, such as graph drawing, computational social choice, and computer memory management. Each application yields a special partial order ρ. We also relate the interval width of ρ to parameterizations for these problems that have been studied earlier in the context of these applications, sometimes improving on parameterized algorithms that have been developed for these parameterizations before. This approach also gives some practical sub-exponential time algorithms for ordering problems. In our second main result, we combine our parameterized approach with the paradigm of solution diversity. The idea of solution diversity is that instead of aiming at the development of algorithms that output a single optimal solution, the goal is to investigate algorithms that output a small set of sufficiently good solutions that are sufficiently diverse from one another. In this way, the user has the opportunity to choose the solution that is most appropriate to the context at hand. It also displays the richness of the solution space. There, we show that the considered diversity version of the Completion of an Ordering problem is fixed-parameter tractable with respect to natural paramaters that capture the notion of diversity and the notion of sufficiently good solutions. We apply this algorithm in the study of the Kemeny Rank Aggregation class of problems, a well-studied class of problems lying in the intersection of order theory and social choice theory. Up to this point, we have been looking at problems where the goal is to find an optimal solution or a diverse set of good solutions. In the last part, we shift our focus from finding solutions to studying the solution space of a problem. There we consider the following order reconfiguration problem: Given a graph G together with linear orders τ and τ ′ of the vertices of G, can one transform τ into τ ′ by a sequence of swaps of adjacent elements in such a way that at each time step the resulting linear order has cutwidth (pathwidth) at most w? We show that this problem always has an affirmative answer when the input linear orders τ and τ ′ have cutwidth (pathwidth) at most w/2. Using this result, we establish a connection between two apparently unrelated problems: the reachability problem for two-letter string rewriting systems and the graph isomorphism problem for graphs of bounded cutwidth. This opens an avenue for the study of the famous graph isomorphism problem using techniques from term rewriting theory. In addition to the main part of this work, we present results on two unrelated problems, namely on the Steiner Tree problem and on the Intersection Non-emptiness problem from automata theory.Doktorgradsavhandlin
    corecore