35,065 research outputs found

    Genetic based clustering algorithms and applications.

    Get PDF
    by Lee Wing Kin.Thesis (M.Phil.)--Chinese University of Hong Kong, 2000.Includes bibliographical references (leaves 81-90).Abstracts in English and Chinese.Abstract --- p.iAcknowledgments --- p.iiiList of Figures --- p.viiList of Tables --- p.viiiChapter 1 --- Introduction --- p.1Chapter 1.1 --- Clustering --- p.1Chapter 1.1.1 --- Hierarchical Classification --- p.2Chapter 1.1.2 --- Partitional Classification --- p.3Chapter 1.1.3 --- Comparative Analysis --- p.4Chapter 1.2 --- Cluster Analysis and Traveling Salesman Problem --- p.5Chapter 1.3 --- Solving Clustering Problem --- p.7Chapter 1.4 --- Genetic Algorithms --- p.9Chapter 1.5 --- Outline of Work --- p.11Chapter 2 --- The Clustering Algorithms and Applications --- p.13Chapter 2.1 --- Introduction --- p.13Chapter 2.2 --- Traveling Salesman Problem --- p.14Chapter 2.2.1 --- Related Work on TSP --- p.14Chapter 2.2.2 --- Solving TSP using Genetic Algorithm --- p.15Chapter 2.3 --- Applications --- p.22Chapter 2.3.1 --- Clustering for Vertical Partitioning Design --- p.22Chapter 2.3.2 --- Horizontal Partitioning a Relational Database --- p.36Chapter 2.3.3 --- Object-Oriented Database Design --- p.42Chapter 2.3.4 --- Document Database Design --- p.49Chapter 2.4 --- Conclusions --- p.53Chapter 3 --- The Experiments for Vertical Partitioning Problem --- p.55Chapter 3.1 --- Introduction --- p.55Chapter 3.2 --- Comparative Study --- p.56Chapter 3.3 --- Experimental Results --- p.59Chapter 3.4 --- Conclusions --- p.61Chapter 4 --- Three New Operators for TSP --- p.62Chapter 4.1 --- Introduction --- p.62Chapter 4.2 --- Enhanced Cost Edge Recombination Operator --- p.63Chapter 4.3 --- Shortest Path Operator --- p.66Chapter 4.4 --- Shortest Edge Operator --- p.69Chapter 4.5 --- The Experiments --- p.71Chapter 4.5.1 --- Experimental Results for a 48-city TSP --- p.71Chapter 4.5.2 --- Experimental Results for Problems in TSPLIB --- p.73Chapter 4.6 --- Conclusions --- p.77Chapter 5 --- Conclusions --- p.78Chapter 5.1 --- Summary of Achievements --- p.78Chapter 5.2 --- Future Development --- p.80Bibliography --- p.8

    Approximate Computation and Implicit Regularization for Very Large-scale Data Analysis

    Full text link
    Database theory and database practice are typically the domain of computer scientists who adopt what may be termed an algorithmic perspective on their data. This perspective is very different than the more statistical perspective adopted by statisticians, scientific computers, machine learners, and other who work on what may be broadly termed statistical data analysis. In this article, I will address fundamental aspects of this algorithmic-statistical disconnect, with an eye to bridging the gap between these two very different approaches. A concept that lies at the heart of this disconnect is that of statistical regularization, a notion that has to do with how robust is the output of an algorithm to the noise properties of the input data. Although it is nearly completely absent from computer science, which historically has taken the input data as given and modeled algorithms discretely, regularization in one form or another is central to nearly every application domain that applies algorithms to noisy data. By using several case studies, I will illustrate, both theoretically and empirically, the nonobvious fact that approximate computation, in and of itself, can implicitly lead to statistical regularization. This and other recent work suggests that, by exploiting in a more principled way the statistical properties implicit in worst-case algorithms, one can in many cases satisfy the bicriteria of having algorithms that are scalable to very large-scale databases and that also have good inferential or predictive properties.Comment: To appear in the Proceedings of the 2012 ACM Symposium on Principles of Database Systems (PODS 2012

    Open by design: the role of design in open innovation

    Get PDF

    Explain3D: Explaining Disagreements in Disjoint Datasets

    Get PDF
    Data plays an important role in applications, analytic processes, and many aspects of human activity. As data grows in size and complexity, we are met with an imperative need for tools that promote understanding and explanations over data-related operations. Data management research on explanations has focused on the assumption that data resides in a single dataset, under one common schema. But the reality of today's data is that it is frequently un-integrated, coming from different sources with different schemas. When different datasets provide different answers to semantically similar questions, understanding the reasons for the discrepancies is challenging and cannot be handled by the existing single-dataset solutions. In this paper, we propose Explain3D, a framework for explaining the disagreements across disjoint datasets (3D). Explain3D focuses on identifying the reasons for the differences in the results of two semantically similar queries operating on two datasets with potentially different schemas. Our framework leverages the queries to perform a semantic mapping across the relevant parts of their provenance; discrepancies in this mapping point to causes of the queries' differences. Exploiting the queries gives Explain3D an edge over traditional schema matching and record linkage techniques, which are query-agnostic. Our work makes the following contributions: (1) We formalize the problem of deriving optimal explanations for the differences of the results of semantically similar queries over disjoint datasets. (2) We design a 3-stage framework for solving the optimal explanation problem. (3) We develop a smart-partitioning optimizer that improves the efficiency of the framework by orders of magnitude. (4)~We experiment with real-world and synthetic data to demonstrate that Explain3D can derive precise explanations efficiently
    • …
    corecore