Search CORE

35,065 research outputs found

Genetic based clustering algorithms and applications.

Author
Publication venue
Publication date: 01/01/2000
Field of study

by Lee Wing Kin.Thesis (M.Phil.)--Chinese University of Hong Kong, 2000.Includes bibliographical references (leaves 81-90).Abstracts in English and Chinese.Abstract --- p.iAcknowledgments --- p.iiiList of Figures --- p.viiList of Tables --- p.viiiChapter 1 --- Introduction --- p.1Chapter 1.1 --- Clustering --- p.1Chapter 1.1.1 --- Hierarchical Classification --- p.2Chapter 1.1.2 --- Partitional Classification --- p.3Chapter 1.1.3 --- Comparative Analysis --- p.4Chapter 1.2 --- Cluster Analysis and Traveling Salesman Problem --- p.5Chapter 1.3 --- Solving Clustering Problem --- p.7Chapter 1.4 --- Genetic Algorithms --- p.9Chapter 1.5 --- Outline of Work --- p.11Chapter 2 --- The Clustering Algorithms and Applications --- p.13Chapter 2.1 --- Introduction --- p.13Chapter 2.2 --- Traveling Salesman Problem --- p.14Chapter 2.2.1 --- Related Work on TSP --- p.14Chapter 2.2.2 --- Solving TSP using Genetic Algorithm --- p.15Chapter 2.3 --- Applications --- p.22Chapter 2.3.1 --- Clustering for Vertical Partitioning Design --- p.22Chapter 2.3.2 --- Horizontal Partitioning a Relational Database --- p.36Chapter 2.3.3 --- Object-Oriented Database Design --- p.42Chapter 2.3.4 --- Document Database Design --- p.49Chapter 2.4 --- Conclusions --- p.53Chapter 3 --- The Experiments for Vertical Partitioning Problem --- p.55Chapter 3.1 --- Introduction --- p.55Chapter 3.2 --- Comparative Study --- p.56Chapter 3.3 --- Experimental Results --- p.59Chapter 3.4 --- Conclusions --- p.61Chapter 4 --- Three New Operators for TSP --- p.62Chapter 4.1 --- Introduction --- p.62Chapter 4.2 --- Enhanced Cost Edge Recombination Operator --- p.63Chapter 4.3 --- Shortest Path Operator --- p.66Chapter 4.4 --- Shortest Edge Operator --- p.69Chapter 4.5 --- The Experiments --- p.71Chapter 4.5.1 --- Experimental Results for a 48-city TSP --- p.71Chapter 4.5.2 --- Experimental Results for Problems in TSPLIB --- p.73Chapter 4.6 --- Conclusions --- p.77Chapter 5 --- Conclusions --- p.78Chapter 5.1 --- Summary of Achievements --- p.78Chapter 5.2 --- Future Development --- p.80Bibliography --- p.8

CUHK Digital Repository

Approximate Computation and Implicit Regularization for Very Large-scale Data Analysis

Author: Mahoney Michael W.
Publication venue
Publication date: 01/01/2012
Field of study

Database theory and database practice are typically the domain of computer scientists who adopt what may be termed an algorithmic perspective on their data. This perspective is very different than the more statistical perspective adopted by statisticians, scientific computers, machine learners, and other who work on what may be broadly termed statistical data analysis. In this article, I will address fundamental aspects of this algorithmic-statistical disconnect, with an eye to bridging the gap between these two very different approaches. A concept that lies at the heart of this disconnect is that of statistical regularization, a notion that has to do with how robust is the output of an algorithm to the noise properties of the input data. Although it is nearly completely absent from computer science, which historically has taken the input data as given and modeled algorithms discretely, regularization in one form or another is central to nearly every application domain that applies algorithms to noisy data. By using several case studies, I will illustrate, both theoretically and empirically, the nonobvious fact that approximate computation, in and of itself, can implicitly lead to statistical regularization. This and other recent work suggests that, by exploiting in a more principled way the statistical properties implicit in worst-case algorithms, one can in many cases satisfy the bicriteria of having algorithms that are scalable to very large-scale databases and that also have good inferential or predictive properties.Comment: To appear in the Proceedings of the 2012 ACM Symposium on Principles of Database Systems (PODS 2012

arXiv.org e-Print Archive

CiteSeerX

Recommended from our members

Survey of partitioning techniques in silicon compilation

Author: Wu Allen C.H.
Publication venue: eScholarship, University of California
Publication date: 01/01/1991
Field of study

In the silicon compilation design process, partitioning is usually the first problem to be investigated because partitioning algorithms form the backbone of many algorithms including: system synthesis, processor synthesis, floorplanning, and placement. In this survey, several partitioning techniques will be examined. In addition, this paper will review the partitioning algorithms used by synthesis systems at different design levels

eScholarship - University of California

Open by design: the role of design in open innovation

Author: Acha Virginia
Publication venue: Department for Innovation, Universities and Skills
Publication date: 05/05/2011
Field of study

Digital Education Resource Archive

Explain3D: Explaining Disagreements in Disjoint Datasets

Author: Wang Xiaolan
Meliou Alexandra
Publication venue
Publication date: 24/02/1911
Field of study

Data plays an important role in applications, analytic processes, and many aspects of human activity. As data grows in size and complexity, we are met with an imperative need for tools that promote understanding and explanations over data-related operations. Data management research on explanations has focused on the assumption that data resides in a single dataset, under one common schema. But the reality of today's data is that it is frequently un-integrated, coming from different sources with different schemas. When different datasets provide different answers to semantically similar questions, understanding the reasons for the discrepancies is challenging and cannot be handled by the existing single-dataset solutions. In this paper, we propose Explain3D, a framework for explaining the disagreements across disjoint datasets (3D). Explain3D focuses on identifying the reasons for the differences in the results of two semantically similar queries operating on two datasets with potentially different schemas. Our framework leverages the queries to perform a semantic mapping across the relevant parts of their provenance; discrepancies in this mapping point to causes of the queries' differences. Exploiting the queries gives Explain3D an edge over traditional schema matching and record linkage techniques, which are query-agnostic. Our work makes the following contributions: (1) We formalize the problem of deriving optimal explanations for the differences of the results of semantically similar queries over disjoint datasets. (2) We design a 3-stage framework for solving the optimal explanation problem. (3) We develop a smart-partitioning optimizer that improves the efficiency of the framework by orders of magnitude. (4)~We experiment with real-world and synthetic data to demonstrate that Explain3D can derive precise explanations efficiently

arXiv.org e-Print Archive

Trinity College