35,065 research outputs found
Genetic based clustering algorithms and applications.
by Lee Wing Kin.Thesis (M.Phil.)--Chinese University of Hong Kong, 2000.Includes bibliographical references (leaves 81-90).Abstracts in English and Chinese.Abstract --- p.iAcknowledgments --- p.iiiList of Figures --- p.viiList of Tables --- p.viiiChapter 1 --- Introduction --- p.1Chapter 1.1 --- Clustering --- p.1Chapter 1.1.1 --- Hierarchical Classification --- p.2Chapter 1.1.2 --- Partitional Classification --- p.3Chapter 1.1.3 --- Comparative Analysis --- p.4Chapter 1.2 --- Cluster Analysis and Traveling Salesman Problem --- p.5Chapter 1.3 --- Solving Clustering Problem --- p.7Chapter 1.4 --- Genetic Algorithms --- p.9Chapter 1.5 --- Outline of Work --- p.11Chapter 2 --- The Clustering Algorithms and Applications --- p.13Chapter 2.1 --- Introduction --- p.13Chapter 2.2 --- Traveling Salesman Problem --- p.14Chapter 2.2.1 --- Related Work on TSP --- p.14Chapter 2.2.2 --- Solving TSP using Genetic Algorithm --- p.15Chapter 2.3 --- Applications --- p.22Chapter 2.3.1 --- Clustering for Vertical Partitioning Design --- p.22Chapter 2.3.2 --- Horizontal Partitioning a Relational Database --- p.36Chapter 2.3.3 --- Object-Oriented Database Design --- p.42Chapter 2.3.4 --- Document Database Design --- p.49Chapter 2.4 --- Conclusions --- p.53Chapter 3 --- The Experiments for Vertical Partitioning Problem --- p.55Chapter 3.1 --- Introduction --- p.55Chapter 3.2 --- Comparative Study --- p.56Chapter 3.3 --- Experimental Results --- p.59Chapter 3.4 --- Conclusions --- p.61Chapter 4 --- Three New Operators for TSP --- p.62Chapter 4.1 --- Introduction --- p.62Chapter 4.2 --- Enhanced Cost Edge Recombination Operator --- p.63Chapter 4.3 --- Shortest Path Operator --- p.66Chapter 4.4 --- Shortest Edge Operator --- p.69Chapter 4.5 --- The Experiments --- p.71Chapter 4.5.1 --- Experimental Results for a 48-city TSP --- p.71Chapter 4.5.2 --- Experimental Results for Problems in TSPLIB --- p.73Chapter 4.6 --- Conclusions --- p.77Chapter 5 --- Conclusions --- p.78Chapter 5.1 --- Summary of Achievements --- p.78Chapter 5.2 --- Future Development --- p.80Bibliography --- p.8
Approximate Computation and Implicit Regularization for Very Large-scale Data Analysis
Database theory and database practice are typically the domain of computer
scientists who adopt what may be termed an algorithmic perspective on their
data. This perspective is very different than the more statistical perspective
adopted by statisticians, scientific computers, machine learners, and other who
work on what may be broadly termed statistical data analysis. In this article,
I will address fundamental aspects of this algorithmic-statistical disconnect,
with an eye to bridging the gap between these two very different approaches. A
concept that lies at the heart of this disconnect is that of statistical
regularization, a notion that has to do with how robust is the output of an
algorithm to the noise properties of the input data. Although it is nearly
completely absent from computer science, which historically has taken the input
data as given and modeled algorithms discretely, regularization in one form or
another is central to nearly every application domain that applies algorithms
to noisy data. By using several case studies, I will illustrate, both
theoretically and empirically, the nonobvious fact that approximate
computation, in and of itself, can implicitly lead to statistical
regularization. This and other recent work suggests that, by exploiting in a
more principled way the statistical properties implicit in worst-case
algorithms, one can in many cases satisfy the bicriteria of having algorithms
that are scalable to very large-scale databases and that also have good
inferential or predictive properties.Comment: To appear in the Proceedings of the 2012 ACM Symposium on Principles
of Database Systems (PODS 2012
Recommended from our members
Survey of partitioning techniques in silicon compilation
In the silicon compilation design process, partitioning is usually the first problem to be investigated because partitioning algorithms form the backbone of many algorithms including: system synthesis, processor synthesis, floorplanning, and placement. In this survey, several partitioning techniques will be examined. In addition, this paper will review the partitioning algorithms used by synthesis systems at different design levels
Explain3D: Explaining Disagreements in Disjoint Datasets
Data plays an important role in applications, analytic processes, and many
aspects of human activity. As data grows in size and complexity, we are met
with an imperative need for tools that promote understanding and explanations
over data-related operations. Data management research on explanations has
focused on the assumption that data resides in a single dataset, under one
common schema. But the reality of today's data is that it is frequently
un-integrated, coming from different sources with different schemas. When
different datasets provide different answers to semantically similar questions,
understanding the reasons for the discrepancies is challenging and cannot be
handled by the existing single-dataset solutions.
In this paper, we propose Explain3D, a framework for explaining the
disagreements across disjoint datasets (3D). Explain3D focuses on identifying
the reasons for the differences in the results of two semantically similar
queries operating on two datasets with potentially different schemas. Our
framework leverages the queries to perform a semantic mapping across the
relevant parts of their provenance; discrepancies in this mapping point to
causes of the queries' differences. Exploiting the queries gives Explain3D an
edge over traditional schema matching and record linkage techniques, which are
query-agnostic. Our work makes the following contributions: (1) We formalize
the problem of deriving optimal explanations for the differences of the results
of semantically similar queries over disjoint datasets. (2) We design a 3-stage
framework for solving the optimal explanation problem. (3) We develop a
smart-partitioning optimizer that improves the efficiency of the framework by
orders of magnitude. (4)~We experiment with real-world and synthetic data to
demonstrate that Explain3D can derive precise explanations efficiently
- …