13,183 research outputs found

    Reducing UK-means to k-means

    Get PDF
    This paper proposes an optimisation to the UK-means algorithm, which generalises the k-means algorithm to handle objects whose locations are uncertain. The location of each object is described by a probability density function (pdf). The UK-means algorithm needs to compute expected distances (EDs) between each object and the cluster representatives. The evaluation of ED from first principles is very costly operation, because the pdf's are different and arbitrary. But UK-means needs to evaluate a lot of EDs. This is a major performance burden of the algorithm. In this paper, we derive a formula for evaluating EDs efficiently. This tremendously reduces the execution time of UK-means, as demonstrated by our preliminary experiments. We also illustrate that this optimised formula effectively reduces the UK-means problem to the traditional clustering algorithm addressed by the k-means algorithm. © 2007 IEEE.published_or_final_versionThe 7th IEEE International Conference on Data Mining (ICDM) Workshops 2007, Omaha, NE., 28-31 October 2007. In Proceedings of the 7th ICDM, 2007, p. 483-48

    Clustering uncertain data using voronoi diagrams and R-tree index

    Get PDF
    We study the problem of clustering uncertain objects whose locations are described by probability density functions (pdfs). We show that the UK-means algorithm, which generalizes the k-means algorithm to handle uncertain objects, is very inefficient. The inefficiency comes from the fact that UK-means computes expected distances (EDs) between objects and cluster representatives. For arbitrary pdfs, expected distances are computed by numerical integrations, which are costly operations. We propose pruning techniques that are based on Voronoi diagrams to reduce the number of expected distance calculations. These techniques are analytically proven to be more effective than the basic bounding-box-based technique previously known in the literature. We then introduce an R-tree index to organize the uncertain objects so as to reduce pruning overheads. We conduct experiments to evaluate the effectiveness of our novel techniques. We show that our techniques are additive and, when used in combination, significantly outperform previously known methods. © 2006 IEEE.published_or_final_versio

    Efficient mining of frequent item sets on large uncertain databases

    Get PDF
    The data handled in emerging applications like location-based services, sensor monitoring systems, and data integration, are often inexact in nature. In this paper, we study the important problem of extracting frequent item sets from a large uncertain database, interpreted under the Possible World Semantics (PWS). This issue is technically challenging, since an uncertain database contains an exponential number of possible worlds. By observing that the mining process can be modeled as a Poisson binomial distribution, we develop an approximate algorithm, which can efficiently and accurately discover frequent item sets in a large uncertain database. We also study the important issue of maintaining the mining result for a database that is evolving (e.g., by inserting a tuple). Specifically, we propose incremental mining algorithms, which enable Probabilistic Frequent Item set (PFI) results to be refreshed. This reduces the need of re-executing the whole mining algorithm on the new database, which is often more expensive and unnecessary. We examine how an existing algorithm that extracts exact item sets, as well as our approximate algorithm, can support incremental mining. All our approaches support both tuple and attribute uncertainty, which are two common uncertain database models. We also perform extensive evaluation on real and synthetic data sets to validate our approaches. © 1989-2012 IEEE.published_or_final_versio

    Naive bayes classification of uncertain data

    Get PDF
    Traditional machine learning algorithms assume that data are exact or precise. However, this assumption may not hold in some situations because of data uncertainty arising from measurement errors, data staleness, and repeated measurements, etc. With uncertainty, the value of each data item is represented by a probability distribution function (pdf). In this paper, we propose a novel naive Bayes classification algorithm for uncertain data with a pdf. Our key solution is to extend the class conditional probability estimation in the Bayes model to handle pdf's. Extensive experiments on UCI datasets show that the accuracy of naive Bayes model can be improved by taking into account the uncertainty information. © 2009 IEEE.published_or_final_versionThe 9th IEEE International Conference on Data Mining (ICDM), Miami, FL., 6-9 December 2009. In Proceedings of the 9th ICDM, 2009, p. 944-94

    Electron-Spin Excitation Coupling in an Electron Doped Copper Oxide Superconductor

    Full text link
    High-temperature (high-Tc) superconductivity in the copper oxides arises from electron or hole doping of their antiferromagnetic (AF) insulating parent compounds. The evolution of the AF phase with doping and its spatial coexistence with superconductivity are governed by the nature of charge and spin correlations and provide clues to the mechanism of high-Tc superconductivity. Here we use a combined neutron scattering and scanning tunneling spectroscopy (STS) to study the Tc evolution of electron-doped superconducting Pr0.88LaCe0.12CuO4-delta obtained through the oxygen annealing process. We find that spin excitations detected by neutron scattering have two distinct modes that evolve with Tc in a remarkably similar fashion to the electron tunneling modes in STS. These results demonstrate that antiferromagnetism and superconductivity compete locally and coexist spatially on nanometer length scales, and the dominant electron-boson coupling at low energies originates from the electron-spin excitations.Comment: 30 pages, 12 figures, supplementary information include
    • …
    corecore