205 research outputs found
GreedyDual-Join: Locality-Aware Buffer Management for Approximate Join Processing Over Data Streams
We investigate adaptive buffer management techniques for approximate evaluation of sliding window joins over multiple data streams. In many applications, data stream processing systems have limited memory or have to deal with very high speed data streams. In both cases, computing the exact results of joins between these streams may not be feasible, mainly because the buffers used to compute the joins contain much smaller number of tuples than the tuples contained in the sliding windows. Therefore, a stream buffer management policy is needed in that case. We show that the buffer replacement policy is an important determinant of the quality of the produced results. To that end, we propose GreedyDual-Join (GDJ) an adaptive and locality-aware buffering technique for managing these buffers. GDJ exploits the temporal correlations (at both long and short time scales), which we found to be prevalent in many real data streams. We note that our algorithm is readily applicable to multiple data streams and multiple joins and requires almost no additional system resources. We report results of an experimental study using both synthetic and real-world data sets. Our results demonstrate the superiority and flexibility of our approach when contrasted to other recently proposed techniques
Discovering Clusters in Motion Time-Series Data
A new approach is proposed for clustering time-series data. The approach can be used to discover groupings of similar object motions that were observed in a video collection. A finite mixture of hidden Markov models (HMMs) is fitted to the motion data using the expectation-maximization (EM) framework. Previous approaches for HMM-based clustering employ a k-means formulation, where each sequence is assigned to only a single HMM. In contrast, the formulation presented in this paper allows each sequence to belong to more than a single HMM with some probability, and the hard decision about the sequence class membership can be deferred until a later time when such a decision is required. Experiments with simulated data demonstrate the benefit of using this EM-based approach when there is more "overlap" in the processes generating the data. Experiments with real data show the promising potential of HMM-based motion clustering in a number of applications.Office of Naval Research (N000140310108, N000140110444); National Science Foundation (IIS-0208876, CAREER Award 0133825
Efficient Correlation Clustering Methods for Large Consensus Clustering Instances
Consensus clustering (or clustering aggregation) inputs partitions of a
given ground set , and seeks to create a single partition that minimizes
disagreement with all input partitions. State-of-the-art algorithms for
consensus clustering are based on correlation clustering methods like the
popular Pivot algorithm. Unfortunately these methods have not proved to be
practical for consensus clustering instances where either or gets
large.
In this paper we provide practical run time improvements for correlation
clustering solvers when is large. We reduce the time complexity of Pivot
from to , and its space complexity from to
-- a significant savings since in practice is much less than
. We also analyze a sampling method for these algorithms when is
large, bridging the gap between running Pivot on the full set of input
partitions (an expected 1.57-approximation) and choosing a single input
partition at random (an expected 2-approximation). We show experimentally that
algorithms like Pivot do obtain quality clustering results in practice even on
small samples of input partitions
PA-Tree: A Parametric Indexing Scheme for Spatio-temporal Trajectories
Abstract. Many new applications involving moving objects require the collec-tion and querying of trajectory data, so efficient indexing methods are needed to support complex spatio-temporal queries on such data. Current work in this domain has used MBRs to approximate trajectories, which fail to capture some basic properties of trajectories, including smoothness and lack of internal area. This mismatch leads to poor pruning when such indices are used. In this work, we revisit the issue of using parametric space indexing for historical trajectory data. We approximate a sequence of movement functions with single continuous polynomial. Since trajectories tend to be smooth, our approximations work well and yield much finer approximation quality than MBRs. We present the PA-tree, a parametric index that uses this new approximation method. Experiments show that PA-tree construction costs are orders of magnitude lower than that of com-peting methods. Further, for spatio-temporal range queries, MBR-based methods require 20%–60 % more I/O than PA-trees with clustered indicies, and 300%– 400 % more I/O than PA-trees with non-clustered indicies.
Autoimmune polyendocrinopathy-candidiasis-ectodermal dystrophy syndrome (APECED) due to AIRET16M mutation in a consanguineous Greek girl
Autoimmune polyendocrinopathy-candidiasis-ectodermal dystrophy syndrome (APECED) or autoimmune polyendocrine syndrome type 1 (APS-1) is a rare autosomal recessive disease caused by mutations of the AutoImmune REgulator (AIRE) gene, an important mediator of tolerance to self-antigens. It is characterized by two out of three major components: chronic mucocutaneous candidiasis, hypoparathyroidism and Addison's disease. We present an 11-year-old girl suffering from recurrent episodes of mucocutaneous candidiasis and onychomycosis from 1 to 6years of age, and transient alopecia at the age of 4years. Hypoparathyroidism and dental enamel hypoplasia were diagnosed at 8years. Autoantibodies to thyroid and adrenal glands were not detected and all other endocrine functions have remained normal. Genetic analysis revealed that the patient was homozygous for the mutation T16M in exon 1 of the AIRE gene (p.T16M, c.47C>T). This is the first APECED case reported for carrying this mutation in homozygous form. Parents were third cousins and heterozygous carriers of this mutatio
- …