16 research outputs found
Set Similarity Search for Skewed Data
Set similarity join, as well as the corresponding indexing problem set
similarity search, are fundamental primitives for managing noisy or uncertain
data. For example, these primitives can be used in data cleaning to identify
different representations of the same object. In many cases one can represent
an object as a sparse 0-1 vector, or equivalently as the set of nonzero entries
in such a vector. A set similarity join can then be used to identify those
pairs that have an exceptionally large dot product (or intersection, when
viewed as sets). We choose to focus on identifying vectors with large Pearson
correlation, but results extend to other similarity measures. In particular, we
consider the indexing problem of identifying correlated vectors in a set S of
vectors sampled from {0,1}^d. Given a query vector y and a parameter alpha in
(0,1), we need to search for an alpha-correlated vector x in a data structure
representing the vectors of S. This kind of similarity search has been
intensely studied in worst-case (non-random data) settings.
Existing theoretically well-founded methods for set similarity search are
often inferior to heuristics that take advantage of skew in the data
distribution, i.e., widely differing frequencies of 1s across the d dimensions.
The main contribution of this paper is to analyze the set similarity problem
under a random data model that reflects the kind of skewed data distributions
seen in practice, allowing theoretical results much stronger than what is
possible in worst-case settings. Our indexing data structure is a recursive,
data-dependent partitioning of vectors inspired by recent advances in set
similarity search. Previous data-dependent methods do not seem to allow us to
exploit skew in item frequencies, so we believe that our work sheds further
light on the power of data dependence
Mitochondrial mosaics in the liver of 3 infants with mtDNA defects
<p>Abstract</p> <p>Background</p> <p>In muscle cytochrome oxidase (COX) negative fibers (mitochondrial mosaics) have often been visualized.</p> <p>Methods</p> <p>COX activity staining of liver for light and electron microscopy, muscle stains, blue native gel electrophoresis and activity assays of respiratory chain proteins, their immunolocalisation, mitochondrial and nuclear DNA analysis.</p> <p>Results</p> <p>Three unrelated infants showed a mitochondrial mosaic in the liver after staining for COX activity, i.e. hepatocytes with strongly reactive mitochondria were found adjacent to cells with many negative, or barely reactive, mitochondria. Deficiency was most severe in the patient diagnosed with Pearson syndrome. Ragged-red fibers were absent in muscle biopsies of all patients. Enzyme biochemistry was not diagnostic in muscle, fibroblasts and lymphocytes. Blue native gel electrophoresis of liver tissue, but not of muscle, demonstrated a decreased activity of complex IV; in both muscle and liver subcomplexes of complex V were seen. Immunocytochemistry of complex IV confirmed the mosaic pattern in two livers, but not in fibroblasts. MRI of the brain revealed severe white matter cavitation in the Pearson case, but only slight cortical atrophy in the Alpers-Huttenlocher patient, and a normal image in the 3rd. MtDNA in leucocytes showed a common deletion in 50% of the mtDNA molecules of the Pearson patient. In the patient diagnosed with Alpers-Huttenlocher syndrome, mtDNA was depleted for 60% in muscle. In the 3rd patient muscular and hepatic mtDNA was depleted for more than 70%. Mutations in the nuclear encoded gene of <it>POLG </it>were subsequently found in both the 2nd and 3rd patients.</p> <p>Conclusion</p> <p>Histoenzymatic COX staining of a liver biopsy is fast and yields crucial data about the pathogenesis; it indicates whether mtDNA should be assayed. Each time a mitochondrial disorder is suspected and muscle data are non-diagnostic, a liver biopsy should be recommended. Mosaics are probably more frequent than observed until now. A novel pathogenic mutation in <it>POLG </it>is reported.</p> <p>Tentative explanations for the mitochondrial mosaics are, in one patient, unequal partition of mutated mitochondria during mitoses, and in two others, an interaction between products of several genes required for mtDNA maintenance.</p