16 research outputs found

    Set Similarity Search for Skewed Data

    Get PDF
    Set similarity join, as well as the corresponding indexing problem set similarity search, are fundamental primitives for managing noisy or uncertain data. For example, these primitives can be used in data cleaning to identify different representations of the same object. In many cases one can represent an object as a sparse 0-1 vector, or equivalently as the set of nonzero entries in such a vector. A set similarity join can then be used to identify those pairs that have an exceptionally large dot product (or intersection, when viewed as sets). We choose to focus on identifying vectors with large Pearson correlation, but results extend to other similarity measures. In particular, we consider the indexing problem of identifying correlated vectors in a set S of vectors sampled from {0,1}^d. Given a query vector y and a parameter alpha in (0,1), we need to search for an alpha-correlated vector x in a data structure representing the vectors of S. This kind of similarity search has been intensely studied in worst-case (non-random data) settings. Existing theoretically well-founded methods for set similarity search are often inferior to heuristics that take advantage of skew in the data distribution, i.e., widely differing frequencies of 1s across the d dimensions. The main contribution of this paper is to analyze the set similarity problem under a random data model that reflects the kind of skewed data distributions seen in practice, allowing theoretical results much stronger than what is possible in worst-case settings. Our indexing data structure is a recursive, data-dependent partitioning of vectors inspired by recent advances in set similarity search. Previous data-dependent methods do not seem to allow us to exploit skew in item frequencies, so we believe that our work sheds further light on the power of data dependence

    Mitochondrial mosaics in the liver of 3 infants with mtDNA defects

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In muscle cytochrome oxidase (COX) negative fibers (mitochondrial mosaics) have often been visualized.</p> <p>Methods</p> <p>COX activity staining of liver for light and electron microscopy, muscle stains, blue native gel electrophoresis and activity assays of respiratory chain proteins, their immunolocalisation, mitochondrial and nuclear DNA analysis.</p> <p>Results</p> <p>Three unrelated infants showed a mitochondrial mosaic in the liver after staining for COX activity, i.e. hepatocytes with strongly reactive mitochondria were found adjacent to cells with many negative, or barely reactive, mitochondria. Deficiency was most severe in the patient diagnosed with Pearson syndrome. Ragged-red fibers were absent in muscle biopsies of all patients. Enzyme biochemistry was not diagnostic in muscle, fibroblasts and lymphocytes. Blue native gel electrophoresis of liver tissue, but not of muscle, demonstrated a decreased activity of complex IV; in both muscle and liver subcomplexes of complex V were seen. Immunocytochemistry of complex IV confirmed the mosaic pattern in two livers, but not in fibroblasts. MRI of the brain revealed severe white matter cavitation in the Pearson case, but only slight cortical atrophy in the Alpers-Huttenlocher patient, and a normal image in the 3rd. MtDNA in leucocytes showed a common deletion in 50% of the mtDNA molecules of the Pearson patient. In the patient diagnosed with Alpers-Huttenlocher syndrome, mtDNA was depleted for 60% in muscle. In the 3rd patient muscular and hepatic mtDNA was depleted for more than 70%. Mutations in the nuclear encoded gene of <it>POLG </it>were subsequently found in both the 2nd and 3rd patients.</p> <p>Conclusion</p> <p>Histoenzymatic COX staining of a liver biopsy is fast and yields crucial data about the pathogenesis; it indicates whether mtDNA should be assayed. Each time a mitochondrial disorder is suspected and muscle data are non-diagnostic, a liver biopsy should be recommended. Mosaics are probably more frequent than observed until now. A novel pathogenic mutation in <it>POLG </it>is reported.</p> <p>Tentative explanations for the mitochondrial mosaics are, in one patient, unequal partition of mutated mitochondria during mitoses, and in two others, an interaction between products of several genes required for mtDNA maintenance.</p
    corecore