187 research outputs found

    ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution

    Full text link
    Entity resolution (ER), an important and common data cleaning problem, is about detecting data duplicate representations for the same external entities, and merging them into single representations. Relatively recently, declarative rules called matching dependencies (MDs) have been proposed for specifying similarity conditions under which attribute values in database records are merged. In this work we show the process and the benefits of integrating three components of ER: (a) Classifiers for duplicate/non-duplicate record pairs built using machine learning (ML) techniques, (b) MDs for supporting both the blocking phase of ML and the merge itself; and (c) The use of the declarative language LogiQL -an extended form of Datalog supported by the LogicBlox platform- for data processing, and the specification and enforcement of MDs.Comment: To appear in Proc. SUM, 201

    Feasibility tests of transmission x-ray photoelectron emission microscopy of wet samples

    Get PDF
    We performed feasibility tests of photoelectron emission spectromicroscopy of wet samples in the water window (285-532 eV) soft x-ray spectral region. Water was successfully confined in an ultrahigh vacuum compatible compartment with x-ray transparent sides. This water cell was placed in the MEPHISTO spectromicroscope in a transmission geometry, and complete x-ray absorption spectra of the water window region were acquired. We also show micrographs of test samples, mounted outside of the compartment, and imaged through the water. This technique can be used to study liquid chemistry and, at least to the micron level, the microstructure of wet samples. Possibilities include cells in water or buffer, proteins in solution, oils of tribological interest, liquid crystals, and other samples not presently accessible to the powerful x-ray photoelectron emission spectromicroscopy technique

    Application of Linear Discriminant Analysis in Dimensionality Reduction for Hand Motion Classification

    Get PDF
    The classification of upper-limb movements based on surface electromyography (EMG) signals is an important issue in the control of assistive devices and rehabilitation systems. Increasing the number of EMG channels and features in order to increase the number of control commands can yield a high dimensional feature vector. To cope with the accuracy and computation problems associated with high dimensionality, it is commonplace to apply a processing step that transforms the data to a space of significantly lower dimensions with only a limited loss of useful information. Linear discriminant analysis (LDA) has been successfully applied as an EMG feature projection method. Recently, a number of extended LDA-based algorithms have been proposed, which are more competitive in terms of both classification accuracy and computational costs/times with classical LDA. This paper presents the findings of a comparative study of classical LDA and five extended LDA methods. From a quantitative comparison based on seven multi-feature sets, three extended LDA-based algorithms, consisting of uncorrelated LDA, orthogonal LDA and orthogonal fuzzy neighborhood discriminant analysis, produce better class separability when compared with a baseline system (without feature projection), principle component analysis (PCA), and classical LDA. Based on a 7-dimension time domain and time-scale feature vectors, these methods achieved respectively 95.2% and 93.2% classification accuracy by using a linear discriminant classifier

    C14ORF39/SIX6OS1 is a constituent of the synaptonemal complex and is essential for mouse fertility

    Get PDF
    Meiotic recombination generates crossovers between homologous chromosomes that are essential for genome haploidization. The synaptonemal complex is a ‘zipper’-like protein assembly that synapses homologue pairs together and provides the structural framework for processing recombination sites into crossovers. Humans show individual differences in the number of crossovers generated across the genome. Recently, an anonymous gene variant in C14ORF39/SIX6OS1 was identified that influences the recombination rate in humans. Here we show that C14ORF39/SIX6OS1 encodes a component of the central element of the synaptonemal complex. Yeast two-hybrid analysis reveals that SIX6OS1 interacts with the well-established protein synaptonemal complex central element 1 (SYCE1). Mice lacking SIX6OS1 are defective in chromosome synapsis at meiotic prophase I, which provokes an arrest at the pachytene-like stage and results in infertility. In accordance with its role as a modifier of the human recombination rate, SIX6OS1 is essential for the appropriate processing of intermediate recombination nodules before crossover formation.This work was supported by BFU_2014-59307-R, MEIONet and JCyLe (CSI052U16). LGH and NFM are supported by European Social Fund/JCyLe grants (EDU/1083/2013 and EDU/310/2015). ORD is a Sir Henry Dale Fellow jointly funded by the Wellcome Trust and Royal Society (Grant Number 104158/Z/14/Z). RB is funded by DFG (grant Be1168/8-1). AT and ID were supported by DFG grants TO421/8-2 and TO421/6-1, respectively.Peer reviewe

    Analysis of human meiotic recombination events with a parent-sibling tracing approach

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Meiotic recombination ensures that each child inherits distinct genetic materials from each parent, but the distribution of crossovers along meiotic chromosomes remains difficult to identify. In this study, we developed a parent-sibling tracing (PST) approach from previously reported methods to identify meiotic crossover sites of GEO GSE6754 data set. This approach requires only the single nucleotide polymorphism (SNP) data of the pedigrees of both parents and at least two of children.</p> <p>Results</p> <p>Compared to other SNP-based algorithms (identity by descent or pediSNP), fewer uninformative SNPs were derived with the use of PST. Analysis of a GEO GSE6754 data set containing 2,145 maternal and paternal meiotic events revealed that the pattern and distribution of paternal and maternal recombination sites vary along the chromosomes. Lower crossover rates near the centromeres were more prominent in males than in females. Based on analysis of repetitive sequences, we also showed that recombination hotspots are positively correlated with SINE/MIR repetitive elements and negatively correlated with LINE/L1 elements. The number of meiotic recombination events was positively correlated with the number of shorter tandem repeat sequences.</p> <p>Conclusions</p> <p>The advantages of the PST approach include the ability to use only two-generation pedigrees with two siblings and the ability to perform gender-specific analyses of repetitive elements and tandem repeat sequences while including fewer uninformative SNP regions in the results.</p

    Genome-wide fine-scale recombination rate variation in Drosophila melanogaster

    Get PDF
    Estimating fine-scale recombination maps of Drosophila from population genomic data is a challenging problem, in particular because of the high background recombination rate. In this paper, a new computational method is developed to address this challenge. Through an extensive simulation study, it is demonstrated that the method allows more accurate inference, and exhibits greater robustness to the effects of natural selection and noise, compared to a well-used previous method developed for studying fine-scale recombination rate variation in the human genome. As an application, a genome-wide analysis of genetic variation data is performed for two Drosophila melanogaster populations, one from North America (Raleigh, USA) and the other from Africa (Gikongoro, Rwanda). It is shown that fine-scale recombination rate variation is widespread throughout the D. melanogaster genome, across all chromosomes and in both populations. At the fine-scale, a conservative, systematic search for evidence of recombination hotspots suggests the existence of a handful of putative hotspots each with at least a tenfold increase in intensity over the background rate. A wavelet analysis is carried out to compare the estimated recombination maps in the two populations and to quantify the extent to which recombination rates are conserved. In general, similarity is observed at very broad scales, but substantial differences are seen at fine scales. The average recombination rate of the X chromosome appears to be higher than that of the autosomes in both populations, and this pattern is much more pronounced in the African population than the North American population. The correlation between various genomic features—including recombination rates, diversity, divergence, GC content, gene content, and sequence quality—is examined using the wavelet analysis, and it is shown that the most notable difference between D. melanogaster and humans is in the correlation between recombination and diversity

    Prdm9, a Major Determinant of Meiotic Recombination Hotspots, Is Not Functional in Dogs and Their Wild Relatives, Wolves and Coyotes

    Get PDF
    Meiotic recombination is a fundamental process needed for the correct segregation of chromosomes during meiosis in sexually reproducing organisms. In humans, 80% of crossovers are estimated to occur at specific areas of the genome called recombination hotspots. Recently, a protein called PRDM9 was identified as a major player in determining the location of genome-wide meiotic recombination hotspots in humans and mice. The origin of this protein seems to be ancient in evolutionary time, as reflected by its fairly conserved structure in lineages that diverged over 700 million years ago. Despite its important role, there are many animal groups in which Prdm9 is absent (e.g. birds, reptiles, amphibians, diptera) and it has been suggested to have disruptive mutations and thus to be a pseudogene in dogs. Because of the dog's history through domestication and artificial selection, we wanted to confirm the presence of a disrupted Prdm9 gene in dogs and determine whether this was exclusive of this species or whether it also occurred in its wild ancestor, the wolf, and in a close relative, the coyote. We sequenced the region in the dog genome that aligned to the last exon of the human Prdm9, containing the entire zinc finger domain, in 4 dogs, 17 wolves and 2 coyotes. Our results show that the three canid species possess mutations that likely make this gene non functional. Because these mutations are shared across the three species, they must have appeared prior to the split of the wolf and the coyote, millions of years ago, and are not related to domestication. In addition, our results suggest that in these three canid species recombination does not occur at hotspots or hotspot location is controlled through a mechanism yet to be determined
    • …
    corecore