28,794 research outputs found

    Detecting the Origin of Text Segments Efficiently

    Get PDF
    In the origin detection problem an algorithm is given a set S of documents, ordered by creation time, and a query document D. It needs to output for every consecutive sequence of k alphanumeric terms in D the earliest document in S in which the sequence appeared (if such a document exists). Algorithms for the origin detection problem can, for example, be used to detect the "origin" of text segments in D and thus to detect novel content in D. They can also find the document from which the author of D has copied the most (or show that D is mostly original). We propose novel algorithm for this problem and evaluate them together with a large number of previously published algorithms. Our results show that (1) detecting the origin of text segments efficiently can be done with very high accuracy even when the space used is less than 1% of the size of the documents in S, (2) the precision degrades smoothly with the amount of available space, (3) various estimation techniques can be used to increase the performance of the algorithms

    Bayesian modeling of recombination events in bacterial populations

    Get PDF
    Background: We consider the discovery of recombinant segments jointly with their origins within multilocus DNA sequences from bacteria representing heterogeneous populations of fairly closely related species. The currently available methods for recombination detection capable of probabilistic characterization of uncertainty have a limited applicability in practice as the number of strains in a data set increases. Results: We introduce a Bayesian spatial structural model representing the continuum of origins over sites within the observed sequences, including a probabilistic characterization of uncertainty related to the origin of any particular site. To enable a statistically accurate and practically feasible approach to the analysis of large-scale data sets representing a single genus, we have developed a novel software tool (BRAT, Bayesian Recombination Tracker) implementing the model and the corresponding learning algorithm, which is capable of identifying the posterior optimal structure and to estimate the marginal posterior probabilities of putative origins over the sites. Conclusion: A multitude of challenging simulation scenarios and an analysis of real data from seven housekeeping genes of 120 strains of genus Burkholderia are used to illustrate the possibilities offered by our approach. The software is freely available for download at URL http://web.abo.fi/fak/ mnf//mate/jc/software/brat.html

    Emulsion Chamber with Big Radiation Length for Detecting Neutrino Oscillations

    Get PDF
    A conceptual scheme of a hybrid-emulsion spectrometer for investigating various channels of neutrino oscillations is proposed. The design emphasizes detection of τ\tau leptons by detached vertices, reliable identification of electrons, and good spectrometry for all charged particles and photons. A distributed target is formed by layers of low-Z material, emulsion-plastic-emulsion sheets, and air gaps in which τ\tau decays are detected. The tracks of charged secondaries, including electrons, are momentum-analyzed by curvature in magnetic field using hits in successive thin layers of emulsion. The τ\tau leptons are efficiently detected in all major decay channels, including \xedec. Performance of a model spectrometer, that contains 3 tons of nuclear emulsion and 20 tons of passive material, is estimated for different experimental environments. When irradiated by the νμ\nu_\mu beam of a proton accelerator over a medium baseline of ∼1 \sim 1 km/GeV, the spectrometer will efficiently detect either the \omutau and \omue transitions in the mass-difference region of Δm2∼1\Delta m^2 \sim 1 eV2^2, as suggested by the results of LSND. When exposed to the neutrino beam of a muon storage ring over a long baseline of ∼ \sim 10-20 km/GeV, the model detector will efficiently probe the entire pattern of neutrino oscillations in the region Δm2∼10−2−10−3\Delta m^2 \sim 10^{-2}-10^{-3} eV2^2, as suggested by the data on atmospheric neutrinos.Comment: 34 pages, 8 figure

    Text Localization in Video Using Multiscale Weber's Local Descriptor

    Full text link
    In this paper, we propose a novel approach for detecting the text present in videos and scene images based on the Multiscale Weber's Local Descriptor (MWLD). Given an input video, the shots are identified and the key frames are extracted based on their spatio-temporal relationship. From each key frame, we detect the local region information using WLD with different radius and neighborhood relationship of pixel values and hence obtained intensity enhanced key frames at multiple scales. These multiscale WLD key frames are merged together and then the horizontal gradients are computed using morphological operations. The obtained results are then binarized and the false positives are eliminated based on geometrical properties. Finally, we employ connected component analysis and morphological dilation operation to determine the text regions that aids in text localization. The experimental results obtained on publicly available standard Hua, Horizontal-1 and Horizontal-2 video dataset illustrate that the proposed method can accurately detect and localize texts of various sizes, fonts and colors in videos.Comment: IEEE SPICES, 201

    Probabilistic RGB-D Odometry based on Points, Lines and Planes Under Depth Uncertainty

    Full text link
    This work proposes a robust visual odometry method for structured environments that combines point features with line and plane segments, extracted through an RGB-D camera. Noisy depth maps are processed by a probabilistic depth fusion framework based on Mixtures of Gaussians to denoise and derive the depth uncertainty, which is then propagated throughout the visual odometry pipeline. Probabilistic 3D plane and line fitting solutions are used to model the uncertainties of the feature parameters and pose is estimated by combining the three types of primitives based on their uncertainties. Performance evaluation on RGB-D sequences collected in this work and two public RGB-D datasets: TUM and ICL-NUIM show the benefit of using the proposed depth fusion framework and combining the three feature-types, particularly in scenes with low-textured surfaces, dynamic objects and missing depth measurements.Comment: Major update: more results, depth filter released as opensource, 34 page
    • …