1,609 research outputs found

    String Indexing for Patterns with Wildcards

    Get PDF
    We consider the problem of indexing a string tt of length nn to report the occurrences of a query pattern pp containing mm characters and jj wildcards. Let occocc be the number of occurrences of pp in tt, and σ\sigma the size of the alphabet. We obtain the following results. - A linear space index with query time O(m+σjloglogn+occ)O(m+\sigma^j \log \log n + occ). This significantly improves the previously best known linear space index by Lam et al. [ISAAC 2007], which requires query time Θ(jn)\Theta(jn) in the worst case. - An index with query time O(m+j+occ)O(m+j+occ) using space O(σk2nlogklogn)O(\sigma^{k^2} n \log^k \log n), where kk is the maximum number of wildcards allowed in the pattern. This is the first non-trivial bound with this query time. - A time-space trade-off, generalizing the index by Cole et al. [STOC 2004]. We also show that these indexes can be generalized to allow variable length gaps in the pattern. Our results are obtained using a novel combination of well-known and new techniques, which could be of independent interest

    Comparison of Spectra in Unsequenced Species

    Get PDF
    International audienceWe introduce a new algorithm for the mass spectromet- ric identication of proteins. Experimental spectra obtained by tandem MS/MS are directly compared to theoretical spectra generated from pro- teins of evolutionarily closely related organisms. This work is motivated by the need of a method that allows the identication of proteins of unsequenced species against a database containing proteins of related organisms. The idea is that matching spectra of unknown peptides to very similar MS/MS spectra generated from this database of annotated proteins can lead to annotate unknown proteins. This process is similar to ortholog annotation in protein sequence databases. The difficulty with such an approach is that two similar peptides, even with just one mod- ication (i.e. insertion, deletion or substitution of one or several amino acid(s)) between them, usually generate very dissimilar spectra. In this paper, we present a new dynamic programming based algorithm: Packet- SpectralAlignment. Our algorithm is tolerant to modications and fully exploits two important properties that are usually not considered: the notion of inner symmetry, a relation linking pairs of spectrum peaks, and the notion of packet inside each spectrum to keep related peaks together. Our algorithm, PacketSpectralAlignment is then compared to SpectralAlignment [1] on a dataset of simulated spectra. Our tests show that PacketSpectralAlignment behaves better, in terms of results and execution tim

    Locating previously unknown patterns in data-mining results: a dual data- and knowledge-mining method

    Get PDF
    BACKGROUND: Data mining can be utilized to automate analysis of substantial amounts of data produced in many organizations. However, data mining produces large numbers of rules and patterns, many of which are not useful. Existing methods for pruning uninteresting patterns have only begun to automate the knowledge acquisition step (which is required for subjective measures of interestingness), hence leaving a serious bottleneck. In this paper we propose a method for automatically acquiring knowledge to shorten the pattern list by locating the novel and interesting ones. METHODS: The dual-mining method is based on automatically comparing the strength of patterns mined from a database with the strength of equivalent patterns mined from a relevant knowledgebase. When these two estimates of pattern strength do not match, a high "surprise score" is assigned to the pattern, identifying the pattern as potentially interesting. The surprise score captures the degree of novelty or interestingness of the mined pattern. In addition, we show how to compute p values for each surprise score, thus filtering out noise and attaching statistical significance. RESULTS: We have implemented the dual-mining method using scripts written in Perl and R. We applied the method to a large patient database and a biomedical literature citation knowledgebase. The system estimated association scores for 50,000 patterns, composed of disease entities and lab results, by querying the database and the knowledgebase. It then computed the surprise scores by comparing the pairs of association scores. Finally, the system estimated statistical significance of the scores. CONCLUSION: The dual-mining method eliminates more than 90% of patterns with strong associations, thus identifying them as uninteresting. We found that the pruning of patterns using the surprise score matched the biomedical evidence in the 100 cases that were examined by hand. The method automates the acquisition of knowledge, thus reducing dependence on the knowledge elicited from human expert, which is usually a rate-limiting step

    Current challenges in software solutions for mass spectrometry-based quantitative proteomics

    Get PDF
    This work was in part supported by the PRIME-XS project, grant agreement number 262067, funded by the European Union seventh Framework Programme; The Netherlands Proteomics Centre, embedded in The Netherlands Genomics Initiative; The Netherlands Bioinformatics Centre; and the Centre for Biomedical Genetics (to S.C., B.B. and A.J.R.H); by NIH grants NCRR RR001614 and RR019934 (to the UCSF Mass Spectrometry Facility, director: A.L. Burlingame, P.B.); and by grants from the MRC, CR-UK, BBSRC and Barts and the London Charity (to P.C.

    Search for Yukawa Production of a Light Neutral Higgs Boson at LEP

    Get PDF
    Within a Two-Higgs-Doublet Model (2HDM) a search for a light Higgs boson in the mass range of 4-12 GeV has been performed in the Yukawa process e+e- -> b bbar A/h -> b bbar tau+tau-, using the data collected by the OPAL detector at LEP between 1992 and 1995 in e+e- collisions at about 91 GeV centre-of-mass energy. A likelihood selection is applied to separate background and signal. The number of observed events is in good agreement with the expected background. Within a CP-conserving 2HDM type II model the cross-section for Yukawa production depends on xiAd = |tan beta| and xihd = |sin alpha/cos beta| for the production of the CP-odd A and the CP-even h, respectively, where tan beta is the ratio of the vacuum expectation values of the Higgs doublets and alpha is the mixing angle between the neutral CP-even Higgs bosons. From our data 95% C.L. upper limits are derived for xiAd within the range of 8.5 to 13.6 and for xihd between 8.2 to 13.7, depending on the mass of the Higgs boson, assuming a branching fraction into tau+tau- of 100%. An interpretation of the limits within a 2HDM type II model with Standard Model particle content is given. These results impose constraints on several models that have been proposed to explain the recent BNL measurement of the muon anomalous magnetic moment.Comment: 24 pages, 9 figures, Submitted to Euro. Phys. J.

    Search for the Standard Model Higgs Boson with the OPAL Detector at LEP

    Full text link
    This paper summarises the search for the Standard Model Higgs boson in e+e- collisions at centre-of-mass energies up to 209 GeV performed by the OPAL Collaboration at LEP. The consistency of the data with the background hypothesis and various Higgs boson mass hypotheses is examined. No indication of a signal is found in the data and a lower bound of 112.7GeV/C^2 is obtained on the mass of the Standard Model Higgs boson at the 95% CL.Comment: 51 pages, 21 figure

    Tests of model of color reconnection and a search for glueballs using gluon jets with a rapidity gap

    Full text link
    Gluon jets with a mean energy of 22 GeV and purity of 95% are selected from hadronic Z0 decay events produced in e+e- annihilations. A subsample of these jets is identified which exhibits a large gap in the rapidity distribution of particles within the jet. After imposing the requirement of a rapidity gap, the gluon jet purity is 86%. These jets are observed to demonstrate a high degree of sensitivity to the presence of color reconnection, i.e. higher order QCD processes affecting the underlying color structure. We use our data to test three QCD models which include a simulation of color reconnection: one in the Ariadne Monte Carlo, one in the Herwig Monte Carlo, and the other by Rathsman in the Pythia Monte Carlo. We find the Rathsman and Ariadne color reconnection models can describe our gluon jet measurements only if very large values are used for the cutoff parameters which serve to terminate the parton showers, and that the description of inclusive Z0 data is significantly degraded in this case. We conclude that color reconnection as implemented by these two models is disfavored. The signal from the Herwig color reconnection model is less clear and we do not obtain a definite conclusion concerning this model. In a separate study, we follow recent theoretical suggestions and search for glueball-like objects in the leading part of the gluon jets. No clear evidence is observed for these objects.Comment: 42 pages, 18 figure

    Measurement of the Hadronic Cross-Section for the Scattering of Two Virtual Photons at LEP

    Get PDF
    The interaction of virtual photons is investigated using the reaction e+e- -> e+e- hadrons based on data taken by the OPAL experiment at e+e- centre-of-mass energies sqrt(s_ee)=189-209 GeV, for W>5 GeV and at an average Q^2 of 17.9 GeV^2. The measured cross-sections are compared to predictions of the Quark Parton Model (QPM), to the Leading Order QCD Monte Carlo model PHOJET to the NLO prediction for the reaction e+e- -> e+e-qqbar, and to BFKL calculations. PHOJET, NLO e+e- -> e+e-qqbar, and QPM describe the data reasonably well, whereas the cross-section predicted by a Leading Order BFKL calculation is too large.Comment: 30 pages, 10 figures, Submitted to Eur.Phys.J.
    corecore