1,609 research outputs found
String Indexing for Patterns with Wildcards
We consider the problem of indexing a string of length to report the
occurrences of a query pattern containing characters and wildcards.
Let be the number of occurrences of in , and the size of
the alphabet. We obtain the following results.
- A linear space index with query time .
This significantly improves the previously best known linear space index by Lam
et al. [ISAAC 2007], which requires query time in the worst case.
- An index with query time using space , where is the maximum number of wildcards allowed in the pattern.
This is the first non-trivial bound with this query time.
- A time-space trade-off, generalizing the index by Cole et al. [STOC 2004].
We also show that these indexes can be generalized to allow variable length
gaps in the pattern. Our results are obtained using a novel combination of
well-known and new techniques, which could be of independent interest
Comparison of Spectra in Unsequenced Species
International audienceWe introduce a new algorithm for the mass spectromet- ric identication of proteins. Experimental spectra obtained by tandem MS/MS are directly compared to theoretical spectra generated from pro- teins of evolutionarily closely related organisms. This work is motivated by the need of a method that allows the identication of proteins of unsequenced species against a database containing proteins of related organisms. The idea is that matching spectra of unknown peptides to very similar MS/MS spectra generated from this database of annotated proteins can lead to annotate unknown proteins. This process is similar to ortholog annotation in protein sequence databases. The difficulty with such an approach is that two similar peptides, even with just one mod- ication (i.e. insertion, deletion or substitution of one or several amino acid(s)) between them, usually generate very dissimilar spectra. In this paper, we present a new dynamic programming based algorithm: Packet- SpectralAlignment. Our algorithm is tolerant to modications and fully exploits two important properties that are usually not considered: the notion of inner symmetry, a relation linking pairs of spectrum peaks, and the notion of packet inside each spectrum to keep related peaks together. Our algorithm, PacketSpectralAlignment is then compared to SpectralAlignment [1] on a dataset of simulated spectra. Our tests show that PacketSpectralAlignment behaves better, in terms of results and execution tim
Identification of ubiquitin/ubiquitin-like protein modification from tandem mass spectra with various PTMs
Locating previously unknown patterns in data-mining results: a dual data- and knowledge-mining method
BACKGROUND: Data mining can be utilized to automate analysis of substantial amounts of data produced in many organizations. However, data mining produces large numbers of rules and patterns, many of which are not useful. Existing methods for pruning uninteresting patterns have only begun to automate the knowledge acquisition step (which is required for subjective measures of interestingness), hence leaving a serious bottleneck. In this paper we propose a method for automatically acquiring knowledge to shorten the pattern list by locating the novel and interesting ones. METHODS: The dual-mining method is based on automatically comparing the strength of patterns mined from a database with the strength of equivalent patterns mined from a relevant knowledgebase. When these two estimates of pattern strength do not match, a high "surprise score" is assigned to the pattern, identifying the pattern as potentially interesting. The surprise score captures the degree of novelty or interestingness of the mined pattern. In addition, we show how to compute p values for each surprise score, thus filtering out noise and attaching statistical significance. RESULTS: We have implemented the dual-mining method using scripts written in Perl and R. We applied the method to a large patient database and a biomedical literature citation knowledgebase. The system estimated association scores for 50,000 patterns, composed of disease entities and lab results, by querying the database and the knowledgebase. It then computed the surprise scores by comparing the pairs of association scores. Finally, the system estimated statistical significance of the scores. CONCLUSION: The dual-mining method eliminates more than 90% of patterns with strong associations, thus identifying them as uninteresting. We found that the pruning of patterns using the surprise score matched the biomedical evidence in the 100 cases that were examined by hand. The method automates the acquisition of knowledge, thus reducing dependence on the knowledge elicited from human expert, which is usually a rate-limiting step
Current challenges in software solutions for mass spectrometry-based quantitative proteomics
This work was in part supported by the PRIME-XS project, grant agreement number 262067, funded by the European Union seventh Framework Programme; The Netherlands Proteomics Centre, embedded in The Netherlands Genomics Initiative; The Netherlands Bioinformatics Centre; and the Centre for Biomedical Genetics (to S.C., B.B. and A.J.R.H); by NIH grants NCRR RR001614 and RR019934 (to the UCSF Mass Spectrometry Facility, director: A.L. Burlingame, P.B.); and by grants from the MRC, CR-UK, BBSRC and Barts and the London Charity (to P.C.
Search for Yukawa Production of a Light Neutral Higgs Boson at LEP
Within a Two-Higgs-Doublet Model (2HDM) a search for a light Higgs boson in
the mass range of 4-12 GeV has been performed in the Yukawa process e+e- -> b
bbar A/h -> b bbar tau+tau-, using the data collected by the OPAL detector at
LEP between 1992 and 1995 in e+e- collisions at about 91 GeV centre-of-mass
energy. A likelihood selection is applied to separate background and signal.
The number of observed events is in good agreement with the expected
background. Within a CP-conserving 2HDM type II model the cross-section for
Yukawa production depends on xiAd = |tan beta| and xihd = |sin alpha/cos beta|
for the production of the CP-odd A and the CP-even h, respectively, where tan
beta is the ratio of the vacuum expectation values of the Higgs doublets and
alpha is the mixing angle between the neutral CP-even Higgs bosons. From our
data 95% C.L. upper limits are derived for xiAd within the range of 8.5 to 13.6
and for xihd between 8.2 to 13.7, depending on the mass of the Higgs boson,
assuming a branching fraction into tau+tau- of 100%. An interpretation of the
limits within a 2HDM type II model with Standard Model particle content is
given. These results impose constraints on several models that have been
proposed to explain the recent BNL measurement of the muon anomalous magnetic
moment.Comment: 24 pages, 9 figures, Submitted to Euro. Phys. J.
Search for the Standard Model Higgs Boson with the OPAL Detector at LEP
This paper summarises the search for the Standard Model Higgs boson in e+e-
collisions at centre-of-mass energies up to 209 GeV performed by the OPAL
Collaboration at LEP. The consistency of the data with the background
hypothesis and various Higgs boson mass hypotheses is examined. No indication
of a signal is found in the data and a lower bound of 112.7GeV/C^2 is obtained
on the mass of the Standard Model Higgs boson at the 95% CL.Comment: 51 pages, 21 figure
Tests of model of color reconnection and a search for glueballs using gluon jets with a rapidity gap
Gluon jets with a mean energy of 22 GeV and purity of 95% are selected from
hadronic Z0 decay events produced in e+e- annihilations. A subsample of these
jets is identified which exhibits a large gap in the rapidity distribution of
particles within the jet. After imposing the requirement of a rapidity gap, the
gluon jet purity is 86%. These jets are observed to demonstrate a high degree
of sensitivity to the presence of color reconnection, i.e. higher order QCD
processes affecting the underlying color structure. We use our data to test
three QCD models which include a simulation of color reconnection: one in the
Ariadne Monte Carlo, one in the Herwig Monte Carlo, and the other by Rathsman
in the Pythia Monte Carlo. We find the Rathsman and Ariadne color reconnection
models can describe our gluon jet measurements only if very large values are
used for the cutoff parameters which serve to terminate the parton showers, and
that the description of inclusive Z0 data is significantly degraded in this
case. We conclude that color reconnection as implemented by these two models is
disfavored. The signal from the Herwig color reconnection model is less clear
and we do not obtain a definite conclusion concerning this model. In a separate
study, we follow recent theoretical suggestions and search for glueball-like
objects in the leading part of the gluon jets. No clear evidence is observed
for these objects.Comment: 42 pages, 18 figure
Measurement of the Hadronic Cross-Section for the Scattering of Two Virtual Photons at LEP
The interaction of virtual photons is investigated using the reaction e+e- ->
e+e- hadrons based on data taken by the OPAL experiment at e+e- centre-of-mass
energies sqrt(s_ee)=189-209 GeV, for W>5 GeV and at an average Q^2 of 17.9
GeV^2. The measured cross-sections are compared to predictions of the Quark
Parton Model (QPM), to the Leading Order QCD Monte Carlo model PHOJET to the
NLO prediction for the reaction e+e- -> e+e-qqbar, and to BFKL calculations.
PHOJET, NLO e+e- -> e+e-qqbar, and QPM describe the data reasonably well,
whereas the cross-section predicted by a Leading Order BFKL calculation is too
large.Comment: 30 pages, 10 figures, Submitted to Eur.Phys.J.
- …