28 research outputs found

    msmsEval: tandem mass spectral quality assignment for high-throughput proteomics

    Get PDF
    BACKGROUND: In proteomics experiments, database-search programs are the method of choice for protein identification from tandem mass spectra. As amino acid sequence databases grow however, computing resources required for these programs have become prohibitive, particularly in searches for modified proteins. Recently, methods to limit the number of spectra to be searched based on spectral quality have been proposed by different research groups, but rankings of spectral quality have thus far been based on arbitrary cut-off values. In this work, we develop a more readily interpretable spectral quality statistic by providing probability values for the likelihood that spectra will be identifiable. RESULTS: We describe an application, msmsEval, that builds on previous work by statistically modeling the spectral quality discriminant function using a Gaussian mixture model. This allows a researcher to filter spectra based on the probability that a spectrum will ultimately be identified by database searching. We show that spectra that are predicted by msmsEval to be of high quality, yet remain unidentified in standard database searches, are candidates for more intensive search strategies. Using a well studied public dataset we also show that a high proportion (83.9%) of the spectra predicted by msmsEval to be of high quality but that elude standard search strategies, are in fact interpretable. CONCLUSION: msmsEval will be useful for high-throughput proteomics projects and is freely available for download from . Supports Windows, Mac OS X and Linux/Unix operating systems

    EuroPhenome: a repository for high-throughput mouse phenotyping data.

    Get PDF
    The broad aim of biomedical science in the postgenomic era is to link genomic and phenotype information to allow deeper understanding of the processes leading from genomic changes to altered phenotype and disease. The EuroPhenome project (http://www.EuroPhenome.org) is a comprehensive resource for raw and annotated high-throughput phenotyping data arising from projects such as EUMODIC. EUMODIC is gathering data from the EMPReSSslim pipeline (http://www.empress.har.mrc.ac.uk/) which is performed on inbred mouse strains and knock-out lines arising from the EUCOMM project. The EuroPhenome interface allows the user to access the data via the phenotype or genotype. It also allows the user to access the data in a variety of ways, including graphical display, statistical analysis and access to the raw data via web services. The raw phenotyping data captured in EuroPhenome is annotated by an annotation pipeline which automatically identifies statistically different mutants from the appropriate baseline and assigns ontology terms for that specific test. Mutant phenotypes can be quickly identified using two EuroPhenome tools: PhenoMap, a graphical representation of statistically relevant phenotypes, and mining for a mutant using ontology terms. To assist with data definition and cross-database comparisons, phenotype data is annotated using combinations of terms from biological ontologies

    The White Mountain Polarimeter Telescope and an Upper Limit on CMB Polarization

    Get PDF
    The White Mountain Polarimeter (WMPol) is a dedicated ground-based microwave telescope and receiver system for observing polarization of the Cosmic Microwave Background. WMPol is located at an altitude of 3880 meters on a plateau in the White Mountains of Eastern California, USA, at the Barcroft Facility of the University of California White Mountain Research Station. Presented here is a description of the instrument and the data collected during April through October 2004. We set an upper limit on EE-mode polarization of 14 μK\mu\mathrm{K} (95% confidence limit) in the multipole range 170<<240170<\ell<240. This result was obtained with 422 hours of observations of a 3 deg2\mathrm{deg}^2 sky area about the North Celestial Pole, using a 42 GHz polarimeter. This upper limit is consistent with EEEE polarization predicted from a standard Λ\Lambda-CDM concordance model.Comment: 35 pages. 12 figures. To appear in ApJ

    Problems with using mechanisms to solve the problem of extrapolation

    Full text link

    Pattern formation outside of equilibrium

    Full text link

    Chemiluminescent Delay: An Experiment in StoppedF?low Kinetics

    No full text

    Getting the Timing Right - The Use of Genetic Algorithms in Scheduling

    No full text
    This paper discusses why the GA is a valuable tool for solving such problems; in it we also consider some difficulties that arise in implementation of the algorithm for scheduling problems and how these difficulties may be resolved, taking as an example a typical industrial scheduling problem - the chemical flowshop. 2. The nature of the scheduling problem Scheduling problems are widespread in industry because of the economies to be gained by moving from single station working, in which all processing of a manufactured item is carried out at one location in a plant, to production line working, where items move from station to station to be processed. Because of the extensive adoption of continuous production lines the number and variety of industrial scheduling problems is very great. In a typical industrial application (Figure 1) a set of operations must be performed on each item as it passes through a number of discrete stations. In a mechanical engineering shop the stations might comprise mills, lathes or drills; in the chemical industry they might be reactors, dryers, packing units or centrifuges. In either case, the scheduling problem is the same, requiring that a feed order for items entering the shop be found according to which: ffl each item is processed by a predefined deadline; and ffl the completion time (the time required to process the complete set of items) is minimised, to obtain the greatest efficiency. The scheduling problem sketched in Figure 1 may be static, in which case the constraints defining the problem, such as the numbers of items to be processed and the topology of the shop, are unchanging; the search for the best solution then needs to be performed once only. The construction of a school timetable is an example of a static scheduling problem ..

    Fourier transform ion cyclotron resonance mass spectrometry for petroleomics

    No full text
    The past two decades have witnessed tremendous advances in the field of high accuracy, high mass resolution data acquisition of complex samples such as crude oils and the human proteome. With the development of Fourier transform ion cyclotron resonance mass spectrometry, the rapidly growing field of petroleomics has emerged, whose goal is to process and analyse the large volumes of complex and often poorly understood data on crude oils generated by mass spectrometry. As global oil resources deplete, oil companies are increasingly moving towards the extraction and refining of the still plentiful reserves of heavy, carbon rich and highly contaminated crude oil. It is essential that the oil industry gather the maximum possible amount of information about the crude oil prior to setting up the drilling infrastructure, in order to reduce processing costs. This project describes how machine learning can be used as a novel way to extract critical information from complex mass spectra which will aid in the processing of crude oils. The thesis discusses the experimental methods involved in acquiring high accuracy mass spectral data for a large and key industry-standard set of crude oil samples. These data are subsequently analysed to identify possible links between the raw mass spectra and certain physical properties of the oils, such as pour point and sulphur content. Methods including artificial neural networks and self organising maps are described and the use of spectral clustering and pattern recognition to classify crude oils is investigated. The main focus of the research, the creation of an original simulated annealing genetic algorithm hybrid technique (SAGA), is discussed in detail and the successes of modelling a number of different datasets using all described methods are outlined. Despite the complexity of the underlying mass spectrometry data, which reflects the considerable chemical diversity of the samples themselves, the results show that physical properties can be modelled with varying degrees of success. When modelling pour point temperatures, the artificial neural network achieved an average prediction error of less than 10% while SAGA predicted the same values with an average accuracy of more than 85%. It did not prove possible to model any of the other properties with such statistical significance; however improvements to feature extraction and pre-processing of the spectral data as well as enhancement of the modelling techniques should yield more consistent and statistically reliable results. These should in due course lead to a comprehensive model which the oil industry can use to process crude oil data using rapid and cost effective analytical methods.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    corecore