752 research outputs found

    The age of data-driven proteomics : how machine learning enables novel workflows

    Get PDF
    A lot of energy in the field of proteomics is dedicated to the application of challenging experimental workflows, which include metaproteomics, proteogenomics, data independent acquisition (DIA), non-specific proteolysis, immunopeptidomics, and open modification searches. These workflows are all challenging because of ambiguity in the identification stage; they either expand the search space and thus increase the ambiguity of identifications, or, in the case of DIA, they generate data that is inherently more ambiguous. In this context, machine learning-based predictive models are now generating considerable excitement in the field of proteomics because these predictive models hold great potential to drastically reduce the ambiguity in the identification process of the above-mentioned workflows. Indeed, the field has already produced classical machine learning and deep learning models to predict almost every aspect of a liquid chromatography-mass spectrometry (LC-MS) experiment. Yet despite all the excitement, thorough integration of predictive models in these challenging LC-MS workflows is still limited, and further improvements to the modeling and validation procedures can still be made. In this viewpoint we therefore point out highly promising recent machine learning developments in proteomics, alongside some of the remaining challenges

    Using data mining for wine quality assessment

    Get PDF
    CertiïŹcation and quality assessment are crucial issues within the wine industry. Currently, wine quality is mostly assessed by physico- chemical (e.g alcohol levels) and sensory (e.g. human expert evaluation) tests. In this paper, we propose a data mining approach to predict wine preferences that is based on easily available analytical tests at the certiïŹ- cation step. A large dataset is considered with white vinho verde samples from the Minho region of Portugal. Wine quality is modeled under a re- gression approach, which preserves the order of the grades. Explanatory knowledge is given in terms of a sensitivity analysis, which measures the response changes when a given input variable is varied through its do- main. Three regression techniques were applied, under a computationally efficient procedure that performs simultaneous variable and model selec- tion and that is guided by the sensitivity analysis. The support vector machine achieved promising results, outperforming the multiple regres- sion and neural network methods. Such model is useful for understand- ing how physicochemical tests affect the sensory preferences. Moreover, it can support the wine expert evaluations and ultimately improve the production

    Using machine learning to speed up manual image annotation: application to a 3D imaging protocol for measuring single cell gene expression in the developing C. elegans embryo

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Image analysis is an essential component in many biological experiments that study gene expression, cell cycle progression, and protein localization. A protocol for tracking the expression of individual <it>C. elegans </it>genes was developed that collects image samples of a developing embryo by 3-D time lapse microscopy. In this protocol, a program called StarryNite performs the automatic recognition of fluorescently labeled cells and traces their lineage. However, due to the amount of noise present in the data and due to the challenges introduced by increasing number of cells in later stages of development, this program is not error free. In the current version, the error correction (<it>i.e</it>., editing) is performed manually using a graphical interface tool named AceTree, which is specifically developed for this task. For a single experiment, this manual annotation task takes several hours.</p> <p>Results</p> <p>In this paper, we reduce the time required to correct errors made by StarryNite. We target one of the most frequent error types (movements annotated as divisions) and train a support vector machine (SVM) classifier to decide whether a division call made by StarryNite is correct or not. We show, via cross-validation experiments on several benchmark data sets, that the SVM successfully identifies this type of error significantly. A new version of StarryNite that includes the trained SVM classifier is available at <url>http://starrynite.sourceforge.net</url>.</p> <p>Conclusions</p> <p>We demonstrate the utility of a machine learning approach to error annotation for StarryNite. In the process, we also provide some general methodologies for developing and validating a classifier with respect to a given pattern recognition task.</p

    Support vector machine versus logistic regression modeling for prediction of hospital mortality in critically ill patients with haematological malignancies

    Get PDF
    Background: Several models for mortality prediction have been constructed for critically ill patients with haematological malignancies in recent years. These models have proven to be equally or more accurate in predicting hospital mortality in patients with haematological malignancies than ICU severity of illness scores such as the APACHE II or SAPS II [1]. The objective of this study is to compare the accuracy of predicting hospital mortality in patients with haematological malignancies admitted to the ICU between models based on multiple logistic regression (MLR) and support vector machine (SVM) based models. Methods: 352 patients with haematological malignancies admitted to the ICU between 1997 and 2006 for a life-threatening complication were included. 252 patient records were used for training of the models and 100 were used for validation. In a first model 12 input variables were included for comparison between MLR and SVM. In a second more complex model 17 input variables were used. MLR and SVM analysis were performed independently from each other. Discrimination was evaluated using the area under the receiver operating characteristic (ROC) curves (+/- SE). Results: The area under ROC curve for the MLR and SVM in the validation data set were 0.768 (+/- 0.04) vs. 0.802 (+/- 0.04) in the first model (p = 0.19) and 0.781 (+/- 0.05) vs. 0.808 (+/- 0.04) in the second more complex model (p = 0.44). SVM needed only 4 variables to make its prediction in both models, whereas MLR needed 7 and 8 variables in the first and second model respectively. Conclusion: The discriminative power of both the MLR and SVM models was good. No statistically significant differences were found in discriminative power between MLR and SVM for prediction of hospital mortality in critically ill patients with haematological malignancies

    Limits on the high-energy gamma and neutrino fluxes from the SGR 1806-20 giant flare of December 27th, 2004 with the AMANDA-II detector

    Get PDF
    On December 27th 2004, a giant gamma flare from the Soft Gamma-ray Repeater 1806-20 saturated many satellite gamma-ray detectors. This event was by more than two orders of magnitude the brightest cosmic transient ever observed. If the gamma emission extends up to TeV energies with a hard power law energy spectrum, photo-produced muons could be observed in surface and underground arrays. Moreover, high-energy neutrinos could have been produced during the SGR giant flare if there were substantial baryonic outflow from the magnetar. These high-energy neutrinos would have also produced muons in an underground array. AMANDA-II was used to search for downgoing muons indicative of high-energy gammas and/or neutrinos. The data revealed no significant signal. The upper limit on the gamma flux at 90% CL is dN/dE < 0.05 (0.5) TeV^-1 m^-2 s^-1 for gamma=-1.47 (-2). Similarly, we set limits on the normalization constant of the high-energy neutrino emission of 0.4 (6.1) TeV^-1 m^-2 s^-1 for gamma=-1.47 (-2).Comment: 14 pages, 3 figure

    Standalone vertex ïŹnding in the ATLAS muon spectrometer

    Get PDF
    A dedicated reconstruction algorithm to find decay vertices in the ATLAS muon spectrometer is presented. The algorithm searches the region just upstream of or inside the muon spectrometer volume for multi-particle vertices that originate from the decay of particles with long decay paths. The performance of the algorithm is evaluated using both a sample of simulated Higgs boson events, in which the Higgs boson decays to long-lived neutral particles that in turn decay to bbar b final states, and pp collision data at √s = 7 TeV collected with the ATLAS detector at the LHC during 2011

    Measurements of Higgs boson production and couplings in diboson final states with the ATLAS detector at the LHC

    Get PDF
    Measurements are presented of production properties and couplings of the recently discovered Higgs boson using the decays into boson pairs, H →γ Îł, H → Z Z∗ →4l and H →W W∗ →lÎœlÎœ. The results are based on the complete pp collision data sample recorded by the ATLAS experiment at the CERN Large Hadron Collider at centre-of-mass energies of √s = 7 TeV and √s = 8 TeV, corresponding to an integrated luminosity of about 25 fb−1. Evidence for Higgs boson production through vector-boson fusion is reported. Results of combined ïŹts probing Higgs boson couplings to fermions and bosons, as well as anomalous contributions to loop-induced production and decay modes, are presented. All measurements are consistent with expectations for the Standard Model Higgs boson

    Measurement of the top quark pair cross section with ATLAS in pp collisions at √s=7 TeV using final states with an electron or a muon and a hadronically decaying τ lepton

    Get PDF
    A measurement of the cross section of top quark pair production in proton-proton collisions recorded with the ATLAS detector at the Large Hadron Collider at a centre-of-mass energy of 7 TeV is reported. The data sample used corresponds to an integrated luminosity of 2.05 fb -1. Events with an isolated electron or muon and a τ lepton decaying hadronically are used. In addition, a large missing transverse momentum and two or more energetic jets are required. At least one of the jets must be identified as originating from a b quark. The measured cross section, σtt-=186±13(stat.)±20(syst.)±7(lumi.) pb, is in good agreement with the Standard Model prediction
    • 

    corecore