916 research outputs found

    The age of data-driven proteomics : how machine learning enables novel workflows

    Get PDF
    A lot of energy in the field of proteomics is dedicated to the application of challenging experimental workflows, which include metaproteomics, proteogenomics, data independent acquisition (DIA), non-specific proteolysis, immunopeptidomics, and open modification searches. These workflows are all challenging because of ambiguity in the identification stage; they either expand the search space and thus increase the ambiguity of identifications, or, in the case of DIA, they generate data that is inherently more ambiguous. In this context, machine learning-based predictive models are now generating considerable excitement in the field of proteomics because these predictive models hold great potential to drastically reduce the ambiguity in the identification process of the above-mentioned workflows. Indeed, the field has already produced classical machine learning and deep learning models to predict almost every aspect of a liquid chromatography-mass spectrometry (LC-MS) experiment. Yet despite all the excitement, thorough integration of predictive models in these challenging LC-MS workflows is still limited, and further improvements to the modeling and validation procedures can still be made. In this viewpoint we therefore point out highly promising recent machine learning developments in proteomics, alongside some of the remaining challenges

    Optimal length-constrained segmentation and subject-adaptive learning for real-time arrhythmia detection

    Full text link
    © 2018 Association for Computing Machinery. An algorithm of data segmentation with length constraints for each segment is presented and applied in the context of arrhythmia detection. The additivity property of the cost function for each segment yields the induction proof of the exact global optimal solution. The experiments were conducted on the MIT-BIH arrhythmia dataset with the heartbeat categories recommended by the ANSI/AAMI EC57:1998 standard. The heartbeat classification task is enhanced by an adaptive learning scheme. Incremental support vector machine is used to integrate a small number of expert-annotated samples specific to the subject into the existing classifier previously learned from the dataset. The proposed segmentation scheme obtains the sensitivity of 99.89% and the positive predictivity of 99.83%. The classification sensitivities of ventricular and supraventricular detection are significantly boosted from 85.9% and 83.5% (subject-unadaptive) to 97.7% and 93.2% (subject-adaptive), respectively. Similarly the pre-dictivities increase from 94.8% to 99.3% (ventricular), and from 67.7% to 88.0% (supraventricular) when plugging in the adaptive learning method. The signal processing framework is conducted in a simulated real-time model. As compared to the previously reported studies we achieve a competitive performance in terms of all assessment measures

    Using data mining for wine quality assessment

    Get PDF
    Certification and quality assessment are crucial issues within the wine industry. Currently, wine quality is mostly assessed by physico- chemical (e.g alcohol levels) and sensory (e.g. human expert evaluation) tests. In this paper, we propose a data mining approach to predict wine preferences that is based on easily available analytical tests at the certifi- cation step. A large dataset is considered with white vinho verde samples from the Minho region of Portugal. Wine quality is modeled under a re- gression approach, which preserves the order of the grades. Explanatory knowledge is given in terms of a sensitivity analysis, which measures the response changes when a given input variable is varied through its do- main. Three regression techniques were applied, under a computationally efficient procedure that performs simultaneous variable and model selec- tion and that is guided by the sensitivity analysis. The support vector machine achieved promising results, outperforming the multiple regres- sion and neural network methods. Such model is useful for understand- ing how physicochemical tests affect the sensory preferences. Moreover, it can support the wine expert evaluations and ultimately improve the production

    Support vector machine versus logistic regression modeling for prediction of hospital mortality in critically ill patients with haematological malignancies

    Get PDF
    Background: Several models for mortality prediction have been constructed for critically ill patients with haematological malignancies in recent years. These models have proven to be equally or more accurate in predicting hospital mortality in patients with haematological malignancies than ICU severity of illness scores such as the APACHE II or SAPS II [1]. The objective of this study is to compare the accuracy of predicting hospital mortality in patients with haematological malignancies admitted to the ICU between models based on multiple logistic regression (MLR) and support vector machine (SVM) based models. Methods: 352 patients with haematological malignancies admitted to the ICU between 1997 and 2006 for a life-threatening complication were included. 252 patient records were used for training of the models and 100 were used for validation. In a first model 12 input variables were included for comparison between MLR and SVM. In a second more complex model 17 input variables were used. MLR and SVM analysis were performed independently from each other. Discrimination was evaluated using the area under the receiver operating characteristic (ROC) curves (+/- SE). Results: The area under ROC curve for the MLR and SVM in the validation data set were 0.768 (+/- 0.04) vs. 0.802 (+/- 0.04) in the first model (p = 0.19) and 0.781 (+/- 0.05) vs. 0.808 (+/- 0.04) in the second more complex model (p = 0.44). SVM needed only 4 variables to make its prediction in both models, whereas MLR needed 7 and 8 variables in the first and second model respectively. Conclusion: The discriminative power of both the MLR and SVM models was good. No statistically significant differences were found in discriminative power between MLR and SVM for prediction of hospital mortality in critically ill patients with haematological malignancies

    Detecting a stochastic gravitational wave background with the Laser Interferometer Space Antenna

    Get PDF
    The random superposition of many weak sources will produce a stochastic background of gravitational waves that may dominate the response of the LISA (Laser Interferometer Space Antenna) gravitational wave observatory. Unless something can be done to distinguish between a stochastic background and detector noise, the two will combine to form an effective noise floor for the detector. Two methods have been proposed to solve this problem. The first is to cross-correlate the output of two independent interferometers. The second is an ingenious scheme for monitoring the instrument noise by operating LISA as a Sagnac interferometer. Here we derive the optimal orbital alignment for cross-correlating a pair of LISA detectors, and provide the first analytic derivation of the Sagnac sensitivity curve.Comment: 9 pages, 11 figures. Significant changes to the noise estimate

    Classification of microarray data using gene networks

    Get PDF
    BACKGROUND: Microarrays have become extremely useful for analysing genetic phenomena, but establishing a relation between microarray analysis results (typically a list of genes) and their biological significance is often difficult. Currently, the standard approach is to map a posteriori the results onto gene networks in order to elucidate the functions perturbed at the level of pathways. However, integrating a priori knowledge of the gene networks could help in the statistical analysis of gene expression data and in their biological interpretation. RESULTS: We propose a method to integrate a priori the knowledge of a gene network in the analysis of gene expression data. The approach is based on the spectral decomposition of gene expression profiles with respect to the eigenfunctions of the graph, resulting in an attenuation of the high-frequency components of the expression profiles with respect to the topology of the graph. We show how to derive unsupervised and supervised classification algorithms of expression profiles, resulting in classifiers with biological relevance. We illustrate the method with the analysis of a set of expression profiles from irradiated and non-irradiated yeast strains. CONCLUSION: Including a priori knowledge of a gene network for the analysis of gene expression data leads to good classification performance and improved interpretability of the results
    corecore