806 research outputs found

    Advances in pre-processing and model generation for mass spectrometric data analysis

    Get PDF
    The analysis of complex signals as obtained by mass spectrometric measurements is complicated and needs an appropriate representation of the data. Thereby the kind of preprocessing, feature extraction as well as the used similarity measure are of particular importance. Focusing on biomarker analysis and taking the functional nature of the data into account this task is even more complicated. A new mass spectrometry tailored data preprocessing is shown, discussed and analyzed in a clinical proteom study compared to a standard setting

    Improving average ranking precision in user searches for biomedical research datasets

    Full text link
    Availability of research datasets is keystone for health and life science study reproducibility and scientific progress. Due to the heterogeneity and complexity of these data, a main challenge to be overcome by research data management systems is to provide users with the best answers for their search queries. In the context of the 2016 bioCADDIE Dataset Retrieval Challenge, we investigate a novel ranking pipeline to improve the search of datasets used in biomedical experiments. Our system comprises a query expansion model based on word embeddings, a similarity measure algorithm that takes into consideration the relevance of the query terms, and a dataset categorisation method that boosts the rank of datasets matching query constraints. The system was evaluated using a corpus with 800k datasets and 21 annotated user queries. Our system provides competitive results when compared to the other challenge participants. In the official run, it achieved the highest infAP among the participants, being +22.3% higher than the median infAP of the participant's best submissions. Overall, it is ranked at top 2 if an aggregated metric using the best official measures per participant is considered. The query expansion method showed positive impact on the system's performance increasing our baseline up to +5.0% and +3.4% for the infAP and infNDCG metrics, respectively. Our similarity measure algorithm seems to be robust, in particular compared to Divergence From Randomness framework, having smaller performance variations under different training conditions. Finally, the result categorization did not have significant impact on the system's performance. We believe that our solution could be used to enhance biomedical dataset management systems. In particular, the use of data driven query expansion methods could be an alternative to the complexity of biomedical terminologies

    Advanced metric adaptation in Generalized LVQ for classification of mass spectrometry data

    Get PDF
    Metric adaptation constitutes a powerful approach to improve the performance of prototype based classication schemes. We apply extensions of Generalized LVQ based on different adaptive distance measures in the domain of clinical proteomics. The Euclidean distance in GLVQ is extended by adaptive relevance vectors and matrices of global or local influence where training follows a stochastic gradient descent on an appropriate error function. We compare the performance of the resulting learning algorithms for the classification of high dimensional mass spectrometry data from cancer research. High prediction accuracies can be obtained by adapting full matrices of relevance factors in the distance measure in order to adjust the metric to the underlying data structure. The easy interpretability of the resulting models after training of relevance vectors allows to identify discriminative features in the original spectra

    Intelligent techniques using molecular data analysis in leukaemia: an opportunity for personalized medicine support system

    Get PDF
    The use of intelligent techniques in medicine has brought a ray of hope in terms of treating leukaemia patients. Personalized treatment uses patient’s genetic profile to select a mode of treatment. This process makes use of molecular technology and machine learning, to determine the most suitable approach to treating a leukaemia patient. Until now, no reviews have been published from a computational perspective concerning the development of personalized medicine intelligent techniques for leukaemia patients using molecular data analysis. This review studies the published empirical research on personalized medicine in leukaemia and synthesizes findings across studies related to intelligence techniques in leukaemia, with specific attention to particular categories of these studies to help identify opportunities for further research into personalized medicine support systems in chronic myeloid leukaemia. A systematic search was carried out to identify studies using intelligence techniques in leukaemia and to categorize these studies based on leukaemia type and also the task, data source, and purpose of the studies. Most studies used molecular data analysis for personalized medicine, but future advancement for leukaemia patients requires molecular models that use advanced machine-learning methods to automate decision-making in treatment management to deliver supportive medical information to the patient in clinical practice.Haneen Banjar, David Adelson, Fred Brown, and Naeem Chaudhr

    Fuzzy classification with distance-based depth prototypes: High-dimensional unsupervised and/or supervised problems

    Get PDF
    Supervised and unsupervised classification is crucial in many areas where different types of data sets are common, such as biology, medicine, or industry, among others. A key consideration is that some units are more typical of the group they belong to than others. For this reason, fuzzy classification approaches are necessary. In this paper, a fuzzy supervised classification method, which is based on the construction of prototypes, is proposed. The method obtains the prototypes from an objective function that includes label information and a distance-based depth function. It works with any distance and it can deal with data sets of a wide nature variety. It can further be applied to data sets where the use of Euclidean distance is not suitable and to high-dimensional data (data sets in which the number of features is larger than the number of observations , often written as ). In addition, the model can also cope with unsupervised classification, thus becoming an interesting alternative to other fuzzy clustering methods. With synthetic data sets along with high-dimensional real biomedical and industrial data sets, we demonstrate the good performance of the supervised and unsupervised fuzzy proposed procedures.This research was partially supported: II by the Spanish ‘Ministerio de Economia y Competitividad’ (PID2019-106942RB-C31). CA by grant 2021SGR01421 (GRBIO) from the Departament de Economia i Coneixement de la Generalitat de Catalunya, Spain. II, CA and BS by the Spanish ‘Ministerio de Economia Competitividad’ (PID2021-122402OB-C21)
    • …
    corecore