2,775 research outputs found

    Genetic Programming for Biomarker Detection in Classification of Mass Spectrometry Data

    No full text
    Mass spectrometry (MS) is currently the most commonly used technology in biochemical research for proteomic analysis. The primary goal of proteomic profiling using mass spectrometry is the classification of samples from different experimental states. To classify the MS samples, the identification of protein or peptides (biomarker detection) that are expressed differently between the classes, is required. However, due to the high dimensionality of the data and the small number of samples, classification of MS data is extremely challenging. Another important aspect of biomarker detection is the verification of the detected biomarker that acts as an intermediate step before passing these biomarkers to the experimental validation stage. Biomarker detection aims at altering the input space of the learning algorithm for improving classification of proteomic or metabolomic data. This task is performed through feature manipulation. Feature manipulation consists of three aspects: feature ranking, feature selection, and feature construction. Genetic programming (GP) is an evolutionary computation algorithm that has the intrinsic capability for the three aspects of feature manipulation. The ability of GP for feature manipulation in proteomic biomarker discovery has not been fully investigated. This thesis, therefore, proposes an embedded methodology for these three aspects of feature manipulation in high dimensional MS data using GP. The thesis also presents a method for biomarker verification, using GP. The thesis investigates the use of GP for both single-objective and multi-objective feature selection and construction. In feature ranking, the thesis proposes a GP-based method for ranking subsets of features by using GP as an ensemble approach. The proposed algorithm uses GP capability to combine the advantages of different feature ranking metrics and evolve a new ranking scheme for the subset of the features selected from the top ranked features. The capability of GP as a classifier is also investigated by this method. The results show that GP can select a smaller number of features and provide a better ranking of the selected features, which can improve the classification performance of five classifiers. In feature construction, this thesis proposes a novel multiple feature construction method, which uses a single GP tree to generate a new set of high-level features from the original set of selected features. The results show that the proposed new algorithm outperforms two feature selection algorithms. In feature selection, the thesis introduces the first GP multi-objective method for biomarker detection, which simultaneously increase the classification accuracy and reduce the number of detected features. The proposed multi-objective method can obtain better subsets of features than the single-objective algorithm and two traditional multi-objective approaches for feature selection. This thesis also develops the first multi-objective multiple feature construction algorithm for MS data. The proposed method aims at both maximising the classification performance and minimizing the cardinality of the constructed new high-level features. The results show that GP can dis- cover the complex relationships between the features and can significantly improve classification performance and reduce the cardinality. For biomarker verification, the thesis proposes the first GP biomarker verification method through measuring the peptide detectability. The method solves the imbalance problem in the data and shows improvement over the benchmark algorithms. Also, the algorithm outperforms a well-known peptide detection method. The thesis also introduces a new GP method for alignment of MS data as a preprocessing stage, which will further help in improving the biomarker detection process

    Updates in metabolomics tools and resources: 2014-2015

    Get PDF
    Data processing and interpretation represent the most challenging and time-consuming steps in high-throughput metabolomic experiments, regardless of the analytical platforms (MS or NMR spectroscopy based) used for data acquisition. Improved machinery in metabolomics generates increasingly complex datasets that create the need for more and better processing and analysis software and in silico approaches to understand the resulting data. However, a comprehensive source of information describing the utility of the most recently developed and released metabolomics resources—in the form of tools, software, and databases—is currently lacking. Thus, here we provide an overview of freely-available, and open-source, tools, algorithms, and frameworks to make both upcoming and established metabolomics researchers aware of the recent developments in an attempt to advance and facilitate data processing workflows in their metabolomics research. The major topics include tools and researches for data processing, data annotation, and data visualization in MS and NMR-based metabolomics. Most in this review described tools are dedicated to untargeted metabolomics workflows; however, some more specialist tools are described as well. All tools and resources described including their analytical and computational platform dependencies are summarized in an overview Table

    Comparison of metaheuristic strategies for peakbin selection in proteomic mass spectrometry data

    Get PDF
    Mass spectrometry (MS) data provide a promising strategy for biomarker discovery. For this purpose, the detection of relevant peakbins in MS data is currently under intense research. Data from mass spectrometry are challenging to analyze because of their high dimensionality and the generally low number of samples available. To tackle this problem, the scientific community is becoming increasingly interested in applying feature subset selection techniques based on specialized machine learning algorithms. In this paper, we present a performance comparison of some metaheuristics: best first (BF), genetic algorithm (GA), scatter search (SS) and variable neighborhood search (VNS). Up to now, all the algorithms, except for GA, have been first applied to detect relevant peakbins in MS data. All these metaheuristic searches are embedded in two different filter and wrapper schemes coupled with Naive Bayes and SVM classifiers

    Knowledge management overview of feature selection problem in high-dimensional financial data: Cooperative co-evolution and Map Reduce perspectives

    Get PDF
    The term big data characterizes the massive amounts of data generation by the advanced technologies in different domains using 4Vs volume, velocity, variety, and veracity-to indicate the amount of data that can only be processed via computationally intensive analysis, the speed of their creation, the different types of data, and their accuracy. High-dimensional financial data, such as time-series and space-Time data, contain a large number of features (variables) while having a small number of samples, which are used to measure various real-Time business situations for financial organizations. Such datasets are normally noisy, and complex correlations may exist between their features, and many domains, including financial, lack the al analytic tools to mine the data for knowledge discovery because of the high-dimensionality. Feature selection is an optimization problem to find a minimal subset of relevant features that maximizes the classification accuracy and reduces the computations. Traditional statistical-based feature selection approaches are not adequate to deal with the curse of dimensionality associated with big data. Cooperative co-evolution, a meta-heuristic algorithm and a divide-And-conquer approach, decomposes high-dimensional problems into smaller sub-problems. Further, MapReduce, a programming model, offers a ready-To-use distributed, scalable, and fault-Tolerant infrastructure for parallelizing the developed algorithm. This article presents a knowledge management overview of evolutionary feature selection approaches, state-of-The-Art cooperative co-evolution and MapReduce-based feature selection techniques, and future research directions

    Metabolomics for mitochondrial and cancer studies

    Get PDF
    AbstractMetabolomics, a high-throughput global metabolite analysis, is a burgeoning field, and in recent times has shown substantial evidence to support its emerging role in cancer diagnosis, cancer recurrence, and prognosis, as well as its impact in identifying novel cancer biomarkers and developing cancer therapeutics. Newly evolving advances in disease diagnostics and therapy will further facilitate future growth in the field of metabolomics, especially in cancer, where there is a dire need for sensitive and more affordable diagnostic tools and an urgency to develop effective therapies and identify reliable biomarkers to predict accurately the response to a therapy. Here, we review the application of metabolomics in cancer and mitochondrial studies and its role in enabling the understanding of altered metabolism and malignant transformation during cancer growth and metastasis. The recent developments in the area of metabolic flux analysis may help to close the gap between clinical metabolomics research and the development of cancer metabolome. In the era of personalized medicine with more and more patient specific targeted therapies being used, we need reliable, dynamic, faster, and yet sensitive biomarkers both to track the disease and to develop and evolve therapies during the course of treatment. Recent advances in metabolomics along with the novel strategies to analyze, understand, and construct the metabolic pathways opens this window of opportunity in a very cost-effective manner. This article is part of a Special Issue entitled: Bioenergetics of Cancer

    Metabolomic Profiling in Children with Celiac Disease: Beyond the Gluten-Free Diet

    Get PDF
    FEDER/Junta de Andalucía-Consejería de Transformación Económica, Industria, Conocimiento y Universidades Projects FEDER No B-AGR-658 and Excelencia P21_00101Investigation grant program by the Association of Celiacs and Sensitive to Gluten of the Community of Madrid” and the Government of Catalonia: Agency for the Management of University and Resarch Grants (2021SGR00990)Nutrición y Ciencias de los Alimentos” from the University of Granad

    Report from the 5th international symposium on mycotoxins and toxigenic moulds : challenges and perspectives (MYTOX) held in Ghent, Belgium, May 2016

    Get PDF
    The association research platform MYTOX “Mycotoxins and Toxigenic Moulds” held the 5th meeting of its International Symposium in Ghent, Belgium on 11 May 2016.[...

    Evaluation of computer methods for biomarker discovery on computational grids

    Get PDF
    Background: Discovering biomarkers is a fundamental step to understand and deal with genetic diseases. Methods using classic Computer Science algorithms have been adapted in order to support processing large biological data sets, aiming to find useful information to understand causing conditions of diseases such as cancer. Results: This paper describes some promising biomarker discovery methods based on several grid architectures. Each technique has some features that make it more suitable for a particular grid architecture. This matching depends on the parallelizing capabilities of the method and the resource availability in each processing/storage node. Conclusion: The study described in this paper analyzed the performance of biomarker discovery methods in different grid architectures. We have found some methods are more suited for certain grid architectures, resulting in significant performance improvement and producing more accurate results
    • …
    corecore