391 research outputs found

    Biomarker discovery and redundancy reduction towards classification using a multi-factorial MALDI-TOF MS T2DM mouse model dataset

    Get PDF
    Diabetes like many diseases and biological processes is not mono-causal. On the one hand multifactorial studies with complex experimental design are required for its comprehensive analysis. On the other hand, the data from these studies often include a substantial amount of redundancy such as proteins that are typically represented by a multitude of peptides. Coping simultaneously with both complexities (experimental and technological) makes data analysis a challenge for Bioinformatics

    Comparison of metaheuristic strategies for peakbin selection in proteomic mass spectrometry data

    Get PDF
    Mass spectrometry (MS) data provide a promising strategy for biomarker discovery. For this purpose, the detection of relevant peakbins in MS data is currently under intense research. Data from mass spectrometry are challenging to analyze because of their high dimensionality and the generally low number of samples available. To tackle this problem, the scientific community is becoming increasingly interested in applying feature subset selection techniques based on specialized machine learning algorithms. In this paper, we present a performance comparison of some metaheuristics: best first (BF), genetic algorithm (GA), scatter search (SS) and variable neighborhood search (VNS). Up to now, all the algorithms, except for GA, have been first applied to detect relevant peakbins in MS data. All these metaheuristic searches are embedded in two different filter and wrapper schemes coupled with Naive Bayes and SVM classifiers

    Comparison of feature selection and classification for MALDI-MS data

    Get PDF
    INTRODUCTION: In the classification of Mass Spectrometry (MS) proteomics data, peak detection, feature selection, and learning classifiers are critical to classification accuracy. To better understand which methods are more accurate when classifying data, some publicly available peak detection algorithms for Matrix assisted Laser Desorption Ionization Mass Spectrometry (MALDI-MS) data were recently compared; however, the issue of different feature selection methods and different classification models as they relate to classification performance has not been addressed. With the application of intelligent computing, much progress has been made in the development of feature selection methods and learning classifiers for the analysis of high-throughput biological data. The main objective of this paper is to compare the methods of feature selection and different learning classifiers when applied to MALDI-MS data and to provide a subsequent reference for the analysis of MS proteomics data. RESULTS: We compared a well-known method of feature selection, Support Vector Machine Recursive Feature Elimination (SVMRFE), and a recently developed method, Gradient based Leave-one-out Gene Selection (GLGS) that effectively performs microarray data analysis. We also compared several learning classifiers including K-Nearest Neighbor Classifier (KNNC), Naïve Bayes Classifier (NBC), Nearest Mean Scaled Classifier (NMSC), uncorrelated normal based quadratic Bayes Classifier recorded as UDC, Support Vector Machines, and a distance metric learning for Large Margin Nearest Neighbor classifier (LMNN) based on Mahanalobis distance. To compare, we conducted a comprehensive experimental study using three types of MALDI-MS data. CONCLUSION: Regarding feature selection, SVMRFE outperformed GLGS in classification. As for the learning classifiers, when classification models derived from the best training were compared, SVMs performed the best with respect to the expected testing accuracy. However, the distance metric learning LMNN outperformed SVMs and other classifiers on evaluating the best testing. In such cases, the optimum classification model based on LMNN is worth investigating for future study

    Genetic Algorithms for Feature Selection and Classification of Complex Chromatographic and Spectroscopic Data

    Get PDF
    A basic methodology for analyzing large multivariate chemical data sets based on feature selection is proposed. Each chromatogram or spectrum is represented as a point in a high dimensional measurement space. A genetic algorithm for feature selection and classification is applied to the data to identify features that optimize the separation of the classes in a plot of the two or three largest principal components of the data. A good principal component plot can only be generated using features whose variance or information is primarily about differences between classes in the data. Hence, feature subsets that maximize the ratio of between-class to within-class variance are selected by the pattern recognition genetic algorithm. Furthermore, the structure of the data set can be explored, for example, new classes can be discovered by simply tuning various parameters of the fitness function of the pattern recognition genetic algorithm. The proposed method has been validated on a wide range of data. A two-step procedure for pattern recognition analysis of spectral data has been developed. First, wavelets are used to denoise and deconvolute spectral bands by decomposing each spectrum into wavelet coefficients, which represent the samples constituent frequencies. Second, the pattern recognition genetic algorithm is used to identify wavelet coefficients characteristic of the class. In several studies involving spectral library searching, this method was employed. In one study, a search pre-filter to detect the presence of carboxylic acids from vapor phase infrared spectra which has previously eluted prominent researchers has been successfully formulated and validated. In another study, this same approach has been used to develop a pattern recognition assisted infrared library searching technique to determine the model, manufacturer, and year of the vehicle from which a clear coat paint smear originated. The pattern recognition genetic algorithm has also been used to develop a potential method to identify molds in indoor environments using volatile organic compounds. A distinct profile indicative of microbial volatile organic compounds was developed from air sampling data that could be readily differentiated from the blank for both high mold count and moderate mold count exposure samples. The utility of the pattern recognition genetic algorithm for discovery of biomarker candidates from genomic and proteomic data sets has also been shown.Chemistry Departmen

    Venom Yield, Regeneration, and Composition in the Centipede Scolopendra Polymorpha

    Get PDF
    In this dissertation, I investigated yield, regeneration, and composition of centipede venom. In the first of three empirical studies, I investigated how size influenced venom volume yield and protein concentration in Scolopendra polymorpha and S. subspinipes. I also examined additional potential influences on yield in S. polymorpha, including relative forcipule size, relative mass, geographic origin, sex, time in captivity, and milking history. Volume yield was positively linearly related to body length in both species; however, body length and protein concentration were uncorrelated. In S. polymorpha, yield was most influenced by body length, but was also positively associated with relative forcipule length and relative body mass. In the second study, I investigated venom volume and total protein regeneration during the 14-day period subsequent to venom extraction in S. polymorpha. I further tested the hypothesis that venom protein components, separated by RP-FPLC, undergo asynchronous synthesis. During the first 48 hours, volume and protein mass increased linearly. However, protein regeneration lagged behind volume regeneration, with only 65–86% of venom volume and 29–47% of protein mass regenerated during the first 2 days. No significant additional regeneration occurred over the subsequent 12 days. Analysis of chromatograms of individual venom samples revealed that five of 10 chromatographic regions and 12 of 28 peaks demonstrated changes in percent of total peak area among milking intervals, indicating that venom proteins are regenerated asynchronously. In the third study, I characterized the venom composition of S. polymorpha using proteomic methods. I demonstrated that the venom of S. polymorpha is complex, generating 23 bands by SDS-PAGE and 56 peaks by RP-FPLC. MALDI TOF MS revealed hundreds of components with masses ranging from 1014.5 to 82863.9 Da. The distribution of molecular masses was skewed toward smaller peptides and proteins, with 72% of components found below 12 kDa. BLASTp sequence similarity searching of MS/MSderived amino acid sequences demonstrated 20 different sequences with similarity to known venom components, including serine proteases, ion-channel activators/inhibitors, and neurotoxins. In Appendix A, I reviewed how animals strategically deploy various emissions, including venom, highlighting how the metabolic and ecological value of these emissions leads to their judicious use

    Venom chemistry and ecology of Australian scorpions

    Get PDF
    Edward Evans studied multiple Australian scorpion species, focusing on poorly understood aspects of scorpion venom chemistry and ecological drivers of venom variation. Novel peptides were characterised, and small molecules identified from scorpion venoms. Additionally, venom variation associated with ontogeny, sex, and defensive venom use was described
    corecore