16,501 research outputs found

    Biomarker discovery and redundancy reduction towards classification using a multi-factorial MALDI-TOF MS T2DM mouse model dataset

    Get PDF
    Diabetes like many diseases and biological processes is not mono-causal. On the one hand multifactorial studies with complex experimental design are required for its comprehensive analysis. On the other hand, the data from these studies often include a substantial amount of redundancy such as proteins that are typically represented by a multitude of peptides. Coping simultaneously with both complexities (experimental and technological) makes data analysis a challenge for Bioinformatics

    Protein processing characterized by a gel-free proteomics approach

    Get PDF
    We describe a method for the specific isolation of representative N-terminal peptides of proteins and their proteolytic fragments. Their isolation is based on a gel-free, peptidecentric proteomics approach using the principle of diagonal chromatography. We will indicate that the introduction of an altered chemical property to internal peptides holding a free α-N-terminus results in altered column retention of these peptides, thereby enabling the isolation and further characterization by mass spectrometry of N-terminal peptides. Besides pointing to changes in protein expression levels when performing such proteome surveys in a differential modus, protease specificity and substrate repertoires can be allocated since both are specified by neo-N-termini generated after a protease cleavage event. As such, our gel-free proteomics technology is widely applicable and amenable for a variety of proteome-driven protease degradomics research

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    A Sparse Graph-Structured Lasso Mixed Model for Genetic Association with Confounding Correction

    Full text link
    While linear mixed model (LMM) has shown a competitive performance in correcting spurious associations raised by population stratification, family structures, and cryptic relatedness, more challenges are still to be addressed regarding the complex structure of genotypic and phenotypic data. For example, geneticists have discovered that some clusters of phenotypes are more co-expressed than others. Hence, a joint analysis that can utilize such relatedness information in a heterogeneous data set is crucial for genetic modeling. We proposed the sparse graph-structured linear mixed model (sGLMM) that can incorporate the relatedness information from traits in a dataset with confounding correction. Our method is capable of uncovering the genetic associations of a large number of phenotypes together while considering the relatedness of these phenotypes. Through extensive simulation experiments, we show that the proposed model outperforms other existing approaches and can model correlation from both population structure and shared signals. Further, we validate the effectiveness of sGLMM in the real-world genomic dataset on two different species from plants and humans. In Arabidopsis thaliana data, sGLMM behaves better than all other baseline models for 63.4% traits. We also discuss the potential causal genetic variation of Human Alzheimer's disease discovered by our model and justify some of the most important genetic loci.Comment: Code available at https://github.com/YeWenting/sGLM

    From aptamer-based biomarker discovery to diagnostic and clinical applications: an aptamer-based, streamlined multiplex proteomic assay

    Get PDF
    Recently, we reported an aptamer-based, highly multiplexed assay for the purpose of biomarker identification. To enable seamless transition from highly multiplexed biomarker discovery assays to a format suitable and convenient for diagnostic and life-science applications, we developed a streamlined, plate-based version of the assay. The plate-based version of the assay is robust, sensitive (sub-picomolar), rapid, can be highly multiplexed (upwards of 60 analytes), and fully automated. We demonstrate that quantification by microarray-based hybridization, Luminex bead-based methods, and qPCR are each compatible with our platform, further expanding the breadth of proteomic applications for a wide user community
    corecore