3,668 research outputs found

    Machine learning and mapping algorithms applied to proteomics problems

    Get PDF
    Proteins provide evidence that a given gene is expressed, and machine learning algorithms can be applied to various proteomics problems in order to gain information about the underlying biology. This dissertation applies machine learning algorithms to proteomics data in order to predict whether or not a given peptide is observable by mass spectrometry, whether a given peptide can serve as a cell penetrating peptide, and then utilizes the peptides observed through mass spectrometry to aid in the structural annotation of the chicken genome. Peptides observed by mass spectrometry are used to identify proteins, and being able to accurately predict which peptides will be seen can allow researchers to analyze to what extent a given protein is observable. Cell penetrating peptides can possibly be utilized to allow targeted small molecule delivery across cellular membranes and possibly serve a role as drug delivery peptides. Peptides and proteins identified through mass spectrometry can help refine computational gene models and improve structural genome annotations

    Proteins associated with pancreatic cancer survival in patients with resectable pancreatic ductal adenocarcinoma.

    Get PDF
    Pancreatic ductal adenocarcinoma (PDAC) is a highly lethal disease with a dismal prognosis. However, while most patients die within the first year of diagnosis, very rarely, a few patients can survive for >10 years. Better understanding the molecular characteristics of the pancreatic adenocarcinomas from these very-long-term survivors (VLTS) may provide clues for personalized medicine and improve current pancreatic cancer treatment. To extend our previous investigation, we examined the proteomes of individual pancreas tumor tissues from a group of VLTS patients (survival ≥10 years) and short-term survival patients (STS, survival <14 months). With a given analytical sensitivity, the protein profile of each pancreatic tumor tissue was compared to reveal the proteome alterations that may be associated with pancreatic cancer survival. Pathway analysis of the differential proteins identified suggested that MYC, IGF1R and p53 were the top three upstream regulators for the STS-associated proteins, and VEGFA, APOE and TGFβ-1 were the top three upstream regulators for the VLTS-associated proteins. Immunohistochemistry analysis using an independent cohort of 145 PDAC confirmed that the higher abundance of ribosomal protein S8 (RPS8) and prolargin (PRELP) were correlated with STS and VLTS, respectively. Multivariate Cox analysis indicated that 'High-RPS8 and Low-PRELP' was significantly associated with shorter survival time (HR=2.69, 95% CI 1.46-4.92, P=0.001). In addition, galectin-1, a previously identified protein with its abundance aversely associated with pancreatic cancer survival, was further evaluated for its significance in cancer-associated fibroblasts. Knockdown of galectin-1 in pancreatic cancer-associated fibroblasts dramatically reduced cell migration and invasion. The results from our study suggested that PRELP, LGALS1 and RPS8 might be significant prognostic factors, and RPS8 and LGALS1 could be potential therapeutic targets to improve pancreatic cancer survival if further validated

    Machine learning applications in proteomics research: How the past can boost the future

    Get PDF
    Machine learning is a subdiscipline within artificial intelligence that focuses on algorithms that allow computers to learn solving a (complex) problem from existing data. This ability can be used to generate a solution to a particularly intractable problem, given that enough data are available to train and subsequently evaluate an algorithm on. Since MS-based proteomics has no shortage of complex problems, and since publicly available data are becoming available in ever growing amounts, machine learning is fast becoming a very popular tool in the field. We here therefore present an overview of the different applications of machine learning in proteomics that together cover nearly the entire wet- and dry-lab workflow, and that address key bottlenecks in experiment planning and design, as well as in data processing and analysis.acceptedVersio

    De novo sequencing of MS/MS spectra

    Get PDF
    Proteomics is the study of proteins, their time- and location-dependent expression profiles, as well as their modifications and interactions. Mass spectrometry is useful to investigate many of the questions asked in proteomics. Database search methods are typically employed to identify proteins from complex mixtures. However, databases are not often available or, despite their availability, some sequences are not readily found therein. To overcome this problem, de novo sequencing can be used to directly assign a peptide sequence to a tandem mass spectrometry spectrum. Many algorithms have been proposed for de novo sequencing and a selection of them are detailed in this article. Although a standard accuracy measure has not been agreed upon in the field, relative algorithm performance is discussed. The current state of the de novo sequencing is assessed thereafter and, finally, examples are used to construct possible future perspectives of the field. © 2011 Expert Reviews Ltd.The Turkish Academy of Science (TÜBA

    Quantitative analysis of mass spectrometry proteomics data : Software for improved life science

    Get PDF
    The rapid advances in life science, including the sequencing of the human genome and numerous other techiques, has given an extraordinary ability to aquire data on biological systems and human disease. Even so, drug development costs are higher than ever, while the rate of new approved treatments is historically low. A potential explanation to this discrepancy might be the difficulty of understanding the biology underlying the acquired data; the difficulty to refine the data to useful knowledge through interpretation. In this thesis the refinement of the complex data from mass spectrometry proteomics is studied. A number of new algorithms and programs are presented and demonstrated to provide increased analytical ability over previously suggested alternatives. With the higher goal of increasing the mass spectrometry laboratory scientific output, pragmatic studies were also performed, to create new set on compression algorithms for reduced storage requirement of mass spectrometry data, and also to characterize instrument stability. The final components of this thesis are the discussion of the technical and instrumental weaknesses associated with the currently employed mass spectrometry proteomics methodology, and the discussion of current lacking academical software quality and the reasons thereof. As a whole, the primary algorithms, the enabling technology, and the weakness discussions all aim to improve the current capability to perform mass spectrometry proteomics. As this technology is crucial to understand the main functional components of biology, proteins, this quest should allow better and higher quality life science data, and ultimately increase the chances of developing new treatments or diagnostics

    EPIC-DB: a proteomics database for studying Apicomplexan organisms

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>High throughput proteomics experiments are useful for analyzing the protein expression of an organism, identifying the correct gene structure of a genome, or locating possible post-translational modifications within proteins. High throughput methods necessitate publicly accessible and easily queried databases for efficiently and logically storing, displaying, and analyzing the large volume of data.</p> <p>Description</p> <p>EPICDB is a publicly accessible, queryable, relational database that organizes and displays experimental, high throughput proteomics data for <it>Toxoplasma gondii </it>and <it>Cryptosporidium parvum</it>. Along with detailed information on mass spectrometry experiments, the database also provides antibody experimental results and analysis of functional annotations, comparative genomics, and aligned expressed sequence tag (EST) and genomic open reading frame (ORF) sequences. The database contains all available alternative gene datasets for each organism, which comprises a complete theoretical proteome for the respective organism, and all data is referenced to these sequences. The database is structured around clusters of protein sequences, which allows for the evaluation of redundancy, protein prediction discrepancies, and possible splice variants. The database can be expanded to include genomes of other organisms for which proteome-wide experimental data are available.</p> <p>Conclusion</p> <p>EPICDB is a comprehensive database of genome-wide <it>T. gondii </it>and <it>C. parvum </it>proteomics data and incorporates many features that allow for the analysis of the entire proteomes and/or annotation of specific protein sequences. EPICDB is complementary to other -genomics- databases of these organisms by offering complete mass spectrometry analysis on a comprehensive set of all available protein sequences.</p

    Differences in genotype and virulence among four multidrug-resistant <i>Streptococcus pneumoniae</i> isolates belonging to the PMEN1 clone

    Get PDF
    We report on the comparative genomics and characterization of the virulence phenotypes of four &lt;i&gt;S. pneumoniae&lt;/i&gt; strains that belong to the multidrug resistant clone PMEN1 (Spain&lt;sup&gt;23F&lt;/sup&gt; ST81). Strains SV35-T23 and SV36-T3 were recovered in 1996 from the nasopharynx of patients at an AIDS hospice in New York. Strain SV36-T3 expressed capsule type 3 which is unusual for this clone and represents the product of an in vivo capsular switch event. A third PMEN1 isolate - PN4595-T23 - was recovered in 1996 from the nasopharynx of a child attending day care in Portugal, and a fourth strain - ATCC700669 - was originally isolated from a patient with pneumococcal disease in Spain in 1984. We compared the genomes among four PMEN1 strains and 47 previously sequenced pneumococcal isolates for gene possession differences and allelic variations within core genes. In contrast to the 47 strains - representing a variety of clonal types - the four PMEN1 strains grouped closely together, demonstrating high genomic conservation within this lineage relative to the rest of the species. In the four PMEN1 strains allelic and gene possession differences were clustered into 18 genomic regions including the capsule, the blp bacteriocins, erythromycin resistance, the MM1-2008 prophage and multiple cell wall anchored proteins. In spite of their genomic similarity, the high resolution chinchilla model was able to detect variations in virulence properties of the PMEN1 strains highlighting how small genic or allelic variation can lead to significant changes in pathogenicity and making this set of strains ideal for the identification of novel virulence determinant

    Identification and functional characterization of secreted effector proteins of the hemibiotrophic fungus Colletotrichum higginsianum

    Get PDF
    The hemibiotrophic ascomycete fungus Colletotrichum higginsianum causes anthracnose on cruciferous crops and the model plant Arabidopsis thaliana. Successful infection of wild-type plants requires sequential development of specialized infection structures, including melanized appressoria for initial penetration, and bulbous biotrophic hyphae formed inside living epidermal cells. It was hypothesized that appressoria and biotrophic hyphae secrete effector proteins that permit the fungus to evade or disarm host defence responses and to reprogram host cells. This study aimed to define the repertoire of fungal effectors expressed during plant infection and to characterise their biological activity. Discovery of Colletotrichum higginsianum Effector Candidates (ChECs) was accomplished by computational mining collections of infection stage-specific expressed sequence tags (ESTs) for genes encoding solubly secreted proteins with either no homology to known proteins or resembling presumed effectors from other pathogens. Fungal cell types and infection stages sampled for cDNA generation and pyrosequencing included developing and mature in vitro appressoria, early invasive growth in planta, biotrophy and late necrotrophy. After assembling contiguous sequences, analysis of their EST composition allowed the identification of putative plant-induced genes and the definition of a set of 69 ChEC genes that are preferentially expressed at biotrophy-relevant stages. In relation to other infection stages, the early host invasion transcriptome was enriched for genes encoding higginsianum-specific proteins and plant-induced secreted proteins, including ChECs. This suggests that the initial establishment of biotrophy requires the highest proportion of stage-specific effectors and diversified genes. One further ChEC was identified using a complementary proteomic analysis of secreted proteins produced by conidial germlings developing in vitro. Expression analysis showed that transcription of most of the ChECs chosen for further study were highly stage-specific, with ChEC3, ChEC3a, ChEC4 and ChEC6 all being plant-induced. Targeted gene replacement showed that neither ChEC1 nor ChEC2 contribute measurably to fungal virulence. Upon transient expression in tobacco, ChEC3, ChEC3a or ChEC5 all suppressed plant cell death evoked by a C. higginsianum homologue of Necrosis and Ethylene-inducing Peptide1-like proteins, but not by the Phytophthora infestans elicitin INF1, suggesting that there is functional redundancy between C. higginsianum effectors. ChEC4 was found to contain a functional nuclear localization signal and signal peptide, and was shown to be secreted by the fungus during plant infection using fluorescent protein-tagging. This raises the possibility that ChEC4 is translocated into the host nucleus for transcriptional re-programming

    Exploring lncRNAs in cancer : tools for discovery and characterization of cancer associated lncRNAs

    Get PDF
    • …
    corecore