5,180 research outputs found

    A comprehensive assessment of N-terminal signal peptides prediction methods

    Get PDF
    Background: Amino-terminal signal peptides (SPs) are short regions that guide the targeting of secretory proteins to the correct subcellular compartments in the cell. They are cleaved off upon the passenger protein reaching its destination. The explosive growth in sequencing technologies has led to the deposition of vast numbers of protein sequences necessitating rapid functional annotation techniques, with subcellular localization being a key feature. Of the myriad software prediction tools developed to automate the task of assigning the SP cleavage site of these new sequences, we review here, the performance and reliability of commonly used SP prediction tools. Results: The available signal peptide data has been manually curated and organized into three datasets representing eukaryotes, Gram-positive and Gram-negative bacteria. These datasets are used to evaluate thirteen prediction tools that are publicly available. SignalP (both the HMM and ANN versions) maintains consistency and achieves the best overall accuracy in all three benchmarking experiments, ranging from 0.872 to 0.914 although other prediction tools are narrowing the performance gap. Conclusion: The majority of the tools evaluated in this study encounter no difficulty in discriminating between secretory and non-secretory proteins. The challenge clearly remains with pinpointing the correct SP cleavage site. The composite scoring schemes employed by SignalP may help to explain its accuracy. Prediction task is divided into a number of separate steps, thus allowing each score to tackle a particular aspect of the prediction.12 page(s

    The importance of physicochemical characteristics and nonlinear classifiers in determining HIV-1 protease specificity

    Get PDF
    This paper reviews recent research relating to the application of bioinformatics approaches to determining HIV-1 protease specificity, outlines outstanding issues, and presents a new approach to addressing these issues. Leading machine learning theory for the problem currently suggests that the direct encoding of the physicochemical properties of the amino acid substrates is not required for optimal performance. A number of amino acid encoding approaches which incorporate potentially relevant physicochemical properties of the substrate are identified, and are evaluated using a nonlinear task decomposition based neuroevolution algorithm. The results are evaluated, and compared against a recent benchmark set on a nonlinear classifier using only amino acid sequence and identity information. Ensembles of these nonlinear classifiers using the physicochemical properties of the substrate are demonstrated to consistently outperform the recently published state-of-the-art linear support vector machine based approach in out-of-sample evaluations

    Protein family classification using multiple-class neural networks.

    Get PDF
    The objective of genomic sequence analysis is to retrieve important information from the vast amount of genomic sequence data, such as DNA, RNA and protein sequences. The main task includes the interpretation of the function of DNA sequence on a genomic scale, the comparisons among genomes to gain insight into the universality of biological mechanisms and into the details of gene structure and function, the determination of the structure of all proteins and protein family classification. With its many features and capabilities for recognition, generalization and classification, artificial neural network technology is well suited for sequence analysis. At the state of the art, many methods have been devised to determine if a given protein sequence is member of a given protein superfamily. This is a binary classification problem, and efficient neural network techniques are mentioned in literature for solving such problem. In this Master\u27s thesis, we consider the problem of classifying given protein sequences into one among at least three protein families using neural networks, and, propose two methods: Pair-wise Multiple Classification Approach and Single Network Approach for this problem. In Pair-wise Multiple Classification Approach , several sub-networks are employed to perform the task whereas a compact network system is used in Single Network Approach . We performed experiments, using SNNS and UOWNNS neural network simulator on our NNs with different input/output representation, and reported accuracies as high as 95%. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2004 .Z54. Source: Masters Abstracts International, Volume: 43-01, page: 0248. Adviser: Alioune Ngom. Thesis (M.Sc.)--University of Windsor (Canada), 2004

    State of the art and challenges in sequence based T-cell epitope prediction

    Get PDF
    Sequence based T-cell epitope predictions have improved immensely in the last decade. From predictions of peptide binding to major histocompatibility complex molecules with moderate accuracy, limited allele coverage, and no good estimates of the other events in the antigen-processing pathway, the field has evolved significantly. Methods have now been developed that produce highly accurate binding predictions for many alleles and integrate both proteasomal cleavage and transport events. Moreover have so-called pan-specific methods been developed, which allow for prediction of peptide binding to MHC alleles characterized by limited or no peptide binding data. Most of the developed methods are publicly available, and have proven to be very useful as a shortcut in epitope discovery. Here, we will go through some of the history of sequence-based predictions of helper as well as cytotoxic T cell epitopes. We will focus on some of the most accurate methods and their basic background

    Genome analysis of Phytophthora cactorum strains associated with crown- and leather-rot in strawberry

    Get PDF
    Phytophthora cactorum has two distinct pathotypes that cause crown rot and leather rot in strawberry (Fragaria × ananassa). Strains of the crown rot pathotype can infect both the rhizome (crown) and fruit tissues, while strains of the leather rot pathotype can only infect the fruits of strawberry. The genome of a highly virulent crown rot strain, a low virulent crown rot strain, and three leather rot strains were sequenced using PacBio high fidelity (HiFi) long read sequencing. The reads were de novo assembled to 66.4–67.6 megabases genomes in 178–204 contigs, with N50 values ranging from 892 to 1,036 kilobases. The total number of predicted complete genes in the five P. cactorum genomes ranged from 17,286 to 17,398. Orthology analysis identified a core secretome of 8,238 genes. Comparative genomic analysis revealed differences in the composition of potential virulence effectors, such as putative RxLR and Crinklers, between the crown rot and the leather rot pathotypes. Insertions, deletions, and amino acid substitutions were detected in genes encoding putative elicitors such as beta elicitin and cellulose-binding domain proteins from the leather rot strains compared to the highly virulent crown rot strain, suggesting a potential mechanism for the crown rot strain to escape host recognition during compatible interaction with strawberry. The results presented here highlight several effectors that may facilitate the tissue-specific colonization of P. cactorum in strawberry.publishedVersio

    Prediction of Candidate Primary Immunodeficiency Disease Genes Using a Support Vector Machine Learning Approach

    Get PDF
    Screening and early identification of primary immunodeficiency disease (PID) genes is a major challenge for physicians. Many resources have catalogued molecular alterations in known PID genes along with their associated clinical and immunological phenotypes. However, these resources do not assist in identifying candidate PID genes. We have recently developed a platform designated Resource of Asian PDIs, which hosts information pertaining to molecular alterations, protein–protein interaction networks, mouse studies and microarray gene expression profiling of all known PID genes. Using this resource as a discovery tool, we describe the development of an algorithm for prediction of candidate PID genes. Using a support vector machine learning approach, we have predicted 1442 candidate PID genes using 69 binary features of 148 known PID genes and 3162 non-PID genes as a training data set. The power of this approach is illustrated by the fact that six of the predicted genes have recently been experimentally confirmed to be PID genes. The remaining genes in this predicted data set represent attractive candidates for testing in patients where the etiology cannot be ascribed to any of the known PID genes

    The Role of MSA in the Global Regulation of Virulence in \u3ci\u3eStaphylococcus aureus\u3c/i\u3e

    Get PDF
    Staphylococcus aureus is an important pathogen causing life threatening diseases in humans. Previously we showed that msa modulates the activity of sarA (Staphylococcal accessory regulator), which is one of a major global regulator of virulence in S. aureus. The objective of this study is to characterize the role of msa (Modulator of SarA) in the global regulation of virulence in S. aureus. Structure and function predictions were done using several computational tools and approaches to understand the nature of msa. A novel S. aureus microarray meta-database (SAMMD) was designed and developed to compare and contrast other transcriptomes with msa transcriptome. msa and sarA transcriptomes were generated using the microarray technology. Phenotypic and molecular assays were performed to support microarray results. The results show that msa is a putative transmembrane protein, with three transmembrane segments, a distinct N-terminal cleavable signal peptide, four phophorylation sites (two outside and two inside the membrane) and a binding site in the cytoplasmic region. Microarray results and comparative transcriptome analysis using SAMMD showed that several genes regulated by msa are also regulated by sarA. Based on these results I hypothesize that msa is a novel signal transducer, which modulates the activity of genes involved in virulence in a sar/\-dependent manner, while modulating the activity of genes involved in metabolism in a sar-4-independent manner

    Characterization of Gamma-Secretase-Mediated Cleavage of Receptor Tyrosine Kinases

    Get PDF
    ABSTRACT Receptor tyrosine kinases (RTK) are a family of cell surface receptors consisting of 55 members. RTKs regulate intracellular signaling pathways that control fundamental cellular processes including differentiation, proliferation, and survival. The functionality of RTKs is necessary for the development and homeostasis of many tissues. In human pathologies, such as cancer, aberrant RTK signaling is a common feature. Gamma-secretase-mediated regulated intramembrane proteolysis is a proteolytic cleavage of RTKs in two sequential proteolytic events: a sheddasemediated ectodomain shedding followed by the release of a soluble intracellular domain by a gamma-secretase cleavage. The aims of my thesis were to characterize the gamma-secretase-mediated cleavage of RTKs, with a focus on identifying the prevalence of cleavage among RTKs and developing novel methods to identify signaling pathways associated with the process. The results of this thesis indicate that at least half of the RTKs are subjected to gamma-secretase cleavage. In total, 12 new gamma-secretase targets were identified. Many of the identified new gamma-secretase target RTKs, for example AXL and TYRO3, presented cleavage-dependent effect on cell growth. My research also demonstrated that the signaling of TYRO3 full-length receptor and soluble intracellular domain of TYRO3 is different as observed with our novel systems biology methods. Together, these findings represent for a first time an approach to determine the prevalence of gamma-secretase cleavage among RTKs. Moreover, this study presents novel methods and tools for identifying still largely unknown RTK cleavage associated signaling pathways. The RTK processing via proteolytical cleavage has indications for the functionality of RTKs in both normal tissues and cancer. The results of this thesis can provide new insights into the regulation of the functions of RTKs and can be used to develop new strategies to treat cancers. KEYWORDS: receptor tyrosine kinase, RTK, gamma-secretase, regulated intramembrane proteolysis, intracellular kinase domain, shedding, proteomicsTIIVISTELMÄ Ihmisen genomi sisältää 55 reseptorityrosiinikinaasia (RTK). RTK:t ovat solukalvolla sijaitsevia signalointiproteiineja. RTK:t signaloivat solunsisäisten signalointireittien välityksellä ja säätelevät elintärkeitä solutapahtumia, kuten solujen lisääntymistä, erilaistumista ja selviytymistä. RTK ovat tärkeitä monien kudosten kehittymisessä, ja niiden epänormaalia toimintaa on todettu monissa sairauksissa, kuten syövissä. Gamma-sekretaasivälitteinen säädelty solukalvonsisäinen proteolyysi on mekanismi, jolla RTK:t katkaistaan proteolyyttisesti. Tämä on kaksivaiheinen tapahtuma. RTK:n solunulkoinen domeeni katkaistaan ensin ADAM-nimisten proteiinien toimesta ja tätä seuraa gamma-sekretaasin tekemä solukalvon sisäisen osan irrottaminen solukalvolta. Tämän väitöskirjan tavoitteena oli karakterisoida RTK:iden gamma-sekretaasikatkeamista. RTK:iden katkeamisen yleisyyden selvittäminen, sekä menetelmien kehitys, joilla paremmin pystytään tunnistamaan RTK:iden katkeamiseen liittyvää signalointia, olivat tarkemman tutkimuksen kohteena. Selvitimme, että puolet ihmisen RTK:ista on kohteena gamma-sekretaasi-välitteiselle katkaisulle ja tunnistimme yhteensä 12 uutta kohdetta. TYRO3 ja AXL RTK:iden kohdalla solujen kasvun lisääntyminen liittyi näiden RTK:iden katkeamiseen. Lisäksi väitöskirjatutkimuksessani pystyimme osoittamaan, että TYRO3 RTK:n katkeamisesta muodostuvan liukoisen osan aikaansaama signalointi eroaa merkittävästi kokopitkän TYRO3:n aikaansaamasta signaloinnista. Tutkimuksessa tehdyt havainnot osoittavat, että RTK:iden katkeaminen on yleistä ja uudenlaiset analysointimenetelmät auttavat aikaisempaa paremmin tunnistamaan uusia signalointireittejä katkeaville RTK:ille. Tutkimuksen tulokset RTK:iden katkeamisesta sekä sen signaloinnista laajentavat ymmärrystämme RTK:iden signaloinnista ja tulosten antamaa tietoa voidaan käyttää uusien syöpähoitojen kehittämisessä. AVAINSANAT: reseptorityrosiinikinaasi, RTK, gamma-sekretaasi, säädelty solukalvonsisäinen proteolyysi, solunsisäinen kinaasidomeeni, proteomiikka

    Genome-Wide Identification, Characterization and Phylogenetic Analysis of the Rice LRR-Kinases

    Get PDF
    LRR-kinases constitute the largest subfamily of receptor-like kinases in plants and regulate a wide variety of processes related to development and defense. Through a reiterative process of sequence analysis and re-annotation, we identified 309 LRR-kinase genes in the rice genome (Nipponbare). Among them, 127 genes in the Rice Annotation Project Database and 85 in Refseq of NCBI were amended (in addition, 62 LRR-kinase genes were not annotated in Refseq). The complete set of LRR-kinases was characterized. These LRR-kinases were classified into five groups according to phylogenetic analysis, and the genes in groups 1, 2, 3 and 4 usually have fewer introns than those in group 5. The introns in the LRR domain, which are highly conserved in regards to their positions and configurations, split the first Leu or other amino residues at this position of the ‘xxLxLxx’ motif with phase 2 and usually separate one or more LRR repeats exactly. Tandemly repeated LRR motifs have evolved from exon duplication, mutation and exon shuffling. The extensive distribution and diversity of the LRR-kinase genes have been mainly generated by tandem duplication and mutation after whole genome duplication. Positive selection has made a limited contribution to the sequence diversity after duplication, but positively selected sites located in the LRR domain are thought to involve in the protein-protein interaction
    corecore