16 research outputs found

    Putative antimicrobial peptides of the posterior salivary glands from the cephalopod octopus vulgaris revealed by exploring a composite protein database

    Get PDF
    Cephalopods, successful predators, can use a mixture of substances to subdue their prey, becoming interesting sources of bioactive compounds. In addition to neurotoxins and enzymes, the presence of antimicrobial compounds has been reported. Recently, the transcriptome and the whole proteome of the Octopus vulgaris salivary apparatus were released, but the role of some compounds—e.g., histones, antimicrobial peptides (AMPs), and toxins—remains unclear. Herein, we profiled the proteome of the posterior salivary glands (PSGs) of O. vulgaris using two sample preparation protocols combined with a shotgun-proteomics approach. Protein identification was performed against a composite database comprising data from the UniProtKB, all transcriptomes available from the cephalopods’ PSGs, and a comprehensive non-redundant AMPs database. Out of the 10,075 proteins clustered in 1868 protein groups, 90 clusters corresponded to venom protein toxin families. Additionally, we detected putative AMPs clustered with histones previously found as abundant proteins in the saliva of O. vulgaris. Some of these histones, such as H2A and H2B, are involved in systemic inflammatory responses and their antimicrobial effects have been demonstrated. These results not only confirm the production of enzymes and toxins by the O. vulgaris PSGs but also suggest their involvement in the first line of defense against microbes.AA was partially supported by the Strategic Funding UIDB/04423/2020 and UIDP/04423/2020 through national funds provided by FCT and the European Regional Development Fund (ERDF) in the framework of the program PT2020, by the European Structural and Investment Funds (ESIF) through the Competitiveness and Internationalization Operational Progra-COMPETE 2020 and by National Funds through the FCT under the project PTDC/CTA-AMB/31774/2017 (POCI-01-0145-FEDER/031774/2017)

    Exploring general-purpose protein features for distinguishing enzymes and non-enzymes within the twilight zone

    Get PDF
    Background: Computational prediction of protein function constitutes one of the more complex problems in Bioinformatics, because of the diversity of functions and mechanisms in that proteins exert in nature. This issue is reinforced especially for proteins that share very low primary or tertiary structure similarity to existing annotated proteomes. In this sense, new alignment-free (AF) tools are needed to overcome the inherent limitations of classic alignment-based approaches to this issue. We have recently introduced AF protein-numerical-encoding programs (TI2BioP and ProtDCal), whose sequence-based features have been successfully applied to detect remote protein homologs, post-translational modifications and antibacterial peptides. Here we aim to demonstrate the applicability of 4 AF protein descriptor families, implemented in our programs, for the identification enzyme-like proteins. At the same time, the use of our novel family of 3D-structure-based descriptors is introduced for the first time. The Dobson & Doig (D&D) benchmark dataset is used for the evaluation of our AF protein descriptors, because of its proven structural diversity that permits one to emulate an experiment within the twilight zone of alignment-based methods (pair-wise identity <30%). The performance of our sequence-based predictor was further assessed using a subset of formerly uncharacterized proteins which currently represent a benchmark annotation dataset. Results: Four protein descriptor families (sequence-composition-based (0D), linear-topology-based (1D), pseudo-fold-topology-based (2D) and 3D-structure features (3D), were assessed using the D&D benchmark dataset. We show that only the families of ProtDCal's descriptors (0D, 1D and 3D) encode significant information for enzymes and non-enzymes discrimination. The obtained 3D-structure-based classifier ranked first among several other SVM-based methods assessed in this dataset. Furthermore, the model leveraging 1D descriptors, showed a higher success rate than EzyPred on a benchmark annotation dataset from the Shewanella oneidensis proteome. Conclusions: The applicability of ProtDCal as a general-purpose-AF protein modelling method is illustrated through the discrimination between two comprehensive protein functional classes. The observed performances using the highly diverse D&D dataset, and the set of formerly uncharacterized (hard-to-annotate) proteins of Shewanella oneidensis, places our methodology on the top range of methods to model and predict protein function using alignment-free approaches. © 2017 The Author(s).Acknowledgements The authors thank Dr. Reinaldo Molina-Ruiz for his assistance in obtaining the latest version of TI2BioP program. GACh acknowledges Dr. Federico Pallardo’s support, Dean of Medicine and Dentistry Faculty, University of Valencia (UV) in regards to the access to the UV’s facilities during part of this work. Funding YBRB is financed by a Postdoc Fellowship in the Chemistry Institute of the UNAM (DGAPA-UNAM [PAPIIT-IN200115]). GACh was funded by a Postdoc fellowship (SFRH/BPD/92978/2013) granted by the Portuguese Fundação para a CiĂȘncia e a Tecnologia (FCT). AA was partially supported by the Strategic Funding UID/Multi/04423/2013 through national funds provided by FCT and the European Regional Development Fund (ERDF) in the framework of the program PT2020, by the European Structural and Investment Funds (ESIF) through the Competitiveness and Internationalization Operational Program – COMPETE 2020 and by National Funds through the FCT under the project PTDC/AAG-GLO/6887/2014 (POCI-01-0124-FEDER-016845), and by the Structured Programs of R&D&I INNOVMAR (NORTE-01-0145-FEDER-000035 – NOVELMAR) and CORAL NORTE (NORTE- 01–0145-FEDER-000036), and funded by the Northern Regional Operational Program (NORTE2020) through the ERDF. The funding sources were not involved with the design of the study, analysis and interpretation of data or in the writing of the manuscript

    Graph theory-based sequence descriptors as remote homology predictors

    Get PDF
    Indexación: Scopus.Alignment-free (AF) methodologies have increased in popularity in the last decades as alternative tools to alignment-based (AB) algorithms for performing comparative sequence analyses. They have been especially useful to detect remote homologs within the twilight zone of highly diverse gene/protein families and superfamilies. The most popular alignment-free methodologies, as well as their applications to classification problems, have been described in previous reviews. Despite a new set of graph theory-derived sequence/structural descriptors that have been gaining relevance in the detection of remote homology, they have been omitted as AF predictors when the topic is addressed. Here, we first go over the most popular AF approaches used for detecting homology signals within the twilight zone and then bring out the state-of-the-art tools encoding graph theory-derived sequence/structure descriptors and their success for identifying remote homologs. We also highlight the tendency of integrating AF features/measures with the AB ones, either into the same prediction model or by assembling the predictions from different algorithms using voting/weighting strategies, for improving the detection of remote signals. Lastly, we briefly discuss the efforts made to scale up AB and AF features/measures for the comparison of multiple genomes and proteomes. Alongside the achieved experiences in remote homology detection by both the most popular AF tools and other less known ones, we provide our own using the graphical–numerical methodologies, MARCH-INSIDE, TI2BioP, and ProtDCal. We also present a new Python-based tool (SeqDivA) with a friendly graphical user interface (GUI) for delimiting the twilight zone by using several similar criteria.https://www.mdpi.com/2218-273X/10/1/2

    Exploring general-purpose protein features for distinguishing enzymes and non-enzymes within the twilight zone

    Get PDF
    Background: Computational prediction of protein function constitutes one of the more complex problems in Bioinformatics, because of the diversity of functions and mechanisms in that proteins exert in nature. This issue is reinforced especially for proteins that share very low primary or tertiary structure similarity to existing annotated proteomes. In this sense, new alignment-free (AF) tools are needed to overcome the inherent limitations of classic alignment-based approaches to this issue. We have recently introduced AF protein-numerical-encoding programs (TI2BioP and ProtDCal), whose sequence-based features have been successfully applied to detect remote protein homologs, post-translational modifications and antibacterial peptides. Here we aim to demonstrate the applicability of 4 AF protein descriptor families, implemented in our programs, for the identification enzyme-like proteins. At the same time, the use of our novel family of 3D-structure-based descriptors is introduced for the first time. The Dobson & Doig (D&D) benchmark dataset is used for the evaluation of our AF protein descriptors, because of its proven structural diversity that permits one to emulate an experiment within the twilight zone of alignment-based methods (pair-wise identity <30%). The performance of our sequence-based predictor was further assessed using a subset of formerly uncharacterized proteins which currently represent a benchmark annotation dataset. Results: Four protein descriptor families (sequence-composition-based (0D), linear-topology-based (1D), pseudo-fold-topology-based (2D) and 3D-structure features (3D), were assessed using the D&D benchmark dataset. We show that only the families of ProtDCal's descriptors (0D, 1D and 3D) encode significant information for enzymes and non-enzymes discrimination. The obtained 3D-structure-based classifier ranked first among several other SVM-based methods assessed in this dataset. Furthermore, the model leveraging 1D descriptors, showed a higher success rate than EzyPred on a benchmark annotation dataset from the Shewanella oneidensis proteome. Conclusions: The applicability of ProtDCal as a general-purpose-AF protein modelling method is illustrated through the discrimination between two comprehensive protein functional classes. The observed performances using the highly diverse D&D dataset, and the set of formerly uncharacterized (hard-to-annotate) proteins of Shewanella oneidensis, places our methodology on the top range of methods to model and predict protein function using alignment-free approaches

    The Airway Microbiota in Cystic Fibrosis: A Complex Fungal and Bacterial Community—Implications for Therapeutic Management

    Get PDF
    International audienceBackground Given the polymicrobial nature of pulmonary infections in patients with cystic fibrosis (CF), it is essential to enhance our knowledge on the composition of the microbial community to improve patient management. In this study, we developed a pyrosequencing approach to extensively explore the diversity and dynamics of fungal and prokaryotic populations in CF lower airways. Methodology and Principal Findings Fungi and bacteria diversity in eight sputum samples collected from four adult CF patients was investigated using conventional microbiological culturing and high-throughput pyrosequencing approach targeting the ITS2 locus and the 16S rDNA gene. The unveiled microbial community structure was compared to the clinical profile of the CF patients. Pyrosequencing confirmed recently reported bacterial diversity and observed complex fungal communities, in which more than 60% of the species or genera were not detected by cultures. Strikingly, the diversity and species richness of fungal and bacterial communities was significantly lower in patients with decreased lung function and poor clinical status. Values of Chao1 richness estimator were statistically correlated with values of the Shwachman-Kulczycki score, body mass index, forced vital capacity, and forced expiratory volume in 1 s (p = 0.046, 0.047, 0.004, and 0.001, respectively for fungal Chao1 indices, and p = 0.010, 0.047, 0.002, and 0.0003, respectively for bacterial Chao1 values). Phylogenetic analysis showed high molecular diversities at the sub-species level for the main fungal and bacterial taxa identified in the present study. Anaerobes were isolated with Pseudomonas aeruginosa, which was more likely to be observed in association with Candida albicans than with Aspergillus fumigatus

    Composite Protein Database (nr) from Cephalopod Salivary Apparatus for In silico Enzymatic Digestion and Peptide Library Generation

    No full text
    The database includes proteins and translated transcriptomes from the posterior salivary glands (PSG) of Octopus vulgaris and 16 other cephalopods. It will be utilised to generate peptide libraries using various in silico enzymatic digestion protocols to explore peptide diversity.THIS DATASET IS ARCHIVED AT DANS/EASY, BUT NOT ACCESSIBLE HERE. TO VIEW A LIST OF FILES AND ACCESS THE FILES IN THIS DATASET CLICK ON THE DOI-LINK ABOV

    Peptide Libraries from Cephalopods&apos; Posterior Salivary Glands for Potential Antimicrobial Peptides.

    No full text
    Filtered peptide libraries consisting of 11-40 amino acids (AA) were generated, comprising conventional AAs, and with intended use as a source of antimicrobial peptides (AMPs). These peptide libraries were constructed through various digestion protocols, including single enzyme applications (Trypsin, Chymotrypsin, Proteinase-K, AspN, and GluC), sequential mode with two enzymes (Tryp-Chym, Tryp-ProtK, Tryp-AspN, and Tryp-GluC), and concurrent mode with two enzymes (Tryp-Chym, Tryp-ProtK, Tryp-AspN, and Tryp-GluC). The peptide libraries were derived from the Composite Protein Database originating from Cephalopods&apos; Posterior Salivary Glands.THIS DATASET IS ARCHIVED AT DANS/EASY, BUT NOT ACCESSIBLE HERE. TO VIEW A LIST OF FILES AND ACCESS THE FILES IN THIS DATASET CLICK ON THE DOI-LINK ABOV

    Generation of Peptide Libraries: Applying Various In silico Enzymatic Digestion Protocols on the Composite Protein Database (nr).

    No full text
    Peptide Diversity Generation through In silico Enzymatic Digestion of the Composite Protein Database (nr) from Cephalopod&apos;s Posterior Salivary Glands resulted in 13 distinct peptide libraries. These libraries were created using various digestion protocols, including just one enzyme (Trypsin, Chymotrypsin, Proteinase-K, AspN, and GluC), sequential mode with two enzymes (Tryp-Chym, Tryp-ProtK, Tryp-AspN, and Tryp-GluC), and concurrent mode with two enzymes (Tryp-Chym, Tryp-ProtK, Tryp-AspN, and Tryp-GluC).THIS DATASET IS ARCHIVED AT DANS/EASY, BUT NOT ACCESSIBLE HERE. TO VIEW A LIST OF FILES AND ACCESS THE FILES IN THIS DATASET CLICK ON THE DOI-LINK ABOV

    Omics Datasets to Create a Composite Protein Database from Cephalopod Salivary Apparatus for In silico Enzymatic Digestion and Peptide Library Generation

    No full text
    Database A—protein database from proteogenomic analyses of the Octupus vulgaris salivary apparatus, built by Fingerhut et al. (2018). DOI: 10.1021/acs.jproteome.8b00525, but retrieved from DOI: doi.org/10.3390/data5040110.Database C—proteins identified with Proteome Discoverer using 12 raw files against the UniProt database for the Metazoan taxonomic selection, built by Almeida et. al (2020) DOI: 10.3390/antibiotics9110757, but retrieved from DOI: doi.org/10.3390/data5040110.Database D—proteins identified from de novo transcriptome assemblies of 16 cephalopods’ Post Salivary Glands by TransDecoder, built by Almeida et. al (2020) DOI: 10.3390/antibiotics9110757, but retrieved from DOI: doi.org/10.3390/data5040110.Database E—proteins identified from de novo transcriptome assemblies of 16 cephalopods’ PSGs using a six-frame translation tool, which are not included in Database D, built by Almeida et. al (2020) DOI: 10.3390/antibiotics9110757, but retrieved from DOI: doi.org/10.3390/data5040110.Database F—proteins obtained using a six-frame translation tool using the transcripts profiled in the transcriptome of O. vulgaris [10.1021/acs.jproteome.8b00525] , but not included in Database A. Built by Almeida et. al (2020) DOI: 10.3390/antibiotics9110757, but retrieved from DOI: doi.org/10.3390/data5040110.THIS DATASET IS ARCHIVED AT DANS/EASY, BUT NOT ACCESSIBLE HERE. TO VIEW A LIST OF FILES AND ACCESS THE FILES IN THIS DATASET CLICK ON THE DOI-LINK ABOV

    The Harderian gland transcriptomes of Caraiba andreae, Cubophis cantherigerus and Tretanorhinus variabilis, three colubroid snakes from Cuba

    No full text
    The Harderian gland is a cephalic structure, widely distributed among vertebrates. In snakes, the Harderian gland is anatomically connected to the vomeronasal organ via the nasolacrimal duct, and in some species can be larger than the eyes. The function of the Harderian gland remains elusive, but it has been proposed to play a role in the production of saliva, pheromones, thermoregulatory lipids and growth factors, among others. Here, we have profiled the transcriptomes of the Harderian glands of three non-front-fanged colubroid snakes from Cuba: Caraiba andreae (Cuban Lesser Racer); Cubophis cantherigerus (Cuban Racer); and Tretanorhinus variabilis (Caribbean Water Snake), using Illumina HiSeq2000 100 bp paired-end. In addition to ribosomal and non-characterized proteins, the most abundant transcripts encode putative transport/binding, lipocalin/lipocalin-like, and bactericidal/permeability-increasing-like proteins. Transcripts coding for putative canonical toxins described in venomous snakes were also identified. This transcriptional profile suggests a more complex function than previously recognized for this enigmatic organ. © 2018DDP was supported by a PhD grant ( SFRH/BD/80592/2011 ) from the Portuguese Foundation for Science and Technology (FCT—Fundação para a CiĂȘncia e a Tecnologia, Portugal). GACh was also supported by a Postdoctoral grant ( SFRH/BPD/92978/2013 ) from the FCT . This study was funded in part by the projects PTDC/AAG-GLO/6887/2014 and by the Strategic Funding UID/Multi/04423/2013 through national funds provided by FCT and the European Regional Development Fund (ERDF) in the framework of the program PT2020 , by the European Structural and Investment Funds (ESIF) through the Competitiveness and Internationalization Operational Program — COMPETE 2020 and by National Funds through the FCT under the project PTDC/AAG-GLO/6887/2014 (POCI-01-0124-FEDER-016845), and by the Structured Programs of R&D&I INNOVMAR —Innovation and Sustainability in the Management and Exploitation of Marine Resources ( NORTE-01-0145-FEDER-000035 , Research Line NOVELMAR), and funded by the Northern Regional Operational Program ( NORTE2020 ) through the ERDF. Work performed at the Evolutionary and Translational Venomics Laboratory, Instituto de Biomedicina de Valencia (CSIC) was funded by grant BFU2013-42833-P from the Ministerio de EconomĂ­a y Competitividad , Madrid, Spain (to JJC). We are grateful to: TomĂĄs Michel RodrĂ­guez Cabrera (Sociedad Cubana de ZoologĂ­a) and LĂĄzaro Cuellar Yanes (undergraduate Biology student from La Universidad de La Habana, Cuba) for collecting specimens; Bruno Reis (CIIMAR, FCUP, University of Porto) for his help supervising the total RNA extraction; Prof. Alan H. Savitzky (Utah State University, USA), for his useful comments and insights on the Harderian Gland; to Yudermys Moya Chaviano, for helping with the confection of Fig. 1 ; Filipe Silva and Emanuel Maldonado (CIIMAR, FCUP, University of Porto) for helping with the analysis of contig expression performed with the CLC Genomics Worbench 8.5.1, and with the confection of Supplementary Table S1, respectively. Appendix
    corecore