1,733 research outputs found

    Novel CaLB-like Lipase Found Using ProspectBIO, a Software for Genome-Based Bioprospection

    Get PDF
    Enzymes have been highly demanded in diverse applications such as in the food, pharmaceutical, and industrial fuel sectors. Thus, in silico bioprospecting emerges as an efficient strategy for discovering new enzyme candidates. A new program called ProspectBIO was developed for this purpose as it can find non-annotated sequences by searching for homologs of a model enzyme directly in genomes. Here we describe the ProspectBIO software methodology and the experimental validation by prospecting for novel lipases by sequence homology to Candida antarctica lipase B (CaLB) and conserved motifs. As expected, we observed that the new bioprospecting software could find more sequences (1672) than a conventional similarity-based search in a protein database (733). Additionally, the absence of patent protection was introduced as a criterion resulting in the final selection of a putative lipase-encoding gene from Ustilago hordei (UhL). Expression of UhL in Pichia pastoris resulted in the production of an enzyme with activity towards a tributyrin substrate. The recombinant enzyme activity levels were 4-fold improved when lowering the temperature and increasing methanol concentrations during the induction phase in shake-flask cultures. Protein sequence alignment and structural modeling showed that the recombinant enzyme has high similarity and capability of adjustment to the structure of CaLB. However, amino acid substitutions identified in the active pocket entrance may be responsible for the differences in the substrate specificities of the two enzymes. Thus, the ProspectBIO software allowed the finding of a new promising lipase for biotechnological application without the need for laborious and expensive conventional bioprospecting experimental steps

    Interspecific and intraspecific gene variability in a 1-Mb region containing the highest density of NBS-LRR genes found in the melon genome

    Get PDF
    Background: Plant NBS-LRR -resistance genes tend to be found in clusters, which have been shown to be hot spots of genome variability. In melon, half of the 81 predicted NBS-LRR genes group in nine clusters, and a 1 Mb region on linkage group V contains the highest density of R-genes and presence/absence gene polymorphisms found in the melon genome. This region is known to contain the locus of Vat, an agronomically important gene that confers resistance to aphids. However, the presence of duplications makes the sequencing and annotation of R-gene clusters difficult, usually resulting in multi-gapped sequences with higher than average errors. - Results: A 1-Mb sequence that contains the largest NBS-LRR gene cluster found in melon was improved using a strategy that combines Illumina paired-end mapping and PCR-based gap closing. Unknown sequence was decreased by 70% while about 3,000 SNPs and small indels were corrected. As a result, the annotations of 18 of a total of 23 NBS-LRR genes found in this region were modified, including additional coding sequences, amino acid changes, correction of splicing boundaries, or fussion of ORFs in common transcription units. A phylogeny analysis of the R-genes and their comparison with syntenic sequences in other cucurbits point to a pattern of local gene amplifications since the diversification of cucurbits from other families, and through speciation within the family. A candidate Vat gene is proposed based on the sequence similarity between a reported Vat gene from a Korean melon cultivar and a sequence fragment previously absent in the unrefined sequence. - Conclusions: A sequence refinement strategy allowed substantial improvement of a 1 Mb fragment of the melon genome and the re-annotation of the largest cluster of NBS-LRR gene homologues found in melon. Analysis of the cluster revealed that resistance genes have been produced by sequence duplication in adjacent genome locations since the divergence of cucurbits from other close families, and through the process of speciation within the family a candidate Vat gene was also identified using sequence previously unavailable, which demonstrates the advantages of genome assembly refinements when analyzing complex regions such as those containing clusters of highly similar genes

    SCRIPDB: a portal for easy access to syntheses, chemicals and reactions in patents

    Get PDF
    The patent literature is a rich catalog of biologically relevant chemicals; many public and commercial molecular databases contain the structures disclosed in patent claims. However, patents are an equally rich source of metadata about bioactive molecules, including mechanism of action, disease class, homologous experimental series, structural alternatives, or the synthetic pathways used to produce molecules of interest. Unfortunately, this metadata is discarded when chemical structures are deposited separately in databases. SCRIPDB is a chemical structure database designed to make this metadata accessible. SCRIPDB provides the full original patent text, reactions and relationships described within any individual patent, in addition to the molecular files common to structural databases. We discuss how such information is valuable in medical text mining, chemical image analysis, reaction extraction and in silico pharmaceutical lead optimization. SCRIPDB may be searched by exact chemical structure, substructure or molecular similarity and the results may be restricted to patents describing synthetic routes. SCRIPDB is available at http://dcv.uhnres.utoronto.ca/SCRIPDB

    Protein structure prediction and structure-based protein function annotation

    Get PDF
    Nature tends to modify rather than invent function of protein molecules, and the log of the modifications is encrypted in the gene sequence. Analysis of these modification events in evolutionarily related genes is important for assigning function to hypothetical genes and their products surging in databases, and to improve our understanding of the bioverse. However, random mutations occurring during evolution chisel the sequence to an extent that both decrypting these codes and identifying evolutionary relatives from sequence alone becomes difficult. Thankfully, even after many changes at the sequence level, the protein three-dimensional structures are often conserved and hence protein structural similarity usually provide more clues on evolution of functionally related proteins. In this dissertation, I study the design of three bioinformatics modules that form a new hierarchical approach for structure prediction and function annotation of proteins based on sequence-to-structure-to-function paradigm. First, we design an online platform for structure prediction of protein molecules using multiple threading alignments and iterative structural assembly simulations (I-TASSER). I review the components of this module and have added features that provide function annotation to the protein sequences and help to combine experimental and biological data for improving the structure modeling accuracy. The online service of the system has been supporting more than 20,000 biologists from over 100 countries. Next, we design a new comparative approach (COFACTOR) to identify the location of ligand binding sites on these modeled protein structures and spot the functional residue constellations using an innovative global-to-local structural alignment procedure and functional sites in known protein structures. Based on both large-scale benchmarking and blind tests (CASP), the method demonstrates significant advantages over the state-of-the- art methods of the field in recognizing ligand-binding residues for both metal and non- metal ligands. The major advantage of the method is the optimal combination of the local and global protein structural alignments, which helps to recognize functionally conserved structural motifs among proteins that have taken different evolutionary paths. We further extend the COFACTOR global-to-local approach to annotate the gene- ontology and enzyme classifications of protein molecules. Here, we added two new components to COFACTOR. First, we developed a new global structural match algorithm that allows performing better structural search. Second, a sensitive technique was proposed for constructing local 3D-signature motifs of template proteins that lack known functional sites, which allows us to perform query-template local structural similarity comparisons with all template proteins. A scoring scheme that combines the confidence score of structure prediction with global-local similarity score is used for assigning a confidence score to each of the predicted function. Large scale benchmarking shows that the predicted functions have remarkably improved precision and recall rates and also higher prediction coverage than the state-of-art sequence based methods. To explore the applicability of the method for real-world cases, we applied the method to a subset of ORFs from Chlamydia trachomatis and the functional annotations provided new testable hypothesis for improving the understanding of this phylogenetically distinct bacterium

    Predicting controlled vocabulary based on text and citations: Case studies in medical subject headings in MEDLINE and patents

    Get PDF
    This dissertation makes three contributions in the area of controlled vocabulary prediction of Medical Subject Headings. The first contribution is a new partial matching measure based on distributional semantics. The second contribution is a probabilistic model based on text similarity and citations. The third contribution is a case study of cross-domain vocabulary prediction in US Patents. Medical subject headings (MeSH) are an important life sciences controlled vocabulary. They are an ideal ground to study controlled vocabulary prediction due to their complexity, hierarchical nature, and practical significance. The dissertation begins with an updated analysis of human indexing consistency in MEDLINE. This study demonstrates the need for partial matching measures to account for indexing variability. Here, I develop four measures combining the MeSH hierarchy and contextual similarity. These measures provide several new tools for evaluating and diagnosing controlled vocabulary models. Next, a generalized predictive model is introduced. This model uses citations and abstract similarity as inputs to a hybrid KNN classifier. Citations and abstracts are found to be complimentary in that they reliably produce unique and relevant candidate terms. Finally, the predictive model is applied to a corpus of approximately 65,000 biomedical US patents. This case study explores differences in the vocabulary of MEDLINE and patents, as well as the prospect for MeSH prediction to open new scholarly opportunities in economics and health policy research

    Transcriptome analysis of Thapsia laciniata rouy provides insights into terpenoid biosynthesis and diversity in apiaceae

    Get PDF
    Thapsia laciniata Rouy (Apiaceae) produces irregular and regular sesquiterpenoids with thapsane and guaiene carbon skeletons, as found in other Apiaceae species. A transcriptomic analysis utilizing Illumina next-generation sequencing enabled the identification of novel genes involved in the biosynthesis of terpenoids in Thapsia. From 66.78 million HQ paired-end reads obtained from T. laciniata roots, 64.58 million were assembled into 76,565 contigs (N50: 1261 bp). Seventeen contigs were annotated as terpene synthases and five of these were predicted to be sesquiterpene synthases. Of the 67 contigs annotated as cytochromes P450, 18 of these are part of the CYP71 clade that primarily performs hydroxylations of specialized metabolites. Three contigs annotated as aldehyde dehydrogenases grouped phylogenetically with the characterized ALDH1 from Artemisia annua and three contigs annotated as alcohol dehydrogenases grouped with the recently described ADH1 from A. annua. ALDH1 and ADH1 were characterized as part of the artemisinin biosynthesis. We have produced a comprehensive EST dataset for T. laciniata roots, which contains a large sample of the T. laciniata transcriptome. These transcriptome data provide the foundation for future research into the molecular basis for terpenoid biosynthesis in Thapsia and on the evolution of terpenoids in Apiaceae.Damian Paul Drew, Bjørn Dueholm, Corinna Weitzel, Ye Zhang, Christoph W. Sensen and Henrik Toft Simonse

    The exposed proteomes of Brachyspira hyodysenteriae and B. pilosicoli

    Get PDF
    Brachyspira hyodysenteriae and Brachyspira pilosicoli are well-known intestinal pathogens in pigs. B. hyodysenteriae is the causative agent of swine dysentery, a disease with an important impact on pig production while B. pilosicoli is responsible of a milder diarrheal disease in these animals, porcine intestinal spirochetosis. Recent sequencing projects have provided information for the genome of these species facilitating the search of vaccine candidates using reverse vaccinology approaches. However, practically no experimental evidence exists of the actual gene products being expressed and of those proteins exposed on the cell surface or released to the cell media. Using a cell-shaving strategy and a shotgun proteomic approach we carried out a large-scale characterization of the exposed proteins on the bacterial surface in these species as well as of peptides and proteins in the extracellular medium. The study included three strains of B. hyodysenteriae and two strains of B. pilosicoli and involved 148 LC-MS/MS runs on a high resolution Orbitrap instrument. Overall, we provided evidence for more than 29,000 different peptides pointing to 1625 and 1338 different proteins in B. hyodysenteriae and B. pilosicoli, respectively. Many of the most abundant proteins detected corresponded to described virulence factors and vaccine candidates. The level of expression of these proteins, however, was different among species and strains, stressing the value of determining actual gene product levels as a complement of genomic-based approaches for vaccine design.This work was funded by the Spanish MINECO (IPT-2011-0735-010000). The CSIC/UAB Proteomics Laboratory of IIBB-CSIC is a member of Proteored, PRB2-ISCIII and is supported by grant PT13/0001, of the PE I+D+i 2013-2016, funded by ISCIII and FEDER.Peer Reviewe
    corecore