86 research outputs found

    HPC-REDItools: A novel HPC-aware tool for improved large scale RNA-editing analysis

    Get PDF
    Background: RNA editing is a widespread co-/post-transcriptional mechanism that alters primary RNA sequences through the modification of specific nucleotides and it can increase both the transcriptome and proteome diversity. The automatic detection of RNA-editing from RNA-seq data is computational intensive and limited to small data sets, thus preventing a reliable genome-wide characterisation of such process. Results: In this work we introduce HPC-REDItools, an upgraded tool for accurate RNA-editing events discovery from large dataset repositories. Availability: https://github.com/BioinfoUNIBA/REDItools2. Conclusions: HPC-REDItools is dramatically faster than the previous version, REDItools, enabling big-data analysis by means of a MPI-based implementation and scaling almost linearly with the number of available cores

    A geostatistical fusion approach using UAV data for probabilistic estimation of Xylella fastidiosa subsp. pauca infection in olive trees

    Get PDF
    Xylella fastidiosa is one of the most destructive plant pathogenic bacteria worldwide, affecting more than 500 plant species. In Apulia region (southeastern Italy), X. fastidiosa subsp. pauca (Xfp) is responsible for a severe disease, the olive quick decline syndrome (OQDS), spreading epidemically and with dramatic impact on the agriculture, the landscape, the tourism, and the cultural heritage of this region. An early detection of the infected plants would hinder the rapid spread of the disease. The main objective of this paper was to define a geostatistical approach of data fusion, which combines remote (radiometric), and proximal (geophysical) sensor data and visual inspections with plant diagnostic tests, to provide probabilistic maps of Xfp infection risk. The study site was an olive grove located at Oria (province of Brindisi, Italy), where at the time of monitoring (September 2017) only few plants showed initial symptoms of the disease. The measurements included: 1) acquisitions of reflected electromagnetic radiation with UAV (Unmanned Aerial Vehicle) equipped with a multi-spectral camera; 2) geophysical surveys on the trunks of 49 plants with Ground Penetrating Radar (GPR); 3) disease severity rating, by visual inspection of the proportion of canopy with symptoms; 4) qPCR (real time-quantitative Polymerase Chain Reaction) data from tests on 61 plants. The data were submitted to a set of processing techniques to define a “data fusion” procedure, based on non-parametric multivariate geostatistics. The approach allowed marking those areas where the risk of infection was higher, and identifying the possible infection entry routes into the field. The probability map of infection risk could be used as an effective tool for a preventive action and for a better organization of the monitoring plans

    ASPicDB: a database of annotated transcript and protein variants generated by alternative splicing

    Get PDF
    Alternative splicing is emerging as a major mechanism for the expansion of the transcriptome and proteome diversity, particularly in human and other vertebrates. However, the proportion of alternative transcripts and proteins actually endowed with functional activity is currently highly debated. We present here a new release of ASPicDB which now provides a unique annotation resource of human protein variants generated by alternative splicing. A total of 256 939 protein variants from 17 191 multi-exon genes have been extensively annotated through state of the art machine learning tools providing information of the protein type (globular and transmembrane), localization, presence of PFAM domains, signal peptides, GPI-anchor propeptides, transmembrane and coiled-coil segments. Furthermore, full-length variants can be now specifically selected based on the annotation of CAGE-tags and polyA signal and/or polyA sites, marking transcription initiation and termination sites, respectively. The retrieval can be carried out at gene, transcript, exon, protein or splice site level allowing the selection of data sets fulfilling one or more features settled by the user. The retrieval interface also enables the selection of protein variants showing specific differences in the annotated features. ASPicDB is available at http://www.caspur.it/ASPicDB/

    Assessment of orthologous splicing isoforms in human and mouse orthologous genes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Recent discoveries have highlighted the fact that alternative splicing and alternative transcripts are the rule, rather than the exception, in metazoan genes. Since multiple transcript and protein variants expressed by the same gene are, by definition, structurally distinct and need not to be functionally equivalent, the concept of gene orthology should be extended to the transcript level in order to describe evolutionary relationships between structurally similar transcript variants. In other words, the identification of true orthology relationships between gene products now should progress beyond primary sequence and "splicing orthology", consisting in ancestrally shared exon-intron structures, is required to define orthologous isoforms at transcript level.</p> <p>Results</p> <p>As a starting step in this direction, in this work we performed a large scale human- mouse gene comparison with a twofold goal: first, to assess if and to which extent traditional gene annotations such as RefSeq capture genuine splicing orthology; second, to provide a more detailed annotation and quantification of true human-mouse orthologous transcripts defined as transcripts of orthologous genes exhibiting the same splicing patterns.</p> <p>Conclusions</p> <p>We observed an identical exon/intron structure for 32% of human and mouse orthologous genes. This figure increases to 87% using less stringent criteria for gene structure similarity, thus implying that for about 13% of the human RefSeq annotated genes (and about 25% of the corresponding transcripts) we could not identify any mouse transcript showing sufficient similarity to be confidently assigned as a splicing ortholog. Our data suggest that current gene and transcript data may still be rather incomplete - with several splicing variants still unknown. The observation that alternative splicing produces large numbers of alternative transcripts and proteins, some of them conserved across species and others truly species-specific, suggests that, still maintaining the conventional definition of gene orthology, a new concept of "splicing orthology" can be defined at transcript level.</p

    EasyCluster: a fast and efficient gene-oriented clustering tool for large-scale transcriptome data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>ESTs and full-length cDNAs represent an invaluable source of evidence for inferring reliable gene structures and discovering potential alternative splicing events. In newly sequenced genomes, these tasks may not be practicable owing to the lack of appropriate training sets. However, when expression data are available, they can be used to build EST clusters related to specific genomic transcribed <it>loci</it>. Common strategies recently employed to this end are based on sequence similarity between transcripts and can lead, in specific conditions, to inconsistent and erroneous clustering. In order to improve the cluster building and facilitate all downstream annotation analyses, we developed a simple genome-based methodology to generate gene-oriented clusters of ESTs when a genomic sequence and a pool of related expressed sequences are provided. Our procedure has been implemented in the software EasyCluster and takes into account the spliced nature of ESTs after an <it>ad hoc </it>genomic mapping.</p> <p>Methods</p> <p>EasyCluster uses the well-known GMAP program in order to perform a very quick EST-to-genome mapping in addition to the detection of reliable splice sites. Given a genomic sequence and a pool of ESTs/FL-cDNAs, EasyCluster starts building genomic and EST local databases and runs GMAP. Subsequently, it parses results creating an initial collection of pseudo-clusters by grouping ESTs according to the overlap of their genomic coordinates on the same strand. In the final step, EasyCluster refines the clustering by again running GMAP on each pseudo-cluster and groups together ESTs sharing at least one splice site.</p> <p>Results</p> <p>The higher accuracy of EasyCluster with respect to other clustering tools has been verified by means of a manually cured benchmark of human EST clusters. Additional datasets including the Unigene cluster Hs.122986 and ESTs related to the human <it>HOXA </it>gene family have also been used to demonstrate the better clustering capability of EasyCluster over current genome-based web service tools such as ASmodeler and BIPASS. EasyCluster has also been used to provide a first compilation of gene-oriented clusters in the <it>Ricinus communis </it>oilseed plant for which no Unigene clusters are yet available, as well as an evaluation of the alternative splicing in this plant species.</p

    Common and Distant Structural Characteristics of Feruloyl Esterase Families from Aspergillus oryzae

    Get PDF
    Background: Feruloyl esterases (FAEs) are important biomass degrading accessory enzymes due to their capability of cleaving the ester links between hemicellulose and pectin to aromatic compounds of lignin, thus enhancing the accessibility of plant tissues to cellulolytic and hemicellulolytic enzymes. FAEs have gained increased attention in the area of biocatalytic transformations for the synthesis of value added compounds with medicinal and nutritional applications. Following the increasing attention on these enzymes, a novel descriptor based classification system has been proposed for FAEs resulting into 12 distinct families and pharmacophore models for three FAE sub-families have been developed. Methodology/Principal Findings: The feruloylome of Aspergillus oryzae contains 13 predicted FAEs belonging to six sub-families based on our recently developed descriptor-based classification system. The three-dimensional structures of the 13 FAEs were modeled for structural analysis of the feruloylome. The three genes coding for three enzymes, viz., A.O.2, A.O.8 and A.O.10 from the feruloylome of A. oryzae, representing sub-families with unknown functional features, were heterologously expressed in Pichia pastoris, characterized for substrate specificity and structural characterization through CD spectroscopy. Common feature-based pharamacophore models were developed according to substrate specificity characteristics of the three enzymes. The active site residues were identified for the three expressed FAEs by determining the titration curves of amino acid residues as a function of the pH by applying molecular simulations. Conclusions/Significance: Our findings on the structure-function relationships and substrate specificity of the FAEs of A. oryzae will be instrumental for further understanding of the FAE families in the novel classification system. The developed pharmacophore models could be applied for virtual screening of compound databases for short listing the putative substrates prior to docking studies or for post-processing docking results to remove false positives. Our study exemplifies how computational predictions can complement to the information obtained through experimental methods. © 2012 Udatha et al.published_or_final_versio

    Model of the complex of Parathyroid hormone-2 receptor and Tuberoinfundibular peptide of 39 residues

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>We aim to propose interactions between the parathyroid hormone-2 receptor (PTH2R) and its ligand the tuberoinfundibular peptide of 39 residues (TIP39) by constructing a homology model of their complex. The two related peptides parathyroid hormone (PTH) and parathyroid hormone related protein (PTHrP) are compared with the complex to examine their interactions.</p> <p>Findings</p> <p>In the model, the hydrophobic N-terminus of TIP39 is buried in a hydrophobic part of the central cavity between helices 3 and 7. Comparison of the peptide sequences indicates that the main discriminator between the agonistic peptides TIP39 and PTH and the inactive PTHrP is a tryptophan-phenylalanine replacement. The model indicates that the smaller phenylalanine in PTHrP does not completely occupy the binding site of the larger tryptophan residue in the other peptides. As only TIP39 causes internalisation of the receptor and the primary difference being an aspartic acid in position 7 of TIP39 that interacts with histidine 396 in the receptor, versus isoleucine/histidine residues in the related hormones, this might be a trigger interaction for the events that cause internalisation.</p> <p>Conclusions</p> <p>A model is constructed for the complex and a trigger interaction for full agonistic activation between aspartic acid 7 of TIP39 and histidine 396 in the receptor is proposed.</p
    corecore