2,073 research outputs found

    SearchSmallRNA: a graphical interface tool for the assemblage of viral genomes using small RNA libraries data

    Get PDF
    BACKGROUND: Next-generation parallel sequencing (NGS) allows the identification of viral pathogens by sequencing the small RNAs of infected hosts. Thus, viral genomes may be assembled from host immune response products without prior virus enrichment, amplification or purification. However, mapping of the vast information obtained presents a bioinformatics challenge. METHODS: In order to by pass the need of line command and basic bioinformatics knowledge, we develop a mapping software with a graphical interface to the assemblage of viral genomes from small RNA dataset obtained by NGS. SearchSmallRNA was developed in JAVA language version 7 using NetBeans IDE 7.1 software. The program also allows the analysis of the viral small interfering RNAs (vsRNAs) profile; providing an overview of the size distribution and other features of the vsRNAs produced in infected cells. RESULTS: The program performs comparisons between each read sequenced present in a library and a chosen reference genome. Reads showing Hamming distances smaller or equal to an allowed mismatched will be selected as positives and used to the assemblage of a long nucleotide genome sequence. In order to validate the software, distinct analysis using NGS dataset obtained from HIV and two plant viruses were used to reconstruct viral whole genomes. CONCLUSIONS: SearchSmallRNA program was able to reconstructed viral genomes using NGS of small RNA dataset with high degree of reliability so it will be a valuable tool for viruses sequencing and discovery. It is accessible and free to all research communities and has the advantage to have an easy-to-use graphical interface. AVAILABILITY AND IMPLEMENTATION: SearchSmallRNA was written in Java and is freely available at http://www.microbiologia.ufrj.br/ssrna/

    Using Bayesian Networks and Machine Learning to Predict Computer Science Success

    Get PDF
    Bayesian Networks and Machine Learning techniques were evaluated and compared for predicting academic performance of Computer Science students at the University of Cape Town. Bayesian Networks performed similarly to other classification models. The causal links AQ1 inherent in Bayesian Networks allow for understanding of the contributing factors for academic success in this field. The most effective indicators of success in first-year ‘core’ courses in Computer Science included the student’s scores for Mathematics and Physics as well as their aptitude for learning and their work ethos. It was found that unsuccessful students could be identified with ≈91% accuracy. This could help to increase throughput as well as student wellbeing at university

    Profile of small interfering RNAs from cotton plants infected with the polerovirus Cotton leafroll dwarf virus

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In response to infection, viral genomes are processed by Dicer-like (DCL) ribonuclease proteins into viral small RNAs (vsRNAs) of discrete sizes. vsRNAs are then used as guides for silencing the viral genome. The profile of vsRNAs produced during the infection process has been extensively studied for some groups of viruses. However, nothing is known about the vsRNAs produced during infections of members of the economically important family <it>Luteoviridae</it>, a group of phloem-restricted viruses. Here, we report the characterization of a population of vsRNAs from cotton plants infected with Cotton leafroll dwarf virus (CLRDV), a member of the genus <it>Polerovirus</it>, family <it>Luteoviridae</it>.</p> <p>Results</p> <p>Deep sequencing of small RNAs (sRNAs) from leaves of CLRDV-infected cotton plants revealed that the vsRNAs were 21- to 24-nucleotides (nt) long and that their sequences matched the viral genome, with higher frequencies of matches in the 3- region. There were equivalent amounts of sense and antisense vsRNAs, and the 22-nt class of small RNAs was predominant. During infection, cotton <it>Dcl </it>transcripts appeared to be up-regulated, while Dcl2 appeared to be down-regulated.</p> <p>Conclusions</p> <p>This is the first report on the profile of sRNAs in a plant infected with a virus from the family <it>Luteoviridae</it>. Our sequence data strongly suggest that virus-derived double-stranded RNA functions as one of the main precursors of vsRNAs. Judging by the profiled size classes, all cotton DCLs might be working to silence the virus. The possible causes for the unexpectedly high accumulation of 22-nt vsRNAs are discussed. CLRDV is the causal agent of Cotton blue disease, which occurs worldwide. Our results are an important contribution for understanding the molecular mechanisms involved in this and related diseases.</p

    Genomic-Bioinformatic Analysis of Transcripts Enriched in the Third-Stage Larva of the Parasitic Nematode Ascaris suum

    Get PDF
    Differential transcription in Ascaris suum was investigated using a genomic-bioinformatic approach. A cDNA archive enriched for molecules in the infective third-stage larva (L3) of A. suum was constructed by suppressive-subtractive hybridization (SSH), and a subset of cDNAs from 3075 clones subjected to microarray analysis using cDNA probes derived from RNA from different developmental stages of A. suum. The cDNAs (n = 498) shown by microarray analysis to be enriched in the L3 were sequenced and subjected to bioinformatic analyses using a semi-automated pipeline (ESTExplorer). Using gene ontology (GO), 235 of these molecules were assigned to ‘biological process’ (n = 68), ‘cellular component’ (n = 50), or ‘molecular function’ (n = 117). Of the 91 clusters assembled, 56 molecules (61.5%) had homologues/orthologues in the free-living nematodes Caenorhabditis elegans and C. briggsae and/or other organisms, whereas 35 (38.5%) had no significant similarity to any sequences available in current gene databases. Transcripts encoding protein kinases, protein phosphatases (and their precursors), and enolases were abundantly represented in the L3 of A. suum, as were molecules involved in cellular processes, such as ubiquitination and proteasome function, gene transcription, protein–protein interactions, and function. In silico analyses inferred the C. elegans orthologues/homologues (n = 50) to be involved in apoptosis and insulin signaling (2%), ATP synthesis (2%), carbon metabolism (6%), fatty acid biosynthesis (2%), gap junction (2%), glucose metabolism (6%), or porphyrin metabolism (2%), although 34 (68%) of them could not be mapped to a specific metabolic pathway. Small numbers of these 50 molecules were predicted to be secreted (10%), anchored (2%), and/or transmembrane (12%) proteins. Functionally, 17 (34%) of them were predicted to be associated with (non-wild-type) RNAi phenotypes in C. elegans, the majority being embryonic lethality (Emb) (13 types; 58.8%), larval arrest (Lva) (23.5%) and larval lethality (Lvl) (47%). A genetic interaction network was predicted for these 17 C. elegans orthologues, revealing highly significant interactions for nine molecules associated with embryonic and larval development (66.9%), information storage and processing (5.1%), cellular processing and signaling (15.2%), metabolism (6.1%), and unknown function (6.7%). The potential roles of these molecules in development are discussed in relation to the known roles of their homologues/orthologues in C. elegans and some other nematodes. The results of the present study provide a basis for future functional genomic studies to elucidate molecular aspects governing larval developmental processes in A. suum and/or the transition to parasitism

    Detecting Network Communities: An Application to Phylogenetic Analysis

    Get PDF
    This paper proposes a new method to identify communities in generally weighted complex networks and apply it to phylogenetic analysis. In this case, weights correspond to the similarity indexes among protein sequences, which can be used for network construction so that the network structure can be analyzed to recover phylogenetically useful information from its properties. The analyses discussed here are mainly based on the modular character of protein similarity networks, explored through the Newman-Girvan algorithm, with the help of the neighborhood matrix . The most relevant networks are found when the network topology changes abruptly revealing distinct modules related to the sets of organisms to which the proteins belong. Sound biological information can be retrieved by the computational routines used in the network approach, without using biological assumptions other than those incorporated by BLAST. Usually, all the main bacterial phyla and, in some cases, also some bacterial classes corresponded totally (100%) or to a great extent (>70%) to the modules. We checked for internal consistency in the obtained results, and we scored close to 84% of matches for community pertinence when comparisons between the results were performed. To illustrate how to use the network-based method, we employed data for enzymes involved in the chitin metabolic pathway that are present in more than 100 organisms from an original data set containing 1,695 organisms, downloaded from GenBank on May 19, 2007. A preliminary comparison between the outcomes of the network-based method and the results of methods based on Bayesian, distance, likelihood, and parsimony criteria suggests that the former is as reliable as these commonly used methods. We conclude that the network-based method can be used as a powerful tool for retrieving modularity information from weighted networks, which is useful for phylogenetic analysis
    corecore