154 research outputs found

    Gene prediction in metagenomic fragments: A large scale machine learning approach

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Metagenomics is an approach to the characterization of microbial genomes via the direct isolation of genomic sequences from the environment without prior cultivation. The amount of metagenomic sequence data is growing fast while computational methods for metagenome analysis are still in their infancy. In contrast to genomic sequences of single species, which can usually be assembled and analyzed by many available methods, a large proportion of metagenome data remains as unassembled anonymous sequencing reads. One of the aims of all metagenomic sequencing projects is the identification of novel genes. Short length, for example, Sanger sequencing yields on average 700 bp fragments, and unknown phylogenetic origin of most fragments require approaches to gene prediction that are different from the currently available methods for genomes of single species. In particular, the large size of metagenomic samples requires fast and accurate methods with small numbers of false positive predictions.</p> <p>Results</p> <p>We introduce a novel gene prediction algorithm for metagenomic fragments based on a two-stage machine learning approach. In the first stage, we use linear discriminants for monocodon usage, dicodon usage and translation initiation sites to extract features from DNA sequences. In the second stage, an artificial neural network combines these features with open reading frame length and fragment GC-content to compute the probability that this open reading frame encodes a protein. This probability is used for the classification and scoring of gene candidates. With large scale training, our method provides fast single fragment predictions with good sensitivity and specificity on artificially fragmented genomic DNA. Additionally, this method is able to predict translation initiation sites accurately and distinguishes complete from incomplete genes with high reliability.</p> <p>Conclusion</p> <p>Large scale machine learning methods are well-suited for gene prediction in metagenomic DNA fragments. In particular, the combination of linear discriminants and neural networks is promising and should be considered for integration into metagenomic analysis pipelines. The data sets can be downloaded from the URL provided (see Availability and requirements section).</p

    Machine Learning Methods for the Analysis of Metagenomes

    Get PDF
    As of October 2020, there are 18.6 × 1015 DNA base pairs publicly available in the Sequence Read Archive and this number is growing at an exponential rate. As DNA sequencing prices continue to drop, many research groups around the world have incorporated high throughput sequencing in their research, giving us access to sequences from many distinct ecosystems. This has revolutionized the field of metagenomics, which aims to fully characterize all organisms and their interactions in a particular system. Nevertheless, the plethora of available data has made its analysis difficult as traditional techniques such as genome assembly or sequence alignment are bound to fail due to the high noise of metagenomes, or take an impractically long time due to their size. Through this thesis, we explore those challenges and develop techniques to meet them. Chapter 1 serves as an introduction to the fields of metagenomics and machine learning and the applications where the two meet. Chapter 2 examines the different kinds of noises in sequencing datasets and presents PRINSEQ++, a C++ multi-threaded software for quality control of sequencing datasets. Chapter 3 describes the analysis of 63 metagenomic samples from children with ”nodding syndrome” using Random Forest to give insights into the etiology of the disease. Chapter 4 explores the use of artificial neutral networks to classify phage structural proteins derived from metagenomes

    Genome of Rhodnius prolixus, an insect vector of Chagas disease, reveals unique adaptations to hematophagy and parasite infection

    Get PDF
    Rhodnius prolixus not only has served as a model organism for the study of insect physiology, but also is a major vector of Chagas disease, an illness that affects approximately seven million people worldwide. We sequenced the genome of R. prolixus, generated assembled sequences covering 95% of the genome (∌702 Mb), including 15,456 putative protein-coding genes, and completed comprehensive genomic analyses of this obligate blood-feeding insect. Although immunedeficiency (IMD)-mediated immune responses were observed, R. prolixus putatively lacks key components of the IMD pathway, suggesting a reorganization of the canonical immune signaling network. Although both Toll and IMD effectors controlled intestinal microbiota, neither affected Trypanosoma cruzi, the causal agent of Chagas disease, implying the existence of evasion or tolerance mechanisms. R. prolixus has experienced an extensive loss of selenoprotein genes, with its repertoire reduced to only two proteins, one of which is a selenocysteine-based glutathione peroxidase, the first found in insects. The genome contained actively transcribed, horizontally transferred genes from Wolbachia sp., which showed evidence of codon use evolution toward the insect use pattern. Comparative protein analyses revealed many lineage-specific expansions and putative gene absences in R. prolixus, including tandem expansions of genes related to chemoreception, feeding, and digestion that possibly contributed to the evolution of a blood-feeding lifestyle. The genome assembly and these associated analyses provide critical information on the physiology and evolution of this important vector species and should be instrumental for the development of innovative disease control methods.La lista completa de autores que integran el documento puede consultarse en el archivoEste documento tiene una correcciĂłn (ver documento relacionado).Centro Regional de Estudios GenĂłmicosInstituto de Investigaciones BioquĂ­micas de La Plat

    Interactions of host miRNAs in the flavivirus 3ÂŽUTR genome:From bioinformatics predictions to practical approaches

    Get PDF
    The genus Flavivirus of the Flaviviridae family includes important viruses, such as Dengue, Zika, West Nile, Japanese encephalitis, Murray Valley encephalitis, tick-borne encephalitis, Yellow fever, Saint Louis encephalitis, and Usutu viruses. They are transmitted by mosquitoes or ticks, and they can infect humans, causing fever, encephalitis, or haemorrhagic fever. The treatment resources for these diseases and the number of vaccines available are limited. It has been discovered that eukaryotic cells synthesize small RNA molecules that can bind specifically to sequences present in messenger RNAs to inhibit the translation process, thus regulating gene expression. These small RNAs have been named microRNAs, and they have an important impact on viral infections. In this review, we compiled the available information on miRNAs that can interact with the 3’ untranslated region (3’UTR) of the flavivirus genome, a conserved region that is important for viral replication and translation

    Genome-wide association study of sleep in Drosophila melanogaster

    Get PDF
    BACKGROUND: Sleep is a highly conserved behavior, yet its duration and pattern vary extensively among species and between individuals within species. The genetic basis of natural variation in sleep remains unknown. RESULTS: We used the Drosophila Genetic Reference Panel (DGRP) to perform a genome-wide association (GWA) study of sleep in D. melanogaster. We identified candidate single nucleotide polymorphisms (SNPs) associated with differences in the mean as well as the environmental sensitivity of sleep traits; these SNPs typically had sex-specific or sex-biased effects, and were generally located in non-coding regions. The majority of SNPs (80.3%) affecting sleep were at low frequency and had moderately large effects. Additive models incorporating multiple SNPs explained as much as 55% of the genetic variance for sleep in males and females. Many of these loci are known to interact physically and/or genetically, enabling us to place them in candidate genetic networks. We confirmed the role of seven novel loci on sleep using insertional mutagenesis and RNA interference. CONCLUSIONS: We identified many SNPs in novel loci that are potentially associated with natural variation in sleep, as well as SNPs within genes previously known to affect Drosophila sleep. Several of the candidate genes have human homologues that were identified in studies of human sleep, suggesting that genes affecting variation in sleep are conserved across species. Our discovery of genetic variants that influence environmental sensitivity to sleep may have a wider application to all GWA studies, because individuals with highly plastic genotypes will not have consistent phenotypes

    A Systems Genetics Approach to Drosophila melanogaster Models of Rare and Common Neurodevelopmental Disorders

    Get PDF
    Fetal Alcohol Spectrum Disorders are a group of disorders resulting from prenatal alcohol exposure, presenting with neurodevelopmental and facial abnormalities of varying severity. SSRIDDs and CdLS are rare disorders of chromatin modification, resulting in patients with a wide range of craniofacial, digit and/or neurodevelopmental abnormalities. All of these disorders have a wide range of clinical phenotypes and disease severity, yet the role of potential genetic modifiers and gene-gene or gene-environment interactions in disease pathogenesis is largely unknown and cannot be studied in humans. Insufficient numbers of patients with a single rare disorder prevent investigation of genetic factors beyond the focal disease-associated variant, while experimental study of the more common FASD using human subjects is prohibited due to ethical constraints. Drosophila melanogaster is an excellent model system for neurodevelopmental disorders, as Drosophila neurobiology is largely conserved in humans and experiments performed in Drosophila are low-cost, easily controlled, and exempt from regulation. Here, we take advantage of the Drosophila model system and identify genetic factors contributing to these neurodevelopmental disorders. Specifically, we used the Drosophila Genetic Reference Panel (DGRP) of inbred lines with full genome sequences and single cell RNA sequencing to identify genetic networks in adult Drosophila after developmental ethanol exposure and demonstrate that changes in sleep, activity, and time to sedation as a result of the developmental ethanol exposure are dependent on genetic background. We also developed a novel assay measuring time to ethanol-induced sedation of individual flies to better assess this phenotype in our research and characterized a previously unstudied long noncoding RNA critical for Drosophila fitness and stress-response. We then established Drosophila models for multiple SSRIDD and CdLS subtypes and determined the extent to which behavioral and transcriptomic phenotypes vary within and across these rare disorders. Finally, we used SSRIDD Drosophila models to present evidence for the role of genetic modifiers in ARID1B-associated SSRIDD and identify candidate genetic modifiers for multiple SSRIDD subtypes. Taken together, these results show that the Drosophila model system is a powerful tool for investigating the genetic underpinnings of both rare and common neurodevelopmental disorders that cannot be currently identified using human populations

    Plant-parasitic nematodes: from genomics to functional analysis of parasitism genes

    Get PDF
    Nematodes (roundworms) belong to the largest phylum on earth. The numerous species inhabit practically all ecological niches, including plants. Plant-parasitic species live on plant roots, causing substantial damage to the plant and hampering its development. As such, they cause gigantic economical losses in crop production. We used a molecular approach to analyze the plant-parasitic nematode Radopholus similis by generating expressed sequence tags (ESTs). The most striking discovery was tags corresponding to aWolbachia-like endosymbiont, which was subsequently located in the ovaria of R. similis. Numerous tags corresponding to parasitism genes with potential roles in, amongst other things, host localisation, detoxification, cell wall modification, and even putative host transcriptional reprogramming were identified. In addition, a tool to explore all available nematode EST data is presented in this study. The ‘nematode EST exploration tool’ (NEXT) (http://zion.ugent.be/joachim/next) extends the usefulness by extracting and storing temporal and spatial information of all publicly available nematode EST libraries. Some members of the transthyretin-like gene family of R. similis were characterized. All stages except developing embryos express the analyzed genes, and expression is localized to the ventral nerve cord and tissues surrounding the vulva. Predicted secondary structure is suggestive of a binding capacity with a yet unknown ligand. Further, the annotation of the complete mitochondrial (mt) genome of R. similis is reported. The mt genome has the expected gene content, but shows many aberrant features such as: a considerably smaller 16S rRNA with reduced structures, two large repeat regions, the lack of stop codons on many genes and a unique codon reassignment UAA:Stop to UAA:Tyrosine. The aberrant features in the mt genome could be related to this codon reassignment, but results are ambiguous and require further research. A last part of the study reports on the response of the plant on nematode infection. Signaling of two plant hormones involved in plant defense is measured during early phases of parasitism. In addition, the role of flavonoid compounds produced by the plant is analyzed by infection tests on several mutants

    The Eternal Network:The Ends and Becomings of Network Culture

    Get PDF

    The Eternal Network:The Ends and Becomings of Network Culture

    Get PDF
    • 

    corecore