1,074 research outputs found

    LTR-retrotransposons in R. exoculata and other crustaceans

    Get PDF
    Transposable elements are major constituents of eukaryote genomes and have a great impact on genome structure and stability. They can contribute to the genetic diversity and evolution of organisms. Knowledge of their distribution among several genomes is an essential condition to study their dynamics and to better understand their role in species evolution. LTR-retrotransposons have been reported in many diverse eukaryote species, describing a ubiquitous distribution. Given their abundance, diversity and their extended ranges in C-values, environment and life styles, crustaceans are a great taxon to investigate the genomic component of adaptation and its possible relationships with TEs. However, crustaceans have been greatly underrepresented in transposable element studies. Using both degenerate PCR and in silico approaches, we have identified 35 Copia and 46 Gypsy families in 15 and 18 crustacean species, respectively. In particular, we characterized several full-length elements from the shrimp Rimicaris exoculata that is listed as a model organism from hydrothermal vents. Phylogenic analyses show that Copia and Gypsy retrotransposons likely present two opposite dynamics within crustaceans. The Gypsy elements appear relatively frequent and diverse whereas Copia are much more homogeneous, as 29 of them belong to the single GalEa clade, and species- or lineage-dependent. Our results also support the hypothesis of the Copia retrotransposon scarcity in metazoans compared to Gypsy elements. In such a context, the GalEa-like elements present an outstanding wide distribution among eukaryotes, from fishes to red algae, and can be even highly predominant within a large taxon, such as Malacostraca. Their distribution among crustaceans suggests a dynamics that follows a "domino days spreading" branching process in which successive amplifications may interact positively

    PyEvolve: a toolkit for statistical modelling of molecular evolution

    No full text
    BACKGROUND: Examining the distribution of variation has proven an extremely profitable technique in the effort to identify sequences of biological significance. Most approaches in the field, however, evaluate only the conserved portions of sequences ā€“ ignoring the biological significance of sequence differences. A suite of sophisticated likelihood based statistical models from the field of molecular evolution provides the basis for extracting the information from the full distribution of sequence variation. The number of different problems to which phylogeny-based maximum likelihood calculations can be applied is extensive. Available software packages that can perform likelihood calculations suffer from a lack of flexibility and scalability, or employ error-prone approaches to model parameterisation. RESULTS: Here we describe the implementation of PyEvolve, a toolkit for the application of existing, and development of new, statistical methods for molecular evolution. We present the object architecture and design schema of PyEvolve, which includes an adaptable multi-level parallelisation schema. The approach for defining new methods is illustrated by implementing a novel dinucleotide model of substitution that includes a parameter for mutation of methylated CpG's, which required 8 lines of standard Python code to define. Benchmarking was performed using either a dinucleotide or codon substitution model applied to an alignment of BRCA1 sequences from 20 mammals, or a 10 species subset. Up to five-fold parallel performance gains over serial were recorded. Compared to leading alternative software, PyEvolve exhibited significantly better real world performance for parameter rich models with a large data set, reducing the time required for optimisation from ~10 days to ~6 hours. CONCLUSION: PyEvolve provides flexible functionality that can be used either for statistical modelling of molecular evolution, or the development of new methods in the field. The toolkit can be used interactively or by writing and executing scripts. The toolkit uses efficient processes for specifying the parameterisation of statistical models, and implements numerous optimisations that make highly parameter rich likelihood functions solvable within hours on multi-cpu hardware. PyEvolve can be readily adapted in response to changing computational demands and hardware configurations to maximise performance. PyEvolve is released under the GPL and can be downloaded from http://cbis.anu.edu.au/software webcite

    The Origins of Novel Protein Interactions during Animal Opsin Evolution

    Get PDF
    Background. Biologists are gaining an increased understanding of the genetic bases of phenotypic change during evolution. Nevertheless, the origins of phenotypes mediated by novel protein-protein interactions remain largely undocumented. Methodology/Principle Findings. Here we analyze the evolution of opsin visual pigment proteins from the genomes of early branching animals, including a new class of opsins from Cnidaria. We combine these data with existing knowledge of the molecular basis of opsin function in a rigorous phylogenetic framework. We identify adaptive amino acid substitutions in duplicated opsin genes that correlate with a diversification of physiological pathways mediated by different protein-protein interactions. Conclusions/Significance. This study documents how gene duplication events early in the history of animals followed by adaptive structural mutations increased organismal complexity by adding novel protein-protein interactions that underlie different physiological pathways. These pathways are central to vision and other photo-reactive phenotypes in most extant animals. Similar evolutionary processes may have been a work in generating other metazoan sensory systems and other physiological processes mediated by signal transduction

    Lateral Gene Transfer Drives Metabolic Flexibility in the Anaerobic Methane-Oxidizing Archaeal Family Methanoperedenaceae

    Get PDF
    Anaerobic oxidation of methane (AOM) is an important biological process responsible for controlling the flux of methane into the atmosphere. Members of the archaeal family Methanoperedenaceae (formerly ANME-2d) have been demonstrated to couple AOM to the reduction of nitrate, iron, and manganese. Here, comparative genomic analysis of 16 Methanoperedenaceace metagenome-assembled genomes (MAGs), recovered from diverse environments, revealed novel respiratory strategies acquired through lateral gene transfer (LGT) events from diverse archaea and bacteria. Comprehensive phylogenetic analyses suggests that LGT has allowed members of the Methanoperedenaceae to acquire genes for the oxidation of hydrogen and formate, and the reduction of arsenate, selenate and elemental sulfur. Numerous membrane-bound multi-heme c type cytochrome complexes also appear to have been laterally acquired, which may be involved in the direct transfer of electrons to metal oxides, humics and syntrophic partners

    Measuring reproducibility of high-throughput experiments

    Full text link
    Reproducibility is essential to reliable scientific discovery in high-throughput experiments. In this work we propose a unified approach to measure the reproducibility of findings identified from replicate experiments and identify putative discoveries using reproducibility. Unlike the usual scalar measures of reproducibility, our approach creates a curve, which quantitatively assesses when the findings are no longer consistent across replicates. Our curve is fitted by a copula mixture model, from which we derive a quantitative reproducibility score, which we call the "irreproducible discovery rate" (IDR) analogous to the FDR. This score can be computed at each set of paired replicate ranks and permits the principled setting of thresholds both for assessing reproducibility and combining replicates. Since our approach permits an arbitrary scale for each replicate, it provides useful descriptive measures in a wide variety of situations to be explored. We study the performance of the algorithm using simulations and give a heuristic analysis of its theoretical properties. We demonstrate the effectiveness of our method in a ChIP-seq experiment.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS466 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Nuclear export of single native mRNA molecules observed via light sheet fluorescence microscopy and transcriptional regulation of BR2.1 during heat-shock

    Get PDF
    Eucaryotes store most of their genetic information in the nucleus. Parts of this information encode the amino acid sequence of proteins. To synthesize a protein according to the nucleotide sequence, first the corresponding DNA-sequence is transcribed by RNA-Polymerase II to mRNA. Subsequently ribosomes translate the mRNA into the correct amino acid sequence. In eucaryotes the ribosomes are localized in the cytoplasm and are separated from the nucleus by the nuclear envelope. On the one hand separation of transcription and translation enables eucaryotes to process the transcript post-transcriptionally, on the other it requires a transport of the mRNA from the nucleoplasm into the cytoplasm. The nucleoplasm is interconnected with the cytoplasm by nuclear pore complexes. Most of the nucleo-cytoplasmic trafficking is facilitated through the nuclear pore complexes. Messenger RNA is exported into the cytoplasm through the nuclear pore complexes, too. During transcription the nascent mRNA is bound by several proteins which are essential e.g. for mRNA processing and export. The complex of the mRNA and its associated proteins is called an mRNP-particle. Fully processed mRNP-particles are able to cross the permeability barrier of nuclear pore complexes. In this thesis the kinetics of the mRNA-export were measured in salivary gland cells of C. tentans at the single molecule level. Therefore, mRNA was labeled by Hrp36, which was bacterially expressed and subsequently covalently linked to a fluorescent dye. Hrp36 associates cotranscriptionally with the nascent mRNA and is part of the mRNP-particle. After microinjection, labeled Hrp36 is transported into the nucleus, via its endogenous M9-shuttle domain. As all mRNP-particles, also the labeled ones, diffuse through the nucleus after transcription is finished and can be imaged by advanced fluorescence microscopy. In this thesis it is shown that the kinetics of the mRNA-export across the nuclear prore complexes follow a broad distribution in the range of 20ms to seconds. Furthermore, only 30% of all mRNP-particles are exported after they engaged an NPC. Fitting the mRNA-export kinetics with a bimodal gamma distribution revealed average export times of t1exp = 76ms, which is governed by multiple rate limiting steps and t2exp = 158ms, which is governed by just a single rate limiting step. Therefore, the translocation of the mRNA across the nuclear pore complex is not rate limiting for protein-biosynthesis which takes on average several minutes. Trajectory analysis of export events =300ms, showed that the mRNA were localized mainly in the nuclear basket during the export process. Here proteins are localized which are crucial for the mRNP-particle quality control. These proteins bind mRNP-particles, which are only partially processed, and thereby inhibit their translocation through the nuclear pore complex until their processing is completed. Assuming that the general reaction scheme is the same for all mRNP-particles and considering the fact that these slow export events show only a single rate limiting reactions step, this export events presumably correspond to mRNP-particles, whose processing were not finished. In addition to the mRNP-particle export kinetics, the Dbp5 interaction kinetics with the nuclear pore complexes were measured. Dbp5isaRNA-helicase, which is essential form RNP- particle export. It is assumed that Dbp5 removes the transport receptors from the mRNA via its helicase activity and thereby inhibit the translocation of mRNA back into the nucleus. The interaction kinetics of Dbp5 showed two interaction times (t1Dbp5 200Hz = 26ms & t2Dbp5 200Hz = 240ms). Due to the low number of observations, the interaction times gained by fitting the data with a bimodal gamma distribution showed a high uncertainty This makes a comparison of this results with the observed mRNA-export kinetics not advisable. In the second part of the thesis a so far unknown regulation mechanism of transcription was studied. First hints to this mechanism were observed by a control experiment during the examination of the mRNA-export kinetics. Transcription can be subdivided into the four stages of initiation, early elongation, stable elongation and termination. It was previously believed that after transition into stable elongation the transcription process is either completed or terminated prematurely. The results of this thesis give evidence that the transcription process in salivary gland cells of C. tentans can be halted temporally at the stage of stable elongation by applying a heat-shock to the larvae. The halted transcription processes can be resumed after heat-shock is released. Since RNA-polymerase II is highly conserved throughout eucaryotes, it seems very likely that this regulatory mechanism is not limited to C. tentans . The transcription halt during stable elongation described here, shows that eucaryotes have a more direct and far-ranging access to transcription as believed. This direct control of transcription significantly increases the temporal dynamic of transcriptional regulation

    A flexible integrative approach based on random forest improves prediction of transcription factor binding sites

    Get PDF
    Transcription factor binding sites (TFBSs) are DNA sequences of 6-15 base pairs. Interaction of these TFBSs with transcription factors (TFs) is largely responsible for most spatiotemporal gene expression patterns. Here, we evaluate to what extent sequence-based prediction of TFBSs can be improved by taking into account the positional dependencies of nucleotides (NPDs) and the nucleotide sequence-dependent structure of DNA. We make use of the random forest algorithm to flexibly exploit both types of information. Results in this study show that both the structural method and the NPD method can be valuable for the prediction of TFBSs. Moreover, their predictive values seem to be complementary, even to the widely used position weight matrix (PWM) method. This led us to combine all three methods. Results obtained for five eukaryotic TFs with different DNA-binding domains show that our method improves classification accuracy for all five eukaryotic TFs compared with other approaches. Additionally, we contrast the results of seven smaller prokaryotic sets with high-quality data and show that with the use of high-quality data we can significantly improve prediction performance. Models developed in this study can be of great use for gaining insight into the mechanisms of TF binding
    • ā€¦
    corecore