Search CORE

9,267 research outputs found

Organization and evolution of information within eukaryotic genomes.

Author: Links Matthew Graham
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2007
Field of study

Back-translation for discovering distant protein homologies

Author: A. Pedersen
B. Oostra
C. Kosiol
J. Leluk
J. Leluk
J. Raes
K. Okamura
L. Arvestad
L. Delaye
M. Clamp
M. Pellegrini
P. Harrison
P. Lio
R. Blake
S. Altschul
S. Altschul
S. Altschul
Y. Hahn
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Frameshift mutations in protein-coding DNA sequences produce a drastic change in the resulting protein sequence, which prevents classic protein alignment methods from revealing the proteins' common origin. Moreover, when a large number of substitutions are additionally involved in the divergence, the homology detection becomes difficult even at the DNA level. To cope with this situation, we propose a novel method to infer distant homology relations of two proteins, that accounts for frameshift and point mutations that may have affected the coding sequences. We design a dynamic programming alignment algorithm over memory-efficient graph representations of the complete set of putative DNA sequences of each protein, with the goal of determining the two putative DNA sequences which have the best scoring alignment under a powerful scoring system designed to reflect the most probable evolutionary process. This allows us to uncover evolutionary information that is not captured by traditional alignment methods, which is confirmed by biologically significant examples.Comment: The 9th International Workshop in Algorithms in Bioinformatics (WABI), Philadelphia : \'Etats-Unis d'Am\'erique (2009

arXiv.org e-Print Archive

CiteSeerX

HAL - Lille 3

Crossref

INRIA a CCSD electronic archive server

Investigation of the length distributions of coding and noncoding sequences in relation to gene architecture, function, and expression

Author: Caldwell Rachel Amber
Publication venue: School of Biological Sciences and School of Mathematics and Applied Statistics
Publication date: 01/01/2015
Field of study

The last 20 years has seen the birth of bioinformatics, and is defined as the combination of mathematics, biology, and computational approaches. This discipline has led to the era of ontology, extensive databases including sequences, structures, expression profiles, and genomes and database cross-referencing, (Ouzounis, 2012). Before this discipline, scientists referenced atlas books, such as Margret Dayhoff’s protein sequence collection (Strasser, 2010) which required long hours of letter counting. Through the development of sequencing technology over the past forty years, a tremendous amount of genomic sequencing data has already been collected. With a surge of such data increasing, so does the challenges of data organisation, accessibility and interpretation, with interpretation being the most challenging (Ouzounis, 2012)

Research Online

Computational Analysis of High-Replicate RNA-seq Data in Saccharomyces cerevisiae:Searching for New Genomic Features

Author: Copeland Nancy Giang
Publication venue
Publication date: 01/01/2018
Field of study

University of Dundee Online Publications

Alignment and analysis of noncoding DNA sequences in Drosophila

Author: Wang Jun, Ph.D
Publication venue: The University of Edinburgh
Publication date: 01/01/2010
Field of study

Edinburgh Research Archive

Testing the utility of DNA barcoding on dipteria of eThekwini.

Author: Duze Sanelisiwe Thinasonke.
Publication venue
Publication date: 01/01/2016
Field of study

Master of Science in Genetics. University of KwaZulu-Natal. Durban, 2016.Abstract available in PDF file

ResearchSpace@UKZN

Identification of putative nuclear receptors and steroidogenic enzymes in Murray-Darling rainbowfish (Melanotaenia fluviatilis) using RNA-Seq and de novo transcriptome assembly

Author: Bain Peter A.
Kumar Anupama
Papanicolaou Alexie (R18102)
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2015
Field of study

Murray-Darling rainbowfish (Melanotaenia fluviatilis [Castelnau, 1878]; Atheriniformes: Melanotaeniidae) is a small-bodied teleost currently under development in Australasia as a test species for aquatic toxicological studies. To date, efforts towards the development of molecular biomarkers of contaminant exposure have been hindered by the lack of available sequence data. To address this, we sequenced messenger RNA from brain, liver and gonads of mature male and female fish and generated a high-quality draft transcriptome using a de novo assembly approach. 149,742 clusters of putative transcripts were obtained, encompassing 43,841 non-redundant protein-coding regions. Deduced amino acid sequences were annotated by functional inference based on similarity with sequences from manually curated protein sequence databases. The draft assembly contained protein-coding regions homologous to 95.7% of the complete cohort of predicted proteins from the taxonomically related species, Oryzias latipes (Japanese medaka). The mean length of rainbowfish protein-coding sequences relative to their medaka homologues was 92.1%, indicating that despite the limited number of tissues sampled a large proportion of the total expected number of protein-coding genes was captured in the study. Because of our interest in the effects of environmental contaminants on endocrine pathways, we manually curated subsets of coding regions for putative nuclear receptors and steroidogenic enzymes in the rainbowfish transcriptome, revealing 61 candidate nuclear receptors encompassing all known subfamilies, and 41 putative steroidogenic enzymes representing all major steroidogenic enzymes occurring in teleosts. The transcriptome presented here will be a valuable resource for researchers interested in biomarker development, protein structure and function, and contaminant-response genomics in Murray-Darling rainbowfish

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

Western Sydney ResearchDirect

Model-based probe set optimization for high-performance microarrays

Author: Bernal
Bernhart
Blencowe
Bozdech
Brown
Carninci
Charbonnier
Chen
Chou
D. P. Kreil
Dudley
Fotin
G. G. Leparc
G. Striedner
Gao
Gordon
Griffith
Gunderson
Hofacker
Horak
Hu
I. L. Hofacker
K. Bayer
Kakuhata
Kane
Kreil
Lander
Lee
Li
Li
Li
Luebke
Marko
Mathews
Mrowka
Nadon
Nielsen
P. Sykacek
Pinkel
Rahmann
Ratushna
Relogio
Reymond
Rouillard
Saidi
SantaLucia
Santalucia
SantaLucia
T. Tuchler
Tolstrup
Wang
Wernersson
Xu
Yelin
Zuker
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

A major challenge in microarray design is the selection of highly specific oligonucleotide probes for all targeted genes of interest, while maintaining thermodynamic uniformity at the hybridization temperature. We introduce a novel microarray design framework (Thermodynamic Model-based Oligo Design Optimizer, TherMODO) that for the first time incorporates a number of advanced modelling features: (i) A model of position-dependent labelling effects that is quantitatively derived from experiment. (ii) Multi-state thermodynamic hybridization models of probe binding behaviour, including potential cross-hybridization reactions. (iii) A fast calibrated sequence-similarity-based heuristic for cross-hybridization prediction supporting large-scale designs. (iv) A novel compound score formulation for the integrated assessment of multiple probe design objectives. In contrast to a greedy search for probes meeting parameter thresholds, this approach permits an optimization at the probe set level and facilitates the selection of highly specific probe candidates while maintaining probe set uniformity. (v) Lastly, a flexible target grouping structure allows easy adaptation of the pipeline to a variety of microarray application scenarios. The algorithm and features are discussed and demonstrated on actual design runs. Source code is available on request

Crossref

PubMed Central

Permanent Hosting, Archiving and Indexing of Digital Resources and Assets

Warwick Research Archives Portal Repository

Population-level transcriptome sequencing of nonmodel organisms Erynnis propertius and Papilio zelicaon

Author: Carmichael Rory D
Dzurisin Jason DK
Emrich Scott J
Hellmann Jessica J
Lobo Neil F
O'Neil Shawn T
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Several recent studies have demonstrated the use of Roche 454 sequencing technology for <it>de novo </it>transcriptome analysis. Low error rates and high coverage also allow for effective SNP discovery and genetic diversity estimates. However, genetically diverse datasets, such as those sourced from natural populations, pose challenges for assembly programs and subsequent analysis. Further, estimating the effectiveness of transcript discovery using Roche 454 transcriptome data is still a difficult task. Results Using the Roche 454 FLX Titanium platform, we sequenced and assembled larval transcriptomes for two butterfly species: the Propertius duskywing, <it>Erynnis propertius </it>(Lepidoptera: Hesperiidae) and the Anise swallowtail, <it>Papilio zelicaon </it>(Lepidoptera: Papilionidae). The Expressed Sequence Tags (ESTs) generated represent a diverse sample drawn from multiple populations, developmental stages, and stress treatments. Despite this diversity, > 95% of the ESTs assembled into long (> 714 bp on average) and highly covered (> 9.6× on average) contigs. To estimate the effectiveness of transcript discovery, we compared the number of bases in the hit region of unigenes (contigs and singletons) to the length of the best match silkworm (<it>Bombyx mori</it>) protein--this "ortholog hit ratio" gives a close estimate on the amount of the transcript discovered relative to a model lepidopteran genome. For each species, we tested two assembly programs and two parameter sets; although CAP3 is commonly used for such data, the assemblies produced by Celera Assembler with modified parameters were chosen over those produced by CAP3 based on contig and singleton counts as well as ortholog hit ratio analysis. In the final assemblies, 1,413 <it>E. propertius </it>and 1,940 <it>P. zelicaon </it>unigenes had a ratio > 0.8; 2,866 <it>E. propertius </it>and 4,015 <it>P. zelicaon </it>unigenes had a ratio > 0.5. Conclusions Ultimately, these assemblies and SNP data will be used to generate microarrays for ecoinformatics examining climate change tolerance of different natural populations. These studies will benefit from high quality assemblies with few singletons (less than 26% of bases for each assembled transcriptome are present in unassembled singleton ESTs) and effective transcript discovery (over 6,500 of our putative orthologs cover at least 50% of the corresponding model silkworm gene).</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central