Search CORE

53,279 research outputs found

Predicting protein function with hierarchical phylogenetic profiles: The Gene3D phylo-tuner method applied to eukaryotic Genomes

Author: Grant A
Orengo CA
Ranea JAG
Yeats C
Publication venue: PUBLIC LIBRARY SCIENCE
Publication date: 30/11/2007
Field of study

"Phylogenetic profiling'' is based on the hypothesis that during evolution functionally or physically interacting genes are likely to be inherited or eliminated in a codependent manner. Creating presence-absence profiles of orthologous genes is now a common and powerful way of identifying functionally associated genes. In this approach, correctly determining orthology, as a means of identifying functional equivalence between two genes, is a critical and nontrivial step and largely explains why previous work in this area has mainly focused on using presence-absence profiles in prokaryotic species. Here, we demonstrate that eukaryotic genomes have a high proportion of multigene families whose phylogenetic profile distributions are poor in presence-absence information content. This feature makes them prone to orthology mis-assignment and unsuited to standard profile-based prediction methods. Using CATH structural domain assignments from the Gene3D database for 13 complete eukaryotic genomes, we have developed a novel modification of the phylogenetic profiling method that uses genome copy number of each domain superfamily to predict functional relationships. In our approach, superfamilies are subclustered at ten levels of sequence identity from 30% to 100% - and phylogenetic profiles built at each level. All the profiles are compared using normalised Euclidean distances to identify those with correlated changes in their domain copy number. We demonstrate that two protein families will "auto-tune'' with strong co-evolutionary signals when their profiles are compared at the similarity levels that capture their functional relationship. Our method finds functional relationships that are not detectable by the conventional presence - absence profile comparisons, and it does not require a priori any fixed criteria to define orthologous genes

UCL Discovery

Use of RNA secondary structure for evolutionary relationships : investigating RNase P and RNase MRP : a thesis presented in partial fulfilment of the requirements for the degree of Master of Science in Genetics at Massey University, New Zealand

Author: Collins Lesley Joan
Publication venue: 'Massey University'
Publication date: 01/01/1998
Field of study

Bioinformatics is applied here to examine whether RNA secondary structure data can reflect distant evolutionary relationships. This is important when there is little confidence in sequence data such as when looking at the evolution of RNase MRP (MRP). RNase P (P) and RNase MRP (MRP) are ribonucleoproteins (RNPs) that are involved in RNA processing and due to functional and secondary structure similarities, are thought to be evolutionary related. P activity is found in all cells, and fits the criteria for inclusion in the RNA world (Jeffares et al. 1998). MRP is found only in eukaryotes with essential functions in both the nucleus and mitochondria. The RNA components of P and MRP (pRNA and mrpRNA) cannot be aligned with any certainty, which leads to a lack of confidence in any phylogenetic trees constructed from them. If MRP evolved from P only in eukaryotes then it is an exception to the general process of the transfer of catalytic activity from RNA, to ribonucleoproteins, to proteins (Jeffares et al. 1998). An alternative possibility that MRP evolved with P in the RNA world (and has since been lost from all but the eukaryotes) is raised and examined. Quantitative comparisons of the pRNA and mrpRNA biological secondary structures have found that the third possibility of an organellar origin of MRP is unlikely Results show that biological secondary structure can be used in the evaluation of an evolutionary relatedness between MRP and P and may be extended to other catalytic RNA molecules. Although there are many protein families, this may be the first evidence of the existence of a family of RNA molecules, although it would be a very small family. Secondary structures derived with folding programs from pRNA and mrpRNA sequences are examined for use in the characterisation of catalytic RNA sequences. The high AT content in organellar genomes may hinder the identification of their catalytic RNA sequences. A search strategy is developed here to address this problem and is used to identify putative pRNA sequences in the chloroplast genomes of four green plants. A maize chloroplast pRNA-like sequence is examined in more detail and shows many characteristics seen in known pRNA sequences. Folding programs show some potential for the characterisation of possible catalytic RNA sequences with only a small bias in the results due to sequence length and AT content

Massey Research Online

Are we there yet? : reliably estimating the completeness of plant genome sequences

Author: Ruttink Tom
Vandepoele Klaas
Veeckman Elisabeth
Publication venue: 'American Society of Plant Biologists (ASPB)'
Publication date: 01/01/2016
Field of study

Genome sequencing is becoming cheaper and faster thanks to the introduction of next-generation sequencing techniques. Dozens of new plant genome sequences have been released in recent years, ranging from small to gigantic repeat-rich or polyploid genomes. Most genome projects have a dual purpose: delivering a contiguous, complete genome assembly and creating a full catalog of correctly predicted genes. Frequently, the completeness of a species' gene catalog is measured using a set of marker genes that are expected to be present. This expectation can be defined along an evolutionary gradient, ranging from highly conserved genes to species-specific genes. Large-scale population resequencing studies have revealed that gene space is fairly variable even between closely related individuals, which limits the definition of the expected gene space, and, consequently, the accuracy of estimates used to assess genome and gene space completeness. We argue that, based on the desired applications of a genome sequencing project, different completeness scores for the genome assembly and/or gene space should be determined. Using examples from several dicot and monocot genomes, we outline some pitfalls and recommendations regarding methods to estimate completeness during different steps of genome assembly and annotation

Ghent University Academic Bibliography

PubMed Central

Genomic evidence for genes encoding leucine-rich repeat receptors linked to resistance against the eukaryotic extra- and intracellular Brassica napus pathogens Leptosphaeria maculans and Plasmodiophora brassicae

Author: Fitt Bruce
Hossein Borhan
Kukol Andreas
Larkan Nicholas
Mashanova Alla
Parham Haddadi
Pascoe Harvey
Stotz Henrik
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2018
Field of study

© 2018 Stotz et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.Genes coding for nucleotide-binding leucine-rich repeat (LRR) receptors (NLRs) control resistance against intracellular (cell-penetrating) pathogens. However, evidence for a role of genes coding for proteins with LRR domains in resistance against extracellular (apoplastic) fungal pathogens is limited. Here, the distribution of genes coding for proteins with eLRR domains but lacking kinase domains was determined for the Brassica napus genome. Predictions of signal peptide and transmembrane regions divided these genes into 184 coding for receptor-like proteins (RLPs) and 121 coding for secreted proteins (SPs). Together with previously annotated NLRs, a total of 720 LRR genes were found. Leptosphaeria maculans-induced expression during a compatible interaction with cultivar Topas differed between RLP, SP and NLR gene families; NLR genes were induced relatively late, during the necrotrophic phase of pathogen colonization. Seven RLP, one SP and two NLR genes were found in Rlm1 and Rlm3/Rlm4/Rlm7/Rlm9 loci for resistance against L. maculans on chromosome A07 of B. napus. One NLR gene at the Rlm9 locus was positively selected, as was the RLP gene on chromosome A10 with LepR3 and Rlm2 alleles conferring resistance against L. maculans races with corresponding effectors AvrLm1 and AvrLm2, respectively. Known loci for resistance against L. maculans (extracellular hemi-biotrophic fungus), Sclerotinia sclerotiorum (necrotrophic fungus) and Plasmodiophora brassicae (intracellular, obligate biotrophic protist) were examined for presence of RLPs, SPs and NLRs in these regions. Whereas loci for resistance against P. brassicae were enriched for NLRs, no such signature was observed for the other pathogens. These findings demonstrate involvement of (i) NLR genes in resistance against the intracellular pathogen P. brassicae and a putative NLR gene in Rlm9-mediated resistance against the extracellular pathogen L. maculans.Peer reviewe

Directory of Open Access Journals

University of Hertfordshire Research Archive

FigShare

Chromosomal-level assembly of the Asian Seabass genome using long sequence reads and multi-layered scaffolding

Author: A Bairoch
A Christoffels
A Gurevich
A Kozomara
A McKenna
A Mitchell
A Morgulis
A Morgulis
A Pradhan
A Reiner
A Rodriguez-Mari
A Stamatakis
A Yates
AI Makunin
AJ Enright
AL Price
AL Price
Alan Christoffels
Aleksey Komissarov
Alexey Tupikin
Amy Hin Yan Tong
Andrey A. Yurchenko
AR Quinlan
B Langmead
B Star
C Berthelot
C Camacho
C Holt
C Wang
Chen-Shan Chin
CS Chin
D Brawand
D Ellinghaus
DA Benson
Darrell Green
DC Hardie
Dean R. Jerry
DH Alexander
Doreen Lau
DR Kelley
DRS-K C. Jerry
E Casacuberta
E. TG Staristina
EW Myers
F Abascal
F Chen
F Yang
FC Jones
FJ Krsticevic
Fritz J. Sedlazeck
G Abrusan
G Benson
G Lin
G Marcais
G Parra
G Parra
G Tamazian
GH Yue
GH Yue
Gopikrishna Gopalapillai
Gregory W. Vurture
GS Slater
GT Valente
H Li
H Saiga
Heiner Kuhl
HH Kazazian Jr.
I Braasch
Inna S. Kuznetsova
IS Kuznetsova
J Castresana
J Eid
J Huerta-Cepas
J Jurka
J Lin
James P. Drake
JG Ruby
JN Volff
JN Volff
Jolly M. Saju
Jonas Korlach
JS Chew
Junhui Jiang
K Howe
K Katoh
K Prufer
Kathiresan Purushothaman
KD Pruitt
KJ Hoff
KP Koepfli
KW Tzung
Lawrence S. Hon
László Orbán
M Blanchette
M Kanehisa
M Kasahara
M Kolmogorov
M Krzywinski
M Martin
M Schartl
M Tarailoâ-Graovac
M Tine
MA Larkin
Mario Jonas
Marsel Kabilov
Matthew Boitano
MB Stocks
MG Grabherr
Michael C. Schatz
MJ Chaisson
MR Friedlander
N Siegel
Natascha M. Thevasagayam
NM Thevasagayam
O Jaillon
O Otero
P Cingolani
P Ravi
P Schattner
P Shannon
P Xu
Paul M. Richardson
PE Warburton
Peter Van Heusden
R Kajitani
R Lorenz
R Luo
R Moore
R Pethiyagoda
R Poulter
R She
R Sreenivasan
Ramkumar Lachumanan
RD Ward
RD Ward
Richard Hall
RJ Roberts
S Chen
S Guindon
S Hoegg
S Hoegg
S Koren
S Vij
S Zhou
Sai Rama Sridatta Prakki
Sarah Mwangi
SF Altschul
Shubha Vij
Si Lok
Si Yan Ngoh
Siddharth Singh
Simon Moxon
SM Kielbasa
Sridhar Sivasubbu
Stanley Kimbung Mbandi
Stephen J. O'Brien
Stephen W. Turner
T Anantharaman
Tamás Dalmay
Tansyn H. Noble
TD Wu
TF DeLuca
TH O'Hare
TLO Davis
TS Anantharaman
Tyler Garvin
U Consortium
U Grimholt
V Douard
V Ravi
Vinaya Kumar Katneni
Vinod Scaria
Vladimir Trifonov
W Xue
WC Liew
Woei Chang Liew
WS Davidson
X Huang
X Zheng
XG Wang
XG Wang
Xueyan Shen
Y Guiguen
Y Han
Y Hashiguchi
Y Moriya
Y Sato
Y Sato
Y Sato
Z Lai
Ø Hammer
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2016
Field of study

We report here the ~670 Mb genome assembly of the Asian seabass (Lates calcarifer), a tropical marine teleost. We used long-read sequencing augmented by transcriptomics, optical and genetic mapping along with shared synteny from closely related fish species to derive a chromosome-level assembly with a contig N50 size over 1 Mb and scaffold N50 size over 25 Mb that span ~90% of the genome. The population structure of L. calcarifer species complex was analyzed by re-sequencing 61 individuals representing various regions across the species' native range. SNP analyses identified high levels of genetic diversity and confirmed earlier indications of a population stratification comprising three clades with signs of admixture apparent in the South-East Asian population. The quality of the Asian seabass genome assembly far exceeds that of any other fish species, and will serve as a new standard for fish genomics

Public Library of Science (PLOS)

Crossref

Cold Spring Harbor Laboratory Institutional Repository

Directory of Open Access Journals

ResearchOnline at James Cook University

PubMed Central

Research Repository

Repository of the Academy's Library

University of East Anglia digital repository

NSU Works

MPG.PuRe

Characteristics of oligonucleotide frequencies across genomes: Conservation versus variation, strand symmetry, and evolutionary implications

Author: Shang-Hong Zhang
Ya-Zhi Huang
Publication venue
Publication date: 01/08/2008
Field of study

One of the objectives of evolutionary genomics is to reveal the genetic information contained in the primordial genome (called the primary genetic information in this paper, with the primordial genome defined here as the most primitive nucleic acid genome for earth’s life) by searching for primitive traits or relics remained in modern genomes. As the shorter a sequence is, the less probable it would be modified during genome evolution. For that reason, some characteristics of very short nucleotide sequences would have considerable chances to persist during billions of years of evolution. Consequently, conservation of certain genomic features of mononucleotides, dinucleotides, and higher-order oligonucleotides across various genomes may exist; some, if not all, of these features would be relics of the primary genetic information. Based on this assumption, we analyzed the pattern of frequencies of mononucleotides, dinucleotides, and higher-order oligonucleotides of the whole-genome sequences from 458 species (including archaea, bacteria, and eukaryotes). Also, we studied the phenomenon of strand symmetry in these genomes. The results show that the conservation of frequencies of some dinucleotides and higher-order oligonucleotides across genomes does exist, and that strand symmetry is a ubiquitous and explicit phenomenon that may contribute to frequency conservation. We propose a new hypothesis for the origin of strand symmetry and frequency conservation as well as for the constitution of early genomes. We conclude that the phenomena of strand symmetry and the pattern of frequency conservation would be original features of the primary genetic information

Nature Precedings

Spectral Analysis of Guanine and Cytosine Fluctuations of Mouse Genomic DNA

Author: Bernardi G.
Calladine C. R.
Clay O.
Daniell P. J.
DIRK HOLSTE
Gerton J. L.
Kong A.
Li W.
Li W.
Mansilla R.
Meyer A.
Press W.
Priestley M. B.
Waterston R. H.
WENTIAN LI
West B. J.
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 03/11/2004
Field of study

We study global fluctuations of the guanine and cytosine base content (GC%) in mouse genomic DNA using spectral analyses. Power spectra S(f) of GC% fluctuations in all nineteen autosomal and two sex chromosomes are observed to have the universal functional form S(f) \sim 1/f^alpha (alpha \approx 1) over several orders of magnitude in the frequency range 10^-7< f < 10^-5 cycle/base, corresponding to long-ranging GC% correlations at distances between 100 kb and 10 Mb. S(f) for higher frequencies (f > 10^-5 cycle/base) shows a flattened power-law function with alpha < 1 across all twenty-one chromosomes. The substitution of about 38% interspersed repeats does not affect the functional form of S(f), indicating that these are not predominantly responsible for the long-ranged multi-scale GC% fluctuations in mammalian genomes. Several biological implications of the large-scale GC% fluctuation are discussed, including neutral evolutionary history by DNA duplication, chromosomal bands, spatial distribution of transcription units (genes), replication timing, and recombination hot spots.Comment: 15 pages (figures included), 2 figure

arXiv.org e-Print Archive

Crossref