Search CORE

10,895 research outputs found

MISSEL: a method to identify a large number of small species-specific genomic subsequences and its application to viruses classification

Author: Babakir Mina Muhammed
Bertolazzi Paola
Cella Eleonora
Ciccozzi Massimo
Ciotti Marco
Felici Giovanni
Fiscon Giulia
Giovanetti Marta
Lo Presti Alessandra
Pierangeli Alessandra
Weitschek Emanuel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Continuous improvements in next generation sequencing technologies led to ever-increasing collections of genomic sequences, which have not been easily characterized by biologists, and whose analysis requires huge computational effort. The classification of species emerged as one of the main applications of DNA analysis and has been addressed with several approaches, e.g., multiple alignments-, phylogenetic trees-, statistical- and character-based methods

PubMed Central

Archivio della ricerca- Università di Roma La Sapienza

FigShare

The Echinococcus canadensis (G7) genome: A key knowledge of parasitic platyhelminth human diseases

Author: A Bankevich
A Gurevich
A Lomsadze
A Lomsadze
Adolfo Fox
AM Bolger
Anna C. M. Salim
B Hendrich
B Langmead
C Bermudez-Santana
C Hahn
C Holt
C Jiang
C Trapnell
CA Alvarez Rojas
CCM Budke
D Kim
D Takai
DP McManus
DR Zerbino
E Elkayam
E Keibler
E Quevillon
F Jeanmougin
F Kiefer
F Mohn
Federico Camicia
Flávio M. Gomes Araújo
G Abrusán
G Parra
GSC Slater
Guilherme Oliveira
H Li
H Zheng
I Korf
IJ Tsai
IJ Tsai
J Eckert
JK Nono
JM Bart
JP Hewitson
Juliana Assis
K Arnold
K Matsuo
K Thivierge
K Wasik
KJ Fryxell
KK Geyer
KK Geyer
L Han
L Han
L Kamenetzky
L Kamenetzky
L Kamenetzky
L Li
LA Kelley
Laura Kamenetzky
LD Moore
Lucas L. Maldonado
M Ashburner
M Biasini
M Cucher
M Cucher
M Krzywinski
M Marín
M Nakao
M Nakao
M Nakao
M Nakao
M Nakao
M Rosenzvit
M Sajid
M Stanke
MA Cucher
Mara Rosenzvit
Marcela Cucher
MC Rosenzvit
MW Robinson
N Guex
N Macchiaroli
N Schürmann
Natalia Macchiaroli
ND Young
O Bogdanović
P Carninci
P Cingolani
P Danecek
PM Muzulin
PM Schantz
PS Craig
R Luo
R Schneider
RD Finn
RJ Klose
S Assefa
S Maillard
S Saxonov
S Yi
SF Altschul
SM Sadjjadi
TD Otto
TD Otto
TM Lowe
U Koziol
U Saarma
W Pan
Y Moriya
Y Safonova
YA Medvedeva
Z Zhao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/02/2017
Field of study

Background: The parasite Echinococcus canadensis (G7) (phylum Platyhelminthes, class Cestoda) is one of the causative agents of echinococcosis. Echinococcosis is a worldwide chronic zoonosis affecting humans as well as domestic and wild mammals, which has been reported as a prioritized neglected disease by the World Health Organisation. No genomic data, comparative genomic analyses or efficient therapeutic and diagnostic tools are available for this severe disease. The information presented in this study will help to understand the peculiar biological characters and to design species-specific control tools. Results: We sequenced, assembled and annotated the 115-Mb genome of E. canadensis (G7). Comparative genomic analyses using whole genome data of three Echinococcus species not only confirmed the status of E. canadensis (G7) as a separate species but also demonstrated a high nucleotide sequences divergence in relation to E. granulosus (G1). The E. canadensis (G7) genome contains 11,449 genes with a core set of 881 orthologs shared among five cestode species. Comparative genomics revealed that there are more single nucleotide polymorphisms (SNPs) between E. canadensis (G7) and E. granulosus (G1) than between E. canadensis (G7) and E. multilocularis. This result was unexpected since E. canadensis (G7) and E. granulosus (G1) were considered to belong to the species complex E. granulosus sensu lato. We described SNPs in known drug targets and metabolism genes in the E. canadensis (G7) genome. Regarding gene regulation, we analysed three particular features: CpG island distribution along the three Echinococcus genomes, DNA methylation system and small RNA pathway. The results suggest the occurrence of yet unknown gene regulation mechanisms in Echinococcus. Conclusions: This is the first work that addresses Echinococcus comparative genomics. The resources presented here will promote the study of mechanisms of parasite development as well as new tools for drug discovery. The availability of a high-quality genome assembly is critical for fully exploring the biology of a pathogenic organism. The E. canadensis (G7) genome presented in this study provides a unique opportunity to address the genetic diversity among the genus Echinococcus and its particular developmental features. At present, there is no unequivocal taxonomic classification of Echinococcus species; however, the genome-wide SNPs analysis performed here revealed the phylogenetic distance among these three Echinococcus species. Additional cestode genomes need to be sequenced to be able to resolve their phylogeny.Fil: Maldonado, Lucas Luciano. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; ArgentinaFil: Assis, Juliana. Fundación Oswaldo Cruz; BrasilFil: Gomes Araújo, Flávio M.. Fundación Oswaldo Cruz; BrasilFil: Salim, Anna C. M.. Fundación Oswaldo Cruz; BrasilFil: Macchiaroli, Natalia. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; ArgentinaFil: Cucher, Marcela Alejandra. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; ArgentinaFil: Camicia, Federico. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; ArgentinaFil: Fox, Adolfo. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; ArgentinaFil: Rosenzvit, Mara Cecilia. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; ArgentinaFil: Oliveira, Guilherme. Instituto Tecnológico Vale; Brasil. Fundación Oswaldo Cruz; BrasilFil: Kamenetzky, Laura. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; Argentin

Sequence information gain based motif analysis

Author: Marco Santiago
Maynou Fernández Joan
Pairó Erola
Perera Lluna Alexandre
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Background: The detection of regulatory regions in candidate sequences is essential for the understanding of the regulation of a particular gene and the mechanisms involved. This paper proposes a novel methodology based on information theoretic metrics for finding regulatory sequences in promoter regions. Results: This methodology (SIGMA) has been tested on genomic sequence data for Homo sapiens and Mus musculus. SIGMA has been compared with different publicly available alternatives for motif detection, such as MEME/MAST, Biostrings (Bioconductor package), MotifRegressor, and previous work such Qresiduals projections or information theoretic based detectors. Comparative results, in the form of Receiver Operating Characteristic curves, show how, in 70 % of the studied Transcription Factor Binding Sites, the SIGMA detector has a better performance and behaves more robustly than the methods compared, while having a similar computational time. The performance of SIGMA can be explained by its parametric simplicity in the modelling of the non-linear co-variability in the binding motif positions. Conclusions: Sequence Information Gain based Motif Analysis is a generalisation of a non-linear model of the cis-regulatory sequences detection based on Information Theory. This generalisation allows us to detect transcription factor binding sites with maximum performance disregarding the covariability observed in the positions of the training set of sequences. SIGMA is freely available to the public at http://b2slab.upc.edu.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Springer - Publisher Connector

PubMed Central

Identification of co-regulated candidate genes by promoter analysis.

Author: Hellen Elizabeth H.B.
Publication venue
Publication date: 01/01/2010
Field of study

EThOS - Electronic Theses Online ServiceGBUnited Kingdo

University of Brighton Research Portal

OpenGrey Repository

Peptide vocabulary analysis reveals ultra-conservation and homonymity in protein sequences

Author: Gatherer D.
Publication venue: 'SAGE Publications'
Publication date: 01/01/2007
Field of study

A new algorithm is presented for vocabulary analysis (word detection) in texts of human origin. It performs at 60%–70% overall accuracy and greater than 80% accuracy for longer words, and approximately 85% sensitivity on Alice in Wonderland, a considerable improvement on previous methods. When applied to protein sequences, it detects short sequences analogous to words in human texts, i.e. intolerant to changes in spelling (mutation), and relatively contextindependent in their meaning (function). Some of these are homonyms of up to 7 amino acids, which can assume different structures in different proteins. Others are ultra-conserved stretches of up to 18 amino acids within proteins of less than 40% overall identity, reflecting extreme constraint or convergent evolution. Different species are found to have qualitatively different major peptide vocabularies, e.g. some are dominated by large gene families, while others are rich in simple repeats or dominated by internally repetitive proteins. This suggests the possibility of a peptide vocabulary signature, analogous to genome signatures in DNA. Homonyms may be useful in detecting convergent evolution and positive selection in protein evolution. Ultra-conserved words may be useful in identifying structures intolerant to substitution over long periods of evolutionary time

Directory of Open Access Journals

Enlighten

Lancaster E-Prints

Unveiling combinatorial regulation through the combination of ChIP information and in silico cis-regulatory module detection

Author: Fierro Ana Carolina
Guns Tias
Marchal Kathleen
Nijssen Siegfried
Sun Hong
Thorrez Lieven
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2012
Field of study

Computationally retrieving biologically relevant cis-regulatory modules (CRMs) is not straightforward. Because of the large number of candidates and the imperfection of the screening methods, many spurious CRMs are detected that are as high scoring as the biologically true ones. Using ChIP-information allows not only to reduce the regions in which the binding sites of the assayed transcription factor (TF) should be located, but also allows restricting the valid CRMs to those that contain the assayed TF (here referred to as applying CRM detection in a query-based mode). In this study, we show that exploiting ChIP-information in a query-based way makes in silico CRM detection a much more feasible endeavor. To be able to handle the large datasets, the query-based setting and other specificities proper to CRM detection on ChIP-Seq based data, we developed a novel powerful CRM detection method 'CPModule'. By applying it on a well-studied ChIP-Seq data set involved in self-renewal of mouse embryonic stem cells, we demonstrate how our tool can recover combinatorial regulation of five known TFs that are key in the self-renewal of mouse embryonic stem cells. Additionally, we make a number of new predictions on combinatorial regulation of these five key TFs with other TFs documented in TRANSFAC

Ghent University Academic Bibliography

PubMed Central