Search CORE

106 research outputs found

A graph-based approach for designing extensible pipelines

Author: Machado Moara
Magalhães Wagner
Rodrigues Maíra
Tarazona-Santos Eduardo
Publication venue
Publication date: 30/06/2012
Field of study

Abstract"/p" "p"Background"/p" "p"In bioinformatics, it is important to build extensible and low-maintenance systems that are able to deal with the new tools and data formats that are constantly being developed. The traditional and simplest implementation of pipelines involves hardcoding the execution steps into programs or scripts. This approach can lead to problems when a pipeline is expanding because the incorporation of new tools is often error prone and time consuming. Current approaches to pipeline development such as workflow management systems focus on analysis tasks that are systematically repeated without significant changes in their course of execution, such as genome annotation. However, more dynamism on the pipeline composition is necessary when each execution requires a different combination of steps."/p" "p"Results"/p" "p"We propose a graph-based approach to implement extensible and low-maintenance pipelines that is suitable for pipeline applications with multiple functionalities that require different combinations of steps in each execution. Here pipelines are composed automatically by compiling a specialised set of tools on demand, depending on the functionality required, instead of specifying every sequence of tools in advance. We represent the connectivity of pipeline components with a directed graph in which components are the graph edges, their inputs and outputs are the graph nodes, and the paths through the graph are pipelines. To that end, we developed special data structures and a pipeline system algorithm. We demonstrate the applicability of our approach by implementing a format conversion pipeline for the fields of population genetics and genetic epidemiology, but our approach is also helpful in other fields where the use of multiple software is necessary to perform comprehensive analyses, such as gene expression and proteomics analyses. The project code, documentation and the Java executables are available under an open source license at "url"http://code.google.com/p/dynamic-pipeline"/url". The system has been tested on Linux and Windows platforms."/p" "p"Conclusions"/p" "p"Our graph-based approach enables the automatic creation of pipelines by compiling a specialised set of tools on demand, depending on the functionality required. It also allows the implementation of extensible and low-maintenance pipelines and contributes towards consolidating openness and collaboration in bioinformatics systems. It is targeted at pipeline developers and is suited for implementing applications with sequential execution steps and combined functionalities. In the format conversion application, the automatic combination of conversion tools increased both the number of possible conversions available to the user and the extensibility of the system to allow for future updates with new file formats. Document type: Articl

Crossref

Springer - Publisher Connector

PubMed Central

Scipedia

Limited diversity of Anopheles darlingi in the Peruvian Amazon region of Iquitos.

Author: Gilman Robert H
Jeri Cesar
Oswald William E
Patz Jonathan A
Pinedo-Cancino Viviana
Sheen Patricia
Tarazona-Santos Eduardo
Vittor Amy Yomiko
Publication venue: 'American Society of Tropical Medicine and Hygiene'
Publication date: 01/01/2006
Field of study

Anopheles darlingi is the most important malaria vector in the Amazon basin of South America, and is capable of transmitting both Plasmodium falciparum and P. vivax. To understand the genetic structure of this vector in the Amazonian region of Peru, a simple polymerase chain reaction (PCR)-based test to identify this species of mosquito was used. A random amplified polymorphic DNA-PCR was used to study genetic variation at the micro-geographic level in nine geographically separate populations of An. darlingi collected in areas with different degrees of deforestation surrounding the city of Iquitos. Within-population genetic diversity in nine populations, as quantified by the expected heterozygosity (H(E)), ranged from 0.27 to 0.32. Average genetic distance (F(ST)) among these populations was 0.017. These results show that the nine studied populations are highly homogeneous, suggesting that strategies can be developed to combat this malaria vector as a single epidemiologic unit

LSHTM Research Online

PubMed Central

Diversity in the Glucose Transporter-4 Gene (SLC2A4) in Humans Reflects the Action of Natural Selection along the Old-World Primates Evolution

Author: Andrew Crenshaw
Anita Brandstaetter
Cristina Fabbri
Davide Pettener
Eduardo Tarazona-Santos
Laurie Burdett
Meredith Yeager
Stephen J. Chanock
Wagner C. Magalhaes
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

BACKGROUND: Glucose is an important source of energy for living organisms. In vertebrates it is ingested with the diet and transported into the cells by conserved mechanisms and molecules, such as the trans-membrane Glucose Transporters (GLUTs). Members of this family have tissue specific expression, biochemical properties and physiologic functions that together regulate glucose levels and distribution. GLUT4 -coded by SLC2A4 (17p13) is an insulin-sensitive transporter with a critical role in glucose homeostasis and diabetes pathogenesis, preferentially expressed in the adipose tissue, heart muscle and skeletal muscle. We tested the hypothesis that natural selection acted on SLC2A4. METHODOLOGY/PRINCIPAL FINDINGS: We re-sequenced SLC2A4 and genotyped 104 SNPs along a approximately 1 Mb region flanking this gene in 102 ethnically diverse individuals. Across the studied populations (African, European, Asian and Latin-American), all the eight common SNPs are concentrated in the N-terminal region upstream of exon 7 ( approximately 3700 bp), while the C-terminal region downstream of intron 6 ( approximately 2600 bp) harbors only 6 singletons, a pattern that is not compatible with neutrality for this part of the gene. Tests of neutrality based on comparative genomics suggest that: (1) episodes of natural selection (likely a selective sweep) predating the coalescent of human lineages, within the last 25 million years, account for the observed reduced diversity downstream of intron 6 and, (2) the target of natural selection may not be in the SLC2A4 coding sequence. CONCLUSIONS: We propose that the contrast in the pattern of genetic variation between the N-terminal and C-terminal regions are signatures of the action of natural selection and thus follow-up studies should investigate the functional importance of different regions of the SLC2A4 gene

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Phred-Phrap package to analyses tools: a pipeline to facilitate population genetics re-sequencing studies

Author: Araújo Bruno
Chanock Stephen J
Faria-Campos Alessandra C
Machado Moara
Magalhães Wagner CS
Oliveira Guilherme
Rodrigues Maira R
Scott Leandro
Sene Allan
Tarazona-Santos Eduardo
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

BACKGROUND: Targeted re-sequencing is one of the most powerful and widely used strategies for population genetics studies because it allows an unbiased screening for variation that is suitable for a wide variety of organisms. Examples of studies that require re-sequencing data are evolutionary inferences, epidemiological studies designed to capture rare polymorphisms responsible for complex traits and screenings for mutations in families and small populations with high incidences of specific genetic diseases. Despite the advent of next-generation sequencing technologies, Sanger sequencing is still the most popular approach in population genetics studies because of the widespread availability of automatic sequencers based on capillary electrophoresis and because it is still less prone to sequencing errors, which is critical in population genetics studies. Two popular software applications for re-sequencing studies are Phred-Phrap-Consed-Polyphred, which performs base calling, alignment, graphical edition and genotype calling and DNAsp, which performs a set of population genetics analyses. These independent tools are the start and end points of basic analyses. In between the use of these tools, there is a set of basic but error-prone tasks to be performed with re-sequencing data. RESULTS: In order to assist with these intermediate tasks, we developed a pipeline that facilitates data handling typical of re-sequencing studies. Our pipeline: (1) consolidates different outputs produced by distinct Phred-Phrap-Consed contigs sharing a reference sequence; (2) checks for genotyping inconsistencies; (3) reformats genotyping data produced by Polyphred into a matrix of genotypes with individuals as rows and segregating sites as columns; (4) prepares input files for haplotype inferences using the popular software PHASE; and (5) handles PHASE output files that contain only polymorphic sites to reconstruct the inferred haplotypes including polymorphic and monomorphic sites as required by population genetics software for re-sequencing data such as DNAsp. CONCLUSION: We tested the pipeline in re-sequencing studies of haploid and diploid data in humans, plants, animals and microorganisms and observed that it allowed a substantial decrease in the time required for sequencing analyses, as well as being a more controlled process that eliminates several classes of error that may occur when handling datasets. The pipeline is also useful for investigators using other tools for sequencing and population genetics analyses

Crossref

Springer - Publisher Connector

PubMed Central

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Biogeographical ancestry is associated with socioenvironmental conditions and infections in a Latin American urban population.

Author: Alcantara-Neves Neuza Maria
Barreto Maurício L
Costa Gustavo NO
da Silva Thiago Magalhães
Fiaccone Rosemeire L
Figueiredo Camila A
Kehdy Fernanda SG
Rodrigues Laura C
Tarazona-Santos Eduardo
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

Racial inequalities are observed for different diseases and are mainly caused by differences in socioeconomic status between ethnoracial groups. Genetic factors have also been implicated, and recently, several studies have investigated the association between biogeographical ancestry (BGA) and complex diseases. However, the role of BGA as a proxy for non-genetic health determinants has been little investigated. Similarly, studies comparing the association of BGA and self-reported skin colour with these determinants are scarce. Here, we report the association of BGA and self-reported skin colour with socioenvironmental conditions and infections. We studied 1246 children living in a Brazilian urban poor area. The BGA was estimated using 370,539 genome-wide autosomal markers. Standardised questionnaires were administered to the children's guardians to evaluate socioenvironmental conditions. Infection (or pathogen exposure) was defined by the presence of positive serologic test results for IgG to seven pathogens (Toxocara spp, Toxoplasma gondii, Helicobacter pylori, and hepatitis A, herpes simplex, herpes zoster and Epstein-Barr viruses) and the presence of intestinal helminth eggs in stool samples (Ascaris lumbricoides and Trichiuris trichiura). African ancestry was negatively associated with maternal education and household income and positively associated with infections and variables, indicating poorer housing and living conditions. The self-reported skin colour was associated with infections only. In stratified analyses, the proportion of African ancestry was associated with most of the outcomes investigated, particularly among admixed individuals. In conclusion, BGA was associated with socioenvironmental conditions and infections even in a low-income and highly admixed population, capturing differences that self-reported skin colour miss. Importantly, our findings suggest caution in interpreting significant associations between BGA and diseases as indicative of the genetic factors involved

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

LSHTM Research Online

Directory of Open Access Journals

Unsuspected Associations of Variants within the Genes NOTCH4 and STEAP2-AS1 Uncovered by a GWAS in Endemic Pemphigus Foliaceus

Author: Augusto Danillo G.
Barreto Mauricio L.
Boldt Angelica B. W.
Busch Hauke
de Almeida Rodrigo C.
Farias Ticiana D. J.
Franke Andre
Horta Bernardo L.
Kumar Vinod
Lima-Costa Maria Fernanda
Magalhaes Wagner C. S.
Malheiros Danielle
Petzl-Erler Maria Luiza
Roselino Ana Maria
Schmidt Enno
Tarazona-Santos Eduardo
Wittig Michael
Publication venue: 'Elsevier BV'
Publication date: 01/11/2021
Field of study

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Whole-genome sequencing of 1,171 elderly admixed individuals from Brazil

Author: Araújo Nathalia M
Barreto Mauricio L
Borda Victor
Bozoklian Daniel
Buzzo Jose Leonel
Caceres Omar
Castelli Erick C
Castro Camila Ferreira Bannwart
Ceroni José Ricardo Magliocco
da Silva Simões Carlos Eduardo
da Silva Souza Andreia
de Carvalho Diego Lima
de Souza Andrade Heloísa
Dean Michael
Dos Santos Brito Silva Nayane
Duarte Yeda A O
Galante Pedro A F
Guio Heinner
Guryev Victor
Horta Bernardo L
Karp Tatiana
Lima-Costa Maria Fernanda
Magalhães Wagner C S
Mendes-Junior Celso T
Mercuri Rafael L V
Meyer Diogo
Miller Thiago L A
Mingroni-Netto Regina Célia
Naslavsky Michel S
Nonaka Ricardo
Nunes Kelly
Passos Marília Rodrigues Silva
Passos-Bueno Maria Rita
Rego Fernanda O
Rojas Carlos P
Sanchez Cesar
Scliar Marilia O
Tarazona-Santos Eduardo
Wang Jaqueline Yu Ting
Yamamoto Guilherme L
Zatz Mayana
Zverinova Stepanka
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/03/2022
Field of study

As whole-genome sequencing (WGS) becomes the gold standard tool for studying population genomics and medical applications, data on diverse non-European and admixed individuals are still scarce. Here, we present a high-coverage WGS dataset of 1,171 highly admixed elderly Brazilians from a census-based cohort, providing over 76 million variants, of which ~2 million are absent from large public databases. WGS enables identification of ~2,000 previously undescribed mobile element insertions without previous description, nearly 5 Mb of genomic segments absent from the human genome reference, and over 140 alleles from HLA genes absent from public resources. We reclassify and curate pathogenicity assertions for nearly four hundred variants in genes associated with dominantly-inherited Mendelian disorders and calculate the incidence for selected recessive disorders, demonstrating the clinical usefulness of the present study. Finally, we observe that whole-genome and HLA imputation could be significantly improved compared to available datasets since rare variation represents the largest proportion of input from WGS. These results demonstrate that even smaller sample sizes of underrepresented populations bring relevant data for genomic studies, especially when exploring analyses allowed only by WGS

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

PubMed Central

Dissertations of the University of Groningen

Author Correction: Whole-genome sequencing of 1,171 elderly admixed individuals from Brazil (vol 13, 1004, 2022)

Author: Araujo Nathalia M.
Barreto Mauricio L.
Borda Victor
Bozoklian Daniel
Buzzo Jose Leonel
Caceres Omar
Castelli Erick C.
Castro Camila Ferreira Bannwart
Ceroni Jose Ricardo Magliocco
da Silva Simoes Carlos Eduardo
da Silva Souza Andreia
de Carvalho Diego Lima
de Souza Andrade Heloisa
Dean Michael
dos Santos Brito Silva Nayane
Duarte Yeda A. O.
Galante Pedro A. F.
Guio Heinner
Guryev Victor
Horta Bernardo L.
Karp Tatiana
Lima-Costa Maria Fernanda
Magalhaes Wagner C. S.
Mendes-Junior Celso T.
Mercuri Rafael L. V.
Meyer Diogo
Miller Thiago L. A.
Mingroni-Netto Regina Celia
Naslavsky Michel S.
Nonaka Ricardo
Nunes Kelly
Passos Marilia Rodrigues Silva
Passos-Bueno Maria Rita
Rego Fernanda O.
Rojas Carlos P.
Sanchez Cesar
Scliar Marilia O.
Tarazona-Santos Eduardo
Wang Jaqueline Yu Ting
Yamamoto Guilherme L.
Zatz Mayana
Zverinova Stepanka
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/03/2022
Field of study

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Genome-wide burden and association analyses implicate copy number variations in asthma risk among children and young adults from Latin America.

Author: Barbosa George CG
Barreto Maurício L
Costa Gustavo NO
Damasceno Andresa KA
Fiaccone Rosemeire L
Figueiredo Camila A
Hartwig Fernando P
Horta Bernardo L
Kehdy Fernanda S
Lima-Costa M Fernanda
Oliveira Pablo
Pereira Alexandre
Ribeiro-Silva Rita de C
Rodrigues Laura C
Tarazona-Santos Eduardo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

The genetic architecture of asthma was relatively well explored. However, some work remains in the field to improve our understanding on asthma genetics, especially in non-Caucasian populations and with regards to commonly neglected genetic variants, such as Copy Number Variations (CNVs). In the present study, we investigated the contribution of CNVs on asthma risk among Latin Americans. CNVs were inferred from SNP genotyping data. Genome wide burden and association analyses were conducted to evaluate the impact of CNVs on asthma outcome. We found no significant difference in the numbers of CNVs between asthmatics and non-asthmatics. Nevertheless, we found that CNVs are larger in patients then in healthy controls and that CNVs from cases intersect significantly more genes and regulatory elements. We also found that a deletion at 6p22.1 is associated with asthma symptoms in children from Salvador (Brazil) and in young adults from Pelotas (Brazil). To support our results, we conducted an in silico functional analysis and found that this deletion spans several regulatory elements, including two promoter elements active in lung cells. In conclusion, we found robust evidence that CNVs could contribute for asthma susceptibility. These results uncover a new perspective on the influence of genetic factors modulating asthma risk

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

LSHTM Research Online

Directory of Open Access Journals

Explore Bristol Research