Search CORE

2,902 research outputs found

2b-RAD genotyping for population genomic studies of Chagas disease vectors: Rhodnius ecuadoriensis in Ecuador

Author: Andersson Björn
Costales Jaime A.
De Noia Michele
Grijalva Mario J.
Hernandez Castro Luis Enrique
Hernandez-Castro Luis E.
Llewellyn Martin S.
Ocaña-Mayorga Sofía
Paterno Marta
Villacís Anita G.
Yumiseva Cesar A.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/07/2017
Field of study

Background: Rhodnius ecuadoriensis is the main triatomine vector of Chagas disease, American trypanosomiasis, in Southern Ecuador and Northern Peru. Genomic approaches and next generation sequencing technologies have become powerful tools for investigating population diversity and structure which is a key consideration for vector control. Here we assess the effectiveness of three different 2b restriction site-associated DNA (2b-RAD) genotyping strategies in R. ecuadoriensis to provide sufficient genomic resolution to tease apart microevolutionary processes and undertake some pilot population genomic analyses. Methodology/Principal findings: The 2b-RAD protocol was carried out in-house at a non-specialized laboratory using 20 R. ecuadoriensis adults collected from the central coast and southern Andean region of Ecuador, from June 2006 to July 2013. 2b-RAD sequencing data was performed on an Illumina MiSeq instrument and analyzed with the STACKS de novo pipeline for loci assembly and Single Nucleotide Polymorphism (SNP) discovery. Preliminary population genomic analyses (global AMOVA and Bayesian clustering) were implemented. Our results showed that the 2b-RAD genotyping protocol is effective for R. ecuadoriensis and likely for other triatomine species. However, only BcgI and CspCI restriction enzymes provided a number of markers suitable for population genomic analysis at the read depth we generated. Our preliminary genomic analyses detected a signal of genetic structuring across the study area. Conclusions/Significance: Our findings suggest that 2b-RAD genotyping is both a cost effective and methodologically simple approach for generating high resolution genomic data for Chagas disease vectors with the power to distinguish between different vector populations at epidemiologically relevant scales. As such, 2b-RAD represents a powerful tool in the hands of medical entomologists with limited access to specialized molecular biological equipment. Author summary: Understanding Chagas disease vector (triatomine) population dispersal is key for the design of control measures tailored for the epidemiological situation of a particular region. In Ecuador, Rhodnius ecuadoriensis is a cause of concern for Chagas disease transmission, since it is widely distributed from the central coast to southern Ecuador. Here, a genome-wide sequencing (2b-RAD) approach was performed in 20 specimens from four communities from Manabí (central coast) and Loja (southern) provinces of Ecuador, and the effectiveness of three type IIB restriction enzymes was assessed. The findings of this study show that this genotyping methodology is cost effective in R. ecuadoriensis and likely in other triatomine species. In addition, preliminary population genomic analysis results detected a signal of population structure among geographically distinct communities and genetic variability within communities. As such, 2b-RAD shows significant promise as a relatively low-tech solution for determination of vector population genomics, dynamics, and spread

ZENODO

Directory of Open Access Journals

Electronic Archiving System

Enlighten

Cuckoo search epistasis: a new method for exploring significant genetic interactions

Author: Aflakparast M.
Publication venue
Publication date: 01/01/2014
Field of study

The advent of high-throughput sequencing technology has resulted in the ability to measure millions of single-nucleotide polymorphisms (SNPs) from thousands of individuals. Although these high-dimensional data have paved the way for better understanding of the genetic architecture of common diseases, they have also given rise to challenges in developing computational methods for learning epistatic relationships among genetic markers. We propose a new method, named cuckoo search epistasis (CSE) for identifying significant epistatic interactions in population-based association studies with a case-control design. This method combines a computationally efficient Bayesian scoring function with an evolutionary-based heuristic search algorithm, and can be efficiently applied to high-dimensional genome-wide SNP data. The experimental results from synthetic data sets show that CSE outperforms existing methods including multifactorial dimensionality reduction and Bayesian epistasis association mapping. In addition, on a real genome-wide data set related to Alzheimer's disease, CSE identified SNPs that are consistent with previously reported results, and show the utility of CSE for application to genome-wide data. © 2014 Macmillan Publishers Limited All rights reserved

VU Research Portal

Statistical advances and challenges for analyzing correlated high dimensional SNP data in genomic study for complex diseases

Author: Kelemen Arpad
Liang Yulan
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 28/03/2008
Field of study

Recent advances of information technology in biomedical sciences and other applied areas have created numerous large diverse data sets with a high dimensional feature space, which provide us a tremendous amount of information and new opportunities for improving the quality of human life. Meanwhile, great challenges are also created driven by the continuous arrival of new data that requires researchers to convert these raw data into scientific knowledge in order to benefit from it. Association studies of complex diseases using SNP data have become more and more popular in biomedical research in recent years. In this paper, we present a review of recent statistical advances and challenges for analyzing correlated high dimensional SNP data in genomic association studies for complex diseases. The review includes both general feature reduction approaches for high dimensional correlated data and more specific approaches for SNPs data, which include unsupervised haplotype mapping, tag SNP selection, and supervised SNPs selection using statistical testing/scoring, statistical modeling and machine learning methods with an emphasis on how to identify interacting loci.Comment: Published in at http://dx.doi.org/10.1214/07-SS026 the Statistics Surveys (http://www.i-journals.org/ss/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Recommended from our members

Worldwide genetic variation of the IGHV and TRBV immune receptor gene families in humans.

Author: Li Heng
Luo Shishi
Song Yun
Yu Jane
Publication venue: eScholarship, University of California
Publication date: 01/04/2019
Field of study

The immunoglobulin heavy variable (IGHV) and T cell beta variable (TRBV) loci are among the most complex and variable regions in the human genome. Generated through a process of gene duplication/deletion and diversification, these loci can vary extensively between individuals in copy number and contain genes that are highly similar, making their analysis technically challenging. Here, we present a comprehensive study of the functional gene segments in the IGHV and TRBV loci, quantifying their copy number and single-nucleotide variation in a globally diverse sample of 109 (IGHV) and 286 (TRBV) humans from over a 100 populations. We find that the IGHV and TRBV gene families exhibit starkly different patterns of variation. In addition to providing insight into the different evolutionary paths of the IGHV and TRBV loci, our results are also important to the adaptive immune repertoire sequencing community, where the lack of frequencies of common alleles and copy number variants is hampering existing analytical pipelines

eScholarship - University of California

Exploring Patterns of Epigenetic Information With Data Mining Techniques

Author: Aguiar-Pulido Vanessa
Dorado Julián
Gestal M.
Seoane José A.
Publication venue: 'Bentham Science Publishers Ltd.'
Publication date: 01/01/2013
Field of study

[Abstract] Data mining, a part of the Knowledge Discovery in Databases process (KDD), is the process of extracting patterns from large data sets by combining methods from statistics and artificial intelligence with database management. Analyses of epigenetic data have evolved towards genome-wide and high-throughput approaches, thus generating great amounts of data for which data mining is essential. Part of these data may contain patterns of epigenetic information which are mitotically and/or meiotically heritable determining gene expression and cellular differentiation, as well as cellular fate. Epigenetic lesions and genetic mutations are acquired by individuals during their life and accumulate with ageing. Both defects, either together or individually, can result in losing control over cell growth and, thus, causing cancer development. Data mining techniques could be then used to extract the previous patterns. This work reviews some of the most important applications of data mining to epigenetics.Programa Iberoamericano de Ciencia y Tecnología para el Desarrollo; 209RT-0366Galicia. Consellería de Economía e Industria; 10SIN105004PRInstituto de Salud Carlos III; RD07/0067/000

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Explore Bristol Research

Sequence‐based SNP genotyping in durum wheat

Author: Antoine Janssen
Edwin A. G. van der Vossen
Hoa T. Truong
Jifeng Tang
Marco Maccaferri
Maria Corinna Sanguineti
Nathalie J. van Orsouw
Remco M.P. van Poecke
Roberto Tuberosa
Silvio Salvi
Publication venue
Publication date: 03/05/2013
Field of study

Summary: Marker development for marker-assisted selection in plant breeding is increasingly based on next-generation sequencing (NGS). However, marker development in crops with highly repetitive, complex genomes is still challenging. Here we applied sequence-based genotyping (SBG), which couples AFLP®-based complexity reduction to NGS, for de novo single nucleotide polymorphisms (SNP) marker discovery in and genotyping of a biparental durum wheat population. We identified 9983 putative SNPs in 6372 contigs between the two parents and used these SNPs for genotyping 91 recombinant inbred lines (RILs). Excluding redundant information from multiple SNPs per contig, 2606 (41%) markers were used for integration in a pre-existing framework map, resulting in the integration of 2365 markers over 2607 cM. Of the 2606 markers available for mapping, 91% were integrated in the pre-existing map, containing 708 SSRs, DArT markers, and SNPs from CRoPS technology, with a map-size increase of 492 cM (23%). These results demonstrate the high quality of the discovered SNP markers. With this methodology, it was possible to saturate the map at a final marker density of 0.8 cM/marker. Looking at the binned marker distribution (Figure 2), 63 of the 268 10-cM bins contained only SBG markers, showing that these markers are filling in gaps in the framework map. As to the markers that could not be used for mapping, the main reason was the low sequencing coverage used for genotyping. We conclude that SBG is a valuable tool for efficient, high-throughput and high-quality marker discovery and genotyping for complex genomes such as that of durum wheat

Open Access Repository

SNP-PHAGE – High throughput SNP discovery pipeline

Author: Choi Ik-Young
Cregan Perry B
Grefenstette John J
Hyten David L
Matukumalli Lakshmi K
Van Tassell Curtis P
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Single nucleotide polymorphisms (SNPs) as defined here are single base sequence changes or short insertion/deletions between or within individuals of a given species. As a result of their abundance and the availability of high throughput analysis technologies SNP markers have begun to replace other traditional markers such as restriction fragment length polymorphisms (RFLPs), amplified fragment length polymorphisms (AFLPs) and simple sequence repeats (SSRs or microsatellite) markers for fine mapping and association studies in several species. For SNP discovery from chromatogram data, several bioinformatics programs have to be combined to generate an analysis pipeline. Results have to be stored in a relational database to facilitate interrogation through queries or to generate data for further analyses such as determination of linkage disequilibrium and identification of common haplotypes. Although these tasks are routinely performed by several groups, an integrated open source SNP discovery pipeline that can be easily adapted by new groups interested in SNP marker development is currently unavailable. RESULTS: We developed SNP-PHAGE (SNP discovery Pipeline with additional features for identification of common haplotypes within a sequence tagged site (Haplotype Analysis) and GenBank (-dbSNP) submissions. This tool was applied for analyzing sequence traces from diverse soybean genotypes to discover over 10,000 SNPs. This package was developed on UNIX/Linux platform, written in Perl and uses a MySQL database. Scripts to generate a user-friendly web interface are also provided with common queries for preliminary data analysis. A machine learning tool developed by this group for increasing the efficiency of SNP discovery is integrated as a part of this package as an optional feature. The SNP-PHAGE package is being made available open source at . CONCLUSION: SNP-PHAGE provides a bioinformatics solution for high throughput SNP discovery, identification of common haplotypes within an amplicon, and GenBank (dbSNP) submissions. SNP selection and visualization are aided through a user-friendly web interface. This tool is useful for analyzing sequence tagged sites (STSs) of genomic sequences, and this software can serve as a starting point for groups interested in developing SNP markers

DigitalCommons@University of Nebraska

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central