Search CORE

28,927 research outputs found

Bioinformatics tools for analysing viral genomic data

Author: Davison A.
Gu Q.
Hughes J.
Maabar M.
Modha S.
Orton R.J.
Vattipally Sreenu
Wilkie G.S.
Publication venue: 'O.I.E (World Organisation for Animal Health)'
Publication date: 01/04/2016
Field of study

The field of viral genomics and bioinformatics is experiencing a strong resurgence due to high-throughput sequencing (HTS) technology, which enables the rapid and cost-effective sequencing and subsequent assembly of large numbers of viral genomes. In addition, the unprecedented power of HTS technologies has enabled the analysis of intra-host viral diversity and quasispecies dynamics in relation to important biological questions on viral transmission, vaccine resistance and host jumping. HTS also enables the rapid identification of both known and potentially new viruses from field and clinical samples, thus adding new tools to the fields of viral discovery and metagenomics. Bioinformatics has been central to the rise of HTS applications because new algorithms and software tools are continually needed to process and analyse the large, complex datasets generated in this rapidly evolving area. In this paper, the authors give a brief overview of the main bioinformatics tools available for viral genomic research, with a particular emphasis on HTS technologies and their main applications. They summarise the major steps in various HTS analyses, starting with quality control of raw reads and encompassing activities ranging from consensus and de novo genome assembly to variant calling and metagenomics, as well as RNA sequencing

Enlighten

Plasmodium knowlesi Genome Sequences from Clinical Isolates Reveal Extensive Genomic Dimorphism.

Author: A Conesa
A Ecker
A Pain
A Sinha
AR Quinlan
B Langmead
B Singh
B Singh
C Aurrecoechea
C Daneshvar
C Hertz-Fowler
CJ Sutherland
EV Meyer
FA Fatih
H Li
I Kozarewa
J Cox-Singh
J Cox-Singh
Janet Cox-Singh
JB Koenderink
JC Barrett
JC Barrett
JC de Roode
JC Rayner
Julian C. Rayner
K Rutherford
KS Lee
L Mamanova
LH Miller
LH Miller
M Ashburner
M Rougemont
M Yuda
MA Ahmed
Md Atique Ahmed
Miguel M. Pinheiro
Osamu Kaneko
P Jones
P Librado
PC Divis
PM Jones
R Leinonen
R Tanizaki
R Tripathi
S Iwanaga
S Purcell
Sanjeev Krishna
Scott B. Millar
SE Lindner
SO Oyola
SW Roy
T Carver
T William
TC Yeo
Theo Sanderson
Thomas D. Otto
U Bronner
Woon Chan Lu
YL Lau
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2015
Field of study

Plasmodium knowlesi is a newly described zoonosis that causes malaria in the human population that can be severe and fatal. The study of P. knowlesi parasites from human clinical isolates is relatively new and, in order to obtain maximum information from patient sample collections, we explored the possibility of generating P. knowlesi genome sequences from archived clinical isolates. Our patient sample collection consisted of frozen whole blood samples that contained excessive human DNA contamination and, in that form, were not suitable for parasite genome sequencing. We developed a method to reduce the amount of human DNA in the thawed blood samples in preparation for high throughput parasite genome sequencing using Illumina HiSeq and MiSeq sequencing platforms. Seven of fifteen samples processed had sufficiently pure P. knowlesi DNA for whole genome sequencing. The reads were mapped to the P. knowlesi H strain reference genome and an average mapping of 90% was obtained. Genes with low coverage were removed leaving 4623 genes for subsequent analyses. Previously we identified a DNA sequence dimorphism on a small fragment of the P. knowlesi normocyte binding protein xa gene on chromosome 14. We used the genome data to assemble full-length Pknbpxa sequences and discovered that the dimorphism extended along the gene. An in-house algorithm was developed to detect SNP sites co-associating with the dimorphism. More than half of the P. knowlesi genome was dimorphic, involving genes on all chromosomes and suggesting that two distinct types of P. knowlesi infect the human population in Sarawak, Malaysian Borneo. We use P. knowlesi clinical samples to demonstrate that Plasmodium DNA from archived patient samples can produce high quality genome data. We show that analyses, of even small numbers of difficult clinical malaria isolates, can generate comprehensive genomic information that will improve our understanding of malaria parasite diversity and pathobiology

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Unimas Institutional Repository

Enlighten

St George's Online Research Archive

University of St. Andrews - Pure

St Andrews Research Repository

FigShare

From parasite genomes to one healthy world: Are we having fun yet?

Author: Gasbarre Louis C.
Zarlenga Dante
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 01/01/2009
Field of study

In 1990, the Human Genome Sequencing Project was established. This laid the ground work for an explosion of sequence data that has since followed. As a result of this effort, the first complete genome of an animal, Caenorhabditis elegans was published in 1998. The sequence of Drosophila melanogaster was made available in March, 2000 and in the following year, working drafts of the human genome were generated with the completed sequence (92%) being released in 2003. Recent advancements and next-generation technologies have made sequencing common place and have infiltrated every aspect of biological research, including parasitology. To date, sequencing of 32 apicomplexa and 24 nematode genomes are either in progress or near completion, and over 600k nematode EST and 200k apicomplexa EST submissions fill the databases. However, the winds have shifted and efforts are now refocusing on how best to store, mine and apply these data to problem solving. Herein we tend not to summarize existing X-omics datasets or present new technological advances that promise future benefits. Rather, the information to follow condenses up-to-date-applications of existing technologies to problem solving as it relates to parasite research. Advancements in non-parasite systems are also presented with the proviso that applications to parasite research are in the making

DigitalCommons@University of Nebraska

INTEGRATIVE ANALYSIS OF OMICS DATA IN ADULT GLIOMA AND OTHER TCGA CANCERS TO GUIDE PRECISION MEDICINE

Author: hu Xin
hu xin
Publication venue: DigitalCommons@TMC
Publication date: 01/05/2017
Field of study

Transcriptomic profiling and gene expression signatures have been widely applied as effective approaches for enhancing the molecular classification, diagnosis, prognosis or prediction of therapeutic response towards personalized therapy for cancer patients. Thanks to modern genome-wide profiling technology, scientists are able to build engines leveraging massive genomic variations and integrating with clinical data to identify “at risk” individuals for the sake of prevention, diagnosis and therapeutic interventions. In my graduate work for my Ph.D. thesis, I have investigated genomic sequencing data mining to comprehensively characterise molecular classifications and aberrant genomic events associated with clinical prognosis and treatment response, through applying high-dimensional omics genomic data to promote the understanding of gene signatures and somatic molecular alterations contributing to cancer progression and clinical outcomes. Following this motivation, my dissertation has been focused on the following three topics in translational genomics. 1) Characterization of transcriptomic plasticity and its association with the tumor microenvironment in glioblastoma (GBM). I have integrated transcriptomic, genomic, protein and clinical data to increase the accuracy of GBM classification, and identify the association between the GBM mesenchymal subtype and reduced tumorpurity, accompanied with increased presence of tumor-associated microglia. Then I have tackled the sole source of microglial as intrinsic tumor bulk but not their corresponding neurosphere cells through both transcriptional and protein level analysis using a panel of sphere-forming glioma cultures and their parent GBM samples.FurthermoreI have demonstrated my hypothesis through longitudinal analysis of paired primary and recurrent GBM samples that the phenotypic alterations of GBM subtypes are not due to intrinsic proneural-to-mesenchymal transition in tumor cells, rather it is intertwined with increased level of microglia upon disease recurrence. Collectively I have elucidated the critical role of tumor microenvironment (Microglia and macrophages from central nervous system) contributing to the intra-tumor heterogeneity and accurate classification of GBM patients based on transcriptomic profiling, which will not only significantly impact on clinical perspective but also pave the way for preclinical cancer research. 2) Identification of prognostic gene signatures that stratify adult diffuse glioma patientsharboring1p/19q co-deletions. I have compared multiple statistical methods and derived a gene signature significantly associated with survival by applying a machine learning algorithm. Then I have identified inflammatory response and acetylation activity that associated with malignant progression of 1p/19q co-deleted glioma. In addition, I showed this signature translates to other types of adult diffuse glioma, suggesting its universality in the pathobiology of other subset gliomas. My efforts on integrative data analysis of this highly curated data set usingoptimizedstatistical models will reflect the pending update to WHO classification system oftumorsin the central nervous system (CNS). 3) Comprehensive characterization of somatic fusion transcripts in Pan-Cancers. I have identified a panel of novel fusion transcripts across all of TCGA cancer types through transcriptomic profiling. Then I have predicted fusion proteins with kinase activity and hub function of pathway network based on the annotation of genetically mobile domains and functional domain architectures. I have evaluated a panel of in -frame gene fusions as potential driver mutations based on network fusion centrality hypothesis. I have also characterised the emerging complexity of genetic architecture in fusion transcripts through integrating genomic structure and somatic variants and delineating the distinct genomic patterns of fusion events across different cancer types. Overall my exploration of the pathogenetic impact and clinical relevance of candidate gene fusions have provided fundamental insights into the management of a subset of cancer patients by predicting the oncogenic signalling and specific drug targets encoded by these fusion genes. Taken together, the translational genomic research I have conducted during my Ph.D. study will shed new light on precision medicine and contribute to the cancer research community. The novel classification concept, gene signature and fusion transcripts I have identified will address several hotly debated issues in translational genomics, such as complex interactions between tumor bulks and their adjacent microenvironments, prognostic markers for clinical diagnostics and personalized therapy, distinct patterns of genomic structure alterations and oncogenic events in different cancer types, therefore facilitating our understanding of genomic alterations and moving us towards the development of precision medicine

DigitalCommons@The Texas Medical Center

A practical, bioinformatic workflow system for large data sets generated by next-generation sequencing

Author: Aaron R. Jex
Altschul
Anja Joachim
Ashburner
Bentley
Bethony
Björnberg
Blaxter
Boag
Bronwyn E. Campbell
Caffrey
Campbell
Cantacessi
Cantacessi
Cantacessi
Cantacessi
Chan
Chang
Cinzia Cantacessi
Clifton
Conesa
Cottee
Cottee
Datu
DeRisi
Doyle
Flicek
Freigofas
Gasser
Golden
Greene
Gupta
Hawdon
Hopkins
Hotez
Hu
Huang
Hunter
Iseli
Jackson
Joachim
Joachim
Keil
Krasky
Letunic
Li
Li
Li
Lipinski
Makedonka Mitreva
Margulies
Matthew J. Nolan
McKay
Metzker
Miller
Miller
Mizuarai
Moreno
Morozova
Moser
Mufson
Mulvenna
Nagaraj
Nagaraj
Neil D. Young
Nikolaou
Nisbet
Olson
Parkinson
Paul W. Sternberg
Pong
Portman
Ranganathan
Ren
Robertson
Robin B. Gasser
Robinson
Ross S. Hall
Sahar Abubucker
Sanger
Sanger
Santos
Shoba Ranganathan
Soderlund
Stathopoulos
Stockdale
Tanaka
Vibranovski
Wang
Williamson
Wilson
Wu
Young
Young
Zhan
Zhong
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2010
Field of study

Transcriptomics (at the level of single cells, tissues and/or whole organisms) underpins many fields of biomedical science, from understanding the basic cellular function in model organisms, to the elucidation of the biological events that govern the development and progression of human diseases, and the exploration of the mechanisms of survival, drug-resistance and virulence of pathogens. Next-generation sequencing (NGS) technologies are contributing to a massive expansion of transcriptomics in all fields and are reducing the cost, time and performance barriers presented by conventional approaches. However, bioinformatic tools for the analysis of the sequence data sets produced by these technologies can be daunting to researchers with limited or no expertise in bioinformatics. Here, we constructed a semi-automated, bioinformatic workflow system, and critically evaluated it for the analysis and annotation of large-scale sequence data sets generated by NGS. We demonstrated its utility for the exploration of differences in the transcriptomes among various stages and both sexes of an economically important parasitic worm (Oesophagostomum dentatum) as well as the prediction and prioritization of essential molecules (including GTPases, protein kinases and phosphatases) as novel drug target candidates. This workflow system provides a practical tool for the assembly, annotation and analysis of NGS data sets, also to researchers with a limited bioinformatic expertise. The custom-written Perl, Python and Unix shell computer scripts used can be readily modified or adapted to suit many different applications. This system is now utilized routinely for the analysis of data sets from pathogens of major socio-economic importance and can, in principle, be applied to transcriptomics data sets from any organism

CiteSeerX

ResearchOnline@JCU

Crossref

ResearchOnline at James Cook University

PubMed Central

Digital Commons@Becker

Caltech Authors

UGD Academic Repository

Macquarie University ResearchOnline

University of Melbourne Institutional Repository

A practical, bioinformatic workflow system for large data sets generated by next-generation sequencing

Author: Aaron R. Jex
Altschul
Anja Joachim
Ashburner
Bentley
Bethony
Björnberg
Blaxter
Boag
Bronwyn E. Campbell
Caffrey
Campbell
Cantacessi
Cantacessi
Cantacessi
Cantacessi
Chan
Chang
Cinzia Cantacessi
Clifton
Conesa
Cottee
Cottee
Datu
DeRisi
Doyle
Flicek
Freigofas
Gasser
Golden
Greene
Gupta
Hawdon
Hopkins
Hotez
Hu
Huang
Hunter
Iseli
Jackson
Joachim
Joachim
Keil
Krasky
Letunic
Li
Li
Li
Lipinski
Makedonka Mitreva
Margulies
Matthew J. Nolan
McKay
Metzker
Miller
Miller
Mizuarai
Moreno
Morozova
Moser
Mufson
Mulvenna
Nagaraj
Nagaraj
Neil D. Young
Nikolaou
Nisbet
Olson
Parkinson
Paul W. Sternberg
Pong
Portman
Ranganathan
Ren
Robertson
Robin B. Gasser
Robinson
Ross S. Hall
Sahar Abubucker
Sanger
Sanger
Santos
Shoba Ranganathan
Soderlund
Stathopoulos
Stockdale
Tanaka
Vibranovski
Wang
Williamson
Wilson
Wu
Young
Young
Zhan
Zhong
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2010
Field of study

CiteSeerX

ResearchOnline@JCU

Crossref

ResearchOnline at James Cook University

PubMed Central

Digital Commons@Becker

Caltech Authors

UGD Academic Repository

Macquarie University ResearchOnline

University of Melbourne Institutional Repository

Comparison of TCGA and GENIE genomic datasets for the detection of clinically actionable alterations in breast cancer.

Author: Carpten John D
Kaur Pushpinder
Lang Julie E
Porras Tania B
Ring Alexander
Publication venue: eScholarship, University of California
Publication date: 01/02/2019
Field of study

Whole exome sequencing (WES), targeted gene panel sequencing and single nucleotide polymorphism (SNP) arrays are increasingly used for the identification of actionable alterations that are critical to cancer care. Here, we compared The Cancer Genome Atlas (TCGA) and the Genomics Evidence Neoplasia Information Exchange (GENIE) breast cancer genomic datasets (array and next generation sequencing (NGS) data) in detecting genomic alterations in clinically relevant genes. We performed an in silico analysis to determine the concordance in the frequencies of actionable mutations and copy number alterations/aberrations (CNAs) in the two most common breast cancer histologies, invasive lobular and invasive ductal carcinoma. We found that targeted sequencing identified a larger number of mutational hotspots and clinically significant amplifications that would have been missed by WES and SNP arrays in many actionable genes such as PIK3CA, EGFR, AKT3, FGFR1, ERBB2, ERBB3 and ESR1. The striking differences between the number of mutational hotspots and CNAs generated from these platforms highlight a number of factors that should be considered in the interpretation of array and NGS-based genomic data for precision medicine. Targeted panel sequencing was preferable to WES to define the full spectrum of somatic mutations present in a tumor

Directory of Open Access Journals

eScholarship - University of California

Rapid Sequence Identification of Potential Pathogens Using Techniques from Sparse Linear Algebra

Author: Chiu Nelson
Dodson Stephanie
Kepner Jeremy
Ricke Darrell O.
Shcherbina Anna
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/01/2015
Field of study

The decreasing costs and increasing speed and accuracy of DNA sample collection, preparation, and sequencing has rapidly produced an enormous volume of genetic data. However, fast and accurate analysis of the samples remains a bottleneck. Here we present D

^{4}

RAGenS, a genetic sequence identification algorithm that exhibits the Big Data handling and computational power of the Dynamic Distributed Dimensional Data Model (D4M). The method leverages linear algebra and statistical properties to increase computational performance while retaining accuracy by subsampling the data. Two run modes, Fast and Wise, yield speed and precision tradeoffs, with applications in biodefense and medical diagnostics. The D

^{4}

RAGenS analysis algorithm is tested over several datasets, including three utilized for the Defense Threat Reduction Agency (DTRA) metagenomic algorithm contest

arXiv.org e-Print Archive

Crossref

Whole-Genome Sequencing and Concordance Between Antimicrobial Susceptibility Genotypes and Phenotypes of Bacterial Isolates Associated with Bovine Respiratory Disease.

Author: Abdo Zaid
Aly Sharif S
Belk Keith
Blanchard Patricia C
Davis Jessica H
Lehenbauer Terry W
Miller Michael R
Morley Paul
Noyes Noelle
O'Rourke Sean M
Owen Joseph R
Prince Daniel J
Van Eenennaam Alison L
Young Amy E
Publication venue: eScholarship, University of California
Publication date: 01/09/2017
Field of study

Extended laboratory culture and antimicrobial susceptibility testing timelines hinder rapid species identification and susceptibility profiling of bacterial pathogens associated with bovine respiratory disease, the most prevalent cause of cattle mortality in the United States. Whole-genome sequencing offers a culture-independent alternative to current bacterial identification methods, but requires a library of bacterial reference genomes for comparison. To contribute new bacterial genome assemblies and evaluate genetic diversity and variation in antimicrobial resistance genotypes, whole-genome sequencing was performed on bovine respiratory disease-associated bacterial isolates (Histophilus somni, Mycoplasma bovis, Mannheimia haemolytica, and Pasteurella multocida) from dairy and beef cattle. One hundred genomically distinct assemblies were added to the NCBI database, doubling the available genomic sequences for these four species. Computer-based methods identified 11 predicted antimicrobial resistance genes in three species, with none being detected in M. bovis While computer-based analysis can identify antibiotic resistance genes within whole-genome sequences (genotype), it may not predict the actual antimicrobial resistance observed in a living organism (phenotype). Antimicrobial susceptibility testing on 64 H. somni, M. haemolytica, and P. multocida isolates had an overall concordance rate between genotype and phenotypic resistance to the associated class of antimicrobials of 72.7% (P < 0.001), showing substantial discordance. Concordance rates varied greatly among different antimicrobial, antibiotic resistance gene, and bacterial species combinations. This suggests that antimicrobial susceptibility phenotypes are needed to complement genomically predicted antibiotic resistance gene genotypes to better understand how the presence of antibiotic resistance genes within a given bacterial species could potentially impact optimal bovine respiratory disease treatment and morbidity/mortality outcomes

Directory of Open Access Journals

eScholarship - University of California