Search CORE

Springer - Publisher Connector

Fraunhofer-ePrints

The University of Manchester - Institutional Repository

The South Asian genome

Author: Abbott J
Afaq S
Afzal U
Aitman TJ
Al-Hussaini A
Butcher S
Chambers JC
Elliott P
Elliott P
Elliott P
Gaulton KJ
Geoghegan F
Grewal J
Kooner IK
Kooner JS
Lavery A
Lehne B
Lewin AM
Li X
Li Y
Loh M
McCarthy MI
Miller K
Mills R
Northwood K
O'Reilly P
Oozageer L
Panoulas V
Pearson RD
Scott J
Scott WR
Sehmi J
Tan ST
Turro E
Vandrovcova J
Wander GS
Wang J
Wass MN
Zhang W
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

Genetics of disease Microarrays Variant genotypes Population genetics Sequence alignment AllelesThe genetic sequence variation of people from the Indian subcontinent who comprise one-quarter of the world's population, is not well described. We carried out whole genome sequencing of 168 South Asians, along with whole-exome sequencing of 147 South Asians to provide deeper characterisation of coding regions. We identify 12,962,155 autosomal sequence variants, including 2,946,861 new SNPs and 312,738 novel indels. This catalogue of SNPs and indels amongst South Asians provides the first comprehensive map of genetic variation in this major human population, and reveals evidence for selective pressures on genes involved in skin biology, metabolism, infection and immunity. Our results will accelerate the search for the genetic variants underlying susceptibility to disorders such as type-2 diabetes and cardiovascular disease which are highly prevalent amongst South Asians.Whole genome sequencing to discover genetic variants underlying type-2 diabetes, coronary heart disease and related phenotypes amongst Indian Asians. Imperial College Healthcare NHS Trust cBRC 2011-13 (JS Kooner [PI], JC Chambers)

Directory of Open Access Journals

Edinburgh Research Explorer

Spiral - Imperial College Digital Repository

Brunel University Research Archive

University of Queensland eSpace

FigShare

Public Library of Science (PLOS)

Teeside University's Research Repository

LSHTM Research Online

Oxford University Research Archive

Kent Academic Repository

Optimization of miRNA-seq data preprocessing.

Author: McPherson John D
Tam Shirley
Tsao Ming-Sound
Publication venue: eScholarship, University of California
Publication date: 17/04/2015
Field of study

The past two decades of microRNA (miRNA) research has solidified the role of these small non-coding RNAs as key regulators of many biological processes and promising biomarkers for disease. The concurrent development in high-throughput profiling technology has further advanced our understanding of the impact of their dysregulation on a global scale. Currently, next-generation sequencing is the platform of choice for the discovery and quantification of miRNAs. Despite this, there is no clear consensus on how the data should be preprocessed before conducting downstream analyses. Often overlooked, data preprocessing is an essential step in data analysis: the presence of unreliable features and noise can affect the conclusions drawn from downstream analyses. Using a spike-in dilution study, we evaluated the effects of several general-purpose aligners (BWA, Bowtie, Bowtie 2 and Novoalign), and normalization methods (counts-per-million, total count scaling, upper quartile scaling, Trimmed Mean of M, DESeq, linear regression, cyclic loess and quantile) with respect to the final miRNA count data distribution, variance, bias and accuracy of differential expression analysis. We make practical recommendations on the optimal preprocessing methods for the extraction and interpretation of miRNA count data from small RNA-sequencing experiments

CiteSeerX

University of Toronto Research Repository

ISOWN: accurate somatic mutation identification in the absence of normal tissue controls.

Author: Bartlett John MS
Kalatskaya Irina
McPherson John D
Spears Melanie
Stein Lincoln
Trinh Quang M
Publication venue: eScholarship, University of California
Publication date: 01/06/2017
Field of study

BackgroundA key step in cancer genome analysis is the identification of somatic mutations in the tumor. This is typically done by comparing the genome of the tumor to the reference genome sequence derived from a normal tissue taken from the same donor. However, there are a variety of common scenarios in which matched normal tissue is not available for comparison.ResultsIn this work, we describe an algorithm to distinguish somatic single nucleotide variants (SNVs) in next-generation sequencing data from germline polymorphisms in the absence of normal samples using a machine learning approach. Our algorithm was evaluated using a family of supervised learning classifications across six different cancer types and ~1600 samples, including cell lines, fresh frozen tissues, and formalin-fixed paraffin-embedded tissues; we tested our algorithm with both deep targeted and whole-exome sequencing data. Our algorithm correctly classified between 95 and 98% of somatic mutations with F1-measure ranges from 75.9 to 98.6% depending on the tumor type. We have released the algorithm as a software package called ISOWN (Identification of SOmatic mutations Without matching Normal tissues).ConclusionsIn this work, we describe the development, implementation, and validation of ISOWN, an accurate algorithm for predicting somatic mutations in cancer tissues in the absence of matching normal tissues. ISOWN is available as Open Source under Apache License 2.0 from https://github.com/ikalatskaya/ISOWN

Directory of Open Access Journals

Bayesian models for syndrome- and gene-specific probabilities of novel variant pathogenicity

Author: Balding DJ
Cook SA
Ruklisa D
Walsh R
Ware JS
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

BACKGROUND: With the advent of affordable and comprehensive sequencing technologies, access to molecular genetics for clinical diagnostics and research applications is increasing. However, variant interpretation remains challenging, and tools that close the gap between data generation and data interpretation are urgently required. Here we present a transferable approach to help address the limitations in variant annotation. METHODS: We develop a network of Bayesian logistic regression models that integrate multiple lines of evidence to evaluate the probability that a rare variant is the cause of an individual's disease. We present models for genes causing inherited cardiac conditions, though the framework is transferable to other genes and syndromes. RESULTS: Our models report a probability of pathogenicity, rather than a categorisation into pathogenic or benign, which captures the inherent uncertainty of the prediction. We find that gene- and syndrome-specific models outperform genome-wide approaches, and that the integration of multiple lines of evidence performs better than individual predictors. The models are adaptable to incorporate new lines of evidence, and results can be combined with familial segregation data in a transparent and quantitative manner to further enhance predictions. Though the probability scale is continuous, and innately interpretable, performance summaries based on thresholds are useful for comparisons. Using a threshold probability of pathogenicity of 0.9, we obtain a positive predictive value of 0.999 and sensitivity of 0.76 for the classification of variants known to cause long QT syndrome over the three most important genes, which represents sufficient accuracy to inform clinical decision-making. A web tool APPRAISE [http://www.cardiodb.org/APPRAISE] provides access to these models and predictions. CONCLUSIONS: Our Bayesian framework provides a transparent, flexible and robust framework for the analysis and interpretation of rare genetic variants. Models tailored to specific genes outperform genome-wide approaches, and can be sufficiently accurate to inform clinical decision-making

Springer - Publisher Connector

Spiral - Imperial College Digital Repository

University of Melbourne Institutional Repository

Establishing the precise evolutionary history of a gene improves prediction of disease-causing missense mutations

Author: Alexander O. Reznik
CA Wassif
Daniel S. Ory
DE Sleat
DM Jordan
F Chang
FD Porter
GR Oliver
H Börnig
H Jahnova
HJ Kwon
HR Davis Jr
IA Adzhubei
Igor B. Zhulin
JA Tennessen
JD Retief
JE Dickerson
JM White
K Katoh
KA King
M Lynch
M Stampfer
MC Patterson
MT Vanier
MT Vanier
MT Vanier
NO Stitziel
O Adebali
Ogun Adebali
PC Ng
PC Ng
RD Finn
S Castellana
S Guindon
S Nusca
SB Ng
SF Altschul
SH Katsanis
SR Sunyaev
U Omasits
X Jiang
X Yan
Y Choi
Z Wang
Publication venue: Digital Commons@Becker
Publication date: 01/01/2016
Field of study

PURPOSE: Predicting the phenotypic effects of mutations has become an important application in clinical genetic diagnostics. Computational tools evaluate the behavior of the variant over evolutionary time and assume that variations seen during the course of evolution are probably benign in humans. However, current tools do not take into account orthologous/paralogous relationships. Paralogs have dramatically different roles in Mendelian diseases. For example, whereas inactivating mutations in the NPC1 gene cause the neurodegenerative disorder Niemann-Pick C, inactivating mutations in its paralog NPC1L1 are not disease-causing and, moreover, are implicated in protection from coronary heart disease. METHODS: We identified major events in NPC1 evolution and revealed and compared orthologs and paralogs of the human NPC1 gene through phylogenetic and protein sequence analyses. We predicted whether an amino acid substitution affects protein function by reducing the organism’s fitness. RESULTS: Removing the paralogs and distant homologs improved the overall performance of categorizing disease-causing and benign amino acid substitutions. CONCLUSION: The results show that a thorough evolutionary analysis followed by identification of orthologs improves the accuracy in predicting disease-causing missense mutations. We anticipate that this approach will be used as a reference in the interpretation of variants in other genetic diseases as well. Genet Med 18 10, 1029–1036

Digital Commons@Becker

arXiv.org e-Print Archive

Complex sequencing rules of birdsong can be explained by simple hidden Markov processes

Author: A Leonardo
AC Yu
AJ Doupe
C Bishop
D MacKay
DZ Jin
DZ Jin
DZ Jin
E Honda
Gonzalo G. de Polavieja
H Attias
JA Kogan
JL Fleiss
JT Sakata
JT Sakata
K Katahira
K Katahira
K Okanoya
K Okanoya
K Okanoya
Kazuo Okanoya
Kenta Suzuki
Kentaro Katahira
LR Rabiner
M Beal
MA Long
Masato Okada
MJ Wohlgemuth
O Tchernichovski
RHR Hahnloser
SJ Sober
T Hosino
W Wu
Y Yamashita
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 11/11/2010
Field of study

Complex sequencing rules observed in birdsongs provide an opportunity to investigate the neural mechanism for generating complex sequential behaviors. To relate the findings from studying birdsongs to other sequential behaviors, it is crucial to characterize the statistical properties of the sequencing rules in birdsongs. However, the properties of the sequencing rules in birdsongs have not yet been fully addressed. In this study, we investigate the statistical propertiesof the complex birdsong of the Bengalese finch (Lonchura striata var. domestica). Based on manual-annotated syllable sequences, we first show that there are significant higher-order context dependencies in Bengalese finch songs, that is, which syllable appears next depends on more than one previous syllable. This property is shared with other complex sequential behaviors. We then analyze acoustic features of the song and show that higher-order context dependencies can be explained using first-order hidden state transition dynamics with redundant hidden states. This model corresponds to hidden Markov models (HMMs), well known statistical models with a large range of application for time series modeling. The song annotation with these models with first-order hidden state dynamics agreed well with manual annotation, the score was comparable to that of a second-order HMM, and surpassed the zeroth-order model (the Gaussian mixture model (GMM)), which does not use context information. Our results imply that the hierarchical representation with hidden state dynamics may underlie the neural implementation for generating complex sequences with higher-order dependencies

Public Library of Science (PLOS)

Directory of Open Access Journals

The non-coding landscape of head and neck squamous cell carcinoma.

Author: Grandis Jennifer R
Hinton Andrew
Honda Thomas K
King Charles C
Korrapati Avinaash
Ku Jonjei
Kuo Selena Z
Lippman Scott M
Ongkeko Weg M
Rahimy Mehran
Saad Maarouf A
Singh Pranav
Wang Xiao Qi
Wang-Rodriguez Jessica
Xuan Yinan
Yu Vicky
Zheng Hao
Zou Angela E
Publication venue: eScholarship, University of California
Publication date: 01/01/2016
Field of study

Head and neck squamous cell carcinoma (HNSCC) is an aggressive disease marked by frequent recurrence and metastasis and stagnant survival rates. To enhance molecular knowledge of HNSCC and define a non-coding RNA (ncRNA) landscape of the disease, we profiled the transcriptome-wide dysregulation of long non-coding RNA (lncRNA), microRNA (miRNA), and PIWI-interacting RNA (piRNA) using RNA-sequencing data from 422 HNSCC patients in The Cancer Genome Atlas (TCGA). 307 non-coding transcripts differentially expressed in HNSCC were significantly correlated with patient survival, and associated with mutations in TP53, CDKN2A, CASP8, PRDM9, and FBXW7 and copy number variations in chromosomes 3, 5, 7, and 18. We also observed widespread ncRNA correlation to concurrent TP53 and chromosome 3p loss, a compelling predictor of poor prognosis in HNSCCs. Three selected ncRNAs were additionally associated with tumor stage, HPV status, and other clinical characteristics, and modulation of their expression in vitro reveals differential regulation of genes involved in epithelial-mesenchymal transition and apoptotic response. This comprehensive characterization of the HNSCC non-coding transcriptome introduces new layers of understanding for the disease, and nominates a novel panel of transcripts with potential utility as prognostic markers or therapeutic targets