8,257 research outputs found
Recommended from our members
Haplotype Assembly and Small Variant Calling using Emerging Sequencing Technologies
Short read DNA sequencing technologies from Illumina have made sequencing a human genome significantly more affordable, greatly accelerating studies of biological function and the association of genetic variants to disease. These technologies are frequently used to detect small genetic variants such as single nucleotide variants (SNVs) using a reference genome. However, short read sequencing technologies have several limitations. First, the human genome is diploid and short reads contain limited information for assembling haplotypes, or the sequences of alleles on homologous chromosomes. Moreover, there is significant input DNA required, which poses challenges for analyzing single cells. Further, there is limited ability to detect genetic variants inside long duplicated sequences that occur in the genome. As a result, there has been widespread development of novel methods to overcome these deficiencies using short reads. These include clone based sequencing, linked read sequencing, and proximity ligation sequencing, as well as various single cell sequencing methods. There are also entirely new sequencing technologies from Pacific Biosciences and Oxford Nanopore Technologies that produce significantly longer reads. While these emerging methods and technologies demonstrate improvements compared to short reads, they also have properties and error modalities that pose unique computational challenges. Moreover, there is a shortage of bioinformatics methods for accurate small variant detection and haplotype assembly using these approaches compared to short reads. This dissertation aims to address this problem with the introduction of several new algorithms for highly accurate haplotype assembly and SNV calling. First, it introduces HapCUT2, an algorithm that can rapidly assemble haplotypes using a broad range of sequencing technologies. Second, it introduces an algorithm for variant calling and haplotyping using SISSOR, a recently introduced microfluidics based technology for sequencing single cells. Finally, it introduces Longshot, an algorithm for detecting and phasing SNVs using error-prone long read technologies. In each case, the algorithms are benchmarked using multiple real whole-genome sequencing datasets and are found to be highly accurate. The methods introduced in this dissertation contribute to the goal of sequencing diploid genomes accurately and completely for a broad range of scientific and clinical purposes
Acute Myeloid Leukemia
Acute myeloid leukemia (AML) is the most common type of leukemia. The Cancer Genome Atlas Research Network has demonstrated the increasing genomic complexity of acute myeloid leukemia (AML). In addition, the network has facilitated our understanding of the molecular events leading to this deadly form of malignancy for which the prognosis has not improved over past decades. AML is a highly heterogeneous disease, and cytogenetics and molecular analysis of the various chromosome aberrations including deletions, duplications, aneuploidy, balanced reciprocal translocations and fusion of transcription factor genes and tyrosine kinases has led to better understanding and identification of subgroups of AML with different prognoses. Furthermore, molecular classification based on mRNA expression profiling has facilitated identification of novel subclasses and defined high-, poor-risk AML based on specific molecular signatures. However, despite increased understanding of AML genetics, the outcome for AML patients whose number is likely to rise as the population ages, has not changed significantly. Until it does, further investigation of the genomic complexity of the disease and advances in drug development are needed. In this review, leading AML clinicians and research investigators provide an up-to-date understanding of the molecular biology of the disease addressing advances in diagnosis, classification, prognostication and therapeutic strategies that may have significant promise and impact on overall patient survival
Bacteria homologus to Aeromonas capable of microcystin degradation
Water blooms dominated by cyanobacteria
are capable of producing hepatotoxins known as
microcystins. These toxins are dangerous to people and
to the environment. Therefore, for a better understanding
of the biological termination of this increasingly
common phenomenon, bacteria with the potential to
degrade cyanobacteria-derived hepatotoxins and the
degradative activity of culturable bacteria were studied.
Based on the presence of the mlrA gene, bacteria with a
homology to the Sphingopyxis and Stenotrophomonas
genera were identified as those presenting potential for
microcystins degradation directly in the water samples
from the Sulejów Reservoir (SU, Central Poland). However,
this biodegrading potential has not been confirmed in in
vitro experiments. The degrading activity of the culturable
isolates from the water studied was determined in more
than 30 bacterial mixes. An analysis of the biodegradation
of the microcystin-LR (MC-LR) together with an analysis of
the phylogenetic affiliation of bacteria demonstrated for
the first time that bacteria homologous to the Aeromonas
genus were able to degrade the mentioned hepatotoxin,
although the mlrA gene was not amplified. The maximal
removal efficiency of MC-LR was 48%. This study
demonstrates a new aspect of interactions between the
microcystin-containing cyanobacteria and bacteria from
the Aeromonas genus.The authors would like to
acknowledge the European Cooperation in Science
and Technology, COST Action ES 1105 “CYANOCOST -
Cyanobacterial blooms and toxins in water resources:
Occurrence, impacts and management” for adding value
to this study through networking and knowledge sharing
with European experts and researchers in the field. The
Sulejów Reservoir is a part of the Polish National Long-
Term Ecosystem Research Network and the European
LTER site
Computational Methods for Sequencing and Analysis of Heterogeneous RNA Populations
Next-generation sequencing (NGS) and mass spectrometry technologies bring unprecedented throughput, scalability and speed, facilitating the studies of biological systems. These technologies allow to sequence and analyze heterogeneous RNA populations rather than single sequences. In particular, they provide the opportunity to implement massive viral surveillance and transcriptome quantification. However, in order to fully exploit the capabilities of NGS technology we need to develop computational methods able to analyze billions of reads for assembly and characterization of sampled RNA populations.
In this work we present novel computational methods for cost- and time-effective analysis of sequencing data from viral and RNA samples. In particular, we describe: i) computational methods for transcriptome reconstruction and quantification; ii) method for mass spectrometry data analysis; iii) combinatorial pooling method; iv) computational methods for analysis of intra-host viral populations
A Characterization of the DNA Data Storage Channel
Owing to its longevity and enormous information density, DNA, the molecule
encoding biological information, has emerged as a promising archival storage
medium. However, due to technological constraints, data can only be written
onto many short DNA molecules that are stored in an unordered way, and can only
be read by sampling from this DNA pool. Moreover, imperfections in writing
(synthesis), reading (sequencing), storage, and handling of the DNA, in
particular amplification via PCR, lead to a loss of DNA molecules and induce
errors within the molecules. In order to design DNA storage systems, a
qualitative and quantitative understanding of the errors and the loss of
molecules is crucial. In this paper, we characterize those error probabilities
by analyzing data from our own experiments as well as from experiments of two
different groups. We find that errors within molecules are mainly due to
synthesis and sequencing, while imperfections in handling and storage lead to a
significant loss of sequences. The aim of our study is to help guide the design
of future DNA data storage systems by providing a quantitative and qualitative
understanding of the DNA data storage channel
On the design of clone-based haplotyping
Background: Haplotypes are important for assessing genealogy and disease susceptibility of individual genomes, but are difficult to obtain with routine sequencing approaches. Experimental haplotype reconstruction based on assembling fragments of individual chromosomes is promising, but with variable yields due to incompletely understood parameter choices. Results: We parameterize the clone-based haplotyping problem in order to provide theoretical and empirical assessments of the impact of different parameters on haplotype assembly. We confirm the intuition that long clones help link together heterozygous variants and thus improve haplotype length. Furthermore, given the length of the clones, we address how to choose the other parameters, including number of pools, clone coverage and sequencing coverage, so as to maximize haplotype length. We model the problem theoretically and show empirically the benefits of using larger clones with moderate number of pools and sequencing coverage. In particular, using 140 kb BAC clones, we construct haplotypes for a personal genome and assemble haplotypes with N50 values greater than 2.6 Mb. These assembled haplotypes are longer and at least as accurate as haplotypes of existing clone-based strategies, whether in vivo or in vitro. Conclusions: Our results provide practical guidelines for the development and design of clone-based methods to achieve long range, high-resolution and accurate haplotypes
The application of genomic technologies to cancer and companion diagnostics.
This thesis describes work undertaken by the author between 1996 and 2014. Genomics is the
study of the genome, although it is also often used as a catchall phrase and applied to the
transcriptome (study of RNAs) and methylome (study of DNA methylation). As cancer is a
disease of the genome the rapid advances in genomic technology, specifically microarrays
and next generation sequencing, are creating a wave of change in our understanding of its
molecular pathology. Molecular pathology and personalised medicine are being driven by
discoveries in genomics, and genomics is being driven by the development of faster, better
and cheaper genome sequencing. The next decade is likely to see significant changes in the
way cancer is managed for individual cancer patients as next generation sequencing enters the
clinic.
In chapter 3 I discuss how ERBB2 amplification testing for breast cancer is currently
dominated by immunohistochemistry (a single-gene test); and present the development, by
the author, of a semi-quantitative PCR test for ERBB2 amplification. I also show that
estimating ERBB2 amplification from microarray copy-number analysis of the genome is
possible. In chapter 4 I present a review of microarray comparison studies, and outline the
case for careful and considered comparison of technologies when selecting a platform for use
in a research study. Similar, indeed more stringent, care needs to be applied when selecting a
platform for use in a clinical test. In chapter 5 I present co-authored work on the development
of amplicon and exome methods for the detection and quantitation of somatic mutations in
circulating tumour DNA, and demonstrate the impact this can have in understanding tumour
heterogeneity and evolution during treatment. I also demonstrate how next-generation
sequencing technologies may allow multiple genetic abnormalities to be analysed in a single
test, and in low cellularity tumours and/or heterogenous cancers.
Keywords: Genome, exome, transcriptome, amplicon, next-generation sequencing,
differential gene expression, RNA-seq, ChIP-seq, microarray, ERBB2, companion diagnostic
A novel genotyping approach to improve transfusion support for patients with HLA and/or HPA alloantibodies
Patients who require platelet transfusion support but have become sensitised to Human Leucocyte Antigens (HLA) or Human Platelet Antigens (HPA) require suitably matched or selected products to avoid adverse transfusion reactions resulting from antibodies reacting with the transfused product. Provision of compatible products for these patients is often challenging, and requires significant resources from the blood service. This study set out to develop and implement next generation sequencing (NGS) technology to enhance the HLA and HPA definition of both platelet donors and recipients.An NGS based method was designed and developed for high throughput, allele level HLA class I genotyping and used to evaluate the impact of NGS technology on the selection of platelet donors using HLA epitope matching (HEM). In addition, an alternative NGS approach was designed to simultaneously sequence the six genes that code for glycoproteins expressing HPA in order to define all known HPA systems in both donor and patient samples.Allele level HLA-A, -B and –C genotypes were generated for 519 platelet donors by NGS. A critical evaluation of algorithms used to predict alleles from low to medium resolution HLA types demonstrated that NGS was more accurate when determining HLA epitopes for the selection of platelets by HEM. The HLA genotyping data obtained was used to establish previously undefined HLA allele and haplotype frequencies at third field resolution in the English platelet donor population. This thesis also includes the first reported NGS based method for the simultaneous genotyping of HPA-1 to HPA-29, with the additional capability of novel HPA detection. NGS has been shown to significantly improve the definition of both HLA and HPA genetic systems and will provide a number of future benefits for laboratories and the patients they support, including provision of well matched transfusion products, the detection of rare or novel polymorphisms and increased knowledge of HLA and HPA frequencies
- …