30 research outputs found

    Embedded CMOS Basecalling for Nanopore DNA Sequencing

    Get PDF
    DNA sequencing is undergoing a profound evolution into a mobile technology. Unfortunately the effort needed to process the data emerging from this new sequencing technology requires a compute power only available to traditional desktop or cloud-based machines. To empower the full potential of portable DNA solutions a means of efficiently carrying out their computing needs in an embedded format will certainly be required. This thesis presents the design of a custom fixed-point VLSI hardware implementation of an HMM-based multi-channel DNA sequence processor. A 4096 state (6-mer nanopore sensor) basecalling architecture is designed in a 32-nm CMOS technology with the ability to process 1 million DNA base pairs per second per channel. Over a 100 mm^2 silicon footprint the design could process the equivalent of one human genome every 30 seconds at a power consumption of around 5 W

    The sequence of sequencers: The history of sequencing DNA

    Get PDF
    Determining the order of nucleic acid residues in biological samples is an integral component of a wide variety of research applications. Over the last fifty years large numbers of researchers have applied themselves to the production of techniques and technologies to facilitate this feat, sequencing DNA and RNA molecules. This time-scale has witnessed tremendous changes, moving from sequencing short oligonucleotides to millions of bases, from struggling towards the deduction of the coding sequence of a single gene to rapid and widely available whole genome sequencing. This article traverses those years, iterating through the different generations of sequencing technology, highlighting some of the key discoveries, researchers, and sequences along the way

    A Robust scRNA-seq Data Analysis Pipeline for Measuring Gene Expression Noise

    Get PDF
    abstract: The past decade has seen a drastic increase in collaboration between Computer Science (CS) and Molecular Biology (MB). Current foci in CS such as deep learning require very large amounts of data, and MB research can often be rapidly advanced by analysis and models from CS. One of the places where CS could aid MB is during analysis of sequences to find binding sites, prediction of folding patterns of proteins. Maintenance and replication of stem-like cells is possible for long terms as well as differentiation of these cells into various tissue types. These behaviors are possible by controlling the expression of specific genes. These genes then cascade into a network effect by either promoting or repressing downstream gene expression. The expression level of all gene transcripts within a single cell can be analyzed using single cell RNA sequencing (scRNA-seq). A significant portion of noise in scRNA-seq data are results of extrinsic factors and could only be removed by customized scRNA-seq analysis pipeline. scRNA-seq experiments utilize next-gen sequencing to measure genome scale gene expression levels with single cell resolution. Almost every step during analysis and quantification requires the use of an often empirically determined threshold, which makes quantification of noise less accurate. In addition, each research group often develops their own data analysis pipeline making it impossible to compare data from different groups. To remedy this problem a streamlined and standardized scRNA-seq data analysis and normalization protocol was designed and developed. After analyzing multiple experiments we identified the possible pipeline stages, and tools needed. Our pipeline is capable of handling data with adapters and barcodes, which was not the case with pipelines from some experiments. Our pipeline can be used to analyze single experiment scRNA-seq data and also to compare scRNA-seq data across experiments. Various processes like data gathering, file conversion, and data merging were automated in the pipeline. The main focus was to standardize and normalize single-cell RNA-seq data to minimize technical noise introduced by disparate platforms.Dissertation/ThesisMasters Thesis Bioengineering 201

    Hardware Accelerated DNA Sequencing

    Get PDF
    DNA sequencing technology is quickly evolving. The latest developments ex- ploit nanopore sensing and microelectronics to realize real-time, hand-held devices. A critical limitation in these portable sequencing machines is the requirement of powerful data processing consoles, a need incompatible with portability and wide deployment. This thesis proposes a rst step towards addressing this problem, the construction of specialized computing modules { hardware accelerators { that can execute the required computations in real-time, within a small footprint, and at a fraction of the power needed by conventional computers. Such a hardware accel- erator, in FPGA form, is introduced and optimized specically for the basecalling function of the DNA sequencing pipeline. Key basecalling computations are identi- ed and ported to custom FPGA hardware. Remaining basecalling operations are maintained in a traditional CPU which maintains constant communications with its FPGA accelerator over the PCIe bus. Measured results demonstrated a 137X basecalling speed improvement over CPU-only methods while consuming 17X less power than a CPU-only method

    Hardware / Software System for Portable and Low-Cost Genome Assembly

    Full text link
    “The enjoyment of the highest attainable standard of health is one of the fundamental rights of every human being without distinction of race, religion, political belief, economic or social condition” [56]. Genomics (the study of the entire DNA) provides such a standard of health for people with rare diseases and helps control the spread of pandemics. Still, millions of human beings are unable to access genomics due to its cost, and portability. In genomics, DNA sequencers digitise DNA information, and computers analyse the digitised information. We have desktop and thumb-sized DNA sequencers, that digitise the DNA data rapidly. But computations necessary for the analysis of this data are inevitably performed on high-performance computers (HPCs) and cloud computers. These computations not only require powerful computers but also necessitate high-speed networks since the data generated are in the hundreds of gigabytes. Relying on HPCs and high-speed networks, deny the benefits that can be reaped by genomics for the masses who live in remote areas and in poorer nations. Developing a low-cost and portable genomics computation platform would provide personalised treatment based on an individual’s DNA and identify the source of the fast-spreading epidemics in remote areas and areas without HPC or network infrastructure. But developing a low-cost and portable genome analysing computing platform is a challenging task. This thesis develops novel computer architecture solutions to assemble the whole human DNA and COVID-19 virus RNA on a low-cost and portable platform. The first phase of the solution describes a ring-pipelined processor architecture for a key genome assembly algorithm. The human genome is partitioned to fit into the small memory footprint of embedded processors. These techniques allow an entire human genome to be assembled using highly portable and low-cost embedded processor cores. These processor cores can be housed within a single chip. Each processor was only 0.08 mm 2 and consumed just 37.5 mW. It has only 2 GB memory, 32-bit instruction width, and a clock with a 1 GHz frequency. The second phase of the solution describes how application-specific instruction-set processors can be sped up to execute a key genome assembly algorithm. A fully automated design system is presented, which improves the performance of large applications (such as genome assembly algorithm) and generates application-specific instructions for a commercial processor design tool (Xtensa). The tool enhances the base processor, which was used in the ring pipeline processor architecture. Thus, the alignment algorithms execute 2.1 times faster with only 11% additional hardware. The energy-delay product was reduced by 7.3× compared to the base processor. This tool is the only one of its type which can handle applications which are large. The third phase of the solution designs a portable low-cost genome assembly computer (PGA). PGA enhances the ring pipeline architecture with the customised processor found in phase two and with improved inter-processor communication. The results show that the COVID-19 virus RNA can be assembled in under 10 minutes and the whole human genome can be assembled in 11 days on a portable platform (HPC take around two days) for 30× coverage. PGA has an area footprint of just 5.68 mm 2 in a 28 nm technology node and is far smaller than a high-performance computer processor chip. The PGA consumes only 4W of power, which is lower than the power requirement of a high-performance processor chip. The manufacturing cost of the PGA also would be much cheaper than the high-performance system cost, when produced in volume. The developed solution can be powered by a USB port of a laptop. This thesis is the first of its type to show the design of a single-chip solution to be able to process a complex genomic problem. This thesis contributes to attaining one of the fundamental rights of every human being wherever they may live

    Next generation characterisation of cereal genomes for marker discovery

    Get PDF
    Cereal crops form the bulk of the world's food sources and thus their importance cannot be understated. Crop breeding programs increasingly rely on high-resolution molecular genetic markers to accelerate the breeding process. The development of these markers is hampered by the complexity of some of the major cereal crop genomes, as well as the time and cost required. In this review, we address current and future methods available for the characterisation of cereal genomes, with an emphasis on faster and more cost effective approaches for genome sequencing and the development of markers for trait association and marker assisted selection (MAS) in crop breeding programs

    Application of Next-Generation Sequencing in the Era of Precision Medicine

    Get PDF
    Next-generation sequencing (NGS) technologies represented the next step in the evolution of DNA sequencing, through the generation of thousands to millions of DNA sequences in a short time. The relatively fast emergence and success of NGS in research revolutionized the field of genomics and medical diagnosis. The traditional medicine model of diagnosis has changed to one precision medicine model, leading to a more accurate diagnosis of human diseases and allowing the selection of molecular target drugs for individual treatment. This chapter attempts to review the main features of NGS technique (concepts, data analysis, applications, advances and challenges), starting with a brief history of DNA sequencing followed by a comprehensive description of most used NGS platforms. Further topics will highlight the application of NGS towards routine practice, including variant detection, whole-exome sequencing (WES), whole-genome sequencing (WGS), custom panels (multi-gene), RNA-seq and epigenetic. The potential use of NGS in precision medicine is vast and a better knowledge of this technique is necessary for an efficacious implementation in the clinical workplace. A centralized chapter describing the main NGS aspects in the clinic could help beginners, scientists, researchers and health care professionals, as they will be responsible for translating genomic data into genomic medicine

    Evaluation of nanopore-based sequencing technology for gene marker based analysis of complex microbial communities. Method development for accurate 16S rRNA gene amplicon sequencing

    Get PDF
    Nucleic acid sequencing can provide a detailed overview of microbial communities in comparison with standard plate-culture methods. Expansion of high-throughput sequencing (HTS) technologies and reduction in analysis costs has allowed for detailed exploration of various habitats with use of amplicon, metagenomics, and metatranscriptomics approaches. However, due to a capital cost of HTS platforms and requirements for batch analysis, genomics-based studies are still not being used as a standard method for the comprehensive examination of environmental or clinical samples for microbial characterization. This research project investigated the potential of a novel nanopore-based sequencing platform from Oxford Nanopore Technologies (ONT) for rapid and accurate analysis of various environmentally complex samples. ONT is an emerging company that developed the first-ever portable nanopore-based sequencing platform called MinIONTM. Portability and miniaturised size of the device gives an immense opportunity for de-centralised, in-field, and real-time analysis of environmental and clinical samples. Nonetheless, benchmarking of this new technology against the current gold-standard platform (i.e., Illumina sequencers) is necessary to evaluate nanopore data and understand its benefits and limitations. The focus of this study is on the evaluation of nanopore sequencing data: read quality, sequencing errors, alignment quality but also bacterial community structure. For this reason, mock bacterial community samples were generated, sequenced and analysed with use of multiple bioinformatics approaches. Furthermore, this study developed sophisticated library preparation and data analyses methods to enable high-accuracy analysis of amplicon libraries from complex microbial communities for sequencing on the nanopore platform. Besides, the best performing library preparation and data analyses methods were used for analysis of environmental samples and compared to high-quality Illumina metagenomics data. This work opens a new possibility for accurate, in-field amplicon analysis of complex samples with the use of MinIONTM and for the development of autonomous biosensing technology for culture-free detection of pathogenic and non-pathogenic microorganisms in water, soil, food, drinks or blood

    Recent advances in the development and evaluation of molecular diagnostics for Ebola virus disease

    Get PDF
    The 2014-16 outbreak of ebola virus disease (EVD) in West Africa resulted in 11,308 deaths. During the outbreak only 60% of patients were laboratory confirmed and global health authorities have identified the need for accurate and readily deployable molecular diagnostics as an important component of the ideal response to future outbreaks, to quickly identify and isolate patients. Areas covered: Currently PCR-based techniques and rapid diagnostic tests (RDTs) that detect antigens specific to EVD infections dominate the diagnostic landscape, but recent advances in biosensor technologies have led to novel approaches for the development of EVD diagnostics. This review summarises the literature and available performance data of currently available molecular diagnostics for ebolavirus, identifies knowledge gaps and maps out future priorities for research in this field. Expert opinion: While there are now a plethora of diagnostic tests for EVD at various stages of development, there is an acute need for studies to compare their clinical performance, but the sporadic nature of EVD outbreaks makes this extremely challenging, demanding pragmatic new modalities of research funding and ethical/institutional approval, to enable responsive research in outbreak settings. Retrospective head-to-head diagnostic comparisons could also be implemented using biobanked specimens, providing this can be done safely
    corecore