7 research outputs found
Genomic Interpreter: A Hierarchical Genomic Deep Neural Network with 1D Shifted Window Transformer
Given the increasing volume and quality of genomics data, extracting new
insights requires interpretable machine-learning models. This work presents
Genomic Interpreter: a novel architecture for genomic assay prediction. This
model outperforms the state-of-the-art models for genomic assay prediction
tasks. Our model can identify hierarchical dependencies in genomic sites. This
is achieved through the integration of 1D-Swin, a novel Transformer-based block
designed by us for modelling long-range hierarchical data. Evaluated on a
dataset containing 38,171 DNA segments of 17K base pairs, Genomic Interpreter
demonstrates superior performance in chromatin accessibility and gene
expression prediction and unmasks the underlying `syntax' of gene regulation
Latent Diffusion Model for DNA Sequence Generation
The harnessing of machine learning, especially deep generative models, has
opened up promising avenues in the field of synthetic DNA sequence generation.
Whilst Generative Adversarial Networks (GANs) have gained traction for this
application, they often face issues such as limited sample diversity and mode
collapse. On the other hand, Diffusion Models are a promising new class of
generative models that are not burdened with these problems, enabling them to
reach the state-of-the-art in domains such as image generation. In light of
this, we propose a novel latent diffusion model, DiscDiff, tailored for
discrete DNA sequence generation. By simply embedding discrete DNA sequences
into a continuous latent space using an autoencoder, we are able to leverage
the powerful generative abilities of continuous diffusion models for the
generation of discrete data. Additionally, we introduce Fr\'echet
Reconstruction Distance (FReD) as a new metric to measure the sample quality of
DNA sequence generations. Our DiscDiff model demonstrates an ability to
generate synthetic DNA sequences that align closely with real DNA in terms of
Motif Distribution, Latent Embedding Distribution (FReD), and Chromatin
Profiles. Additionally, we contribute a comprehensive cross-species dataset of
150K unique promoter-gene sequences from 15 species, enriching resources for
future generative modelling in genomics. We will make our code public upon
publication
Recommended from our members
Cas protein diagnostics for pathogen nucleic acids
Cas protein diagnostics for pathogen nucleic acids - Akashaditya Das
I formulated my research goals for my PhD in tandem with a Global Challenges Research Grant awarded to the Ajioka Lab in collaboration with other labs in the United Kingdom and South Africa. The grant aimed to develop tests for pathogens that avoid the reliance on expensive equipment such as thermocyclers to function. We identified Cas proteins as the core technology to develop further and split ourselves into two teams - one focusing on a DNA target and one focusing on an RNA target. The Ajioka Lab was assigned to create a test for a DNA target and we selected Hepatitis B Virus to be our test case.
In my thesis, I investigate the process for producing tests using CRISPR Cas technology for detecting the Hepatitis B Virus. I develop the tests with fluorescent readout and show how they can be adapted to work with lateral flow technology. I looked at integrating isothermal amplification technologies into the assay workflow to boost the test sensitivity. I looked at creating a one-pot system combining CRISPR Cas and isothermal amplification technologies. I also focus on developing semi-automated workows to enable
higher throughput and improve the ability to screen multiple assay designs. I use the data from this workflow to develop a predictive tool using machine
learning techniques that will allow researchers to identify the effectiveness of different CRISPR Cas assay designs in silico.UKR
Recommended from our members
Sensing the DNA-mismatch tolerance of catalytically inactive Cas9 via barcoded DNA nanostructures in solid-state nanopores
Acknowledgements: S.E.S. acknowledges funding from Oxford Nanopore Technologies, Engineering and Physical Sciences Research Council (EPSRC) and Cambridge Trust. N.E.W. acknowledges funding from Oxford Nanopore Technologies, the Canada UK Foundation and the University of Cambridge Office of Postdoctoral Affairs. S.Y. acknowledges funding from the EPSRC (EP/S022953/1), and A.D. acknowledges funding from the EPSRC (EP/L015889/1). U.F.K. and K.C. acknowledge funding through the European Research Council (ERC-2019-POC PoreDetect 899538). We thank Z. Xuan and N. Ermann for assisting in the development of data analysis tools and C. Platnich for the helpful reading of the manuscript and useful suggestions.Funder: EC | EC Seventh Framework Programm | FP7 Ideas: European Research Council (FP7-IDEAS-ERC - Specific Programme: ‘Ideas ’ Implementing the Seventh Framework Programme of the European Community for Research, Technological Development and Demonstration Activities (2007 to 2013)); doi: https://doi.org/10.13039/100011199; Grant(s): ERC-2019-POC PoreDetect 899538, ERC-2019-POC PoreDetect 899538Funder: Oxford Nanopore Technologies (Oxford Nanopore); doi: https://doi.org/10.13039/100010890Funder: Cambridge Commonwealth, European and International Trust (Cambridge Commonwealth, European & International Trust); doi: https://doi.org/10.13039/501100003343Sequence-specific interactions between nucleic acids and proteins are fundamental to many critical biological processes. Despite the ubiquitous nature of protein-DNA binding, versatile methods to probe the specificity of these events remain elusive. In particular, single-molecule methods that enable the quantification of these processes are essential towards understanding and manipulating protein binding. To this end, we report a system which leverages solid state nanopores with diameters of ~10 nm to identify binding events between DNA and CRISPR associated (Cas) probes – specifically catalytically inactive or dead Cas9 (dCas9), which binds to DNA but does not cleave it. The rational design of DNA nanostructures allows for the incorporation of user-defined binding sequences, enabling a systematic study of how mismatch position and identity impacts the binding efficiency. These experiments reveal the relationship between sequence and binding at the single nucleotide level, exemplifying the utility of both nanopore measurements and DNA nanotechnology towards the next generation of biosensing assays.S.E.S. acknowledges funding from Oxford Nanopore Technologies, Engineering and Physical Sciences Research Council (EPSRC) and Cambridge Trust. N.E.W. acknowledges funding from Oxford Nanopore Technologies, the Canada UK Foundation, and the University of Cambridge Office of Postdoctoral Affairs. S.Y. acknowledges funding from the EPSRC grant EP/S022953/1 and A.D. acknowledges funding from the EPSRC grant EP/L015889/1. U.F.K and K.C. acknowledge funding through a ERC-2019-POC PoreDetect 899538
Recommended from our members
Author Correction: Sensing the DNA-mismatch tolerance of catalytically inactive Cas9 via barcoded DNA nanostructures in solid-state nanopores.
Recommended from our members
Source Data for Nanopore sensing with DNA nanostructures reveals Guide-Intrinsic Mismatch Tolerance of CRISPR/dCas9
This is the source data for Nanopore sensing with DNA nanostructures reveals Guide-Intrinsic Mismatch Tolerance of CRISPR/dCas9 in Nature Biomedical Engineering. The data was generated by mixing a DNA nanostructure with dCas9 and measuring the change in current generated from translocation in solid-state nanopores. Data is originally in TDMS format (which can be given upon request) and then converted from TDMS to hdf5 using https://bitbucket.org/nikaer/nanopyre/src/master/ - from this translocationfinder is used which writes files from TDMS (labview) to hdf5. Processing and further data analysis from the hdf5 is done using https://github.com/sarahsandler/nanopro. TDMS files are from labview, however there is a nptdms software which can be used to read them into python. Please see the manuscript for more details