Search CORE

7 research outputs found

Genomic Interpreter: A Hierarchical Genomic Deep Neural Network with 1D Shifted Window Transformer

Author: Beardall William A V
Das Akashaditya
Li Zehui
Stan Guy-Bart
Zhao Yiren
Publication venue
Publication date: 08/06/2023
Field of study

Given the increasing volume and quality of genomics data, extracting new insights requires interpretable machine-learning models. This work presents Genomic Interpreter: a novel architecture for genomic assay prediction. This model outperforms the state-of-the-art models for genomic assay prediction tasks. Our model can identify hierarchical dependencies in genomic sites. This is achieved through the integration of 1D-Swin, a novel Transformer-based block designed by us for modelling long-range hierarchical data. Evaluated on a dataset containing 38,171 DNA segments of 17K base pairs, Genomic Interpreter demonstrates superior performance in chromatin accessibility and gene expression prediction and unmasks the underlying `syntax' of gene regulation

arXiv.org e-Print Archive

Latent Diffusion Model for DNA Sequence Generation

Author: Das Akashaditya
Huygelen Tim August B.
Li Zehui
Ni Yuhao
Stan Guy-Bart
Xia Guoxuan
Zhao Yiren
Publication venue
Publication date: 09/10/2023
Field of study

The harnessing of machine learning, especially deep generative models, has opened up promising avenues in the field of synthetic DNA sequence generation. Whilst Generative Adversarial Networks (GANs) have gained traction for this application, they often face issues such as limited sample diversity and mode collapse. On the other hand, Diffusion Models are a promising new class of generative models that are not burdened with these problems, enabling them to reach the state-of-the-art in domains such as image generation. In light of this, we propose a novel latent diffusion model, DiscDiff, tailored for discrete DNA sequence generation. By simply embedding discrete DNA sequences into a continuous latent space using an autoencoder, we are able to leverage the powerful generative abilities of continuous diffusion models for the generation of discrete data. Additionally, we introduce Fr\'echet Reconstruction Distance (FReD) as a new metric to measure the sample quality of DNA sequence generations. Our DiscDiff model demonstrates an ability to generate synthetic DNA sequences that align closely with real DNA in terms of Motif Distribution, Latent Embedding Distribution (FReD), and Chromatin Profiles. Additionally, we contribute a comprehensive cross-species dataset of 150K unique promoter-gene sequences from 15 species, enriching resources for future generative modelling in genomics. We will make our code public upon publication

arXiv.org e-Print Archive

Recommended from our members

Cas protein diagnostics for pathogen nucleic acids

Author: Das Akashaditya
Publication venue: Darwin College
Publication date: 24/08/2023
Field of study

Cas protein diagnostics for pathogen nucleic acids - Akashaditya Das I formulated my research goals for my PhD in tandem with a Global Challenges Research Grant awarded to the Ajioka Lab in collaboration with other labs in the United Kingdom and South Africa. The grant aimed to develop tests for pathogens that avoid the reliance on expensive equipment such as thermocyclers to function. We identified Cas proteins as the core technology to develop further and split ourselves into two teams - one focusing on a DNA target and one focusing on an RNA target. The Ajioka Lab was assigned to create a test for a DNA target and we selected Hepatitis B Virus to be our test case. In my thesis, I investigate the process for producing tests using CRISPR Cas technology for detecting the Hepatitis B Virus. I develop the tests with fluorescent readout and show how they can be adapted to work with lateral flow technology. I looked at integrating isothermal amplification technologies into the assay workflow to boost the test sensitivity. I looked at creating a one-pot system combining CRISPR Cas and isothermal amplification technologies. I also focus on developing semi-automated workows to enable higher throughput and improve the ability to screen multiple assay designs. I use the data from this workflow to develop a predictive tool using machine learning techniques that will allow researchers to identify the effectiveness of different CRISPR Cas assay designs in silico.UKR

Apollo (Cambridge)

Recommended from our members

Sensing the DNA-mismatch tolerance of catalytically inactive Cas9 via barcoded DNA nanostructures in solid-state nanopores

Author: Chen Kaikai
Das Akashaditya
Guitterez Richard
Keyser Ulrich
Sandler Sarah
Weckman Nicole
Yorke Sarah
Publication venue: Nature Biomedical Engineering
Publication date: 27/03/2024
Field of study

Acknowledgements: S.E.S. acknowledges funding from Oxford Nanopore Technologies, Engineering and Physical Sciences Research Council (EPSRC) and Cambridge Trust. N.E.W. acknowledges funding from Oxford Nanopore Technologies, the Canada UK Foundation and the University of Cambridge Office of Postdoctoral Affairs. S.Y. acknowledges funding from the EPSRC (EP/S022953/1), and A.D. acknowledges funding from the EPSRC (EP/L015889/1). U.F.K. and K.C. acknowledge funding through the European Research Council (ERC-2019-POC PoreDetect 899538). We thank Z. Xuan and N. Ermann for assisting in the development of data analysis tools and C. Platnich for the helpful reading of the manuscript and useful suggestions.Funder: EC | EC Seventh Framework Programm | FP7 Ideas: European Research Council (FP7-IDEAS-ERC - Specific Programme: ‘Ideas ’ Implementing the Seventh Framework Programme of the European Community for Research, Technological Development and Demonstration Activities (2007 to 2013)); doi: https://doi.org/10.13039/100011199; Grant(s): ERC-2019-POC PoreDetect 899538, ERC-2019-POC PoreDetect 899538Funder: Oxford Nanopore Technologies (Oxford Nanopore); doi: https://doi.org/10.13039/100010890Funder: Cambridge Commonwealth, European and International Trust (Cambridge Commonwealth, European & International Trust); doi: https://doi.org/10.13039/501100003343Sequence-specific interactions between nucleic acids and proteins are fundamental to many critical biological processes. Despite the ubiquitous nature of protein-DNA binding, versatile methods to probe the specificity of these events remain elusive. In particular, single-molecule methods that enable the quantification of these processes are essential towards understanding and manipulating protein binding. To this end, we report a system which leverages solid state nanopores with diameters of ~10 nm to identify binding events between DNA and CRISPR associated (Cas) probes – specifically catalytically inactive or dead Cas9 (dCas9), which binds to DNA but does not cleave it. The rational design of DNA nanostructures allows for the incorporation of user-defined binding sequences, enabling a systematic study of how mismatch position and identity impacts the binding efficiency. These experiments reveal the relationship between sequence and binding at the single nucleotide level, exemplifying the utility of both nanopore measurements and DNA nanotechnology towards the next generation of biosensing assays.S.E.S. acknowledges funding from Oxford Nanopore Technologies, Engineering and Physical Sciences Research Council (EPSRC) and Cambridge Trust. N.E.W. acknowledges funding from Oxford Nanopore Technologies, the Canada UK Foundation, and the University of Cambridge Office of Postdoctoral Affairs. S.Y. acknowledges funding from the EPSRC grant EP/S022953/1 and A.D. acknowledges funding from the EPSRC grant EP/L015889/1. U.F.K and K.C. acknowledge funding through a ERC-2019-POC PoreDetect 899538

Apollo (Cambridge)

Recommended from our members

Author Correction: Sensing the DNA-mismatch tolerance of catalytically inactive Cas9 via barcoded DNA nanostructures in solid-state nanopores.

Author: Chen Kaikai
Das Akashaditya
Gutierrez Richard
Keyser Ulrich F
Sandler Sarah E
Weckman Nicole E
Yorke Sarah
Publication venue: Nat Biomed Eng
Publication date: 26/03/2024
Field of study

Apollo (Cambridge)

Recommended from our members

Source Data for Nanopore sensing with DNA nanostructures reveals Guide-Intrinsic Mismatch Tolerance of CRISPR/dCas9

Author: Chen Kaikai
Das Akashaditya
Gutierrez Richard
Keyser Ulrich
Sandler Sarah E
Weckman Nicole E
Yorke Sarah
Publication venue: Department of Physics Student
Publication date: 11/05/2023
Field of study

This is the source data for Nanopore sensing with DNA nanostructures reveals Guide-Intrinsic Mismatch Tolerance of CRISPR/dCas9 in Nature Biomedical Engineering. The data was generated by mixing a DNA nanostructure with dCas9 and measuring the change in current generated from translocation in solid-state nanopores. Data is originally in TDMS format (which can be given upon request) and then converted from TDMS to hdf5 using https://bitbucket.org/nikaer/nanopyre/src/master/ - from this translocationfinder is used which writes files from TDMS (labview) to hdf5. Processing and further data analysis from the hdf5 is done using https://github.com/sarahsandler/nanopro. TDMS files are from labview, however there is a nptdms software which can be used to read them into python. Please see the manuscript for more details

Apollo (Cambridge)