816 research outputs found

    Capacity of DNA Data Embedding Under Substitution Mutations

    Full text link
    A number of methods have been proposed over the last decade for encoding information using deoxyribonucleic acid (DNA), giving rise to the emerging area of DNA data embedding. Since a DNA sequence is conceptually equivalent to a sequence of quaternary symbols (bases), DNA data embedding (diversely called DNA watermarking or DNA steganography) can be seen as a digital communications problem where channel errors are tantamount to mutations of DNA bases. Depending on the use of coding or noncoding DNA hosts, which, respectively, denote DNA segments that can or cannot be translated into proteins, DNA data embedding is essentially a problem of communications with or without side information at the encoder. In this paper the Shannon capacity of DNA data embedding is obtained for the case in which DNA sequences are subject to substitution mutations modelled using the Kimura model from molecular evolution studies. Inferences are also drawn with respect to the biological implications of some of the results presented.Comment: 22 pages, 13 figures; preliminary versions of this work were presented at the SPIE Media Forensics and Security XII conference (January 2010) and at the IEEE ICASSP conference (March 2010

    Energetic signatures of single base bulges: thermodynamic consequences and biological implications

    Get PDF
    DNA bulges are biologically consequential defects that can arise from template-primer misalignments during replication and pose challenges to the cellular DNA repair machinery. Calorimetric and spectroscopic characterizations of defect-containing duplexes reveal systematic patterns of sequence-context dependent bulge-induced destabilizations. These distinguishing energetic signatures are manifest in three coupled characteristics, namely: the magnitude of the bulge-induced duplex destabilization (ΔΔGBulge); the thermodynamic origins of ΔΔGBulge (i.e. enthalpic versus entropic); and, the cooperativity of the duplex melting transition (i.e. two-state versus non-two state). We find moderately destabilized duplexes undergo two-state dissociation and exhibit ΔΔGBulge values consistent with localized, nearest neighbor perturbations arising from unfavorable entropic contributions. Conversely, strongly destabilized duplexes melt in a non-two-state manner and exhibit ΔΔGBulge values consistent with perturbations exceeding nearest-neighbor expectations that are enthalpic in origin. Significantly, our data reveal an intriguing correlation in which the energetic impact of a single bulge base centered in one strand portends the impact of the corresponding complementary bulge base embedded in the opposite strand. We discuss potential correlations between these bulge-specific differential energetic profiles and their overall biological implications in terms of DNA recognition, repair and replication

    A Tutorial on Coding Methods for DNA-based Molecular Communications and Storage

    Full text link
    Exponential increase of data has motivated advances of data storage technologies. As a promising storage media, DeoxyriboNucleic Acid (DNA) storage provides a much higher data density and superior durability, compared with state-of-the-art media. In this paper, we provide a tutorial on DNA storage and its role in molecular communications. Firstly, we introduce fundamentals of DNA-based molecular communications and storage (MCS), discussing the basic process of performing DNA storage in MCS. Furthermore, we provide tutorials on how conventional coding schemes that are used in wireless communications can be applied to DNA-based MCS, along with numerical results. Finally, promising research directions on DNA-based data storage in molecular communications are introduced and discussed in this paper

    Novel pathogenic variants in filamin C identified in pediatric restrictive cardiomyopathy

    Get PDF
    Restrictive cardiomyopathy (RCM) is a rare and distinct form of cardiomyopathy characterized by normal ventricular chamber dimensions, normal myocardial wall thickness, and preserved systolic function. The abnormal myocardium, however, demonstrates impaired relaxation. To date, dominant variants causing RCM have been reported in a small number of sarcomeric or cytoskeletal genes, but the genetic causes in a majority of cases remain unexplained, especially in early childhood. Here, we describe two RCM families with childhood onset: one in a large family with a history of autosomal dominant RCM and the other a family with affected monozygotic, dichorionic/diamniotic twins. Exome sequencing found a pathogenic filamin C (FLNC) variant in each: p.Pro2298Leu, which segregates with disease in the large autosomal dominant RCM family, and p.Tyr2563Cys in both affected twins. In vitro expression of both mutant proteins yielded aggregates of FLNC containing actin in C2C12 myoblast cells. Recently, a number of variants in FLNC have been described that cause hypertrophic, dilated, and restrictive cardiomyopathies. Our data presented here provide further evidence for the role of FLNC in pediatric RCM, and suggest the need to include FLNC in genetic testing of cardiomyopathy patients including those with early ages of onset

    Characterization of argF Specialized Transducing Derivatives of Bacteriophage P1

    Get PDF

    Emerging Approaches to DNA Data Storage: Challenges and Prospects

    Get PDF
    With the total amount of worldwide data skyrocketing, the global data storage demand is predicted to grow to 1.75 × 1014GB by 2025. Traditional storage methods have difficulties keeping pace given that current storage media have a maximum density of 103GB/mm3. As such, data production will far exceed the capacity of currently available storage methods. The costs of maintaining and transferring data, as well as the limited lifespans and significant data losses associated with current technologies also demand advanced solutions for information storage. Nature offers a powerful alternative through the storage of information that defines living organisms in unique orders of four bases (A, T, C, G) located in molecules called deoxyribonucleic acid (DNA). DNA molecules as information carriers have many advantages over traditional storage media. Their high storage density, potentially low maintenance cost, ease of synthesis, and chemical modification make them an ideal alternative for information storage. To this end, rapid progress has been made over the past decade by exploiting user-defined DNA materials to encode information. In this review, we discuss the most recent advances of DNA-based data storage with a major focus on the challenges that remain in this promising field, including the current intrinsic low speed in data writing and reading and the high cost per byte stored. Alternatively, data storage relying on DNA nanostructures (as opposed to DNA sequence) as well as on other combinations of nanomaterials and biomolecules are proposed with promising technological and economic advantages. In summarizing the advances that have been made and underlining the challenges that remain, we provide a roadmap for the ongoing research in this rapidly growing field, which will enable the development of technological solutions to the global demand for superior storage methodologies

    Optimisation of T cell receptors using in vivo recombination and selection

    Get PDF
    The αβT cell receptor (TCR) orchestrates immunity through the recognition of peptides, derived from degraded proteins, presented on major histocompatibility complex (MHC) molecules. The remarkable ability of the receptor to respond to a vast plethora of antigens is driven by V(D)J recombination, a process which generates a highly diverse TCR repertoire by somatic gene rearrangement of coding DNA. TCR diversity is confined to three short hairpin loops on each TCR chain, called the complementarity determining region (CDR), which form the antigen-binding site. The germline-encoded CDR1 and CDR2 loops predominantly contact MHC, whereas the hypervariable CDR3 are non-germline and primarily bind to the MHC-bound peptide. In this study, we developed a novel in vivo mutagenesis approach which redirects somatic gene rearrangement using V(D)J recombination machinery to diversify and optimise TCR binding. This approach involves embedding a gene recombination cassette into the peptide-binding CDR3β region of established TCRs. A retrogenic system was employed to facilitate the in vivo processes necessary for gene rearrangement and thymic selection. We demonstrate that the recombination cassette can successfully induce gene rearrangement and introduce variation to the targeted CDR3β site. Thymocytes expressing the diversified TCRs can be selected on MHC and develop into functional peripheral T cells. Subsequent exposure to cognate ligands also allowed us to identify optimised and ‘immunodominant’ TCRs. In addition, we produced a novel chimeric TCR chain which comprises Vα and Cβ domains. This TCR chain forms a heterodimer with endogenous TCRα chains to form a unique Vα-Vα antigen-binding surface. Thymocytes expressing this novel form of αβTCR were able to engage efficiently with both MHC classes and develop normally into functional T cells typical of a conventional repertoire. Collectively, these findings suggest that the germline CDR loops are not essential for mediating MHC recognition during MHC-restricted T cell development and function.Open Acces

    Using deep learning to detect digitally encoded DNA trigger for Trojan malware in Bio‑Cyber attacks

    Get PDF
    This article uses Deep Learning technologies to safeguard DNA sequencing against Bio-Cyber attacks. We consider a hybrid attack scenario where the payload is encoded into a DNA sequence to activate a Trojan malware implanted in a software tool used in the sequencing pipeline in order to allow the perpetrators to gain control over the resources used in that pipeline during sequence analysis. The scenario considered in the paper is based on perpetrators submitting synthetically engineered DNA samples that contain digitally encoded IP address and port number of the perpetrator’s machine in the DNA. Genetic analysis of the sample’s DNA will decode the address that is used by the software Trojan malware to activate and trigger a remote connection. This approach can open up to multiple perpetrators to create connections to hijack the DNA sequencing pipeline. As a way of hiding the data, the perpetrators can avoid detection by encoding the address to maximise similarity with genuine DNAs, which we showed previously. However, in this paper we show how Deep Learning can be used to successfully detect and identify the trigger encoded data, in order to protect a DNA sequencing pipeline from Trojan attacks. The result shows nearly up to 100% accuracy in detection in such a novel Trojan attack scenario even after applying fragmentation encryption and steganography on the encoded trigger data. In addition, feasibility of designing and synthesizing encoded DNA for such Trojan payloads is validated by a wet lab experiment
    corecore