Search CORE

4,248 research outputs found

Capacity of DNA Data Embedding Under Substitution Mutations

Author: Balado Félix
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/01/2011
Field of study

A number of methods have been proposed over the last decade for encoding information using deoxyribonucleic acid (DNA), giving rise to the emerging area of DNA data embedding. Since a DNA sequence is conceptually equivalent to a sequence of quaternary symbols (bases), DNA data embedding (diversely called DNA watermarking or DNA steganography) can be seen as a digital communications problem where channel errors are tantamount to mutations of DNA bases. Depending on the use of coding or noncoding DNA hosts, which, respectively, denote DNA segments that can or cannot be translated into proteins, DNA data embedding is essentially a problem of communications with or without side information at the encoder. In this paper the Shannon capacity of DNA data embedding is obtained for the case in which DNA sequences are subject to substitution mutations modelled using the Kimura model from molecular evolution studies. Inferences are also drawn with respect to the biological implications of some of the results presented.Comment: 22 pages, 13 figures; preliminary versions of this work were presented at the SPIE Media Forensics and Security XII conference (January 2010) and at the IEEE ICASSP conference (March 2010

arXiv.org e-Print Archive

Crossref

Research Repository UCD

Irish Universities

Reconstruction Codes for DNA Sequences with Uniform Tandem-Duplication Errors

Author: Schwartz Moshe
Yehezkeally Yonatan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/09/2019
Field of study

DNA as a data storage medium has several advantages, including far greater data density compared to electronic media. We propose that schemes for data storage in the DNA of living organisms may benefit from studying the reconstruction problem, which is applicable whenever multiple reads of noisy data are available. This strategy is uniquely suited to the medium, which inherently replicates stored data in multiple distinct ways, caused by mutations. We consider noise introduced solely by uniform tandem-duplication, and utilize the relation to constant-weight integer codes in the Manhattan metric. By bounding the intersection of the cross-polytope with hyperplanes, we prove the existence of reconstruction codes with greater capacity than known error-correcting codes, which we can determine analytically for any set of parameters.Comment: 11 pages, 2 figures, Latex; version accepted for publicatio

arXiv.org e-Print Archive

Crossref

Extreme genetic fragility of the HIV-1 capsid

Author: A Borsetti
A Van Der Velden
A Wagner
AA Pakula
AD Kelleher
AJ Leslie
AJ Price
AS Perelson
AS Reicin
B Kim
BK Ganser
BK Ganser-Pornillos
BK Ganser-Pornillos
BM Forshey
C Chen
C Strambio-de-Castillia
C Tang
C Zimmerman
CL Moyer
CT Lemke
D Rennell
David Bhella
DC Stenger
DD Axe
DD Loeb
DE Ott
DE Ott
E Sokolskaja
EO Freed
F Pereyra
F Zhang
FM Codoner
FM van den Ent
Frazer J. Rixon
H Crawford
H Kawashima
H Zhang
HG Krausslich
HG Morrison
HH Guo
J Ihssen
J Lanman
J Lanman
J Luban
J Martinez-Picado
J Sticht
JA Briggs
JA Briggs
JA de Visser
JB Peris
Jeremy Luban
JM Carlson
K Alin
K Lee
K Lee
K Nakajima
KA Matreyek
L Holm
L Krishnan
LM Mansky
M Masso
M Parera
M Qi
M Rolland
M Stremlau
M Yamashita
M Yamashita
MA Brockman
ME Abram
MG Mateu
MR Auerbach
MR Auerbach
Mudathir Alim
MW McNatt
N Jouvenet
N Manel
N Srinivasakumar
Nick J. Loman
O Pornillos
O Pornillos
P Borrow
P Carrasco
P Domingo-Calap
P Kiepiela
P Markiewicz
Paul D. Bieniasz
PD Bieniasz
PO Olins
PS Shenkin
R Montville
R Sanjuan
R Sanjuan
R Sanjuan
R Sanjuan
RA Smith
RC Edgar
RK Gitti
RM Troyer
Robert J. Gifford
S Li
SA Eifan
Sam J. Wilson
Saskia E. Bakker
SF Chao
SF Elena
SS Rhee
Suzannah J. Rihn
T Fitzon
T Hatziioannou
T Hatziioannou
T Schaller
T Suzutani
T van Opijnen
T Yano
T Yasugi
TC Terwilliger
TM Allen
TR Gamble
TR Gamble
UK von Schwedler
V Dahirel
V Varthakavi
W Huang
W Shao
WI Sundquist
WS Blair
YF Chang
Z Ambrose
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Genetic robustness, or fragility, is defined as the ability, or lack thereof, of a biological entity to maintain function in the face of mutations. Viruses that replicate via RNA intermediates exhibit high mutation rates, and robustness should be particularly advantageous to them. The capsid (CA) domain of the HIV-1 Gag protein is under strong pressure to conserve functional roles in viral assembly, maturation, uncoating, and nuclear import. However, CA is also under strong immunological pressure to diversify. Therefore, it would be particularly advantageous for CA to evolve genetic robustness. To measure the genetic robustness of HIV-1 CA, we generated a library of single amino acid substitution mutants, encompassing almost half the residues in CA. Strikingly, we found HIV-1 CA to be the most genetically fragile protein that has been analyzed using such an approach, with 70% of mutations yielding replication-defective viruses. Although CA participates in several steps in HIV-1 replication, analysis of conditionally (temperature sensitive) and constitutively non-viable mutants revealed that the biological basis for its genetic fragility was primarily the need to coordinate the accurate and efficient assembly of mature virions. All mutations that exist in naturally occurring HIV-1 subtype B populations at a frequency >3%, and were also present in the mutant library, had fitness levels that were >40% of WT. However, a substantial fraction of mutations with high fitness did not occur in natural populations, suggesting another form of selection pressure limiting variation in vivo. Additionally, known protective CTL epitopes occurred preferentially in domains of the HIV-1 CA that were even more genetically fragile than HIV-1 CA as a whole. The extreme genetic fragility of HIV-1 CA may be one reason why cell-mediated immune responses to Gag correlate with better prognosis in HIV-1 infection, and suggests that CA is a good target for therapy and vaccination strategies

Public Library of Science (PLOS)

Crossref

University of Birmingham Research Portal

Directory of Open Access Journals

PubMed Central

Enlighten

FigShare

Recommended from our members

Transmission of integrin β7 transmembrane domain topology enables gut lymphoid tissue development.

Author: Fan Zhichao
Gingras Alexandre R
Ginsberg Mark H
Lagarrigue Frederic
Ley Klaus
Sun Hao
Publication venue: eScholarship, University of California
Publication date: 01/04/2018
Field of study

Integrin activation regulates adhesion, extracellular matrix assembly, and cell migration, thereby playing an indispensable role in development and in many pathological processes. A proline mutation in the central integrin β3 transmembrane domain (TMD) creates a flexible kink that uncouples the topology of the inner half of the TMD from the outer half. In this study, using leukocyte integrin α4β7, which enables development of gut-associated lymphoid tissue (GALT), we examined the biological effect of such a proline mutation and report that it impairs agonist-induced talin-mediated activation of integrin α4β7, thereby inhibiting rolling lymphocyte arrest, a key step in transmigration. Furthermore, the α4β7(L721P) mutation blocks lymphocyte homing to and development of the GALT. These studies show that impairing the ability of an integrin β TMD to transmit talin-induced TMD topology inhibits agonist-induced physiological integrin activation and biological function in development

eScholarship - University of California

Noise and Uncertainty in String-Duplication Systems

Author: Bruck Jehoshua
Farnoud (Hassanzadeh) Farzad
Jain Siddharth
Schwartz Moshe
Publication venue: 'California Institute of Technology Library'
Publication date: 01/01/2017
Field of study

Duplication mutations play a critical role in the generation of biological sequences. Simultaneously, they have a deleterious effect on data stored using in-vivo DNA data storage. While duplications have been studied both as a sequence-generation mechanism and in the context of error correction, for simplicity these studies have not taken into account the presence of other types of mutations. In this work, we consider the capacity of duplication mutations in the presence of point-mutation noise, and so quantify the generation power of these mutations. We show that if the number of point mutations is vanishingly small compared to the number of duplication mutations of a constant length, the generation capacity of these mutations is zero. However, if the number of point mutations increases to a constant fraction of the number of duplications, then the capacity is nonzero. Lower and upper bounds for this capacity are also presented. Another problem that we study is concerned with the mismatch between code design and channel in data storage in the DNA of living organisms with respect to duplication mutations. In this context, we consider the uncertainty of such a mismatched coding scheme measured as the maximum number of input codewords that can lead to the same output

Crossref

Caltech Authors

Data Hiding Based DNA Issues: A Review

Author: Kadhum Sahar Adill
Publication venue: 'University of Babylon - Physical Education and Sports Sciences'
Publication date: 01/09/2020
Field of study

يعد أمن المعلومات مصدر قلق رئيسي ، لا سيما مع نمو استخدام الإنترنت. بسبب هذا النمو ظهرت حالات اختراق للبيانات المرسلة منها الوصول غير المصرح به التي يتم التصدي له باستخدام تقنيات اتصال آمنة متنوعة  وهي ؛ التشفير وإخفاء البيانات. تتعلق الاتجاهات الحديثة بالحمض النووي المستخدم في التشفير وإخفاء البيانات كحامل للبيانات من خلال استغلال خصائصه الجزيئية الحيوية. تقدم هذه الورقة استبيانًا حول البحوث المنشورة المستندة إلى الحمض النووي لاخفاء البيانات المهمة  كحامي لها  والمنقولة عبر قناة غير آمنة  لمعرفة  نقاط القوة والضعف فيها. لمساعدة البحث المستقبلي في تصميم تقنيات أكثر كفاءة وأمانًا للاخفاء في الحمض نوويSecurity of Information are a key concern, particularly with the extension growth of internet usage. This growth comes the incidents of unauthorized access which are countered by the use of varied secure communication techniques, namely; cryptography and data hiding. More recent trends are concerned with DNA used for cryptography and data hiding as a carrier exploiting its bio-molecular properties. This paper provides a review about published DNA based data hiding techniques using the DNA as a safeguard to critical data that transmitted on an insecure channel, to find out the strength and weaknesses points of them. This will help the future research in designing of more efficient and secure data hiding techniques-based DNA

Journals of University of Babylon

Generative Language Models on Nucleotide Sequences of Human Genes

Author: Ihtiyar Musa Nuri
Ozgur Arzucan
Publication venue
Publication date: 20/07/2023
Field of study

Language models, primarily transformer-based ones, obtained colossal success in NLP. To be more precise, studies like BERT in NLU and works such as GPT-3 for NLG are very crucial. DNA sequences are very close to natural language in terms of structure, so if the DNA-related bioinformatics domain is concerned, discriminative models, like DNABert, exist. Yet, the generative side of the coin is mainly unexplored to the best of our knowledge. Consequently, we focused on developing an autoregressive generative language model like GPT-3 for DNA sequences. Because working with whole DNA sequences is challenging without substantial computational resources, we decided to carry out our study on a smaller scale, focusing on nucleotide sequences of human genes, unique parts in DNA with specific functionalities, instead of the whole DNA. This decision did not change the problem structure a lot due to the fact that both DNA and genes can be seen as 1D sequences consisting of four different nucleotides without losing much information and making too much simplification. First of all, we systematically examined an almost entirely unexplored problem and observed that RNNs performed the best while simple techniques like N-grams were also promising. Another beneficial point was learning how to work with generative models on languages we do not understand, unlike natural language. How essential using real-life tasks beyond the classical metrics such as perplexity is observed. Furthermore, checking whether the data-hungry nature of these models can be changed through selecting a language with minimal vocabulary size, four owing to four different types of nucleotides, is examined. The reason for reviewing this was that choosing such a language might make the problem easier. However, what we observed in this study was it did not provide that much of a change in the amount of data needed

arXiv.org e-Print Archive

Targeted KRAS Mutation Assessment on Patient Tumor Histologic Material in Real Time Diagnostics

Author: Biesmans Bart
Charalambous Elpida
Fountzilas George
Karkavelas George
Kotoula Vassiliki
Malousi Andigoni
Vrettou Eleni
Publication venue: Public Library of Science
Publication date: 04/11/2009
Field of study

BACKGROUND: Testing for tumor specific mutations on routine formalin-fixed paraffin-embedded (FFPE) tissues may predict response to treatment in Medical Oncology and has already entered diagnostics, with KRAS mutation assessment as a paradigm. The highly sensitive real time PCR (Q-PCR) methods developed for this purpose are usually standardized under optimal template conditions. In routine diagnostics, however, suboptimal templates pose the challenge. Herein, we addressed the applicability of sequencing and two Q-PCR methods on prospectively assessed diagnostic cases for KRAS mutations. METHODOLOGY/PRINCIPAL FINDINGS: Tumor FFPE-DNA from 135 diagnostic and 75 low-quality control samples was obtained upon macrodissection, tested for fragmentation and assessed for KRAS mutations with dideoxy-sequencing and with two Q-PCR methods (Taqman-minor-groove-binder [TMGB] probes and DxS-KRAS-IVD). Samples with relatively well preserved DNA could be accurately analyzed with sequencing, while Q-PCR methods yielded informative results even in cases with very fragmented DNA (p<0.0001) with 100% sensitivity and specificity vs each other. However, Q-PCR efficiency (Ct values) also depended on DNA-fragmentation (p<0.0001). Q-PCR methods were sensitive to detect<or=1% mutant cells, provided that samples yielded cycle thresholds (Ct)<29, but this condition was met in only 38.5% of diagnostic samples. In comparison, FFPE samples (>99%) could accurately be analyzed at a sensitivity level of 10% (external validation of TMGB results). DNA quality and tumor cell content were the main reasons for discrepant sequencing/Q-PCR results (1.5%). CONCLUSIONS/SIGNIFICANCE: Diagnostic targeted mutation assessment on FFPE-DNA is very efficient with Q-PCR methods in comparison to dideoxy-sequencing. However, DNA fragmentation/amplification capacity and tumor DNA content must be considered for the interpretation of Q-PCR results in order to provide accurate information for clinical decision making

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

Duplication-Correcting Codes for Data Storage in the DNA of Living Organisms

Author: Bruck Jehoshua
Farnoud Farzad
Jain Siddharth
Schwartz Moshe
Publication venue: 'California Institute of Technology Library'
Publication date: 01/01/2016
Field of study

The ability to store data in the DNA of a living organism has applications in a variety of areas including synthetic biology and watermarking of patented genetically-modified organisms. Data stored in this medium is subject to errors arising from various mutations, such as point mutations, indels, and tandem duplication, which need to be corrected to maintain data integrity. In this paper, we provide error-correcting codes for errors caused by tandem duplications, which create a copy of a block of the sequence and insert it in a tandem manner, i.e., next to the original. In particular, we present two families of codes for correcting errors due to tandem-duplications of a fixed length; the first family can correct any number of errors while the second corrects a bounded number of errors. We also study codes for correcting tandem duplications of length up to a given constant k, where we are primarily focused on the cases of k = 2, 3

arXiv.org e-Print Archive

Caltech Authors