3 research outputs found
Private Shotgun DNA Sequencing
Current techniques in sequencing a genome allow a service provider (e.g. a
sequencing company) to have full access to the genome information, and thus the
privacy of individuals regarding their lifetime secret is violated. In this
paper, we introduce the problem of private DNA sequencing, where the goal is to
keep the DNA sequence private to the sequencer. We propose an architecture,
where the task of reading fragments of DNA and the task of DNA assembly are
separated, the former is done at the sequencer(s), and the later is completed
at a local trusted data collector. To satisfy the privacy constraint at the
sequencer and reconstruction condition at the data collector, we create an
information gap between these two relying on two techniques: (i) we use more
than one non-colluding sequencer, all reporting the read fragments to the
single data collector, (ii) adding the fragments of some known DNA molecules,
which are still unknown to the sequencers, to the pool. We prove that these two
techniques provide enough freedom to satisfy both conditions at the same time.Comment: 20 pages with 3 figure
Private Shotgun DNA Sequencing: A Structured Approach
DNA sequencing has faced a huge demand since it was first introduced as a
service to the public. This service is often offloaded to the sequencing
companies who will have access to full knowledge of individuals' sequences, a
major violation of privacy. To address this challenge, we propose a solution,
which is based on separating the process of reading the fragments of sequences,
which is done at a sequencing machine, and assembling the reads, which is done
at a trusted local data collector. To confuse the sequencer, in a pooled
sequencing scenario, in which multiple sequences are going to be sequenced
simultaneously, for each target individual, we add fragments of one non-target
individual, with a known DNA sequence at the data collector. Then coverage
depth of the individuals, defined as the number of DNA fragments per DNA site,
are selected proportional to the powers of two. This layered structured
solution allows us to ensure privacy, using only one sequencing machine, in
contrast to our previous solution, where we relied on the existence of multiple
non-colluding sequencing machines.Comment: 10 pages, 3 figures. arXiv admin note: text overlap with
arXiv:1811.1069
Private DNA Sequencing: Hiding Information in Discrete Noise
When an individual's DNA is sequenced, sensitive medical information becomes
available to the sequencing laboratory. A recently proposed way to hide an
individual's genetic information is to mix in DNA samples of other individuals.
We assume these samples are known to the individual but unknown to the
sequencing laboratory. Thus, these DNA samples act as "noise" to the sequencing
laboratory, but still allow the individual to recover their own DNA samples
afterward. Motivated by this idea, we study the problem of hiding a binary
random variable X (a genetic marker) with the additive noise provided by mixing
DNA samples, using mutual information as a privacy metric. This is equivalent
to the problem of finding a worst-case noise distribution for recovering X from
the noisy observation among a set of feasible discrete distributions. We
characterize upper and lower bounds to the solution of this problem, which are
empirically shown to be very close. The lower bound is obtained through a
convex relaxation of the original discrete optimization problem, and yields a
closed-form expression. The upper bound is computed via a greedy algorithm for
selecting the mixing proportions.Comment: 10 pages, 5 figures, shorter version to appear in proceedings of ITW
202