1,385,581 research outputs found
DNA-Based Kinship Analysis
Relatedness between individuals and groups can be investigated using DNA markers. A child’s DNA profile is a combination of alleles passed down from the father and mother. This means that relationships can be investigated between alleged family members. DNA profiling is commonly used to test for potential paternity, parentage and sibship (whether people are related as brothers or sisters) relationships. In many forensic cases more complex relationships have to be considered
A Linear Algebra Approach to Fast DNA Mixture Analysis Using GPUs
Analysis of DNA samples is an important step in forensics, and the speed of
analysis can impact investigations. Comparison of DNA sequences is based on the
analysis of short tandem repeats (STRs), which are short DNA sequences of 2-5
base pairs. Current forensics approaches use 20 STR loci for analysis. The use
of single nucleotide polymorphisms (SNPs) has utility for analysis of complex
DNA mixtures. The use of tens of thousands of SNPs loci for analysis poses
significant computational challenges because the forensic analysis scales by
the product of the loci count and number of DNA samples to be analyzed. In this
paper, we discuss the implementation of a DNA sequence comparison algorithm by
re-casting the algorithm in terms of linear algebra primitives. By developing
an overloaded matrix multiplication approach to DNA comparisons, we can
leverage advances in GPU hardware and algoithms for Dense Generalized
Matrix-Multiply (DGEMM) to speed up DNA sample comparisons. We show that it is
possible to compare 2048 unknown DNA samples with 20 million known samples in
under 6 seconds using a NVIDIA K80 GPU.Comment: Accepted for publication at the 2017 IEEE High Performance Extreme
Computing conferenc
DNA ANALYSIS USING GRAMMATICAL INFERENCE
An accurate language definition capable of distinguishing between coding and non-coding DNA has important applications and analytical significance to the field of computational biology. The method proposed here uses positive sample grammatical inference and statistical information to infer languages for coding DNA.
An algorithm is proposed for the searching of an optimal subset of input sequences for the inference of regular grammars by optimizing a relevant accuracy metric. The algorithm does not guarantee the finding of the optimal subset; however, testing shows improvement in accuracy and performance over the basis algorithm.
Testing shows that the accuracy of inferred languages for components of DNA are consistently accurate. By using the proposed algorithm languages are inferred for coding DNA with average conditional probability over 80%. This reveals that languages for components of DNA can be inferred and are useful independent of the process that created them. These languages can then be analyzed or used for other tasks in computational biology.
To illustrate potential applications of regular grammars for DNA components, an inferred language for exon sequences is applied as post processing to Hidden Markov exon prediction to reduce the number of wrong exons detected and improve the specificity of the model significantly
Applications and Challenges of Real-time Mobile DNA Analysis
The DNA sequencing is the process of identifying the exact order of
nucleotides within a given DNA molecule. The new portable and relatively
inexpensive DNA sequencers, such as Oxford Nanopore MinION, have the potential
to move DNA sequencing outside of laboratory, leading to faster and more
accessible DNA-based diagnostics. However, portable DNA sequencing and analysis
are challenging for mobile systems, owing to high data throughputs and
computationally intensive processing performed in environments with unreliable
connectivity and power.
In this paper, we provide an analysis of the challenges that mobile systems
and mobile computing must address to maximize the potential of portable DNA
sequencing, and in situ DNA analysis. We explain the DNA sequencing process and
highlight the main differences between traditional and portable DNA sequencing
in the context of the actual and envisioned applications. We look at the
identified challenges from the perspective of both algorithms and systems
design, showing the need for careful co-design
Bayesian DNA copy number analysis
BACKGROUND: Some diseases, like tumors, can be related to chromosomal aberrations, leading to
changes of DNA copy number. The copy number of an aberrant genome can be represented as a
piecewise constant function, since it can exhibit regions of deletions or gains. Instead, in a healthy
cell the copy number is two because we inherit one copy of each chromosome from each our
parents.
Bayesian Piecewise Constant Regression (BPCR) is a Bayesian regression method for data that are
noisy observations of a piecewise constant function. The method estimates the unknown segment
number, the endpoints of the segments and the value of the segment levels of the underlying
piecewise constant function. The Bayesian Regression Curve (BRC) estimates the same data with
a smoothing curve. However, in the original formulation, some estimators failed to properly
determine the corresponding parameters. For example, the boundary estimator did not take into
account the dependency among the boundaries and succeeded in estimating more than one
breakpoint at the same position, losing segments.
RESULTS: We derived an improved version of the BPCR (called mBPCR) and BRC, changing the
segment number estimator and the boundary estimator to enhance the fitting procedure. We also
proposed an alternative estimator of the variance of the segment levels, which is useful in case of
data with high noise. Using artificial data, we compared the original and the modified version of
BPCR and BRC with other regression methods, showing that our improved version of BPCR
generally outperformed all the others. Similar results were also observed on real data.
CONCLUSION: We propose an improved method for DNA copy number estimation, mBPCR, which
performed very well compared to previously published algorithms. In particular, mBPCR was more
powerful in the detection of the true position of the breakpoints and of small aberrations in very
noisy data. Hence, from a biological point of view, our method can be very useful, for example, to
find targets of genomic aberrations in clinical cancer samples
Real-time DNA microarray analysis
We present a quantification method for affinity-based
DNA microarrays which is based on the
real-time measurements of hybridization kinetics.
This method, i.e. real-time DNA microarrays,
enhances the detection dynamic range of conventional
systems by being impervious to probe
saturation in the capturing spots, washing
artifacts, microarray spot-to-spot variations, and
other signal amplitude-affecting non-idealities. We
demonstrate in both theory and practice that the
time-constant of target capturing in microarrays,
similar to all affinity-based biosensors, is inversely
proportional to the concentration of the target
analyte, which we subsequently use as the fundamental
parameter to estimate the concentration
of the analytes. Furthermore, to empirically
validate the capabilities of this method in practical
applications, we present a FRET-based assay which
enables the real-time detection in gene expression
DNA microarrays
Plojdy Analysis and Dna Content of Mutant Banana "Pisang Berangan" Using Flow Cytometry
Mutagens cause random changes in the nuclear DNA or cytoplasmic organelles, resulting in gene, chromosomal or genomic mutations and hence, create variability. In this study, flow cytometry (FCM) was used to determine ploidy levels and DNA content in gamma-irradiated variants of mutated Pisang Berangan (cv. Intan, AAA) - a local banana genotype. Induced variants such as short plant stature (stunted growth), late flowering plants (late maturity) and abnormalities in bunch characters were selected to study possible changes at the DNA level. The study showed that DNA content of mutated plants differed from non-irradiated control and that irradiation had the most effect at high doses (40 and 60 Gy). The increase of DNA content in 20 Gy and 30 Gy treated plants was not more than that of the control plants. The values of genomic DNA content of gamma-irradiation variants decreased as the dose of irradiation increased from 20 to 60 Gy, indicating that the high dose of gamma-irradiation had a significant effect on the genome of the plants. The analysis further showed that phenotypic variation due to mutagenesis was reflected in the DNA content of the plants. The results also showed that ploidy levels were not affected by gamma-irradiation even at high doses
Computational aspects of DNA mixture analysis
Statistical analysis of DNA mixtures is known to pose computational
challenges due to the enormous state space of possible DNA profiles. We propose
a Bayesian network representation for genotypes, allowing computations to be
performed locally involving only a few alleles at each step. In addition, we
describe a general method for computing the expectation of a product of
discrete random variables using auxiliary variables and probability propagation
in a Bayesian network, which in combination with the genotype network allows
efficient computation of the likelihood function and various other quantities
relevant to the inference. Lastly, we introduce a set of diagnostic tools for
assessing the adequacy of the model for describing a particular dataset
Having a direct look:analysis of DNA damage and repair mechanisms by next generation sequencing
AbstractGenetic information is under constant attack from endogenous and exogenous sources, and the use of model organisms has provided important frameworks to understand how genome stability is maintained and how various DNA lesions are repaired. The advance of high throughput next generation sequencing (NGS) provides new inroads for investigating mechanisms needed for genome maintenance. These emerging studies, which aim to link genetic toxicology and mechanistic analyses of DNA repair processes in vivo, rely on defining mutational signatures caused by faulty replication, endogenous DNA damaging metabolites, or exogenously applied genotoxins; the analysis of their nature, their frequency and distribution. In contrast to classical studies, where DNA repair deficiency is assessed by reduced cellular survival, the localization of DNA repair factors and their interdependence as well as limited analysis of single locus reporter assays, NGS based approaches reveal the direct, quantal imprint of mutagenesis genome-wide, at the DNA sequence level. As we will show, such investigations require the analysis of DNA derived from single genotoxin treated cells, or DNA from cell populations regularly passaged through single cell bottlenecks when naturally occurring mutation accumulation is investigated. We will argue that the life cycle of the nematode Caenorhabditis elegans, its genetic malleability combined with whole genome sequencing provides an exciting model system to conduct such analysis
Google matrix analysis of DNA sequences
For DNA sequences of various species we construct the Google matrix G of
Markov transitions between nearby words composed of several letters. The
statistical distribution of matrix elements of this matrix is shown to be
described by a power law with the exponent being close to those of outgoing
links in such scale-free networks as the World Wide Web (WWW). At the same time
the sum of ingoing matrix elements is characterized by the exponent being
significantly larger than those typical for WWW networks. This results in a
slow algebraic decay of the PageRank probability determined by the distribution
of ingoing elements. The spectrum of G is characterized by a large gap leading
to a rapid relaxation process on the DNA sequence networks. We introduce the
PageRank proximity correlator between different species which determines their
statistical similarity from the view point of Markov chains. The properties of
other eigenstates of the Google matrix are also discussed. Our results
establish scale-free features of DNA sequence networks showing their
similarities and distinctions with the WWW and linguistic networks.Comment: latex, 11 fig
- …