1,385,581 research outputs found

    DNA-Based Kinship Analysis

    Get PDF
    Relatedness between individuals and groups can be investigated using DNA markers. A child’s DNA profile is a combination of alleles passed down from the father and mother. This means that relationships can be investigated between alleged family members. DNA profiling is commonly used to test for potential paternity, parentage and sibship (whether people are related as brothers or sisters) relationships. In many forensic cases more complex relationships have to be considered

    A Linear Algebra Approach to Fast DNA Mixture Analysis Using GPUs

    Full text link
    Analysis of DNA samples is an important step in forensics, and the speed of analysis can impact investigations. Comparison of DNA sequences is based on the analysis of short tandem repeats (STRs), which are short DNA sequences of 2-5 base pairs. Current forensics approaches use 20 STR loci for analysis. The use of single nucleotide polymorphisms (SNPs) has utility for analysis of complex DNA mixtures. The use of tens of thousands of SNPs loci for analysis poses significant computational challenges because the forensic analysis scales by the product of the loci count and number of DNA samples to be analyzed. In this paper, we discuss the implementation of a DNA sequence comparison algorithm by re-casting the algorithm in terms of linear algebra primitives. By developing an overloaded matrix multiplication approach to DNA comparisons, we can leverage advances in GPU hardware and algoithms for Dense Generalized Matrix-Multiply (DGEMM) to speed up DNA sample comparisons. We show that it is possible to compare 2048 unknown DNA samples with 20 million known samples in under 6 seconds using a NVIDIA K80 GPU.Comment: Accepted for publication at the 2017 IEEE High Performance Extreme Computing conferenc

    DNA ANALYSIS USING GRAMMATICAL INFERENCE

    Get PDF
    An accurate language definition capable of distinguishing between coding and non-coding DNA has important applications and analytical significance to the field of computational biology. The method proposed here uses positive sample grammatical inference and statistical information to infer languages for coding DNA. An algorithm is proposed for the searching of an optimal subset of input sequences for the inference of regular grammars by optimizing a relevant accuracy metric. The algorithm does not guarantee the finding of the optimal subset; however, testing shows improvement in accuracy and performance over the basis algorithm. Testing shows that the accuracy of inferred languages for components of DNA are consistently accurate. By using the proposed algorithm languages are inferred for coding DNA with average conditional probability over 80%. This reveals that languages for components of DNA can be inferred and are useful independent of the process that created them. These languages can then be analyzed or used for other tasks in computational biology. To illustrate potential applications of regular grammars for DNA components, an inferred language for exon sequences is applied as post processing to Hidden Markov exon prediction to reduce the number of wrong exons detected and improve the specificity of the model significantly

    Applications and Challenges of Real-time Mobile DNA Analysis

    Full text link
    The DNA sequencing is the process of identifying the exact order of nucleotides within a given DNA molecule. The new portable and relatively inexpensive DNA sequencers, such as Oxford Nanopore MinION, have the potential to move DNA sequencing outside of laboratory, leading to faster and more accessible DNA-based diagnostics. However, portable DNA sequencing and analysis are challenging for mobile systems, owing to high data throughputs and computationally intensive processing performed in environments with unreliable connectivity and power. In this paper, we provide an analysis of the challenges that mobile systems and mobile computing must address to maximize the potential of portable DNA sequencing, and in situ DNA analysis. We explain the DNA sequencing process and highlight the main differences between traditional and portable DNA sequencing in the context of the actual and envisioned applications. We look at the identified challenges from the perspective of both algorithms and systems design, showing the need for careful co-design

    Bayesian DNA copy number analysis

    Get PDF
    BACKGROUND: Some diseases, like tumors, can be related to chromosomal aberrations, leading to changes of DNA copy number. The copy number of an aberrant genome can be represented as a piecewise constant function, since it can exhibit regions of deletions or gains. Instead, in a healthy cell the copy number is two because we inherit one copy of each chromosome from each our parents. Bayesian Piecewise Constant Regression (BPCR) is a Bayesian regression method for data that are noisy observations of a piecewise constant function. The method estimates the unknown segment number, the endpoints of the segments and the value of the segment levels of the underlying piecewise constant function. The Bayesian Regression Curve (BRC) estimates the same data with a smoothing curve. However, in the original formulation, some estimators failed to properly determine the corresponding parameters. For example, the boundary estimator did not take into account the dependency among the boundaries and succeeded in estimating more than one breakpoint at the same position, losing segments. RESULTS: We derived an improved version of the BPCR (called mBPCR) and BRC, changing the segment number estimator and the boundary estimator to enhance the fitting procedure. We also proposed an alternative estimator of the variance of the segment levels, which is useful in case of data with high noise. Using artificial data, we compared the original and the modified version of BPCR and BRC with other regression methods, showing that our improved version of BPCR generally outperformed all the others. Similar results were also observed on real data. CONCLUSION: We propose an improved method for DNA copy number estimation, mBPCR, which performed very well compared to previously published algorithms. In particular, mBPCR was more powerful in the detection of the true position of the breakpoints and of small aberrations in very noisy data. Hence, from a biological point of view, our method can be very useful, for example, to find targets of genomic aberrations in clinical cancer samples

    Real-time DNA microarray analysis

    Get PDF
    We present a quantification method for affinity-based DNA microarrays which is based on the real-time measurements of hybridization kinetics. This method, i.e. real-time DNA microarrays, enhances the detection dynamic range of conventional systems by being impervious to probe saturation in the capturing spots, washing artifacts, microarray spot-to-spot variations, and other signal amplitude-affecting non-idealities. We demonstrate in both theory and practice that the time-constant of target capturing in microarrays, similar to all affinity-based biosensors, is inversely proportional to the concentration of the target analyte, which we subsequently use as the fundamental parameter to estimate the concentration of the analytes. Furthermore, to empirically validate the capabilities of this method in practical applications, we present a FRET-based assay which enables the real-time detection in gene expression DNA microarrays

    Plojdy Analysis and Dna Content of Mutant Banana "Pisang Berangan" Using Flow Cytometry

    Full text link
    Mutagens cause random changes in the nuclear DNA or cytoplasmic organelles, resulting in gene, chromosomal or genomic mutations and hence, create variability. In this study, flow cytometry (FCM) was used to determine ploidy levels and DNA content in gamma-irradiated variants of mutated Pisang Berangan (cv. Intan, AAA) - a local banana genotype. Induced variants such as short plant stature (stunted growth), late flowering plants (late maturity) and abnormalities in bunch characters were selected to study possible changes at the DNA level. The study showed that DNA content of mutated plants differed from non-irradiated control and that irradiation had the most effect at high doses (40 and 60 Gy). The increase of DNA content in 20 Gy and 30 Gy treated plants was not more than that of the control plants. The values of genomic DNA content of gamma-irradiation variants decreased as the dose of irradiation increased from 20 to 60 Gy, indicating that the high dose of gamma-irradiation had a significant effect on the genome of the plants. The analysis further showed that phenotypic variation due to mutagenesis was reflected in the DNA content of the plants. The results also showed that ploidy levels were not affected by gamma-irradiation even at high doses

    Computational aspects of DNA mixture analysis

    Full text link
    Statistical analysis of DNA mixtures is known to pose computational challenges due to the enormous state space of possible DNA profiles. We propose a Bayesian network representation for genotypes, allowing computations to be performed locally involving only a few alleles at each step. In addition, we describe a general method for computing the expectation of a product of discrete random variables using auxiliary variables and probability propagation in a Bayesian network, which in combination with the genotype network allows efficient computation of the likelihood function and various other quantities relevant to the inference. Lastly, we introduce a set of diagnostic tools for assessing the adequacy of the model for describing a particular dataset

    Having a direct look:analysis of DNA damage and repair mechanisms by next generation sequencing

    Get PDF
    AbstractGenetic information is under constant attack from endogenous and exogenous sources, and the use of model organisms has provided important frameworks to understand how genome stability is maintained and how various DNA lesions are repaired. The advance of high throughput next generation sequencing (NGS) provides new inroads for investigating mechanisms needed for genome maintenance. These emerging studies, which aim to link genetic toxicology and mechanistic analyses of DNA repair processes in vivo, rely on defining mutational signatures caused by faulty replication, endogenous DNA damaging metabolites, or exogenously applied genotoxins; the analysis of their nature, their frequency and distribution. In contrast to classical studies, where DNA repair deficiency is assessed by reduced cellular survival, the localization of DNA repair factors and their interdependence as well as limited analysis of single locus reporter assays, NGS based approaches reveal the direct, quantal imprint of mutagenesis genome-wide, at the DNA sequence level. As we will show, such investigations require the analysis of DNA derived from single genotoxin treated cells, or DNA from cell populations regularly passaged through single cell bottlenecks when naturally occurring mutation accumulation is investigated. We will argue that the life cycle of the nematode Caenorhabditis elegans, its genetic malleability combined with whole genome sequencing provides an exciting model system to conduct such analysis

    Google matrix analysis of DNA sequences

    Get PDF
    For DNA sequences of various species we construct the Google matrix G of Markov transitions between nearby words composed of several letters. The statistical distribution of matrix elements of this matrix is shown to be described by a power law with the exponent being close to those of outgoing links in such scale-free networks as the World Wide Web (WWW). At the same time the sum of ingoing matrix elements is characterized by the exponent being significantly larger than those typical for WWW networks. This results in a slow algebraic decay of the PageRank probability determined by the distribution of ingoing elements. The spectrum of G is characterized by a large gap leading to a rapid relaxation process on the DNA sequence networks. We introduce the PageRank proximity correlator between different species which determines their statistical similarity from the view point of Markov chains. The properties of other eigenstates of the Google matrix are also discussed. Our results establish scale-free features of DNA sequence networks showing their similarities and distinctions with the WWW and linguistic networks.Comment: latex, 11 fig
    corecore