Search CORE

19,031 research outputs found

DNA ANALYSIS USING GRAMMATICAL INFERENCE

Author: Cook Cory
Publication venue: SJSU ScholarWorks
Publication date: 14/06/2016
Field of study

An accurate language definition capable of distinguishing between coding and non-coding DNA has important applications and analytical significance to the field of computational biology. The method proposed here uses positive sample grammatical inference and statistical information to infer languages for coding DNA. An algorithm is proposed for the searching of an optimal subset of input sequences for the inference of regular grammars by optimizing a relevant accuracy metric. The algorithm does not guarantee the finding of the optimal subset; however, testing shows improvement in accuracy and performance over the basis algorithm. Testing shows that the accuracy of inferred languages for components of DNA are consistently accurate. By using the proposed algorithm languages are inferred for coding DNA with average conditional probability over 80%. This reveals that languages for components of DNA can be inferred and are useful independent of the process that created them. These languages can then be analyzed or used for other tasks in computational biology. To illustrate potential applications of regular grammars for DNA components, an inferred language for exon sequences is applied as post processing to Hidden Markov exon prediction to reduce the number of wrong exons detected and improve the specificity of the model significantly

SJSU ScholarWorks

Constructing multiple unique input/output sequences using metaheuristic optimisation techniques

Author: Aho
Atkinson
Goldberg
Guo
Hierons
Hierons
Jones
K. Derderian
Kirkpatrick
M. Harman
McGookin
Metropolis
Miller
Naik
Pomeranz
Q. Guo
R.M. Hierons
Tracey
Wegener
Yang
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 01/01/2005
Field of study

Multiple unique input/output sequences (UIOs) are often used to generate robust and compact test sequences in finite state machine (FSM) based testing. However, computing UIOs is NP-hard. Metaheuristic optimisation techniques (MOTs) such as genetic algorithms (GAs) and simulated annealing (SA) are effective in providing good solutions for some NP-hard problems. In the paper, the authors investigate the construction of UIOs by using MOTs. They define a fitness function to guide the search for potential UIOs and use sharing techniques to encourage MOTs to locate UIOs that are calculated as local optima in a search domain. They also compare the performance of GA and SA for UIO construction. Experimental results suggest that, after using a sharing technique, both GA and SA can find a majority of UIOs from the models under test

Crossref

UCL Discovery

King's Research Portal

Brunel University Research Archive

The Discrete Infinite Logistic Normal Distribution

Author: Blei David
Paisley John
Wang Chong
Publication venue
Publication date: 01/01/2012
Field of study

We present the discrete infinite logistic normal distribution (DILN), a Bayesian nonparametric prior for mixed membership models. DILN is a generalization of the hierarchical Dirichlet process (HDP) that models correlation structure between the weights of the atoms at the group level. We derive a representation of DILN as a normalized collection of gamma-distributed random variables, and study its statistical properties. We consider applications to topic modeling and derive a variational inference algorithm for approximate posterior inference. We study the empirical performance of the DILN topic model on four corpora, comparing performance with the HDP and the correlated topic model (CTM). To deal with large-scale data sets, we also develop an online inference algorithm for DILN and compare with online HDP and online LDA on the Nature magazine, which contains approximately 350,000 articles.Comment: This paper will appear in Bayesian Analysis. A shorter version of this paper appeared at AISTATS 2011, Fort Lauderdale, FL, US

arXiv.org e-Print Archive

CiteSeerX

Princeton University Open Access Repository

Crossref

Learning Concise Models from Long Execution Traces

Author: Jeppu Natasha Yogananda
Kroening Daniel
Melham Tom
O'Leary John
Publication venue
Publication date: 01/01/2020
Field of study

Abstract models of system-level behaviour have applications in design exploration, analysis, testing and verification. We describe a new algorithm for automatically extracting useful models, as automata, from execution traces of a HW/SW system driven by software exercising a use-case of interest. Our algorithm leverages modern program synthesis techniques to generate predicates on automaton edges, succinctly describing system behaviour. It employs trace segmentation to tackle complexity for long traces. We learn concise models capturing transaction-level, system-wide behaviour--experimentally demonstrating the approach using traces from a variety of sources, including the x86 QEMU virtual platform and the Real-Time Linux kernel

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive