367 research outputs found
Coding-theorem Like Behaviour and Emergence of the Universal Distribution from Resource-bounded Algorithmic Probability
Previously referred to as `miraculous' in the scientific literature because
of its powerful properties and its wide application as optimal solution to the
problem of induction/inference, (approximations to) Algorithmic Probability
(AP) and the associated Universal Distribution are (or should be) of the
greatest importance in science. Here we investigate the emergence, the rates of
emergence and convergence, and the Coding-theorem like behaviour of AP in
Turing-subuniversal models of computation. We investigate empirical
distributions of computing models in the Chomsky hierarchy. We introduce
measures of algorithmic probability and algorithmic complexity based upon
resource-bounded computation, in contrast to previously thoroughly investigated
distributions produced from the output distribution of Turing machines. This
approach allows for numerical approximations to algorithmic
(Kolmogorov-Chaitin) complexity-based estimations at each of the levels of a
computational hierarchy. We demonstrate that all these estimations are
correlated in rank and that they converge both in rank and values as a function
of computational power, despite fundamental differences between computational
models. In the context of natural processes that operate below the Turing
universal level because of finite resources and physical degradation, the
investigation of natural biases stemming from algorithmic rules may shed light
on the distribution of outcomes. We show that up to 60\% of the
simplicity/complexity bias in distributions produced even by the weakest of the
computational models can be accounted for by Algorithmic Probability in its
approximation to the Universal Distribution.Comment: 27 pages main text, 39 pages including supplement. Online complexity
calculator: http://complexitycalculator.com
If the Current Clique Algorithms are Optimal, so is Valiant's Parser
The CFG recognition problem is: given a context-free grammar
and a string of length , decide if can be obtained from
. This is the most basic parsing question and is a core computer
science problem. Valiant's parser from 1975 solves the problem in
time, where is the matrix multiplication
exponent. Dozens of parsing algorithms have been proposed over the years, yet
Valiant's upper bound remains unbeaten. The best combinatorial algorithms have
mildly subcubic complexity.
Lee (JACM'01) provided evidence that fast matrix multiplication is needed for
CFG parsing, and that very efficient and practical algorithms might be hard or
even impossible to obtain. Lee showed that any algorithm for a more general
parsing problem with running time can
be converted into a surprising subcubic algorithm for Boolean Matrix
Multiplication. Unfortunately, Lee's hardness result required that the grammar
size be . Nothing was known for the more relevant
case of constant size grammars.
In this work, we prove that any improvement on Valiant's algorithm, even for
constant size grammars, either in terms of runtime or by avoiding the
inefficiencies of fast matrix multiplication, would imply a breakthrough
algorithm for the -Clique problem: given a graph on nodes, decide if
there are that form a clique.
Besides classifying the complexity of a fundamental problem, our reduction
has led us to similar lower bounds for more modern and well-studied cubic time
problems for which faster algorithms are highly desirable in practice: RNA
Folding, a central problem in computational biology, and Dyck Language Edit
Distance, answering an open question of Saha (FOCS'14)
Efficient known ncRNA search including pseudoknots
BACKGROUND: Searching for members of characterized ncRNA families containing pseudoknots is an important component of genome-scale ncRNA annotation. However, the state-of-the-art known ncRNA search is based on context-free grammar (CFG), which cannot effectively model pseudoknots. Thus, existing CFG-based ncRNA identification tools usually ignore pseudoknots during search. As a result, dozens of sequences that do not contain the native pseudoknots are reported by these tools. When pseudoknot structures are vital to the functions of the ncRNAs, these sequences may not be true members. RESULTS: In this work, we design a pseudoknot search tool using multiple simple sub-structures, which are derived from knot-free and bifurcation-free structural motifs in the underlying family. We test our tool on a contiguous 22-Mb region of the Maize Genome. The experimental results show that our work competes favorably with other pseudoknot search methods. CONCLUSIONS: Our sub-structure based tool can conduct genome-scale pseudoknot-containing ncRNA search effectively and efficiently. It provides a complementary pseudoknot search tool to Infernal. The source codes are available at http://www.cse.msu.edu/~chengy/knotsearch
STARE: Spatio-Temporal Attention Relocation for multiple structured activities detection
We present a spatio-temporal attention relocation (STARE) method, an information-theoretic approach for efficient detection of simultaneously occurring structured activities. Given multiple human activities in a scene, our method dynamically focuses on the currently most informative activity. Each activity can be detected without complete observation, as the structure of sequential actions plays an important role on making the system robust to unattended observations. For such systems, the ability to decide where and when to focus is crucial to achieving high detection performances under resource bounded condition. Our main contributions can be summarized as follows: 1) information-theoretic dynamic attention relocation framework that allows the detection of multiple activities efficiently by exploiting the activity structure information and 2) a new high-resolution data set of temporally-structured concurrent activities. Our experiments on applications show that the STARE method performs efficiently while maintaining a reasonable level of accuracy
Network Analysis with Stochastic Grammars
Digital forensics requires significant manual effort to identify items of evidentiary interest from the ever-increasing volume of data in modern computing systems. One of the tasks digital forensic examiners conduct is mentally extracting and constructing insights from unstructured sequences of events. This research assists examiners with the association and individualization analysis processes that make up this task with the development of a Stochastic Context -Free Grammars (SCFG) knowledge representation for digital forensics analysis of computer network traffic. SCFG is leveraged to provide context to the low-level data collected as evidence and to build behavior profiles. Upon discovering patterns, the analyst can begin the association or individualization process to answer criminal investigative questions. Three contributions resulted from this research. First , domain characteristics suitable for SCFG representation were identified and a step -by- step approach to adapt SCFG to novel domains was developed. Second, a novel iterative graph-based method of identifying similarities in context-free grammars was developed to compare behavior patterns represented as grammars. Finally, the SCFG capabilities were demonstrated in performing association and individualization in reducing the suspect pool and reducing the volume of evidence to examine in a computer network traffic analysis use case
The ‘Shell Game’: Why Children Never Lose
Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/73236/1/1467-9612.00013.pd
Neural Mechanisms for Information Compression by Multiple Alignment, Unification and Search
This article describes how an abstract framework for perception and cognition may be realised in terms of neural mechanisms and neural processing.
This framework — called information compression by multiple alignment, unification and search (ICMAUS) — has been developed in previous research as a generalized model of any system for processing information, either natural or
artificial. It has a range of applications including the analysis and production of natural language, unsupervised inductive learning, recognition of objects and patterns, probabilistic reasoning, and others. The proposals in this article may be seen as an extension and development of
Hebb’s (1949) concept of a ‘cell assembly’.
The article describes how the concept of ‘pattern’ in the ICMAUS framework may be mapped onto a version of the cell
assembly concept and the way in which neural mechanisms may achieve the effect of ‘multiple alignment’ in the ICMAUS framework.
By contrast with the Hebbian concept of a cell assembly, it is proposed here that any one neuron can belong in one assembly and only one assembly. A key feature of present proposals, which is not part of the Hebbian concept, is that any cell assembly may contain ‘references’ or ‘codes’ that serve to identify one or more other cell assemblies. This mechanism allows information to be stored in a compressed form, it provides a robust mechanism by which assemblies may be connected to form hierarchies and other kinds of structure, it means that assemblies can express
abstract concepts, and it provides solutions to some of the other problems associated with cell assemblies.
Drawing on insights derived from the ICMAUS framework, the article also describes how learning may be achieved with neural mechanisms. This concept of learning is significantly different from the Hebbian concept and appears to provide a better account of what we know about human learning
- …