4,872 research outputs found
Justifying Information-Geometric Causal Inference
Information Geometric Causal Inference (IGCI) is a new approach to
distinguish between cause and effect for two variables. It is based on an
independence assumption between input distribution and causal mechanism that
can be phrased in terms of orthogonality in information space. We describe two
intuitive reinterpretations of this approach that makes IGCI more accessible to
a broader audience.
Moreover, we show that the described independence is related to the
hypothesis that unsupervised learning and semi-supervised learning only works
for predicting the cause from the effect and not vice versa.Comment: 3 Figure
DNA ANALYSIS USING GRAMMATICAL INFERENCE
An accurate language definition capable of distinguishing between coding and non-coding DNA has important applications and analytical significance to the field of computational biology. The method proposed here uses positive sample grammatical inference and statistical information to infer languages for coding DNA.
An algorithm is proposed for the searching of an optimal subset of input sequences for the inference of regular grammars by optimizing a relevant accuracy metric. The algorithm does not guarantee the finding of the optimal subset; however, testing shows improvement in accuracy and performance over the basis algorithm.
Testing shows that the accuracy of inferred languages for components of DNA are consistently accurate. By using the proposed algorithm languages are inferred for coding DNA with average conditional probability over 80%. This reveals that languages for components of DNA can be inferred and are useful independent of the process that created them. These languages can then be analyzed or used for other tasks in computational biology.
To illustrate potential applications of regular grammars for DNA components, an inferred language for exon sequences is applied as post processing to Hidden Markov exon prediction to reduce the number of wrong exons detected and improve the specificity of the model significantly
Automatic Network Fingerprinting through Single-Node Motifs
Complex networks have been characterised by their specific connectivity
patterns (network motifs), but their building blocks can also be identified and
described by node-motifs---a combination of local network features. One
technique to identify single node-motifs has been presented by Costa et al. (L.
D. F. Costa, F. A. Rodrigues, C. C. Hilgetag, and M. Kaiser, Europhys. Lett.,
87, 1, 2009). Here, we first suggest improvements to the method including how
its parameters can be determined automatically. Such automatic routines make
high-throughput studies of many networks feasible. Second, the new routines are
validated in different network-series. Third, we provide an example of how the
method can be used to analyse network time-series. In conclusion, we provide a
robust method for systematically discovering and classifying characteristic
nodes of a network. In contrast to classical motif analysis, our approach can
identify individual components (here: nodes) that are specific to a network.
Such special nodes, as hubs before, might be found to play critical roles in
real-world networks.Comment: 16 pages (4 figures) plus supporting information 8 pages (5 figures
Discovering Restricted Regular Expressions with Interleaving
Discovering a concise schema from given XML documents is an important problem
in XML applications. In this paper, we focus on the problem of learning an
unordered schema from a given set of XML examples, which is actually a problem
of learning a restricted regular expression with interleaving using positive
example strings. Schemas with interleaving could present meaningful knowledge
that cannot be disclosed by previous inference techniques. Moreover, inference
of the minimal schema with interleaving is challenging. The problem of finding
a minimal schema with interleaving is shown to be NP-hard. Therefore, we
develop an approximation algorithm and a heuristic solution to tackle the
problem using techniques different from known inference algorithms. We do
experiments on real-world data sets to demonstrate the effectiveness of our
approaches. Our heuristic algorithm is shown to produce results that are very
close to optimal.Comment: 12 page
Towards an Abstract Domain for Resource Analysis of Logic Programs Using Sized Types
We present a novel general resource analysis for logic programs based on
sized types.Sized types are representations that incorporate structural (shape)
information and allow expressing both lower and upper bounds on the size of a
set of terms and their subterms at any position and depth. They also allow
relating the sizes of terms and subterms occurring at different argument
positions in logic predicates. Using these sized types, the resource analysis
can infer both lower and upper bounds on the resources used by all the
procedures in a program as functions on input term (and subterm) sizes,
overcoming limitations of existing analyses and enhancing their precision. Our
new resource analysis has been developed within the abstract interpretation
framework, as an extension of the sized types abstract domain, and has been
integrated into the Ciao preprocessor, CiaoPP. The abstract domain operations
are integrated with the setting up and solving of recurrence equations for
both, inferring size and resource usage functions. We show that the analysis is
an improvement over the previous resource analysis present in CiaoPP and
compares well in power to state of the art systems.Comment: Part of WLPE 2013 proceedings (arXiv:1308.2055
- …