754 research outputs found
Independent component analysis: algorithms and applications
A fundamental problem in neural network research, as well as in many other disciplines, is finding a suitable representation of multivariate data, i.e. random vectors. For reasons of computational and conceptual simplicity, the representation is often sought as a linear transformation of the original data. In other words, each component of the representation is a linear combination of the original variables. Well-known linear transformation methods include principal component analysis, factor analysis, and projection pursuit. Independent component analysis (ICA) is a recently developed method in which the goal is to find a linear representation of nongaussian data so that the components are statistically independent, or as independent as possible. Such a representation seems to capture the essential structure of the data in many applications, including feature extraction and signal separation. In this paper, we present the basic theory and applications of ICA, and our recent work on the subject
Finding Exogenous Variables in Data with Many More Variables than Observations
Many statistical methods have been proposed to estimate causal models in
classical situations with fewer variables than observations (p<n, p: the number
of variables and n: the number of observations). However, modern datasets
including gene expression data need high-dimensional causal modeling in
challenging situations with orders of magnitude more variables than
observations (p>>n). In this paper, we propose a method to find exogenous
variables in a linear non-Gaussian causal model, which requires much smaller
sample sizes than conventional methods and works even when p>>n. The key idea
is to identify which variables are exogenous based on non-Gaussianity instead
of estimating the entire structure of the model. Exogenous variables work as
triggers that activate a causal chain in the model, and their identification
leads to more efficient experimental designs and better understanding of the
causal mechanism. We present experiments with artificial data and real-world
gene expression data to evaluate the method.Comment: A revised version of this was published in Proc. ICANN201
Grid based propositional satisfiability solving
This work studies how grid and cloud computing can be applied to efficiently solving propositional satisfiability problem (SAT) instances. Propositional logic provides a convenient language for expressing real-world originated problems such as AI planning, automated test pattern generation, bounded model checking and cryptanalysis. The interest in SAT solving has increased mainly due to improvements in the solving algorithms, which recently have increasingly focused on using parallelism offered by multi-CPU computers. Partly orthogonally to these improvements this work studies several novel approaches to parallel solving of SAT instances in a grid of widely distributed "virtual" computers instead of workstations or supercomputers.
Two types of parallel SAT solving approaches are analyzed and used as building blocks for more complex systems: using several solvers which compete to solve a given instance in parallel, and splitting the search space of the instance and solving the resulting partitions in parallel. The work presents several efficient partitioning functions, critical in successful splitting according to an analytical result, and presents novel solving systems that are less dependent on the partitioning function efficiency. Finally, the work studies combining clause learning, a key technique in modern SAT solvers, with the novel types of parallel solvers. Different heuristics are studied for filtering clauses learned in parallel, and the work proposes techniques which allow exchanging the clauses between different splits.
The practical significance of the results are studied using large, standard benchmark sets from SAT competitions. Some of the approaches are able to solve several instances that have either not been solved at all by any other solver, or which are significantly slower to solve with other solvers
Approaches to grid-based SAT solving
In this work we develop techniques for using distributed computing resources to efficiently solve instances of the propositional satisfiability problem (SAT). The computing resources considered in this work are assumed to be geographically distributed and connected by a non-dedicated network. Such systems are typically referred to as computational grid environments.
The time a modern SAT solver consumes while solving an instance varies according to a random distribution. Unlike many other methods for distributed SAT solving, this work identifies the random distribution as a valuable resource for solving-time reduction. The methods which use randomness in the run times of a search algorithm, such as the ones discussed in this work, are examples of multi-search. The main contribution of this work is in developing and analyzing the multi-search approach in SAT solving and showing its efficiency with several experiments. For the purpose of the analysis, the work introduces a grid simulation model which captures several of the properties of a grid environment which are not observed in more traditional parallel computing systems.
The work develops two algorithmic frameworks for multi-search in SAT. The first, SDSAT, is based on using properties of the distribution of the solving time so that the expected time required to solve an instance is reduced. Based on the analysis of SDSAT, the work proposes an algorithm for efficiently using large number of computing resources simultaneously to solve collections of SAT instances. The analysis of SDSAT also motivates the second algorithmic framework, CL-SDSAT. The framework is used to efficiently solve many industrial SAT instances by carefully combining information learned in the distributed SAT solvers.
All methods described in the work are directly applicable in a wide range of grid environments and can be used together with virtually unmodified state-of-the-art SAT solvers. The methods are experimentally verified using standard benchmark SAT instances in a production-level grid environment. The experiments show that using the relatively simple methods developed in the work, SAT instances which cannot be solved efficiently in sequential settings can be now solved in a grid environment
Least Dependent Component Analysis Based on Mutual Information
We propose to use precise estimators of mutual information (MI) to find least
dependent components in a linearly mixed signal. On the one hand this seems to
lead to better blind source separation than with any other presently available
algorithm. On the other hand it has the advantage, compared to other
implementations of `independent' component analysis (ICA) some of which are
based on crude approximations for MI, that the numerical values of the MI can
be used for:
(i) estimating residual dependencies between the output components;
(ii) estimating the reliability of the output, by comparing the pairwise MIs
with those of re-mixed components;
(iii) clustering the output according to the residual interdependencies.
For the MI estimator we use a recently proposed k-nearest neighbor based
algorithm. For time sequences we combine this with delay embedding, in order to
take into account non-trivial time correlations. After several tests with
artificial data, we apply the resulting MILCA (Mutual Information based Least
dependent Component Analysis) algorithm to a real-world dataset, the ECG of a
pregnant woman.
The software implementation of the MILCA algorithm is freely available at
http://www.fz-juelich.de/nic/cs/softwareComment: 18 pages, 20 figures, Phys. Rev. E (in press
Tarmo: A Framework for Parallelized Bounded Model Checking
This paper investigates approaches to parallelizing Bounded Model Checking
(BMC) for shared memory environments as well as for clusters of workstations.
We present a generic framework for parallelized BMC named Tarmo. Our framework
can be used with any incremental SAT encoding for BMC but for the results in
this paper we use only the current state-of-the-art encoding for full PLTL.
Using this encoding allows us to check both safety and liveness properties,
contrary to an earlier work on distributing BMC that is limited to safety
properties only.
Despite our focus on BMC after it has been translated to SAT, existing
distributed SAT solvers are not well suited for our application. This is
because solving a BMC problem is not solving a set of independent SAT instances
but rather involves solving multiple related SAT instances, encoded
incrementally, where the satisfiability of each instance corresponds to the
existence of a counterexample of a specific length. Our framework includes a
generic architecture for a shared clause database that allows easy clause
sharing between SAT solver threads solving various such instances.
We present extensive experimental results obtained with multiple variants of
our Tarmo implementation. Our shared memory variants have a significantly
better performance than conventional single threaded approaches, which is a
result that many users can benefit from as multi-core and multi-processor
technology is widely available. Furthermore we demonstrate that our framework
can be deployed in a typical cluster of workstations, where several multi-core
machines are connected by a network
Kokeita automaattiruokinnan järjestämiseksi meijerisikalassa. II
Meijerien Keskusosuusliike Valion Ylitornion meijerin sikalassa suoritetuissa toisen vuoden ruokintakokeissa, joissa oli verrattavana a) perusrehu ynnä tuore hera, b) perusrehu + kaurankuorijauholisä ynnä tuore hera sekä c) perusrehu ynnä kuivattu hera, saatiin seuraavat tulokset. Kun kuivarehuseoksen raakakuitupitoisuus nostettiin 6.9 %:sta 9.2 %:iin lisäämällä 40 kilon elopainorajasta alkaen perusrehuun 10 % kaurankuorijauhoja, tämä toimenpide ei pienentänyt kuivarehun eikä lisännyt heran kulutusta. Seurauksena oli jonkinverran pienempi lisäkasvu ja suurempi suhteellinen rehunkulutus kuin vertailuryhmällä. Osittain tämä tulos voi johtua siitä, että heran kulutus jäi tässä kokeessa odotettua pienemmäksi eli vain noin 1200 kiloon eläintä kohden. Kun tuoreen heran sijasta käytettiin kuivattua heraa 50 % kuivarehuseoksessa, niin kokeen alkuvaiheessa 45 kg:n elopainorajaan saakka kuivatulla heralla näytti olevan kasvunopeuteen ja rehun hyväksikäyttöön edullisia vaikutuksia tuoreeseen heraan verrattuna. Suuremmille sioille syötettynä kuivattu hera taas osoittautui eläinten kasvunopeuden nojalla arvostellen vähemmän edulliseksi, mutta tällöinkin kuivatulla heralla saavutettiin kuitenkin käytettyä ry:ä kohden keskimäärin suhteellisesti parempi tuotantovaikutus kuin tuoreella heralla
- …