754 research outputs found

    Independent component analysis: algorithms and applications

    Get PDF
    A fundamental problem in neural network research, as well as in many other disciplines, is finding a suitable representation of multivariate data, i.e. random vectors. For reasons of computational and conceptual simplicity, the representation is often sought as a linear transformation of the original data. In other words, each component of the representation is a linear combination of the original variables. Well-known linear transformation methods include principal component analysis, factor analysis, and projection pursuit. Independent component analysis (ICA) is a recently developed method in which the goal is to find a linear representation of nongaussian data so that the components are statistically independent, or as independent as possible. Such a representation seems to capture the essential structure of the data in many applications, including feature extraction and signal separation. In this paper, we present the basic theory and applications of ICA, and our recent work on the subject

    Finding Exogenous Variables in Data with Many More Variables than Observations

    Full text link
    Many statistical methods have been proposed to estimate causal models in classical situations with fewer variables than observations (p<n, p: the number of variables and n: the number of observations). However, modern datasets including gene expression data need high-dimensional causal modeling in challenging situations with orders of magnitude more variables than observations (p>>n). In this paper, we propose a method to find exogenous variables in a linear non-Gaussian causal model, which requires much smaller sample sizes than conventional methods and works even when p>>n. The key idea is to identify which variables are exogenous based on non-Gaussianity instead of estimating the entire structure of the model. Exogenous variables work as triggers that activate a causal chain in the model, and their identification leads to more efficient experimental designs and better understanding of the causal mechanism. We present experiments with artificial data and real-world gene expression data to evaluate the method.Comment: A revised version of this was published in Proc. ICANN201

    Grid based propositional satisfiability solving

    Get PDF
    This work studies how grid and cloud computing can be applied to efficiently solving propositional satisfiability problem (SAT) instances. Propositional logic provides a convenient language for expressing real-world originated problems such as AI planning, automated test pattern generation, bounded model checking and cryptanalysis. The interest in SAT solving has increased mainly due to improvements in the solving algorithms, which recently have increasingly focused on using parallelism offered by multi-CPU computers. Partly orthogonally to these improvements this work studies several novel approaches to parallel solving of SAT instances in a grid of widely distributed "virtual" computers instead of workstations or supercomputers. Two types of parallel SAT solving approaches are analyzed and used as building blocks for more complex systems: using several solvers which compete to solve a given instance in parallel, and splitting the search space of the instance and solving the resulting partitions in parallel. The work presents several efficient partitioning functions, critical in successful splitting according to an analytical result, and presents novel solving systems that are less dependent on the partitioning function efficiency. Finally, the work studies combining clause learning, a key technique in modern SAT solvers, with the novel types of parallel solvers. Different heuristics are studied for filtering clauses learned in parallel, and the work proposes techniques which allow exchanging the clauses between different splits. The practical significance of the results are studied using large, standard benchmark sets from SAT competitions. Some of the approaches are able to solve several instances that have either not been solved at all by any other solver, or which are significantly slower to solve with other solvers

    Approaches to grid-based SAT solving

    Get PDF
    In this work we develop techniques for using distributed computing resources to efficiently solve instances of the propositional satisfiability problem (SAT). The computing resources considered in this work are assumed to be geographically distributed and connected by a non-dedicated network. Such systems are typically referred to as computational grid environments. The time a modern SAT solver consumes while solving an instance varies according to a random distribution. Unlike many other methods for distributed SAT solving, this work identifies the random distribution as a valuable resource for solving-time reduction. The methods which use randomness in the run times of a search algorithm, such as the ones discussed in this work, are examples of multi-search. The main contribution of this work is in developing and analyzing the multi-search approach in SAT solving and showing its efficiency with several experiments. For the purpose of the analysis, the work introduces a grid simulation model which captures several of the properties of a grid environment which are not observed in more traditional parallel computing systems. The work develops two algorithmic frameworks for multi-search in SAT. The first, SDSAT, is based on using properties of the distribution of the solving time so that the expected time required to solve an instance is reduced. Based on the analysis of SDSAT, the work proposes an algorithm for efficiently using large number of computing resources simultaneously to solve collections of SAT instances. The analysis of SDSAT also motivates the second algorithmic framework, CL-SDSAT. The framework is used to efficiently solve many industrial SAT instances by carefully combining information learned in the distributed SAT solvers. All methods described in the work are directly applicable in a wide range of grid environments and can be used together with virtually unmodified state-of-the-art SAT solvers. The methods are experimentally verified using standard benchmark SAT instances in a production-level grid environment. The experiments show that using the relatively simple methods developed in the work, SAT instances which cannot be solved efficiently in sequential settings can be now solved in a grid environment

    Least Dependent Component Analysis Based on Mutual Information

    Get PDF
    We propose to use precise estimators of mutual information (MI) to find least dependent components in a linearly mixed signal. On the one hand this seems to lead to better blind source separation than with any other presently available algorithm. On the other hand it has the advantage, compared to other implementations of `independent' component analysis (ICA) some of which are based on crude approximations for MI, that the numerical values of the MI can be used for: (i) estimating residual dependencies between the output components; (ii) estimating the reliability of the output, by comparing the pairwise MIs with those of re-mixed components; (iii) clustering the output according to the residual interdependencies. For the MI estimator we use a recently proposed k-nearest neighbor based algorithm. For time sequences we combine this with delay embedding, in order to take into account non-trivial time correlations. After several tests with artificial data, we apply the resulting MILCA (Mutual Information based Least dependent Component Analysis) algorithm to a real-world dataset, the ECG of a pregnant woman. The software implementation of the MILCA algorithm is freely available at http://www.fz-juelich.de/nic/cs/softwareComment: 18 pages, 20 figures, Phys. Rev. E (in press

    Tarmo: A Framework for Parallelized Bounded Model Checking

    Full text link
    This paper investigates approaches to parallelizing Bounded Model Checking (BMC) for shared memory environments as well as for clusters of workstations. We present a generic framework for parallelized BMC named Tarmo. Our framework can be used with any incremental SAT encoding for BMC but for the results in this paper we use only the current state-of-the-art encoding for full PLTL. Using this encoding allows us to check both safety and liveness properties, contrary to an earlier work on distributing BMC that is limited to safety properties only. Despite our focus on BMC after it has been translated to SAT, existing distributed SAT solvers are not well suited for our application. This is because solving a BMC problem is not solving a set of independent SAT instances but rather involves solving multiple related SAT instances, encoded incrementally, where the satisfiability of each instance corresponds to the existence of a counterexample of a specific length. Our framework includes a generic architecture for a shared clause database that allows easy clause sharing between SAT solver threads solving various such instances. We present extensive experimental results obtained with multiple variants of our Tarmo implementation. Our shared memory variants have a significantly better performance than conventional single threaded approaches, which is a result that many users can benefit from as multi-core and multi-processor technology is widely available. Furthermore we demonstrate that our framework can be deployed in a typical cluster of workstations, where several multi-core machines are connected by a network

    Kokeita automaattiruokinnan järjestämiseksi meijerisikalassa. II

    Get PDF
    Meijerien Keskusosuusliike Valion Ylitornion meijerin sikalassa suoritetuissa toisen vuoden ruokintakokeissa, joissa oli verrattavana a) perusrehu ynnä tuore hera, b) perusrehu + kaurankuorijauholisä ynnä tuore hera sekä c) perusrehu ynnä kuivattu hera, saatiin seuraavat tulokset. Kun kuivarehuseoksen raakakuitupitoisuus nostettiin 6.9 %:sta 9.2 %:iin lisäämällä 40 kilon elopainorajasta alkaen perusrehuun 10 % kaurankuorijauhoja, tämä toimenpide ei pienentänyt kuivarehun eikä lisännyt heran kulutusta. Seurauksena oli jonkinverran pienempi lisäkasvu ja suurempi suhteellinen rehunkulutus kuin vertailuryhmällä. Osittain tämä tulos voi johtua siitä, että heran kulutus jäi tässä kokeessa odotettua pienemmäksi eli vain noin 1200 kiloon eläintä kohden. Kun tuoreen heran sijasta käytettiin kuivattua heraa 50 % kuivarehuseoksessa, niin kokeen alkuvaiheessa 45 kg:n elopainorajaan saakka kuivatulla heralla näytti olevan kasvunopeuteen ja rehun hyväksikäyttöön edullisia vaikutuksia tuoreeseen heraan verrattuna. Suuremmille sioille syötettynä kuivattu hera taas osoittautui eläinten kasvunopeuden nojalla arvostellen vähemmän edulliseksi, mutta tällöinkin kuivatulla heralla saavutettiin kuitenkin käytettyä ry:ä kohden keskimäärin suhteellisesti parempi tuotantovaikutus kuin tuoreella heralla
    corecore