10,902 research outputs found

    Missing Data Imputation and Corrected Statistics for Large-Scale Behavioral Databases

    Full text link
    This paper presents a new methodology to solve problems resulting from missing data in large-scale item performance behavioral databases. Useful statistics corrected for missing data are described, and a new method of imputation for missing data is proposed. This methodology is applied to the DLP database recently published by Keuleers et al. (2010), which allows us to conclude that this database fulfills the conditions of use of the method recently proposed by Courrieu et al. (2011) to test item performance models. Two application programs in Matlab code are provided for the imputation of missing data in databases, and for the computation of corrected statistics to test models.Comment: Behavior Research Methods (2011) in pres

    Bandit Algorithms for Tree Search

    Get PDF
    Bandit based methods for tree search have recently gained popularity when applied to huge trees, e.g. in the game of go (Gelly et al., 2006). The UCT algorithm (Kocsis and Szepesvari, 2006), a tree search method based on Upper Confidence Bounds (UCB) (Auer et al., 2002), is believed to adapt locally to the effective smoothness of the tree. However, we show that UCT is too ``optimistic'' in some cases, leading to a regret O(exp(exp(D))) where D is the depth of the tree. We propose alternative bandit algorithms for tree search. First, a modification of UCT using a confidence sequence that scales exponentially with the horizon depth is proven to have a regret O(2^D \sqrt{n}), but does not adapt to possible smoothness in the tree. We then analyze Flat-UCB performed on the leaves and provide a finite regret bound with high probability. Then, we introduce a UCB-based Bandit Algorithm for Smooth Trees which takes into account actual smoothness of the rewards for performing efficient ``cuts'' of sub-optimal branches with high confidence. Finally, we present an incremental tree search version which applies when the full tree is too big (possibly infinite) to be entirely represented and show that with high probability, essentially only the optimal branches is indefinitely developed. We illustrate these methods on a global optimization problem of a Lipschitz function, given noisy data

    Therapeutic target discovery using Boolean network attractors: avoiding pathological phenotypes

    Get PDF
    Target identification, one of the steps of drug discovery, aims at identifying biomolecules whose function should be therapeutically altered in order to cure the considered pathology. This work proposes an algorithm for in silico target identification using Boolean network attractors. It assumes that attractors of dynamical systems, such as Boolean networks, correspond to phenotypes produced by the modeled biological system. Under this assumption, and given a Boolean network modeling a pathophysiology, the algorithm identifies target combinations able to remove attractors associated with pathological phenotypes. It is tested on a Boolean model of the mammalian cell cycle bearing a constitutive inactivation of the retinoblastoma protein, as seen in cancers, and its applications are illustrated on a Boolean model of Fanconi anemia. The results show that the algorithm returns target combinations able to remove attractors associated with pathological phenotypes and then succeeds in performing the proposed in silico target identification. However, as with any in silico evidence, there is a bridge to cross between theory and practice, thus requiring it to be used in combination with wet lab experiments. Nevertheless, it is expected that the algorithm is of interest for target identification, notably by exploiting the inexpensiveness and predictive power of computational approaches to optimize the efficiency of costly wet lab experiments.Comment: Since the publication of this article and among the possible improvements mentioned in the Conclusion, two improvements have been done: extending the algorithm for multivalued logic and considering the basins of attraction of the pathological attractors for selecting the therapeutic bullet

    Interacting Markov chain Monte Carlo methods for solving nonlinear measure-valued equations

    Get PDF
    We present a new class of interacting Markov chain Monte Carlo algorithms for solving numerically discrete-time measure-valued equations. The associated stochastic processes belong to the class of self-interacting Markov chains. In contrast to traditional Markov chains, their time evolutions depend on the occupation measure of their past values. This general methodology allows us to provide a natural way to sample from a sequence of target probability measures of increasing complexity. We develop an original theoretical analysis to analyze the behavior of these iterative algorithms which relies on measure-valued processes and semigroup techniques. We establish a variety of convergence results including exponential estimates and a uniform convergence theorem with respect to the number of target distributions. We also illustrate these algorithms in the context of Feynman-Kac distribution flows.Comment: Published in at http://dx.doi.org/10.1214/09-AAP628 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Indecomposable polynomials and their spectrum

    Full text link
    We address some questions concerning indecomposable polynomials and their spectrum. How does the spectrum behave via reduction or specialization, or via a more general ring morphism? Are the indecomposability properties equivalent over a field and over its algebraic closure? How many polynomials are decomposable over a finite field?Comment: 22 page

    A Bayesian alternative to mutual information for the hierarchical clustering of dependent random variables

    Full text link
    The use of mutual information as a similarity measure in agglomerative hierarchical clustering (AHC) raises an important issue: some correction needs to be applied for the dimensionality of variables. In this work, we formulate the decision of merging dependent multivariate normal variables in an AHC procedure as a Bayesian model comparison. We found that the Bayesian formulation naturally shrinks the empirical covariance matrix towards a matrix set a priori (e.g., the identity), provides an automated stopping rule, and corrects for dimensionality using a term that scales up the measure as a function of the dimensionality of the variables. Also, the resulting log Bayes factor is asymptotically proportional to the plug-in estimate of mutual information, with an additive correction for dimensionality in agreement with the Bayesian information criterion. We investigated the behavior of these Bayesian alternatives (in exact and asymptotic forms) to mutual information on simulated and real data. An encouraging result was first derived on simulations: the hierarchical clustering based on the log Bayes factor outperformed off-the-shelf clustering techniques as well as raw and normalized mutual information in terms of classification accuracy. On a toy example, we found that the Bayesian approaches led to results that were similar to those of mutual information clustering techniques, with the advantage of an automated thresholding. On real functional magnetic resonance imaging (fMRI) datasets measuring brain activity, it identified clusters consistent with the established outcome of standard procedures. On this application, normalized mutual information had a highly atypical behavior, in the sense that it systematically favored very large clusters. These initial experiments suggest that the proposed Bayesian alternatives to mutual information are a useful new tool for hierarchical clustering
    • …
    corecore