10,702 research outputs found
Missing Data Imputation and Corrected Statistics for Large-Scale Behavioral Databases
This paper presents a new methodology to solve problems resulting from
missing data in large-scale item performance behavioral databases. Useful
statistics corrected for missing data are described, and a new method of
imputation for missing data is proposed. This methodology is applied to the DLP
database recently published by Keuleers et al. (2010), which allows us to
conclude that this database fulfills the conditions of use of the method
recently proposed by Courrieu et al. (2011) to test item performance models.
Two application programs in Matlab code are provided for the imputation of
missing data in databases, and for the computation of corrected statistics to
test models.Comment: Behavior Research Methods (2011) in pres
Bandit Algorithms for Tree Search
Bandit based methods for tree search have recently gained popularity when
applied to huge trees, e.g. in the game of go (Gelly et al., 2006). The UCT
algorithm (Kocsis and Szepesvari, 2006), a tree search method based on Upper
Confidence Bounds (UCB) (Auer et al., 2002), is believed to adapt locally to
the effective smoothness of the tree. However, we show that UCT is too
``optimistic'' in some cases, leading to a regret O(exp(exp(D))) where D is the
depth of the tree. We propose alternative bandit algorithms for tree search.
First, a modification of UCT using a confidence sequence that scales
exponentially with the horizon depth is proven to have a regret O(2^D
\sqrt{n}), but does not adapt to possible smoothness in the tree. We then
analyze Flat-UCB performed on the leaves and provide a finite regret bound with
high probability. Then, we introduce a UCB-based Bandit Algorithm for Smooth
Trees which takes into account actual smoothness of the rewards for performing
efficient ``cuts'' of sub-optimal branches with high confidence. Finally, we
present an incremental tree search version which applies when the full tree is
too big (possibly infinite) to be entirely represented and show that with high
probability, essentially only the optimal branches is indefinitely developed.
We illustrate these methods on a global optimization problem of a Lipschitz
function, given noisy data
Therapeutic target discovery using Boolean network attractors: avoiding pathological phenotypes
Target identification, one of the steps of drug discovery, aims at
identifying biomolecules whose function should be therapeutically altered in
order to cure the considered pathology. This work proposes an algorithm for in
silico target identification using Boolean network attractors. It assumes that
attractors of dynamical systems, such as Boolean networks, correspond to
phenotypes produced by the modeled biological system. Under this assumption,
and given a Boolean network modeling a pathophysiology, the algorithm
identifies target combinations able to remove attractors associated with
pathological phenotypes. It is tested on a Boolean model of the mammalian cell
cycle bearing a constitutive inactivation of the retinoblastoma protein, as
seen in cancers, and its applications are illustrated on a Boolean model of
Fanconi anemia. The results show that the algorithm returns target combinations
able to remove attractors associated with pathological phenotypes and then
succeeds in performing the proposed in silico target identification. However,
as with any in silico evidence, there is a bridge to cross between theory and
practice, thus requiring it to be used in combination with wet lab experiments.
Nevertheless, it is expected that the algorithm is of interest for target
identification, notably by exploiting the inexpensiveness and predictive power
of computational approaches to optimize the efficiency of costly wet lab
experiments.Comment: Since the publication of this article and among the possible
improvements mentioned in the Conclusion, two improvements have been done:
extending the algorithm for multivalued logic and considering the basins of
attraction of the pathological attractors for selecting the therapeutic
bullet
Interacting Markov chain Monte Carlo methods for solving nonlinear measure-valued equations
We present a new class of interacting Markov chain Monte Carlo algorithms for
solving numerically discrete-time measure-valued equations. The associated
stochastic processes belong to the class of self-interacting Markov chains. In
contrast to traditional Markov chains, their time evolutions depend on the
occupation measure of their past values. This general methodology allows us to
provide a natural way to sample from a sequence of target probability measures
of increasing complexity. We develop an original theoretical analysis to
analyze the behavior of these iterative algorithms which relies on
measure-valued processes and semigroup techniques. We establish a variety of
convergence results including exponential estimates and a uniform convergence
theorem with respect to the number of target distributions. We also illustrate
these algorithms in the context of Feynman-Kac distribution flows.Comment: Published in at http://dx.doi.org/10.1214/09-AAP628 the Annals of
Applied Probability (http://www.imstat.org/aap/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Indecomposable polynomials and their spectrum
We address some questions concerning indecomposable polynomials and their
spectrum. How does the spectrum behave via reduction or specialization, or via
a more general ring morphism? Are the indecomposability properties equivalent
over a field and over its algebraic closure? How many polynomials are
decomposable over a finite field?Comment: 22 page
A Bayesian alternative to mutual information for the hierarchical clustering of dependent random variables
The use of mutual information as a similarity measure in agglomerative
hierarchical clustering (AHC) raises an important issue: some correction needs
to be applied for the dimensionality of variables. In this work, we formulate
the decision of merging dependent multivariate normal variables in an AHC
procedure as a Bayesian model comparison. We found that the Bayesian
formulation naturally shrinks the empirical covariance matrix towards a matrix
set a priori (e.g., the identity), provides an automated stopping rule, and
corrects for dimensionality using a term that scales up the measure as a
function of the dimensionality of the variables. Also, the resulting log Bayes
factor is asymptotically proportional to the plug-in estimate of mutual
information, with an additive correction for dimensionality in agreement with
the Bayesian information criterion. We investigated the behavior of these
Bayesian alternatives (in exact and asymptotic forms) to mutual information on
simulated and real data. An encouraging result was first derived on
simulations: the hierarchical clustering based on the log Bayes factor
outperformed off-the-shelf clustering techniques as well as raw and normalized
mutual information in terms of classification accuracy. On a toy example, we
found that the Bayesian approaches led to results that were similar to those of
mutual information clustering techniques, with the advantage of an automated
thresholding. On real functional magnetic resonance imaging (fMRI) datasets
measuring brain activity, it identified clusters consistent with the
established outcome of standard procedures. On this application, normalized
mutual information had a highly atypical behavior, in the sense that it
systematically favored very large clusters. These initial experiments suggest
that the proposed Bayesian alternatives to mutual information are a useful new
tool for hierarchical clustering
- …