2,228 research outputs found
Fringe analysis for parallel MacroSplit insertion algorithms in 2--3 trees
We extend the fringe analysis (used to study the expected behavior of balanced search trees under sequential insertions) to deal with synchronous parallel insertions on 2--3 trees. Given an insertion of k keys in a tree with n nodes, the fringe evolves following a transition matrix whose coefficients take care of the precise form of the algorithm but does not depend on k or n. The derivation of this matrix uses the binomial transform recently developed by P. Poblete, J. Munro and Th. Papadakis. Due to the complexity of the preceding exact analysis, we develop also two approximations. A first one based on a simplified parallel model, and a second one based on the sequential model.
These two approximated analysis prove that the parallel insertions case does not differ significantly from the sequential case, namely
on the terms O(1/n^2).Postprint (published version
Counterexample-Guided Polynomial Loop Invariant Generation by Lagrange Interpolation
We apply multivariate Lagrange interpolation to synthesize polynomial
quantitative loop invariants for probabilistic programs. We reduce the
computation of an quantitative loop invariant to solving constraints over
program variables and unknown coefficients. Lagrange interpolation allows us to
find constraints with less unknown coefficients. Counterexample-guided
refinement furthermore generates linear constraints that pinpoint the desired
quantitative invariants. We evaluate our technique by several case studies with
polynomial quantitative loop invariants in the experiments
Query Chains: Learning to Rank from Implicit Feedback
This paper presents a novel approach for using clickthrough data to learn
ranked retrieval functions for web search results. We observe that users
searching the web often perform a sequence, or chain, of queries with a similar
information need. Using query chains, we generate new types of preference
judgments from search engine logs, thus taking advantage of user intelligence
in reformulating queries. To validate our method we perform a controlled user
study comparing generated preference judgments to explicit relevance judgments.
We also implemented a real-world search engine to test our approach, using a
modified ranking SVM to learn an improved ranking function from preference
data. Our results demonstrate significant improvements in the ranking given by
the search engine. The learned rankings outperform both a static ranking
function, as well as one trained without considering query chains.Comment: 10 page
MNP: R Package for Fitting the Multinomial Probit Model
MNP is a publicly available R package that fits the Bayesian multinomial probit model via Markov chain Monte Carlo. The multinomial probit model is often used to analyze the discrete choices made by individuals recorded in survey data. Examples where the multinomial probit model may be useful include the analysis of product choice by consumers in market research and the analysis of candidate or party choice by voters in electoral studies. The MNP software can also fit the model with different choice sets for each individual, and complete or partial individual choice orderings of the available alternatives from the choice set. The estimation is based on the efficient marginal data augmentation algorithm that is developed by Imai and van Dyk (2005).
Efficient & Effective Selective Query Rewriting with Efficiency Predictions
To enhance effectiveness, a user's query can be rewritten internally by the search engine in many ways, for example by applying proximity, or by expanding the query with related terms. However, approaches that benefit effectiveness often have a negative impact on efficiency, which has impacts upon the user satisfaction, if the query is excessively slow. In this paper, we propose a novel framework for using the predicted execution time of various query rewritings to select between alternatives on a per-query basis, in a manner that ensures both effectiveness and efficiency. In particular, we propose the prediction of the execution time of ephemeral (e.g., proximity) posting lists generated from uni-gram inverted index posting lists, which are used in establishing the permissible query rewriting alternatives that may execute in the allowed time. Experiments examining both the effectiveness and efficiency of the proposed approach demonstrate that a 49% decrease in mean response time (and 62% decrease in 95th-percentile response time) can be attained without significantly hindering the effectiveness of the search engine
Simplicial Homology of Random Configurations
Given a Poisson process on a -dimensional torus, its random geometric
simplicial complex is the complex whose vertices are the points of the Poisson
process and simplices are given by the \u{C}ech complex associated to the
coverage of each point. By means of Malliavin calculus, we compute explicitly
the n order moment of the number of -simplices. The two first order
moments of this quantity allow us to find the mean and the variance of the
Euler caracteristic. Also, we show that the number of any connected geometric
simplicial complex converges to the Gaussian law when the intensity of the
Poisson point process tends to infinity. We use a concentration inequality to
find bounds for the for the distribution of the Betti number of first order and
the Euler characteristic in such simplicial complex
Machine Learning at Microsoft with ML .NET
Machine Learning is transitioning from an art and science into a technology
available to every developer. In the near future, every application on every
platform will incorporate trained models to encode data-based decisions that
would be impossible for developers to author. This presents a significant
engineering challenge, since currently data science and modeling are largely
decoupled from standard software development processes. This separation makes
incorporating machine learning capabilities inside applications unnecessarily
costly and difficult, and furthermore discourage developers from embracing ML
in first place. In this paper we present ML .NET, a framework developed at
Microsoft over the last decade in response to the challenge of making it easy
to ship machine learning models in large software applications. We present its
architecture, and illuminate the application demands that shaped it.
Specifically, we introduce DataView, the core data abstraction of ML .NET which
allows it to capture full predictive pipelines efficiently and consistently
across training and inference lifecycles. We close the paper with a
surprisingly favorable performance study of ML .NET compared to more recent
entrants, and a discussion of some lessons learned
- …