27,246 research outputs found
Hierarchical testing designs for pattern recognition
We explore the theoretical foundations of a ``twenty questions'' approach to
pattern recognition. The object of the analysis is the computational process
itself rather than probability distributions (Bayesian inference) or decision
boundaries (statistical learning). Our formulation is motivated by applications
to scene interpretation in which there are a great many possible explanations
for the data, one (``background'') is statistically dominant, and it is
imperative to restrict intensive computation to genuinely ambiguous regions.
The focus here is then on pattern filtering: Given a large set Y of possible
patterns or explanations, narrow down the true one Y to a small (random) subset
\hat Y\subsetY of ``detected'' patterns to be subjected to further, more
intense, processing. To this end, we consider a family of hypothesis tests for
Y\in A versus the nonspecific alternatives Y\in A^c. Each test has null type I
error and the candidate sets A\subsetY are arranged in a hierarchy of nested
partitions. These tests are then characterized by scope (|A|), power (or type
II error) and algorithmic cost. We consider sequential testing strategies in
which decisions are made iteratively, based on past outcomes, about which test
to perform next and when to stop testing. The set \hat Y is then taken to be
the set of patterns that have not been ruled out by the tests performed. The
total cost of a strategy is the sum of the ``testing cost'' and the
``postprocessing cost'' (proportional to |\hat Y|) and the corresponding
optimization problem is analyzed.Comment: Published at http://dx.doi.org/10.1214/009053605000000174 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
The State-of-the-arts in Focused Search
The continuous influx of various text data on the Web requires search engines to improve their retrieval abilities for more specific information. The need for relevant results to a user’s topic of interest has gone beyond search for domain or type specific documents to more focused result (e.g. document fragments or answers to a query). The introduction of XML provides a format standard for data representation, storage, and exchange. It helps focused search to be carried out at different granularities of a structured document with XML markups. This report aims at reviewing the state-of-the-arts in focused search, particularly techniques for topic-specific document retrieval, passage retrieval, XML retrieval, and entity ranking. It is concluded with highlight of open problems
A hybrid constraint programming and semidefinite programming approach for the stable set problem
This work presents a hybrid approach to solve the maximum stable set problem,
using constraint and semidefinite programming. The approach consists of two
steps: subproblem generation and subproblem solution. First we rank the
variable domain values, based on the solution of a semidefinite relaxation.
Using this ranking, we generate the most promising subproblems first, by
exploring a search tree using a limited discrepancy strategy. Then the
subproblems are being solved using a constraint programming solver. To
strengthen the semidefinite relaxation, we propose to infer additional
constraints from the discrepancy structure. Computational results show that the
semidefinite relaxation is very informative, since solutions of good quality
are found in the first subproblems, or optimality is proven immediately.Comment: 14 page
Approximative filtering of XML documents in a publish/subscribe system
Publish/subscribe systems filter published documents and inform their subscribers about documents matching their interests. Recent systems have focussed on documents or messages sent in XML format. Subscribers have to be familiar with the underlying XML format to create meaningful subscriptions. A service might support several providers with slightly differing formats, e.g., several publishers of books. This makes the definition of a successful subscription almost impossible. This paper proposes the use of an approximative language for subscriptions. We introduce the design of our ApproXFilter algorithm for approximative filtering in a publish/subscribe system. We present the results of our performance analysis of a prototypical implementation
Postponing Branching Decisions
Solution techniques for Constraint Satisfaction and Optimisation Problems
often make use of backtrack search methods, exploiting variable and value
ordering heuristics. In this paper, we propose and analyse a very simple method
to apply in case the value ordering heuristic produces ties: postponing the
branching decision. To this end, we group together values in a tie, branch on
this sub-domain, and defer the decision among them to lower levels of the
search tree. We show theoretically and experimentally that this simple
modification can dramatically improve the efficiency of the search strategy.
Although in practise similar methods may have been applied already, to our
knowledge, no empirical or theoretical study has been proposed in the
literature to identify when and to what extent this strategy should be used.Comment: 11 pages, 3 figure
Scalable Parallel Numerical CSP Solver
We present a parallel solver for numerical constraint satisfaction problems
(NCSPs) that can scale on a number of cores. Our proposed method runs worker
solvers on the available cores and simultaneously the workers cooperate for the
search space distribution and balancing. In the experiments, we attained up to
119-fold speedup using 256 cores of a parallel computer.Comment: The final publication is available at Springe
- …