276,151 research outputs found
Effective retrieval and new indexing method for case based reasoning: Application in chemical process design
In this paper we try to improve the retrieval step for case based reasoning for preliminary design. This improvement deals with three major parts of our CBR system. First, in the preliminary design step, some uncertainties like imprecise or unknown values remain in the description of the problem, because they need a deeper analysis to be withdrawn. To deal with this issue, the faced problem description is soften with the fuzzy sets theory. Features are described with a central value, a percentage of imprecision and a relation with respect to the central value. These additional data allow us to build a domain of possible values for each attributes. With this representation, the calculation of the similarity function is impacted, thus the characteristic function is used to calculate the local similarity between two features. Second, we focus our attention on the main goal of the retrieve step in CBR to find relevant cases for adaptation. In this second part, we discuss the assumption of similarity to find the more appropriated case. We put in highlight that in some situations this classical similarity must be improved with further knowledge to facilitate case adaptation. To avoid failure during the adaptation step, we implement a method that couples similarity measurement with adaptability one, in order to approximate the cases utility more accurately. The latter gives deeper information for the reusing of cases. In a last part, we present a generic indexing technique for the base, and a new algorithm for the research of relevant cases in the memory. The sphere indexing algorithm is a domain independent index that has performances equivalent to the decision tree ones. But its main strength is that it puts the current problem in the center of the research area avoiding boundaries issues. All these points are discussed and exemplified through the preliminary design of a chemical engineering unit operation
Learning optimization models in the presence of unknown relations
In a sequential auction with multiple bidding agents, it is highly
challenging to determine the ordering of the items to sell in order to maximize
the revenue due to the fact that the autonomy and private information of the
agents heavily influence the outcome of the auction.
The main contribution of this paper is two-fold. First, we demonstrate how to
apply machine learning techniques to solve the optimal ordering problem in
sequential auctions. We learn regression models from historical auctions, which
are subsequently used to predict the expected value of orderings for new
auctions. Given the learned models, we propose two types of optimization
methods: a black-box best-first search approach, and a novel white-box approach
that maps learned models to integer linear programs (ILP) which can then be
solved by any ILP-solver. Although the studied auction design problem is hard,
our proposed optimization methods obtain good orderings with high revenues.
Our second main contribution is the insight that the internal structure of
regression models can be efficiently evaluated inside an ILP solver for
optimization purposes. To this end, we provide efficient encodings of
regression trees and linear regression models as ILP constraints. This new way
of using learned models for optimization is promising. As the experimental
results show, it significantly outperforms the black-box best-first search in
nearly all settings.Comment: 37 pages. Working pape
Runtime Optimizations for Prediction with Tree-Based Models
Tree-based models have proven to be an effective solution for web ranking as
well as other problems in diverse domains. This paper focuses on optimizing the
runtime performance of applying such models to make predictions, given an
already-trained model. Although exceedingly simple conceptually, most
implementations of tree-based models do not efficiently utilize modern
superscalar processor architectures. By laying out data structures in memory in
a more cache-conscious fashion, removing branches from the execution flow using
a technique called predication, and micro-batching predictions using a
technique called vectorization, we are able to better exploit modern processor
architectures and significantly improve the speed of tree-based models over
hard-coded if-else blocks. Our work contributes to the exploration of
architecture-conscious runtime implementations of machine learning algorithms
Evaluation of an automatic f-structure annotation algorithm against the PARC 700 dependency bank
An automatic method for annotating the Penn-II Treebank (Marcus et al., 1994) with high-level Lexical Functional Grammar (Kaplan and Bresnan, 1982; Bresnan, 2001; Dalrymple, 2001) f-structure representations is described in (Cahill et al., 2002; Cahill et al., 2004a; Cahill et al., 2004b; O’Donovan et al., 2004). The annotation algorithm and the automatically-generated f-structures are the basis for the automatic acquisition of wide-coverage and robust probabilistic approximations of LFG grammars (Cahill et al., 2002; Cahill et al., 2004a) and for the induction of LFG semantic forms (O’Donovan et al., 2004). The quality of the annotation algorithm and the f-structures it generates is, therefore, extremely important. To date, annotation quality has been measured in terms of precision and recall against the DCU 105. The annotation algorithm currently achieves an f-score of 96.57% for complete f-structures and 94.3% for preds-only
f-structures. There are a number of problems with evaluating against a gold standard of this size, most
notably that of overfitting. There is a risk of assuming that the gold standard is a complete and balanced
representation of the linguistic phenomena in a language and basing design decisions on this. It is, therefore,
preferable to evaluate against a more extensive, external standard. Although the DCU 105 is publicly available,
1 a larger well-established external standard can provide a more widely-recognised benchmark against which the quality of the f-structure annotation algorithm can be evaluated. For these reasons, we present an evaluation of the f-structure annotation algorithm of (Cahill et al., 2002; Cahill et al., 2004a; Cahill et al., 2004b; O’Donovan et al., 2004) against the PARC 700 Dependency Bank (King et al., 2003). Evaluation against an external gold standard is a non-trivial task as linguistic analyses may differ systematically between the gold standard and the output to be evaluated as regards feature geometry and nomenclature. We present conversion software to automatically account for many (but not all) of the systematic differences. Currently, we achieve an f-score of 87.31% for the f-structures generated from the original Penn-II trees and
an f-score of 81.79% for f-structures from parse trees produced by Charniak’s (2000) parser in our pipeline
parsing architecture against the PARC 700
- …