Search CORE

18,190 research outputs found

Boosting Applied to Word Sense Disambiguation

Author: Escudero Gerard
Marquez Lluis
Rigau German
Publication venue
Publication date: 01/01/2000
Field of study

In this paper Schapire and Singer's AdaBoost.MH boosting algorithm is applied to the Word Sense Disambiguation (WSD) problem. Initial experiments on a set of 15 selected polysemous words show that the boosting approach surpasses Naive Bayes and Exemplar-based approaches, which represent state-of-the-art accuracy on supervised WSD. In order to make boosting practical for a real learning domain of thousands of words, several ways of accelerating the algorithm by reducing the feature space are studied. The best variant, which we call LazyBoosting, is tested on the largest sense-tagged corpus available containing 192,800 examples of the 191 most frequent and ambiguous English words. Again, boosting compares favourably to the other benchmark algorithms.Comment: 12 page

arXiv.org e-Print Archive

CiteSeerX

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Recommended from our members

Ensemble learning of model hyperparameters and spatiotemporal data for calibration of low-cost PM2.5 sensors.

Author: Bhanu Bir
Day Rong-Fuh
Tsai Chih-Chun
Tung Ching-Ying
Yin Peng-Yeng
Publication venue: eScholarship, University of California
Publication date: 01/07/2019
Field of study

he PM2.5 air quality index (AQI) measurements from government-built supersites are accurate but cannot provide a dense coverage of monitoring areas. Low-cost PM2.5 sensors can be used to deploy a fine-grained internet-of-things (IoT) as a complement to government facilities. Calibration of low-cost sensors by reference to high-accuracy supersites is thus essential. Moreover, the imputation for missing-value in training data may affect the calibration result, the best performance of calibration model requires hyperparameter optimization, and the affecting factors of PM2.5 concentrations such as climate, geographical landscapes and anthropogenic activities are uncertain in spatial and temporal dimensions. In this paper, an ensemble learning for imputation method selection, calibration model hyperparameterization, and spatiotemporal training data composition is proposed. Three government supersites are chosen in central Taiwan for the deployment of low-cost sensors and hourly PM2.5 measurements are collected for 60 days for conducting experiments. Three optimizers, Sobol sequence, Nelder and Meads, and particle swarm optimization (PSO), are compared for evaluating their performances with various versions of ensembles. The best calibration results are obtained by using PSO, and the improvement ratios with respect to R2, RMSE, and NME, are 4.92%, 52.96%, and 56.85%, respectively

eScholarship - University of California

Storing and Querying Probabilistic XML Using a Probabilistic Relational DBMS

Author: Hollander E.S.
Keulen M. van
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2010
Field of study

This work explores the feasibility of storing and querying probabilistic XML in a probabilistic relational database. Our approach is to adapt known techniques for mapping XML to relational data such that the possible worlds are preserved. We show that this approach can work for any XML-to-relational technique by adapting a representative schema-based (inlining) as well as a representative schemaless technique (XPath Accelerator). We investigate the maturity of probabilistic rela- tional databases for this task with experiments with one of the state-of- the-art systems, called Trio

University of Twente Research Information

Liveness of Randomised Parameterised Systems under Arbitrary Schedulers (Technical Report)

Author: Lin Anthony W.
Ruemmer Philipp
Publication venue
Publication date: 01/01/2016
Field of study

We consider the problem of verifying liveness for systems with a finite, but unbounded, number of processes, commonly known as parameterised systems. Typical examples of such systems include distributed protocols (e.g. for the dining philosopher problem). Unlike the case of verifying safety, proving liveness is still considered extremely challenging, especially in the presence of randomness in the system. In this paper we consider liveness under arbitrary (including unfair) schedulers, which is often considered a desirable property in the literature of self-stabilising systems. We introduce an automatic method of proving liveness for randomised parameterised systems under arbitrary schedulers. Viewing liveness as a two-player reachability game (between Scheduler and Process), our method is a CEGAR approach that synthesises a progress relation for Process that can be symbolically represented as a finite-state automaton. The method is incremental and exploits both Angluin-style L*-learning and SAT-solvers. Our experiments show that our algorithm is able to prove liveness automatically for well-known randomised distributed protocols, including Lehmann-Rabin Randomised Dining Philosopher Protocol and randomised self-stabilising protocols (such as the Israeli-Jalfon Protocol). To the best of our knowledge, this is the first fully-automatic method that can prove liveness for randomised protocols.Comment: Full version of CAV'16 pape

arXiv.org e-Print Archive

Publikationer från Uppsala Universitet

Oxford University Research Archive

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Learning to Prove Safety over Parameterised Concurrent Systems (Full Version)

Author: bertrand
boigelot
boigelot
bollig
donaldson
esparza
fokkink
kearns
legay
lehmann
lin
neider
nilsson
vardhan
Publication venue
Publication date: 02/10/2017
Field of study

We revisit the classic problem of proving safety over parameterised concurrent systems, i.e., an infinite family of finite-state concurrent systems that are represented by some finite (symbolic) means. An example of such an infinite family is a dining philosopher protocol with any number n of processes (n being the parameter that defines the infinite family). Regular model checking is a well-known generic framework for modelling parameterised concurrent systems, where an infinite set of configurations (resp. transitions) is represented by a regular set (resp. regular transducer). Although verifying safety properties in the regular model checking framework is undecidable in general, many sophisticated semi-algorithms have been developed in the past fifteen years that can successfully prove safety in many practical instances. In this paper, we propose a simple solution to synthesise regular inductive invariants that makes use of Angluin's classic L* algorithm (and its variants). We provide a termination guarantee when the set of configurations reachable from a given set of initial configurations is regular. We have tested L* algorithm on standard (as well as new) examples in regular model checking including the dining philosopher protocol, the dining cryptographer protocol, and several mutual exclusion protocols (e.g. Bakery, Burns, Szymanski, and German). Our experiments show that, despite the simplicity of our solution, it can perform at least as well as existing semi-algorithms.Comment: Full version of FMCAD'17 pape

arXiv.org e-Print Archive

Crossref