Search CORE

61 research outputs found

Structured parameter estimation for LFG-DOP using Backoff

Author: Hearne Mary
Sima'an Khalil
Publication venue
Publication date: 01/01/2003
Field of study

Despite its state-of-the-art performance, the Data Oriented Parsing (DOP) model has been shown to suffer from biased parameter estimation, and the good performance seems more the result of ad hoc adjustments than correct probabilistic generalization over the data. In recent work, we developed a new estimation procedure, called Backoff Estimation, for DOP models that are based on Phrase-Structure annotations (so called Tree-DOP models). Backoff Estimation deviates from earlier methods in that it treats the model parameters as a highly structured space of correlated events (backoffs), rather than a set of disjoint events. In this paper we show that the problem of biased estimates also holds for DOP models that are based on Lexical-Functional Grammar annotations (i.e. LFG-DOP), and that the LFG-DOP parameters also constitute a hierarchically structured space. Subsequently, we adapt the Backoff Estimation algorithm from Tree-DOP to LFG-DOP models. Backoff Estimation turns out to be a natural solution to some of the specific problems of robust parsing under LFGDOP

Irish Universities

DCU Online Research Access Service

Disambiguation strategies for data-oriented translation

Author: Hearne Mary
Way Andy
Publication venue
Publication date: 01/01/2006
Field of study

The Data-Oriented Translation (DOT) model { originally proposed in (Poutsma, 1998, 2003) and based on Data-Oriented Parsing (DOP) (e.g. (Bod, Scha, & Sima'an, 2003)) { is best described as a hybrid model of translation as it combines examples, linguistic information and a statistical translation model. Although theoretically interesting, it inherits the computational complexity associated with DOP. In this paper, we focus on one computational challenge for this model: efficiently selecting the `best' translation to output. We present four different disambiguation strategies in terms of how they are implemented in our DOT system, along with experiments which investigate how they compare in terms of accuracy and efficiency

CiteSeerX

Irish Universities

DCU Online Research Access Service

Data Mining : Masa Lalu, Sekarang, dan Masa Mendatang

Author: Purba R. (Ronsen)
Publication venue: None
Publication date: 01/01/2012
Field of study

Data mining telah menjadi disiplin ilmu yang dibangun dalam domain kecerdasan buatan (AI), dan rekayasa pengetahuan (KE). Data mining berakar pada machine learning dan statistika, tetapi merambah bidang lain dalam ilmu komputer dan ilmu lainnya seperti biologi, lingkungan, finansial, jaringan dan sebagainya. Data mining telah mendapatkan begitu besar perhatian pada dekade terakhir sehubungan dengan perkembangan hardware yang menyediakan kemampuan komputasi luar biasa yang memungkinkan pengolahan data besar. Tidak seperti kajian lain dalam AI dan KE, data mining dapat diperdebatkan sebagai sebuah aplikasi dibandingkan dengan sebuah teknologi, dengan demikian diharapkan akan menjadi topik yang hangat dibahas di masa mendatang, mengingat pertumbuhan data yang bersifat eksponensial. Paper ini memberikan kilas Balik perjalanan sejarah data mining, keadaan saat ini dan beberapa pandangan dan perkembangan ke depan

Neliti

E-Jurnal Mikroskil (STMIK - STIE Mikroskil)

Exploring Features and Classifiers for Dialogue Act Segmentation

Author: op den Akker Harm
op den Akker Hendrikus J.A.
Schulz Christian
Publication venue: Springer
Publication date: 20/09/2008
Field of study

University of Twente Research Information

Recommended from our members

Ontology Based Query Expansion with a Probabilistic Retrieval Model

Author: J. Bhogal
K. Sparck-Jones
S.E. Robertson
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

This paper examines the use of ontologies for defining query context. The information retrieval system used is based on the probabilistic retrieval model. We extend the use of relevance feedback (RFB) and pseudo-relevance feedback (PF) query expansion techniques using information from a news domain ontology. The aim is to assess the impact of the ontology on the query expansion results with respect to recall and precision. We also tested the results for varying the relevance feedback parameters (number of terms or number of documents). The factors which influence the success of ontology based query expansion are outlined. Our findings show that ontology based query expansion has had mixed success. The use of the ontology has vastly increased the number of relevant documents retrieved, however, we conclude that for both types of query expansion, the PF results are better than the RFB results

City Research Online

Crossref

Birmingham City University Open Access Repository

BCU Open Access

Are you being addressed?: real-time addressee detection to support remote participants in hybrid meetings

Author: op den Akker Harm
op den Akker Hendrikus J.A.
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 11/09/2009
Field of study

A meeting assistant agent for (remote) participants in hybrid meetings has been developed. Its task is to monitor the meeting conversation and notify the user when he is being addressed. This paper presents the experiments that have been performed to develop machine classifiers to decide if “You are being addressed��? where “You��? refers to a fixed (remote) participant in a meeting. The experimental results back up the choices made regarding the selection of data, features, and classification methods. We discuss variations of the addressee classification problem that have been considered in the literature and how suitable they are for addressing detection in a system that plays a role in a live meeting

University of Twente Research Information

Data selection based on decision tree for SVM classification on large data sets

Author: Cervantes Canales Jair
Cervantes Canales Jair
García Lamont Farid
García Lamont Farid
LOPEZ CHAU ASDRUBAL
LOPEZ CHAU ASDRUBAL
Rodríguez Mazahua Lisbeth
Rodríguez Mazahua Lisbeth
RUIZ CASTILLA JOSE SERGIO
RUIZ CASTILLA JOSE SERGIO
Publication venue: 'Elsevier BV'
Publication date: 18/08/2015
Field of study

Support Vector Machine (SVM) has important properties such as a strong mathematical background and a better generalization capability with respect to other classification methods. On the other hand, the major drawback of SVM occurs in its training phase, which is computationally expensive and highly dependent on the size of input data set. In this study, a new algorithm to speed up the training time of SVM is presented; this method selects a small and representative amount of data from data sets to improve training time of SVM. The novel method uses an induction tree to reduce the training data set for SVM, producing a very fast and high-accuracy algorithm. According to the results, the proposed algorithm produces results with similar accuracy and in a faster way than the current SVM implementations.Proyecto UAEM 3771/2014/C

Red Mexicana de Repositorios Institucionales

Repositorio Institucional de la Universidad Autónoma del Estado de México

A Frame Work for Text Mining using Learned Information Extraction System

Author: Sathish Kuppani
Publication venue: Global Journals Inc. (US)
Publication date: 15/05/2016
Field of study

Text mining is a very exciting research area as it tries to discover knowledge from unstructured texts These texts can be found on a computer desktop intranets and the internet The aim of this paper is to give an overview of text mining in the contexts of its techniques application domains and the most challenging issue The Learned Information Extraction LIE is about locating specific items in natural-language documents This paper presents a framework for text mining called DTEX Discovery Text Extraction using a learned information extraction system to transform text into more structured data which is then mined for interesting relationships The initial version of DTEX integrates an LIE module acquired by an LIE learning system and a standard rule induction module In addition rules mined from a database extracted from a corpus of texts are used to predict additional information to extract from future documents thereby improving the recall of the underlying extraction system Applying these techniques best results are presented to a corpus of computer job announcement postings from an Internet newsgrou

Global Journal of Computer Science and Technology (GJCST)