61 research outputs found
Structured parameter estimation for LFG-DOP using Backoff
Despite its state-of-the-art performance, the Data Oriented
Parsing (DOP) model has been shown to suffer from biased parameter estimation, and the good performance seems more the result of ad hoc adjustments than correct probabilistic generalization over the data. In recent work, we developed a new estimation procedure, called Backoff Estimation, for
DOP models that are based on Phrase-Structure annotations
(so called Tree-DOP models). Backoff Estimation deviates from earlier methods in that it treats the model parameters as a highly structured space of correlated events (backoffs), rather than a set of disjoint events. In this paper we show that the problem of biased estimates also holds for DOP models that are based on Lexical-Functional Grammar annotations (i.e. LFG-DOP), and that the LFG-DOP parameters also constitute a hierarchically structured space. Subsequently, we adapt the Backoff Estimation algorithm from Tree-DOP to LFG-DOP models. Backoff
Estimation turns out to be a natural solution to some
of the specific problems of robust parsing under LFGDOP
Disambiguation strategies for data-oriented translation
The Data-Oriented Translation (DOT) model { originally proposed in (Poutsma, 1998, 2003) and based on Data-Oriented Parsing (DOP) (e.g. (Bod, Scha, & Sima'an, 2003)) { is best described as a hybrid model of
translation as it combines examples, linguistic information and a statistical translation model. Although theoretically interesting, it inherits the computational complexity associated with DOP. In this paper, we focus on
one computational challenge for this model: efficiently selecting the `best' translation to output. We present four different disambiguation strategies in terms of how they are implemented in our DOT system, along with experiments
which investigate how they compare in terms of accuracy and
efficiency
Data Mining : Masa Lalu, Sekarang, dan Masa Mendatang
Data mining telah menjadi disiplin ilmu yang dibangun dalam domain kecerdasan buatan (AI), dan rekayasa pengetahuan (KE). Data mining berakar pada machine learning dan statistika, tetapi merambah bidang lain dalam ilmu komputer dan ilmu lainnya seperti biologi, lingkungan, finansial, jaringan dan sebagainya. Data mining telah mendapatkan begitu besar perhatian pada dekade terakhir sehubungan dengan perkembangan hardware yang menyediakan kemampuan komputasi luar biasa yang memungkinkan pengolahan data besar. Tidak seperti kajian lain dalam AI dan KE, data mining dapat diperdebatkan sebagai sebuah aplikasi dibandingkan dengan sebuah teknologi, dengan demikian diharapkan akan menjadi topik yang hangat dibahas di masa mendatang, mengingat pertumbuhan data yang bersifat eksponensial. Paper ini memberikan kilas Balik perjalanan sejarah data mining, keadaan saat ini dan beberapa pandangan dan perkembangan ke depan
Recommended from our members
Ontology Based Query Expansion with a Probabilistic Retrieval Model
This paper examines the use of ontologies for defining query context. The information retrieval system used is based on the probabilistic retrieval model. We extend the use of relevance feedback (RFB) and pseudo-relevance feedback (PF) query expansion techniques using information from a news domain ontology. The aim is to assess the impact of the ontology on the query expansion results with respect to recall and precision. We also tested the results for varying the relevance feedback parameters (number of terms or number of documents). The factors which influence the success of ontology based query expansion are outlined. Our findings show that ontology based query expansion has had mixed success. The use of the ontology has vastly increased the number of relevant documents retrieved, however, we conclude that for both types of query expansion, the PF results are better than the RFB results
Are you being addressed?: real-time addressee detection to support remote participants in hybrid meetings
A meeting assistant agent for (remote) participants in hybrid meetings has been developed. Its task is to monitor the meeting conversation and notify the user when he is being addressed. This paper presents the experiments that have been performed to develop machine classifiers to decide if “You are being addressed��? where “You��? refers to a fixed (remote) participant in a meeting. The experimental results back up the choices made regarding the selection of data, features, and classification methods. We discuss variations of the addressee classification problem that have been considered in the literature and how suitable they are for addressing detection in a system that plays a role in a live meeting
Data selection based on decision tree for SVM classification on large data sets
Support Vector Machine (SVM) has important properties such as a strong mathematical background and a better generalization capability with respect to other classification methods. On the other hand, the major drawback of SVM occurs in its training phase, which is computationally expensive and highly dependent on the size of input data set. In this study, a new algorithm to speed up the training time of SVM is presented; this method selects a small and representative amount of data from data sets to improve training time of SVM. The novel method uses an induction tree to reduce the training data set for SVM, producing a very fast and high-accuracy algorithm. According to the results, the proposed algorithm produces results with similar accuracy and in a faster way than the current SVM implementations.Proyecto UAEM 3771/2014/C
A Frame Work for Text Mining using Learned Information Extraction System
Text mining is a very exciting research area as it tries to discover knowledge from unstructured texts These texts can be found on a computer desktop intranets and the internet The aim of this paper is to give an overview of text mining in the contexts of its techniques application domains and the most challenging issue The Learned Information Extraction LIE is about locating specific items in natural-language documents This paper presents a framework for text mining called DTEX Discovery Text Extraction using a learned information extraction system to transform text into more structured data which is then mined for interesting relationships The initial version of DTEX integrates an LIE module acquired by an LIE learning system and a standard rule induction module In addition rules mined from a database extracted from a corpus of texts are used to predict additional information to extract from future documents thereby improving the recall of the underlying extraction system Applying these techniques best results are presented to a corpus of computer job announcement postings from an Internet newsgrou
- …