Search CORE

1,482 research outputs found

Dictionary writing system (DWS) plus corpus query package (CQP): the case of TshwaneLex

Author: DE PAUW Guy
de Schryver Gilles-Maurice
Publication venue
Publication date: 01/01/2007
Field of study

In this article the integrated corpus query functionality of the dictionary compilation software TshwanelLex is analysed. Attention is given to the handling of both raw corpus data and annotated corpus data. With regard to the latter it is shown how, with a minimum of human effort, machine learning techniques can be employed to obtain part-of-speech tagged corpora that can be used for lexicographic purposes. All points are illustrated with data drawn from English and Northern Sotho. The tools and techniques themselves, however, are language-independent, and as Such the encouraging outcomes of this study are far-reaching

Ghent University Academic Bibliography

Human-Level Performance on Word Analogy Questions by Latent Relational Analysis

Author: Turney Peter D.
Publication venue
Publication date: 01/01/2004
Field of study

This paper introduces Latent Relational Analysis (LRA), a method for measuring relational similarity. LRA has potential applications in many areas, including information extraction, word sense disambiguation, machine translation, and information retrieval. Relational similarity is correspondence between relations, in contrast with attributional similarity, which is correspondence between attributes. When two words have a high degree of attributional similarity, we call them synonyms. When two pairs of words have a high degree of relational similarity, we say that their relations are analogous. For example, the word pair mason/stone is analogous to the pair carpenter/wood; the relations between mason and stone are highly similar to the relations between carpenter and wood. Past work on semantic similarity measures has mainly been concerned with attributional similarity. For instance, Latent Semantic Analysis (LSA) can measure the degree of similarity between two words, but not between two relations. Recently the Vector Space Model (VSM) of information retrieval has been adapted to the task of measuring relational similarity, achieving a score of 47% on a collection of 374 college-level multiple-choice word analogy questions. In the VSM approach, the relation between a pair of words is characterized by a vector of frequencies of predefined patterns in a large corpus. LRA extends the VSM approach in three ways: (1) the patterns are derived automatically from the corpus (they are not predefined), (2) the Singular Value Decomposition (SVD) is used to smooth the frequency data (it is also used this way in LSA), and (3) automatically generated synonyms are used to explore reformulations of the word pairs. LRA achieves 56% on the 374 analogy questions, statistically equivalent to the average human score of 57%. On the related problem of classifying noun-modifier relations, LRA achieves similar gains over the VSM, while using a smaller corpus

arXiv.org e-Print Archive

NRC Publications Archive

The Art of Fugue: Heidegger on Rhythm

Author: Nowell Smith David
Publication venue
Publication date: 03/11/2017
Field of study

University of East Anglia digital repository

Information scraps: how and why information eludes our personal information management tools

Author: Boardman R.
Bruce H.
Csikszentmihalyi M.
David Karger
Halbert D. C.
Jones W.
Karger D. R.
Lichtenstein S.
M. C. Schraefel
Max Van Kleek
Michael Bernstein
Rhodes B.
Ross L.
Van Kleek M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2008
Field of study

In this paper we describe information scraps -- a class of personal information whose content is scribbled on Post-it notes, scrawled on corners of random sheets of paper, buried inside the bodies of e-mail messages sent to ourselves, or typed haphazardly into text files. Information scraps hold our great ideas, sketches, notes, reminders, driving directions, and even our poetry. We define information scraps to be the body of personal information that is held outside of its natural or We have much still to learn about these loose forms of information capture. Why are they so often held outside of our traditional PIM locations and instead on Post-its or in text files? Why must we sometimes go around our traditional PIM applications to hold on to our scraps, such as by e-mailing ourselves? What are information scraps' role in the larger space of personal information management, and what do they uniquely offer that we find so appealing? If these unorganized bits truly indicate the failure of our PIM tools, how might we begin to build better tools? We have pursued these questions by undertaking a study of 27 knowledge workers. In our findings we describe information scraps from several angles: their content, their location, and the factors that lead to their use, which we identify as ease of capture, flexibility of content and organization, and avilability at the time of need. We also consider the personal emotive responses around scrap management. We present a set of design considerations that we have derived from the analysis of our study results. We present our work on an application platform, jourknow, to test some of these design and usability findings

CiteSeerX

Southampton (e-Prints Soton)

Crossref

A lexicographic approach to profiling nomenclature usage in the biodiversity literature

Author: Young Sandra
Publication venue
Publication date: 01/01/2020
Field of study

University of Brighton Research Portal

Recommended from our members

Detecting dangerous coordination ambiguities using word distribution

Author: Chantree Francis
De Roeck Anne
Kilgarriff Adam
Willis Alistair
Publication venue: 'John Benjamins Publishing Company'
Publication date: 01/01/2007
Field of study

In this paper we present heuristics for resolving coordination ambiguities. We test the hypothesis that the most likely reading of a coordination can be predicted using word distribution information from a generic corpus. Our heuristics are based upon the relative frequency of the coordination in the corpus, the distributional similarity of the coordinated words, and the collocation frequency between the coordinated words and their modifiers. These heuristics have varying but useful predictive power. They also take into account our view that many ambiguities cannot be effectively disambiguated, since human perceptions vary widely

Open Research Online (The Open University)

Conversational OLAP

Author: Francia M.
Gallinucci E.
Golfarelli M.
Publication venue: CEUR-WS
Publication date: 01/01/2021
Field of study

The democratization of data access and the adoption of OLAP in scenarios requiring hand-free interfaces push towards the creation of smart OLAP interfaces. In this paper, we describe COOL, a framework devised for COnversational OLap applications. COOL interprets and translates a natural language dialog into an OLAP session that starts with a GPSJ (Generalized Projection, Selection, and Join) query and continues with the application of OLAP operators. The interpretation relies on a formal grammar and on a repository storing metadata and values from a multidimensional cube. In case of ambiguous text description, COOL can obtain the correct query either through automatic inference or user interactions to disambiguate the text

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna