135,654 research outputs found
Recommended from our members
A prototypical approach to machine learning
This paper presents an overview of a research programme on machine learning which is based on the fundamental process of categorization. This work draws upon the psychological theory of prototypical concepts . This theory is that concepts learnt naturally from interaction with the environment (basic categories) are not structured or defined in logical terms but are clustered in accordance with their similaritry to a central prototype, representing the "most typical" member.
A structure of a computer model designed to achieve categorization is outlined and the knowledge representational forms and developmental learning associated with this approach are discussed
Extracting tag hierarchies
Tagging items with descriptive annotations or keywords is a very natural way
to compress and highlight information about the properties of the given entity.
Over the years several methods have been proposed for extracting a hierarchy
between the tags for systems with a "flat", egalitarian organization of the
tags, which is very common when the tags correspond to free words given by
numerous independent people. Here we present a complete framework for automated
tag hierarchy extraction based on tag occurrence statistics. Along with
proposing new algorithms, we are also introducing different quality measures
enabling the detailed comparison of competing approaches from different
aspects. Furthermore, we set up a synthetic, computer generated benchmark
providing a versatile tool for testing, with a couple of tunable parameters
capable of generating a wide range of test beds. Beside the computer generated
input we also use real data in our studies, including a biological example with
a pre-defined hierarchy between the tags. The encouraging similarity between
the pre-defined and reconstructed hierarchy, as well as the seemingly
meaningful hierarchies obtained for other real systems indicate that tag
hierarchy extraction is a very promising direction for further research with a
great potential for practical applications.Comment: 25 pages with 21 pages of supporting information, 25 figure
Boosting terminology extraction through crosslingual resources
Terminology Extraction is an important Natural Language Processing task with multiple applications in many areas. The task has been approached from different points of view using different techniques. Language and domain independent systems have been proposed as well. Our contribution in this paper focuses on the improvements on Terminology Extraction using crosslingual resources and specifically the Wikipedia and on the use of a variant of PageRank for scoring the candidate terms. // La extracción de terminología es una tarea de procesamiento de la lengua sumamente importante y aplicable en numerosas áreas. La tarea se ha abordado desde múltiples perspectivas y utilizando técnicas diversas. También se han propuesto sistemas independientes de la lengua y del dominio. La contribución de este artículo se centra en las mejoras que los sistemas de extracción de terminología pueden lograr utilizando recursos translingües, y concretamente la Wikipedia y en el uso de una variante de PageRank para valorar los candidatos a términoPeer ReviewedPostprint (published version
Extracting corpus specific knowledge bases from Wikipedia
Thesauri are useful knowledge structures for assisting information retrieval. Yet their production is labor-intensive, and few domains have comprehensive thesauri that cover domain-specific concepts and contemporary usage. One approach, which has been attempted without much success for decades, is to seek statistical natural language processing algorithms that work on free text. Instead, we propose to replace costly professional indexers with thousands of dedicated amateur volunteers--namely, those that are producing Wikipedia. This vast, open encyclopedia represents a rich tapestry of topics and semantics and a huge investment of human effort and judgment. We show how this can be directly exploited to provide WikiSauri: manually-defined yet inexpensive thesaurus structures that are specifically tailored to expose the topics, terminology and semantics of individual document collections. We also offer concrete evidence of the effectiveness of WikiSauri for assisting information retrieval
Ranking and Selecting Multi-Hop Knowledge Paths to Better Predict Human Needs
To make machines better understand sentiments, research needs to move from
polarity identification to understanding the reasons that underlie the
expression of sentiment. Categorizing the goals or needs of humans is one way
to explain the expression of sentiment in text. Humans are good at
understanding situations described in natural language and can easily connect
them to the character's psychological needs using commonsense knowledge. We
present a novel method to extract, rank, filter and select multi-hop relation
paths from a commonsense knowledge resource to interpret the expression of
sentiment in terms of their underlying human needs. We efficiently integrate
the acquired knowledge paths in a neural model that interfaces context
representations with knowledge using a gated attention mechanism. We assess the
model's performance on a recently published dataset for categorizing human
needs. Selectively integrating knowledge paths boosts performance and
establishes a new state-of-the-art. Our model offers interpretability through
the learned attention map over commonsense knowledge paths. Human evaluation
highlights the relevance of the encoded knowledge
- …