26,066 research outputs found
Improving Statistical Language Model Performance with Automatically Generated Word Hierarchies
An automatic word classification system has been designed which processes
word unigram and bigram frequency statistics extracted from a corpus of natural
language utterances. The system implements a binary top-down form of word
clustering which employs an average class mutual information metric. Resulting
classifications are hierarchical, allowing variable class granularity. Words
are represented as structural tags --- unique -bit numbers the most
significant bit-patterns of which incorporate class information. Access to a
structural tag immediately provides access to all classification levels for the
corresponding word. The classification system has successfully revealed some of
the structure of English, from the phonemic to the semantic level. The system
has been compared --- directly and indirectly --- with other recent word
classification systems. Class based interpolated language models have been
constructed to exploit the extra information supplied by the classifications
and some experiments have shown that the new models improve model performance.Comment: 17 Page Paper. Self-extracting PostScript Fil
Concept-based Interactive Query Expansion Support Tool (CIQUEST)
This report describes a three-year project (2000-03) undertaken in the Information Studies
Department at The University of Sheffield and funded by Resource, The Council for
Museums, Archives and Libraries. The overall aim of the research was to provide user
support for query formulation and reformulation in searching large-scale textual resources
including those of the World Wide Web. More specifically the objectives were: to investigate
and evaluate methods for the automatic generation and organisation of concepts derived from
retrieved document sets, based on statistical methods for term weighting; and to conduct
user-based evaluations on the understanding, presentation and retrieval effectiveness of
concept structures in selecting candidate terms for interactive query expansion.
The TREC test collection formed the basis for the seven evaluative experiments conducted in
the course of the project. These formed four distinct phases in the project plan. In the first
phase, a series of experiments was conducted to investigate further techniques for concept
derivation and hierarchical organisation and structure. The second phase was concerned with
user-based validation of the concept structures. Results of phases 1 and 2 informed on the
design of the test system and the user interface was developed in phase 3. The final phase
entailed a user-based summative evaluation of the CiQuest system.
The main findings demonstrate that concept hierarchies can effectively be generated from
sets of retrieved documents and displayed to searchers in a meaningful way. The approach
provides the searcher with an overview of the contents of the retrieved documents, which in
turn facilitates the viewing of documents and selection of the most relevant ones. Concept
hierarchies are a good source of terms for query expansion and can improve precision. The
extraction of descriptive phrases as an alternative source of terms was also effective. With
respect to presentation, cascading menus were easy to browse for selecting terms and for
viewing documents. In conclusion the project dissemination programme and future work are
outlined
Automatic domain ontology extraction for context-sensitive opinion mining
Automated analysis of the sentiments presented in online consumer feedbacks can facilitate both organizations’ business strategy development and individual consumers’ comparison shopping. Nevertheless, existing opinion mining methods either adopt a context-free sentiment classification approach or rely on a large number of manually annotated training examples to perform context sensitive sentiment classification. Guided by the design science research methodology, we illustrate the design, development, and evaluation of a novel fuzzy domain ontology based contextsensitive opinion mining system. Our novel ontology extraction mechanism underpinned by a variant of Kullback-Leibler divergence can automatically acquire contextual sentiment knowledge across various product domains to improve the sentiment analysis processes. Evaluated based on a benchmark dataset and real consumer reviews collected from Amazon.com, our system shows remarkable performance improvement over the context-free baseline
Improving Hypernymy Extraction with Distributional Semantic Classes
In this paper, we show how distributionally-induced semantic classes can be
helpful for extracting hypernyms. We present methods for inducing sense-aware
semantic classes using distributional semantics and using these induced
semantic classes for filtering noisy hypernymy relations. Denoising of
hypernyms is performed by labeling each semantic class with its hypernyms. On
the one hand, this allows us to filter out wrong extractions using the global
structure of distributionally similar senses. On the other hand, we infer
missing hypernyms via label propagation to cluster terms. We conduct a
large-scale crowdsourcing study showing that processing of automatically
extracted hypernyms using our approach improves the quality of the hypernymy
extraction in terms of both precision and recall. Furthermore, we show the
utility of our method in the domain taxonomy induction task, achieving the
state-of-the-art results on a SemEval'16 task on taxonomy induction.Comment: In Proceedings of the 11th Conference on Language Resources and
Evaluation (LREC 2018). Miyazaki, Japa
Acquiring Word-Meaning Mappings for Natural Language Interfaces
This paper focuses on a system, WOLFIE (WOrd Learning From Interpreted
Examples), that acquires a semantic lexicon from a corpus of sentences paired
with semantic representations. The lexicon learned consists of phrases paired
with meaning representations. WOLFIE is part of an integrated system that
learns to transform sentences into representations such as logical database
queries. Experimental results are presented demonstrating WOLFIE's ability to
learn useful lexicons for a database interface in four different natural
languages. The usefulness of the lexicons learned by WOLFIE are compared to
those acquired by a similar system, with results favorable to WOLFIE. A second
set of experiments demonstrates WOLFIE's ability to scale to larger and more
difficult, albeit artificially generated, corpora. In natural language
acquisition, it is difficult to gather the annotated data needed for supervised
learning; however, unannotated data is fairly plentiful. Active learning
methods attempt to select for annotation and training only the most informative
examples, and therefore are potentially very useful in natural language
applications. However, most results to date for active learning have only
considered standard classification tasks. To reduce annotation effort while
maintaining accuracy, we apply active learning to semantic lexicons. We show
that active learning can significantly reduce the number of annotated examples
required to achieve a given level of performance
Research Directions, Challenges and Issues in Opinion Mining
Rapid growth of Internet and availability of user reviews on the web for any product has provided a need for an effective system to analyze the web reviews. Such reviews are useful to some extent, promising both the customers and product manufacturers. For any popular product, the number of reviews can be in hundreds or even thousands. This creates difficulty for a customer to analyze them and make important decisions on whether to purchase the product or to not. Mining such product reviews or opinions is termed as opinion mining which is broadly classified into two main categories namely facts and opinions. Though there are several approaches for opinion mining, there remains a challenge to decide on the recommendation provided by the system. In this paper, we analyze the basics of opinion mining, challenges, pros & cons of past opinion mining systems and provide some directions for the future research work, focusing on the challenges and issues
- …