8,835 research outputs found
Improving Hypernymy Extraction with Distributional Semantic Classes
In this paper, we show how distributionally-induced semantic classes can be
helpful for extracting hypernyms. We present methods for inducing sense-aware
semantic classes using distributional semantics and using these induced
semantic classes for filtering noisy hypernymy relations. Denoising of
hypernyms is performed by labeling each semantic class with its hypernyms. On
the one hand, this allows us to filter out wrong extractions using the global
structure of distributionally similar senses. On the other hand, we infer
missing hypernyms via label propagation to cluster terms. We conduct a
large-scale crowdsourcing study showing that processing of automatically
extracted hypernyms using our approach improves the quality of the hypernymy
extraction in terms of both precision and recall. Furthermore, we show the
utility of our method in the domain taxonomy induction task, achieving the
state-of-the-art results on a SemEval'16 task on taxonomy induction.Comment: In Proceedings of the 11th Conference on Language Resources and
Evaluation (LREC 2018). Miyazaki, Japa
Simplifying Deep-Learning-Based Model for Code Search
To accelerate software development, developers frequently search and reuse
existing code snippets from a large-scale codebase, e.g., GitHub. Over the
years, researchers proposed many information retrieval (IR) based models for
code search, which match keywords in query with code text. But they fail to
connect the semantic gap between query and code. To conquer this challenge, Gu
et al. proposed a deep-learning-based model named DeepCS. It jointly embeds
method code and natural language description into a shared vector space, where
methods related to a natural language query are retrieved according to their
vector similarities. However, DeepCS' working process is complicated and
time-consuming. To overcome this issue, we proposed a simplified model
CodeMatcher that leverages the IR technique but maintains many features in
DeepCS. Generally, CodeMatcher combines query keywords with the original order,
performs a fuzzy search on name and body strings of methods, and returned the
best-matched methods with the longer sequence of used keywords. We verified its
effectiveness on a large-scale codebase with about 41k repositories.
Experimental results showed the simplified model CodeMatcher outperforms DeepCS
by 97% in terms of MRR (a widely used accuracy measure for code search), and it
is over 66 times faster than DeepCS. Besides, comparing with the
state-of-the-art IR-based model CodeHow, CodeMatcher also improves the MRR by
73%. We also observed that: fusing the advantages of IR-based and
deep-learning-based models is promising because they compensate with each other
by nature; improving the quality of method naming helps code search, since
method name plays an important role in connecting query and code
Unsupervised Sense-Aware Hypernymy Extraction
In this paper, we show how unsupervised sense representations can be used to
improve hypernymy extraction. We present a method for extracting disambiguated
hypernymy relationships that propagates hypernyms to sets of synonyms
(synsets), constructs embeddings for these sets, and establishes sense-aware
relationships between matching synsets. Evaluation on two gold standard
datasets for English and Russian shows that the method successfully recognizes
hypernymy relationships that cannot be found with standard Hearst patterns and
Wiktionary datasets for the respective languages.Comment: In Proceedings of the 14th Conference on Natural Language Processing
(KONVENS 2018). Vienna, Austri
Spoken content retrieval: A survey of techniques and technologies
Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR
Recommended from our members
Ideation as an intellectual information acquisition and use context: Investigating game designers’ information-based ideation behavior
Human Information Behavior (HIB) research commonly examines behavior in the context of why information is acquired and how it will be used, but usually at the level of the work or everyday-life tasks the information will support. HIB has not been examined in detail at the broader contextual level of intellectual purpose (i.e. the higher-order conceptual tasks the information was acquired to support). Examination at this level can enhance holistic understanding of HIB as a ‘means to an intellectual end’ and inform the design of digital information environments that support information interaction for specific intellectual purposes. We investigate information-based ideation (IBI) as a specific intellectual information acquisition and use context by conducting Critical Incident-style interviews with ten game designers, focusing on how they interact with information to generate and develop creative design ideas. Our findings give rise to a framework of their ideation-focused HIB, which systems designers can leverage to reason about how best to support certain behaviors to drive design ideation. These findings emphasize the importance of intellectual purpose as a driver for acquisition and desired outcome of use
- …