3,008 research outputs found
A Semantic Distance of Natural Language Queries Based on Question-Answer Pairs
Many Natural Language Processing (NLP) techniques have been applied in the field
of Question Answering (QA) for understanding natural language queries. Practical QA
systems classify a natural language query into vertical domains, and determine whether it
is similar to a question with known or latent answers. Current mobile personal assistant
applications process queries, recognized from voice input or translated from cross-lingual
queries. Theoretically speaking, all these problems rely on an intuitive notion of semantic distance. However, it is neither definable nor computable. Many studies attempt to
approximate such a semantic distance in heuristic ways, for instance, distances based on
synonym dictionaries. In this paper, we propose a unified algorithm to approximate the
semantic distance by a well-defined information distance theory. The algorithm depends
on a pre-constructed data structure - semantic clusters, which is built from 35 million
question-answer pairs automatically. From the semantic measurement of questions, we
implement two practical NLP systems, including a question classifier and a translation
corrector. Then a series of comparison experiments have been conducted on both implementations. Experimental results demonstrate that our distance based approach produces
fewer errors in classification, compared with other academic works. Also, our translation
correction system achieves significant improvements on the Google translation results
Papers on predicative constructions : Proceedings of the workshop on secundary predication, October 16-17, 2000, Berlin
This volume presents a collection of papers touching on various issues concerning the syntax and semantics of predicative constructions.
A hot topic in the study of predicative copula constructions, with direct implications for the treatment of he (how many he's do we need?), and wider implications for the theories of predication, event-based semantics and aspect, is the nature and source of the situation argument. Closer examination of copula-less predications is becoming increasingly relevant to all these issues, as is clearly illustrated by the present collection
A survey on the development status and application prospects of knowledge graph in smart grids
With the advent of the electric power big data era, semantic interoperability
and interconnection of power data have received extensive attention. Knowledge
graph technology is a new method describing the complex relationships between
concepts and entities in the objective world, which is widely concerned because
of its robust knowledge inference ability. Especially with the proliferation of
measurement devices and exponential growth of electric power data empowers,
electric power knowledge graph provides new opportunities to solve the
contradictions between the massive power resources and the continuously
increasing demands for intelligent applications. In an attempt to fulfil the
potential of knowledge graph and deal with the various challenges faced, as
well as to obtain insights to achieve business applications of smart grids,
this work first presents a holistic study of knowledge-driven intelligent
application integration. Specifically, a detailed overview of electric power
knowledge mining is provided. Then, the overview of the knowledge graph in
smart grids is introduced. Moreover, the architecture of the big knowledge
graph platform for smart grids and critical technologies are described.
Furthermore, this paper comprehensively elaborates on the application prospects
leveraged by knowledge graph oriented to smart grids, power consumer service,
decision-making in dispatching, and operation and maintenance of power
equipment. Finally, issues and challenges are summarised.Comment: IET Generation, Transmission & Distributio
Recommended from our members
Acquiring and Harnessing Verb Knowledge for Multilingual Natural Language Processing
Advances in representation learning have enabled natural language processing models to derive non-negligible linguistic information directly from text corpora in an unsupervised fashion. However, this signal is underused in downstream tasks, where they tend to fall back on superficial cues and heuristics to solve the problem at hand. Further progress relies on identifying and filling the gaps in linguistic knowledge captured in their parameters. The objective of this thesis is to address these challenges focusing on the issues of resource scarcity, interpretability, and lexical knowledge injection, with an emphasis on the category of verbs.
To this end, I propose a novel paradigm for efficient acquisition of lexical knowledge leveraging native speakers’ intuitions about verb meaning to support development and downstream performance of NLP models across languages. First, I investigate the potential of acquiring semantic verb classes from non-experts through manual clustering. This subsequently informs the development of a two-phase semantic dataset creation methodology, which combines semantic clustering with fine-grained semantic similarity judgments collected through spatial arrangements of lexical stimuli. The method is tested on English and then applied to a typologically diverse sample of languages to produce the first large-scale multilingual verb dataset of this kind. I demonstrate its utility as a diagnostic tool by carrying out a comprehensive evaluation of state-of-the-art NLP models, probing representation quality across languages and domains of verb meaning, and shedding light on their deficiencies. Subsequently, I directly address these shortcomings by injecting lexical knowledge into large pretrained language models. I demonstrate that external manually curated information about verbs’ lexical properties can support data-driven models in tasks where accurate verb processing is key. Moreover, I examine the potential of extending these benefits from resource-rich to resource-poor languages through translation-based transfer. The results emphasise the usefulness of human-generated lexical knowledge in supporting NLP models and suggest that time-efficient construction of lexicons similar to those developed in this work, especially in under-resourced languages, can play an important role in boosting their linguistic capacity.ESRC Doctoral Fellowship [ES/J500033/1], ERC Consolidator Grant LEXICAL [648909
Incorporating Fine-grained Events in Stock Movement Prediction
Considering event structure information has proven helpful in text-based
stock movement prediction. However, existing works mainly adopt the
coarse-grained events, which loses the specific semantic information of diverse
event types. In this work, we propose to incorporate the fine-grained events in
stock movement prediction. Firstly, we propose a professional finance event
dictionary built by domain experts and use it to extract fine-grained events
automatically from finance news. Then we design a neural model to combine
finance news with fine-grained event structure and stock trade data to predict
the stock movement. Besides, in order to improve the generalizability of the
proposed method, we design an advanced model that uses the extracted
fine-grained events as the distant supervised label to train a multi-task
framework of event extraction and stock prediction. The experimental results
show that our method outperforms all the baselines and has good
generalizability.Comment: Accepted by 2th ECONLP workshop in EMNLP201
Learning Sentence-internal Temporal Relations
In this paper we propose a data intensive approach for inferring
sentence-internal temporal relations. Temporal inference is relevant for
practical NLP applications which either extract or synthesize temporal
information (e.g., summarisation, question answering). Our method bypasses the
need for manual coding by exploiting the presence of markers like after", which
overtly signal a temporal relation. We first show that models trained on main
and subordinate clauses connected with a temporal marker achieve good
performance on a pseudo-disambiguation task simulating temporal inference
(during testing the temporal marker is treated as unseen and the models must
select the right marker from a set of possible candidates). Secondly, we assess
whether the proposed approach holds promise for the semi-automatic creation of
temporal annotations. Specifically, we use a model trained on noisy and
approximate data (i.e., main and subordinate clauses) to predict
intra-sentential relations present in TimeBank, a corpus annotated rich
temporal information. Our experiments compare and contrast several
probabilistic models differing in their feature space, linguistic assumptions
and data requirements. We evaluate performance against gold standard corpora
and also against human subjects
- …