9,193 research outputs found
Living Knowledge
Diversity, especially manifested in language and knowledge, is a function of local goals, needs, competences, beliefs, culture, opinions and personal experience. The Living Knowledge project considers diversity as an asset rather than a problem. With the project, foundational ideas emerged from the synergic contribution of different disciplines, methodologies (with which many partners were previously unfamiliar) and technologies flowed in concrete diversity-aware applications such as the Future Predictor and the Media Content Analyser providing users with better structured information while coping with Web scale complexities. The key notions of diversity, fact, opinion and bias have been defined in relation to three methodologies: Media Content Analysis (MCA) which operates from a social sciences perspective; Multimodal Genre Analysis (MGA) which operates from a semiotic perspective and Facet Analysis (FA) which operates from a knowledge representation and organization perspective. A conceptual architecture that pulls all of them together has become the core of the tools for automatic extraction and the way they interact. In particular, the conceptual architecture has been implemented with the Media Content Analyser application. The scientific and technological results obtained are described in the following
Recommended from our members
Proceedings of QG2010: The Third Workshop on Question Generation
These are the peer-reviewed proceedings of "QG2010, The Third Workshop on Question Generation". The workshop included a special track for "QGSTEC2010: The First Question Generation Shared Task and Evaluation Challenge".
QG2010 was held as part of The Tenth International Conference on Intelligent Tutoring Systems (ITS2010)
Empirical Study of Named Entity Recognition Performance Using Distribution-aware Word Embedding
With the fast development of Deep Learning techniques, Named Entity
Recognition (NER) is becoming more and more important in the information
extraction task. The greatest difficulty that the NER task faces is to keep the
detectability even when types of NE and documents are unfamiliar. Realizing
that the specificity information may contain potential meanings of a word and
generate semantic-related features for word embedding, we develop a
distribution-aware word embedding and implement three different methods to make
use of the distribution information in a NER framework. And the result shows
that the performance of NER will be improved if the word specificity is
incorporated into existing NER methods.Comment: Want to correc
A Survey of Paraphrasing and Textual Entailment Methods
Paraphrasing methods recognize, generate, or extract phrases, sentences, or
longer natural language expressions that convey almost the same information.
Textual entailment methods, on the other hand, recognize, generate, or extract
pairs of natural language expressions, such that a human who reads (and trusts)
the first element of a pair would most likely infer that the other element is
also true. Paraphrasing can be seen as bidirectional textual entailment and
methods from the two areas are often similar. Both kinds of methods are useful,
at least in principle, in a wide range of natural language processing
applications, including question answering, summarization, text generation, and
machine translation. We summarize key ideas from the two areas by considering
in turn recognition, generation, and extraction methods, also pointing to
prominent articles and resources.Comment: Technical Report, Natural Language Processing Group, Department of
Informatics, Athens University of Economics and Business, Greece, 201
NLSC: Unrestricted Natural Language-based Service Composition through Sentence Embeddings
Current approaches for service composition (assemblies of atomic services)
require developers to use: (a) domain-specific semantics to formalize services
that restrict the vocabulary for their descriptions, and (b) translation
mechanisms for service retrieval to convert unstructured user requests to
strongly-typed semantic representations. In our work, we argue that effort to
developing service descriptions, request translations, and matching mechanisms
could be reduced using unrestricted natural language; allowing both: (1)
end-users to intuitively express their needs using natural language, and (2)
service developers to develop services without relying on syntactic/semantic
description languages. Although there are some natural language-based service
composition approaches, they restrict service retrieval to syntactic/semantic
matching. With recent developments in Machine learning and Natural Language
Processing, we motivate the use of Sentence Embeddings by leveraging richer
semantic representations of sentences for service description, matching and
retrieval. Experimental results show that service composition development
effort may be reduced by more than 44\% while keeping a high precision/recall
when matching high-level user requests with low-level service method
invocations.Comment: This paper will appear on SCC'19 (IEEE International Conference on
Services Computing) on July 1
Punctuation Restoration Improves Structure Understanding without Supervision
Unsupervised learning objectives like language modeling and de-noising
constitute a significant part in producing pre-trained models that perform
various downstream applications from natural language understanding to
conversational tasks. However, despite impressive generative capabilities of
recent large language models, their abilities to capture syntactic or semantic
structure within text lag behind. We hypothesize that the mismatch between
linguistic performance and competence in machines is attributable to
insufficient transfer of linguistic structure knowledge to computational
systems with currently popular pre-training objectives. We show that
punctuation restoration as a learning objective improves in- and
out-of-distribution performance on structure-related tasks like named entity
recognition, open information extraction, chunking, and part-of-speech tagging.
Punctuation restoration is an effective learning objective that can improve
structure understanding and yield a more robust structure-aware representations
of natural language.Comment: 10 pages, 1 figure, 6 table
Zero-Resource Hallucination Prevention for Large Language Models
The prevalent use of large language models (LLMs) in various domains has
drawn attention to the issue of "hallucination," which refers to instances
where LLMs generate factually inaccurate or ungrounded information. Existing
techniques for hallucination detection in language assistants rely on intricate
fuzzy, specific free-language-based chain of thought (CoT) techniques or
parameter-based methods that suffer from interpretability issues. Additionally,
the methods that identify hallucinations post-generation could not prevent
their occurrence and suffer from inconsistent performance due to the influence
of the instruction format and model style. In this paper, we introduce a novel
pre-detection self-evaluation technique, referred to as {\method}, which
focuses on evaluating the model's familiarity with the concepts present in the
input instruction and withholding the generation of response in case of
unfamiliar concepts. This approach emulates the human ability to refrain from
responding to unfamiliar topics, thus reducing hallucinations. We validate
{\method} across four different large language models, demonstrating
consistently superior performance compared to existing techniques. Our findings
propose a significant shift towards preemptive strategies for hallucination
mitigation in LLM assistants, promising improvements in reliability,
applicability, and interpretability
FinGPT: Instruction Tuning Benchmark for Open-Source Large Language Models in Financial Datasets
In the swiftly expanding domain of Natural Language Processing (NLP), the
potential of GPT-based models for the financial sector is increasingly evident.
However, the integration of these models with financial datasets presents
challenges, notably in determining their adeptness and relevance. This paper
introduces a distinctive approach anchored in the Instruction Tuning paradigm
for open-source large language models, specifically adapted for financial
contexts. Through this methodology, we capitalize on the interoperability of
open-source models, ensuring a seamless and transparent integration. We begin
by explaining the Instruction Tuning paradigm, highlighting its effectiveness
for immediate integration. The paper presents a benchmarking scheme designed
for end-to-end training and testing, employing a cost-effective progression.
Firstly, we assess basic competencies and fundamental tasks, such as Named
Entity Recognition (NER) and sentiment analysis to enhance specialization.
Next, we delve into a comprehensive model, executing multi-task operations by
amalgamating all instructional tunings to examine versatility. Finally, we
explore the zero-shot capabilities by earmarking unseen tasks and incorporating
novel datasets to understand adaptability in uncharted terrains. Such a
paradigm fortifies the principles of openness and reproducibility, laying a
robust foundation for future investigations in open-source financial large
language models (FinLLMs).Comment: Workshop on Instruction Tuning and Instruction Following at NeurIPS
202
- …