93 research outputs found
A New Multilingual Authoring Tool of Semistructured Legal Documents
Los enfoques actuales de gestión de la documentación multilingüe hacen uso de la traducción humana, la traducción automática (TA) y la traducción asistida por ordenador (TAO) para producir versiones de un solo documento en variosidiomas. Sin embargo, losrecientes avances en generación de lenguaje natural (GLN) indican que es posible implementarsistemas independientes del lenguaje a fin de producir documentos en variosidiomas, independientes de una lengua origen, de forma más eficiente y rentable. En este artículo presentamos GenTur —una herramienta de ayuda a la redacción para producir contratosturísticos en variosidiomas. Se prestará especial atención a dos elementos básicos de su implementación: por un lado, la interlengua xgtling usada para la representación discursiva de los contratos, y por otro lado, el desarrollo de una arquitectura que permita a la citada interlengua generar contratosturísticos por medio del algoritmo de generación GT-Mth
A Study on the Implementation of Generative AI Services Using an Enterprise Data-Based LLM Application Architecture
This study presents a method for implementing generative AI services by
utilizing the Large Language Model (LLM) application architecture. With recent
advancements in generative AI technology, LLMs have gained prominence across
various domains. In this context, the research addresses the challenge of
information scarcity and proposes specific remedies by harnessing LLM
capabilities. The investigation delves into strategies for mitigating the issue
of inadequate data, offering tailored solutions. The study delves into the
efficacy of employing fine-tuning techniques and direct document integration to
alleviate data insufficiency. A significant contribution of this work is the
development of a Retrieval-Augmented Generation (RAG) model, which tackles the
aforementioned challenges. The RAG model is carefully designed to enhance
information storage and retrieval processes, ensuring improved content
generation. The research elucidates the key phases of the information storage
and retrieval methodology underpinned by the RAG model. A comprehensive
analysis of these steps is undertaken, emphasizing their significance in
addressing the scarcity of data. The study highlights the efficacy of the
proposed method, showcasing its applicability through illustrative instances.
By implementing the RAG model for information storage and retrieval, the
research not only contributes to a deeper comprehension of generative AI
technology but also facilitates its practical usability within enterprises
utilizing LLMs. This work holds substantial value in advancing the field of
generative AI, offering insights into enhancing data-driven content generation
and fostering active utilization of LLM-based services within corporate
settings
Natural Language Interfaces to Data
Recent advances in NLU and NLP have resulted in renewed interest in natural
language interfaces to data, which provide an easy mechanism for non-technical
users to access and query the data. While early systems evolved from keyword
search and focused on simple factual queries, the complexity of both the input
sentences as well as the generated SQL queries has evolved over time. More
recently, there has also been a lot of focus on using conversational interfaces
for data analytics, empowering a line of non-technical users with quick
insights into the data. There are three main challenges in natural language
querying (NLQ): (1) identifying the entities involved in the user utterance,
(2) connecting the different entities in a meaningful way over the underlying
data source to interpret user intents, and (3) generating a structured query in
the form of SQL or SPARQL.
There are two main approaches for interpreting a user's NLQ. Rule-based
systems make use of semantic indices, ontologies, and KGs to identify the
entities in the query, understand the intended relationships between those
entities, and utilize grammars to generate the target queries. With the
advances in deep learning (DL)-based language models, there have been many
text-to-SQL approaches that try to interpret the query holistically using DL
models. Hybrid approaches that utilize both rule-based techniques as well as DL
models are also emerging by combining the strengths of both approaches.
Conversational interfaces are the next natural step to one-shot NLQ by
exploiting query context between multiple turns of conversation for
disambiguation. In this article, we review the background technologies that are
used in natural language interfaces, and survey the different approaches to
NLQ. We also describe conversational interfaces for data analytics and discuss
several benchmarks used for NLQ research and evaluation.Comment: The full version of this manuscript, as published by Foundations and
Trends in Databases, is available at http://dx.doi.org/10.1561/190000007
PaLM: Scaling Language Modeling with Pathways
Large language models have been shown to achieve remarkable performance
across a variety of natural language tasks using few-shot learning, which
drastically reduces the number of task-specific training examples needed to
adapt the model to a particular application. To further our understanding of
the impact of scale on few-shot learning, we trained a 540-billion parameter,
densely activated, Transformer language model, which we call Pathways Language
Model PaLM. We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML
system which enables highly efficient training across multiple TPU Pods. We
demonstrate continued benefits of scaling by achieving state-of-the-art
few-shot learning results on hundreds of language understanding and generation
benchmarks. On a number of these tasks, PaLM 540B achieves breakthrough
performance, outperforming the finetuned state-of-the-art on a suite of
multi-step reasoning tasks, and outperforming average human performance on the
recently released BIG-bench benchmark. A significant number of BIG-bench tasks
showed discontinuous improvements from model scale, meaning that performance
steeply increased as we scaled to our largest model. PaLM also has strong
capabilities in multilingual tasks and source code generation, which we
demonstrate on a wide array of benchmarks. We additionally provide a
comprehensive analysis on bias and toxicity, and study the extent of training
data memorization with respect to model scale. Finally, we discuss the ethical
considerations related to large language models and discuss potential
mitigation strategies
Surface Realisation from Knowledge-Bases
International audienceWe present a simple, data-driven approach to generation from knowledge bases (KB). A key feature of this approach is that grammar induction is driven by the extended domain of locality principle of TAG (Tree Adjoining Grammar); and that it takes into account both syntactic and semantic information. The resulting extracted TAG includes a unification based semantics and can be used by an existing surface realiser to generate sentences from KB data. Experimental evaluation on the KBGen data shows that our model outperforms a data-driven generate-and-rank approach based on an automatically induced probabilistic grammar; and is comparable with a handcrafted symbolic approach
Adaptive hypertext and hypermedia : workshop : proceedings, 3rd, Sonthofen, Germany, July 14, 2001 and Aarhus, Denmark, August 15, 2001
This paper presents two empirical usability studies based on techniques from Human-Computer Interaction (HeI) and software engineering, which were used to elicit requirements for the design of a hypertext generation system. Here we will discuss the findings of these studies, which were used to motivate the choice of adaptivity techniques. The results showed dependencies between different ways to adapt the explanation content and the document length and formatting. Therefore, the system's architecture had to be modified to cope with this requirement. In addition, the system had to be made adaptable, in addition to being adaptive, in order to satisfy the elicited users' preferences
Adaptive hypertext and hypermedia : workshop : proceedings, 3rd, Sonthofen, Germany, July 14, 2001 and Aarhus, Denmark, August 15, 2001
This paper presents two empirical usability studies based on techniques from Human-Computer Interaction (HeI) and software engineering, which were used to elicit requirements for the design of a hypertext generation system. Here we will discuss the findings of these studies, which were used to motivate the choice of adaptivity techniques. The results showed dependencies between different ways to adapt the explanation content and the document length and formatting. Therefore, the system's architecture had to be modified to cope with this requirement. In addition, the system had to be made adaptable, in addition to being adaptive, in order to satisfy the elicited users' preferences
- …