2,370 research outputs found
エンティティ・リンキングのための候補検索とランキング方法に関する研究
Tohoku University乾健太郎課
Application of machine learning techniques to the flexible assessment and improvement of requirements quality
It is already common to compute quantitative metrics of requirements to assess their quality. However, the risk is to build assessment methods and tools that are both arbitrary and rigid in the parameterization and combination of metrics. Specifically, we show that a linear combination of metrics is insufficient to adequately compute a global measure of quality. In this work, we propose to develop a flexible method to assess and improve the quality of requirements that can be adapted to different contexts, projects, organizations, and quality standards, with a high degree of automation. The domain experts contribute with an initial set of requirements that they have classified according to their quality, and we extract their quality metrics. We then use machine learning techniques to emulate the implicit expert’s quality function. We provide also a procedure to suggest improvements in bad requirements. We compare the obtained rule-based classifiers with different machine learning algorithms, obtaining measurements of effectiveness around 85%. We show as well the appearance of the generated rules and how to interpret them. The method is tailorable to different contexts, different styles to write requirements, and different demands in quality. The whole process of inferring and applying the quality rules adapted to each organization is highly automatedThis research has received funding from the CRYSTAL project–Critical System Engineering Acceleration (European Union’s Seventh Framework Program FP7/2007-2013, ARTEMIS Joint Undertaking grant agreement no 332830); and from the AMASS project–Architecture-driven, Multi-concern and Seamless Assurance and Certification of Cyber-Physical Systems (H2020-ECSEL grant agreement no 692474; Spain’s MINECO ref. PCIN-2015-262)
Information Extraction from Biomedical Texts
V poslední době bylo vynaloženo velké úsilí k tomu, aby byly biomedicínské znalosti, typicky uložené v podobě vědeckých článků, snadněji přístupné a bylo možné je efektivně sdílet. Ve skutečnosti ale nestrukturovaná podstata těchto textů způsobuje velké obtíže při použití technik pro získávání a vyvozování znalostí. Anotování entit nesoucích jistou sémantickou informaci v textu je prvním krokem k vytvoření znalosti analyzovatelné počítačem. V této práci nejdříve studujeme metody pro automatickou extrakci informací z textů přirozeného jazyka. Dále zhodnotíme hlavní výhody a nevýhody současných systémů pro extrakci informací a na základě těchto znalostí se rozhodneme přijmout přístup strojového učení pro automatické získávání exktrakčních vzorů při našich experimentech. Bohužel, techniky strojového učení často vyžadují obrovské množství trénovacích dat, která může být velmi pracné získat. Abychom dokázali čelit tomuto nepříjemnému problému, prozkoumáme koncept tzv. bootstrapping techniky. Nakonec ukážeme, že během našich experimentů metody strojového učení pracovaly dostatečně dobře a dokonce podstatně lépe než základní metody. Navíc v úloze využívající techniky bootstrapping se podařilo významně snížit množství dat potřebných pro trénování extrakčního systému.Recently, there has been much effort in making biomedical knowledge, typically stored in scientific articles, more accessible and interoperable. As a matter of fact, the unstructured nature of such texts makes it difficult to apply knowledge discovery and inference techniques. Annotating information units with semantic information in these texts is the first step to make the knowledge machine-analyzable. In this work, we first study methods for automatic information extraction from natural language text. Then we discuss the main benefits and disadvantages of the state-of-art information extraction systems and, as a result of this, we adopt a machine learning approach to automatically learn extraction patterns in our experiments. Unfortunately, machine learning techniques often require a huge amount of training data, which can be sometimes laborious to gather. In order to face up to this tedious problem, we investigate the concept of weakly supervised or bootstrapping techniques. Finally, we show in our experiments that our machine learning methods performed reasonably well and significantly better than the baseline. Moreover, in the weakly supervised learning task we were able to substantially bring down the amount of labeled data needed for training of the extraction system.
Emotion Analysis and Dialogue Breakdown Detection in Dialogue of Chat Systems Based on Deep Neural Networks
In dialogues between robots or computers and humans, dialogue breakdown analysis is an important tool for achieving better chat dialogues. Conventional dialogue breakdown detection methods focus on semantic variance. Although these methods can detect dialogue breakdowns based on semantic gaps, they cannot always detect emotional breakdowns in dialogues. In chat dialogue systems, emotions are sometimes included in the utterances of the system when responding to the speaker. In this study, we detect emotions from utterances, analyze emotional changes, and use them as the dialogue breakdown feature. The proposed method estimates emotions by utterance unit and generates features by calculating the similarity of the emotions of the utterance and the emotions that have appeared in prior utterances. We employ deep neural networks using sentence distributed representation vectors as the feature. In an evaluation of experimental results, the proposed method achieved a higher dialogue breakdown detection rate when compared to the method using a sentence distributed representation vectors
Which Melbourne? Augmenting geocoding with maps
The purpose of text geolocation is to associate geographic information contained in a document with a set (or sets) of coordinates, either implicitly by using linguistic features and/or explicitly by using geographic metadata combined with heuristics. We introduce a geocoder (location mention disambiguator) that achieves state-of-the-art (SOTA) results on three diverse datasets by exploiting the implicit lexical clues. Moreover, we propose a new method for systematic encoding of geographic metadata to generate two distinct views of the same text. To that end, we introduce the Map Vector (MapVec), a sparse representation obtained by plotting prior geographic probabilities, derived from population figures, on a World Map. We then integrate the implicit (language) and explicit (map) features to significantly improve a range of metrics. We also introduce an open-source dataset for geoparsing of news events covering global disease outbreaks and epidemics to help future evaluation in geoparsing
Machine-assisted mixed methods: augmenting humanities and social sciences with artificial intelligence
The increasing capacities of large language models (LLMs) present an
unprecedented opportunity to scale up data analytics in the humanities and
social sciences, augmenting and automating qualitative analytic tasks
previously typically allocated to human labor. This contribution proposes a
systematic mixed methods framework to harness qualitative analytic expertise,
machine scalability, and rigorous quantification, with attention to
transparency and replicability. 16 machine-assisted case studies are showcased
as proof of concept. Tasks include linguistic and discourse analysis, lexical
semantic change detection, interview analysis, historical event cause inference
and text mining, detection of political stance, text and idea reuse, genre
composition in literature and film; social network inference, automated
lexicography, missing metadata augmentation, and multimodal visual cultural
analytics. In contrast to the focus on English in the emerging LLM
applicability literature, many examples here deal with scenarios involving
smaller languages and historical texts prone to digitization distortions. In
all but the most difficult tasks requiring expert knowledge, generative LLMs
can demonstrably serve as viable research instruments. LLM (and human)
annotations may contain errors and variation, but the agreement rate can and
should be accounted for in subsequent statistical modeling; a bootstrapping
approach is discussed. The replications among the case studies illustrate how
tasks previously requiring potentially months of team effort and complex
computational pipelines, can now be accomplished by an LLM-assisted scholar in
a fraction of the time. Importantly, this approach is not intended to replace,
but to augment researcher knowledge and skills. With these opportunities in
sight, qualitative expertise and the ability to pose insightful questions have
arguably never been more critical
- …