Search CORE

3,720 research outputs found

Utilization of Underlying Semantic Information in Textual Data

Author: Kicsi András
Publication venue
Publication date: 01/01/2022
Field of study

SZTE Doktori Értekezések Repozitórium (SZTE Repository of Dissertations)

Recommended from our members

Ontology learning for Semantic Web Services

Author: Alfaries Auhood
Publication venue: Brunel University, School of Information Systems, Computing and Mathematics Theses
Publication date: 01/01/2010
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University, 18/10/2010.The expansion of Semantic Web Services is restricted by traditional ontology engineering methods. Manual ontology development is time consuming, expensive and a resource exhaustive task. Consequently, it is important to support ontology engineers by automating the ontology acquisition process to help deliver the Semantic Web vision. Existing Web Services offer an affluent source of domain knowledge for ontology engineers. Ontology learning can be seen as a plug-in in the Web Service ontology development process, which can be used by ontology engineers to develop and maintain an ontology that evolves with current Web Services. Supporting the domain engineer with an automated tool whilst building an ontological domain model, serves the purpose of reducing time and effort in acquiring the domain concepts and relations from Web Service artefacts, whilst effectively speeding up the adoption of Semantic Web Services, thereby allowing current Web Services to accomplish their full potential. With that in mind, a Service Ontology Learning Framework (SOLF) is developed and applied to a real set of Web Services. The research contributes a rigorous method that effectively extracts domain concepts, and relations between these concepts, from Web Services and automatically builds the domain ontology. The method applies pattern-based information extraction techniques to automatically learn domain concepts and relations between those concepts. The framework is automated via building a tool that implements the techniques. Applying the SOLF and the tool on different sets of services results in an automatically built domain ontology model that represents semantic knowledge in the underlying domain. The framework effectiveness, in extracting domain concepts and relations, is evaluated by its appliance on varying sets of commercial Web Services including the financial domain. The standard evaluation metrics, precision and recall, are employed to determine both the accuracy and coverage of the learned ontology models. Both the lexical and structural dimensions of the models are evaluated thoroughly. The evaluation results are encouraging, providing concrete outcomes in an area that is little researched

Brunel University Research Archive

Semantic Representation and Inference for NLP

Author: Wang Dongsheng
Publication venue
Publication date: 01/01/2020
Field of study

Semantic representation and inference is essential for Natural Language Processing (NLP). The state of the art for semantic representation and inference is deep learning, and particularly Recurrent Neural Networks (RNNs), Convolutional Neural Networks (CNNs), and transformer Self-Attention models. This thesis investigates the use of deep learning for novel semantic representation and inference, and makes contributions in the following three areas: creating training data, improving semantic representations and extending inference learning. In terms of creating training data, we contribute the largest publicly available dataset of real-life factual claims for the purpose of automatic claim verification (MultiFC), and we present a novel inference model composed of multi-scale CNNs with different kernel sizes that learn from external sources to infer fact checking labels. In terms of improving semantic representations, we contribute a novel model that captures non-compositional semantic indicators. By definition, the meaning of a non-compositional phrase cannot be inferred from the individual meanings of its composing words (e.g., hot dog). Motivated by this, we operationalize the compositionality of a phrase contextually by enriching the phrase representation with external word embeddings and knowledge graphs. Finally, in terms of inference learning, we propose a series of novel deep learning architectures that improve inference by using syntactic dependencies, by ensembling role guided attention heads, incorporating gating layers, and concatenating multiple heads in novel and effective ways. This thesis consists of seven publications (five published and two under review).Comment: PhD thesis, the University of Copenhage

arXiv.org e-Print Archive

Copenhagen University Research Information System

Data-Driven Design-by-Analogy: State of the Art and Future Directions

Author: Hu Jie
Jiang Shuo
Luo Jianxi
Wood Kristin L.
Publication venue: 'ASME International'
Publication date: 03/06/2021
Field of study

Design-by-Analogy (DbA) is a design methodology wherein new solutions, opportunities or designs are generated in a target domain based on inspiration drawn from a source domain; it can benefit designers in mitigating design fixation and improving design ideation outcomes. Recently, the increasingly available design databases and rapidly advancing data science and artificial intelligence technologies have presented new opportunities for developing data-driven methods and tools for DbA support. In this study, we survey existing data-driven DbA studies and categorize individual studies according to the data, methods, and applications in four categories, namely, analogy encoding, retrieval, mapping, and evaluation. Based on both nuanced organic review and structured analysis, this paper elucidates the state of the art of data-driven DbA research to date and benchmarks it with the frontier of data science and AI research to identify promising research opportunities and directions for the field. Finally, we propose a future conceptual data-driven DbA system that integrates all propositions.Comment: A Preprint Versio

arXiv.org e-Print Archive

Bridging the gap between textual and formal business process representations

Author: Sànchez-Ferreres Josep
Publication venue: Universitat Politècnica de Catalunya
Publication date: 29/09/2021
Field of study

Tesi en modalitat de compendi de publicacionsIn the era of digital transformation, an increasing number of organizations are start ing to think in terms of business processes. Processes are at the very heart of each business, and must be understood and carried out by a wide range of actors, from both technical and non-technical backgrounds alike. When embracing digital transformation practices, there is a need for all involved parties to be aware of the underlying business processes in an organization. However, the representational complexity and biases of the state-of-the-art modeling notations pose a challenge in understandability. On the other hand, plain language representations, accessible by nature and easily understood by everyone, are often frowned upon by technical specialists due to their ambiguity. The aim of this thesis is precisely to bridge this gap: Between the world of the techni cal, formal languages and the world of simpler, accessible natural languages. Structured as an article compendium, in this thesis we present four main contributions to address specific problems in the intersection between the fields of natural language processing and business process management.A l’era de la transformació digital, cada vegada més organitzacions comencen a pensar en termes de processos de negoci. Els processos són el nucli principal de tota empresa i, com a tals, han de ser fàcilment comprensibles per un ampli ventall de rols, tant perfils tècnics com no-tècnics. Quan s’adopta la transformació digital, és necessari que totes les parts involucrades estiguin ben informades sobre els protocols implantats com a part del procés de digitalització. Tot i això, la complexitat i biaixos de representació dels llenguatges de modelització que actualment conformen l’estat de l’art sovint en dificulten la seva com prensió. D’altra banda, les representacions basades en documentació usant llenguatge natural, accessibles per naturalesa i fàcilment comprensibles per tothom, moltes vegades són vistes com un problema pels perfils més tècnics a causa de la presència d’ambigüitats en els textos. L’objectiu d’aquesta tesi és precisament el de superar aquesta distància: La distància entre el món dels llenguatges tècnics i formals amb el dels llenguatges naturals, més accessibles i senzills. Amb una estructura de compendi d’articles, en aquesta tesi presentem quatre grans línies de recerca per adreçar problemes específics en aquesta intersecció entre les tecnologies d’anàlisi de llenguatge natural i la gestió dels processos de negoci.Postprint (published version

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Improving Editorial Workflow and Metadata Quality at Springer Nature

Author: A Duvvuru
A Moro
AA Salatino
AG Nuzzolese
Angelo A. Salatino
B Sateli
DM Blei
F Osborne
F Osborne
F Osborne
F Osborne
F Osborne
M Herrera
R Usbeck
RL Ohniwa
S Peroni
T Thanapalasingam
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Identifying the research topics that best describe the scope of a scientific publication is a crucial task for editors, in particular because the quality of these annotations determine how effectively users are able to discover the right content in online libraries. For this reason, Springer Nature, the world's largest academic book publisher, has traditionally entrusted this task to their most expert editors. These editors manually analyse all new books, possibly including hundreds of chapters, and produce a list of the most relevant topics. Hence, this process has traditionally been very expensive, time-consuming, and confined to a few senior editors. For these reasons, back in 2016 we developed Smart Topic Miner (STM), an ontology-driven application that assists the Springer Nature editorial team in annotating the volumes of all books covering conference proceedings in Computer Science. Since then STM has been regularly used by editors in Germany, China, Brazil, India, and Japan, for a total of about 800 volumes per year. Over the past three years the initial prototype has iteratively evolved in response to feedback from the users and evolving requirements. In this paper we present the most recent version of the tool and describe the evolution of the system over the years, the key lessons learnt, and the impact on the Springer Nature workflow. In particular, our solution has drastically reduced the time needed to annotate proceedings and significantly improved their discoverability, resulting in 9.3 million additional downloads. We also present a user study involving 9 editors, which yielded excellent results in term of usability, and report an evaluation of the new topic classifier used by STM, which outperforms previous versions in recall and F-measure

arXiv.org e-Print Archive

Crossref

Open Research Online (The Open University)

Semantic Technologies for Business Decision Support

Author: Esposito Francesca
Publication venue: Universita degli studi di Salerno
Publication date: 15/03/2017
Field of study

2015 - 2016In order to improve and to be competitive, enterprises should know how to get opportunities coming from data provided from the Web. The strategic vision implies a high level of communication sharing and the integration of practices across every business level. This not means that enterprises need a disruptive change in informative systems, but the conversion of them, reusing existent business data and integrating new data. However, data is heterogeneous, and so to maximise the value of the data it is necessary to extract meaning from it considering the context in which they evolve. The proliferation of new linguistic data linked to the growth of textual resources on the Web generate an inadequacy in the analysis and integration phases of data in the enterprise. Thus, the use of Semantic Technologies based on Natural Language Processing (NLP) applications is required in advance. This study arises as a first approach to the development of a document-driven Decision Support System (DSS) based on NLP technology within the theoretical framework of Lexicon-Grammar by Maurice Gross. Our research project has two main objectives: the first is to recognize and codify the innovative language with which the companies express and describe their business, in order to standardize it and make it actionable by machine. The second one aims to use information resulting from the text analysis to support strategic decisions, considering that through Text Mining analysis we can capture the hidden meaning in business documents. In the first chapter we examine the concept, characteristics and different types of DSS (with particular reference to document-driven analysis) and changes that these systems have experienced with web development and consequently of information systems within companies. In the second chapter, we proceed with a brief review of Computational Linguistics, paying particular attention to goals, resources and applications. In the third chapter, we provide a state-of-the-art of Semantic Technology Enterprises (STEs) and their process of integration in the innovation market, analysing the diffusion, the types of technologies and main sectors in which they operate. In the fourth chapter, we propose a model of linguistic support and analysis, according with Lexicon-Grammar, in order to create an enriched solution for document-driven decision systems: we provide specific features of business language, resulted from experimental research work in the startup ecosystem. Finally, we recognize that the formalization of all linguistic phenomena is extremely complex, but the results of analysis make us hopeful to continue with this line of research. Applying linguistic support to the business technological environment provides results that are more efficient and in constantly updated innovating even in strong resistance to change conditions. [edited by author]XV n.s

EleA@UniSA - Università degli Studi di Salerno

A Systematic Review of Automated Query Reformulations in Source Code Search

Author: Rahman Mohammad Masudur
Roy Chanchal K.
Publication venue
Publication date: 08/06/2023
Field of study

Fixing software bugs and adding new features are two of the major maintenance tasks. Software bugs and features are reported as change requests. Developers consult these requests and often choose a few keywords from them as an ad hoc query. Then they execute the query with a search engine to find the exact locations within software code that need to be changed. Unfortunately, even experienced developers often fail to choose appropriate queries, which leads to costly trials and errors during a code search. Over the years, many studies attempt to reformulate the ad hoc queries from developers to support them. In this systematic literature review, we carefully select 70 primary studies on query reformulations from 2,970 candidate studies, perform an in-depth qualitative analysis (e.g., Grounded Theory), and then answer seven research questions with major findings. First, to date, eight major methodologies (e.g., term weighting, term co-occurrence analysis, thesaurus lookup) have been adopted to reformulate queries. Second, the existing studies suffer from several major limitations (e.g., lack of generalizability, vocabulary mismatch problem, subjective bias) that might prevent their wide adoption. Finally, we discuss the best practices and future opportunities to advance the state of research in search query reformulations.Comment: 81 pages, accepted at TOSE

arXiv.org e-Print Archive