Search CORE

306 research outputs found

Natural language processing and cognitive science : proceedings 2018

Author: Lubaszewski Wiesław
Sedes Florence
Sharp Bernadette
Publication venue: Jagiellonian Library
Publication date: 01/01/2018
Field of study

Design of a Controlled Language for Critical Infrastructures Protection

Author: CANTARELLA SIMONA
FERIGATO Carlo
OWUSU EVANS BOATENG
Publication venue: European Language Resources Association
Publication date: 28/03/2012
Field of study

We describe a project for the construction of controlled language for critical infrastructures protection (CIP). This project originates from the need to coordinate and categorize the communications on CIP at the European level. These communications can be physically represented by official documents, reports on incidents, informal communications and plain e-mail. We explore the application of traditional library science tools for the construction of controlled languages in order to achieve our goal. Our starting point is an analogous work done during the sixties in the field of nuclear science known as the Euratom Thesaurus.JRC.G.6-Security technology assessmen

JRC Publications Repository

Improving search engines with open Web-based SKOS vocabularies

Author: Martins Flávio Nuno Fernandes
Publication venue: Faculdade de Ciências e Tecnologia
Publication date: 01/01/2012
Field of study

Dissertação para obtenção do Grau de Mestre em Engenharia InformáticaThe volume of digital information is increasingly larger and even though organiza-tions are making more of this information available, without the proper tools users have great difficulties in retrieving documents about subjects of interest. Good infor-mation retrieval mechanisms are crucial for answering user information needs. Nowadays, search engines are unavoidable - they are an essential feature in docu-ment management systems. However, achieving good relevancy is a difficult problem particularly when dealing with specific technical domains where vocabulary mismatch problems can be prejudicial. Numerous research works found that exploiting the lexi-cal or semantic relations of terms in a collection attenuates this problem. In this dissertation, we aim to improve search results and user experience by inves-tigating the use of potentially connected Web vocabularies in information retrieval en-gines. In the context of open Web-based SKOS vocabularies we propose a query expan-sion framework implemented in a widely used IR system (Lucene/Solr), and evaluated using standard IR evaluation datasets. The components described in this thesis were applied in the development of a new search system that was integrated with a rapid applications development tool in the context of an internship at Quidgest S.A.Fundação para a Ciência e Tecnologia - ImTV research project, in the context of the UTAustin-Portugal collaboration (UTA-Est/MAI/0010/2009); QSearch project (FCT/Quidgest

Repositório da Universidade Nova de Lisboa

Recommended from our members

Acquiring and Harnessing Verb Knowledge for Multilingual Natural Language Processing

Author: Majewska Olga
Publication venue: University of Cambridge
Publication date: 01/02/2021
Field of study

Advances in representation learning have enabled natural language processing models to derive non-negligible linguistic information directly from text corpora in an unsupervised fashion. However, this signal is underused in downstream tasks, where they tend to fall back on superficial cues and heuristics to solve the problem at hand. Further progress relies on identifying and filling the gaps in linguistic knowledge captured in their parameters. The objective of this thesis is to address these challenges focusing on the issues of resource scarcity, interpretability, and lexical knowledge injection, with an emphasis on the category of verbs. To this end, I propose a novel paradigm for efficient acquisition of lexical knowledge leveraging native speakers’ intuitions about verb meaning to support development and downstream performance of NLP models across languages. First, I investigate the potential of acquiring semantic verb classes from non-experts through manual clustering. This subsequently informs the development of a two-phase semantic dataset creation methodology, which combines semantic clustering with fine-grained semantic similarity judgments collected through spatial arrangements of lexical stimuli. The method is tested on English and then applied to a typologically diverse sample of languages to produce the first large-scale multilingual verb dataset of this kind. I demonstrate its utility as a diagnostic tool by carrying out a comprehensive evaluation of state-of-the-art NLP models, probing representation quality across languages and domains of verb meaning, and shedding light on their deficiencies. Subsequently, I directly address these shortcomings by injecting lexical knowledge into large pretrained language models. I demonstrate that external manually curated information about verbs’ lexical properties can support data-driven models in tasks where accurate verb processing is key. Moreover, I examine the potential of extending these benefits from resource-rich to resource-poor languages through translation-based transfer. The results emphasise the usefulness of human-generated lexical knowledge in supporting NLP models and suggest that time-efficient construction of lexicons similar to those developed in this work, especially in under-resourced languages, can play an important role in boosting their linguistic capacity.ESRC Doctoral Fellowship [ES/J500033/1], ERC Consolidator Grant LEXICAL [648909

Apollo (Cambridge)

Dezambiguizacja angielskich czasowników open i send w ramach ujęcia zorientowanego obiektowo

Author: Drzazga Anna
Publication venue: Katowice : Uniwersytet Śląski
Publication date: 01/01/2012
Field of study

Przedmiotem rozprawy doktorskiej jest dezambiguizacja dwóch angielskich czasowników kauzatywnych: open (otworzyć/otwierać) oraz send (wysłać/wysyłać) w ramach projektu polegającego na stworzeniu elektronicznych baz danych morfologicznych, syntaktycznych i leksykalnych, znajdujących zastosowanie w tworzeniu słowników elektronicznych typu modifie - modifieur języka ogólnego, jak również języków specjalistycznych. Do dezambiguizacji i analizy wybranych czasowników zastosowano model zorientowany obiektowo Wiesława Banysia, którego parametry umożliwiają opis każdej jednostki leksykalnej w sposób precyzyjny, kompletny i zgodny z wymogami tłumaczenia automatycznego. Pojęciem kluczowym przyjętej metody opisu leksykograficznego jest klasa obiektowa zawierająca elementy wyodrębnione na podstawie atrybutów i operatorów właściwych dla danej klasy, umożliwiających ukazanie polisemii predykatów i wyróżnienie ich poszczególnych użyć. Posługując się modelem zorientowanym obiektowo ustala się zestaw użyć analizowanych czasowników w korpusie, z uwzględnieniem słowników tradycyjnych, następnie grupuje się znalezione okurencje użyć w zbiory posiadające wspólne cechy syntaktyczne, semantyczne i leksykalne, przypisuje się poszczególnym zbiorom użyć tłumaczenia w języku docelowym, konklukzje analizy zapisuje się zarówno w formacie opisowym, jak i w formie tabel. Z prezentowanego w niniejszej rozprawie punktu widzenia wynika fakt, że jest tyle znaczeń danego słowa w języku źródłowym, ile jest jego tłumaczeń w języku docelowym

Repozytorium Uniwersytetu Śląskiego RE-BUŚ

Proceedings of the Sixth International Conference Formal Approaches to South Slavic and Balkan languages

Author
Publication venue: Croatian Language Technologies Society, Faculty of Humanities and Social Science
Publication date: 01/01/2008
Field of study

Proceedings of the Sixth International Conference Formal Approaches to South Slavic and Balkan Languages publishes 22 papers that were presented at the conference organised in Dubrovnik, Croatia, 25-28 Septembre 2008

Repozitorij Filozofskog fakulteta u Zagrebu' at University of Zagreb

Action Categorisation in Multimodal Instructions

Author: Redeker Gisela
van der Sluis Ielka
Vergeer Renate
Publication venue
Publication date: 07/05/2018
Field of study

Dissertations of the University of Groningen

Use of Web mining for an actualized and coherent chatterbot dialogue

Author: Dosquet Benjamin
Magnant Xavier
Publication venue
Publication date: 01/01/2004
Field of study

Repository of the University of Namur

A Survey on Semantic Processing Techniques

Author: Cambria Erik
Chen Guanyi
He Kai
Mao Rui
Ni Jinjie
Yang Zonglin
Zhang Xulang
Publication venue
Publication date: 22/10/2023
Field of study

Semantic processing is a fundamental research domain in computational linguistics. In the era of powerful pre-trained language models and large language models, the advancement of research in this domain appears to be decelerating. However, the study of semantics is multi-dimensional in linguistics. The research depth and breadth of computational semantic processing can be largely improved with new technologies. In this survey, we analyzed five semantic processing tasks, e.g., word sense disambiguation, anaphora resolution, named entity recognition, concept extraction, and subjectivity detection. We study relevant theoretical research in these fields, advanced methods, and downstream applications. We connect the surveyed tasks with downstream applications because this may inspire future scholars to fuse these low-level semantic processing tasks with high-level natural language processing tasks. The review of theoretical research may also inspire new tasks and technologies in the semantic processing domain. Finally, we compare the different semantic processing techniques and summarize their technical trends, application trends, and future directions.Comment: Published at Information Fusion, Volume 101, 2024, 101988, ISSN 1566-2535. The equal contribution mark is missed in the published version due to the publication policies. Please contact Prof. Erik Cambria for detail

arXiv.org e-Print Archive