Search CORE

20 research outputs found

Applying Deep Learning for Arabic Keyphrase Extraction

Author: Helmy Muhammad
Serra Giuseppe
Tasso Carlo
Vigneshram R. M.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

Archivio istituzionale della ricerca - Università degli Studi di Udine

Knowledge-Based Techniques for Scholarly Data Access: Towards Automatic Curation

Author: De Nart Dario
Publication venue: Università degli Studi di Udine
Publication date: 03/04/2017
Field of study

Accessing up-to-date and quality scientific literature is a critical preliminary step in any research activity. Identifying relevant scholarly literature for the extents of a given task or application is, however a complex and time consuming activity. Despite the large number of tools developed over the years to support scholars in their literature surveying activity, such as Google Scholar, Microsoft Academic search, and others, the best way to access quality papers remains asking a domain expert who is actively involved in the field and knows research trends and directions. State of the art systems, in fact, either do not allow exploratory search activity, such as identifying the active research directions within a given topic, or do not offer proactive features, such as content recommendation, which are both critical to researchers. To overcome these limitations, we strongly advocate a paradigm shift in the development of scholarly data access tools: moving from traditional information retrieval and filtering tools towards automated agents able to make sense of the textual content of published papers and therefore monitor the state of the art. Building such a system is however a complex task that implies tackling non trivial problems in the fields of Natural Language Processing, Big Data Analysis, User Modelling, and Information Filtering. In this work, we introduce the concept of Automatic Curator System and present its fundamental components.openDottorato di ricerca in InformaticaopenDe Nart, Dari

Archivio istituzionale della ricerca - Università degli Studi di Udine

A Survey on Semantic Processing Techniques

Author: Cambria Erik
Chen Guanyi
He Kai
Mao Rui
Ni Jinjie
Yang Zonglin
Zhang Xulang
Publication venue
Publication date: 22/10/2023
Field of study

Semantic processing is a fundamental research domain in computational linguistics. In the era of powerful pre-trained language models and large language models, the advancement of research in this domain appears to be decelerating. However, the study of semantics is multi-dimensional in linguistics. The research depth and breadth of computational semantic processing can be largely improved with new technologies. In this survey, we analyzed five semantic processing tasks, e.g., word sense disambiguation, anaphora resolution, named entity recognition, concept extraction, and subjectivity detection. We study relevant theoretical research in these fields, advanced methods, and downstream applications. We connect the surveyed tasks with downstream applications because this may inspire future scholars to fuse these low-level semantic processing tasks with high-level natural language processing tasks. The review of theoretical research may also inspire new tasks and technologies in the semantic processing domain. Finally, we compare the different semantic processing techniques and summarize their technical trends, application trends, and future directions.Comment: Published at Information Fusion, Volume 101, 2024, 101988, ISSN 1566-2535. The equal contribution mark is missed in the published version due to the publication policies. Please contact Prof. Erik Cambria for detail

arXiv.org e-Print Archive

Representing relational knowledge with language models

Author: Ushio Asahi
Publication venue
Publication date
Field of study

Relational knowledge is the ability to recognize the relationship between instances, and it has an important role in human understanding a concept or commonsense reasoning. We, humans, structure our knowledge by understanding individual instances together with the relationship among them, which enables us to further expand the knowledge. Nevertheless, modelling relational knowledge with computational models is a long-standing challenge in Natural Language Processing (NLP). The main difficulty at acquiring relational knowledge arises from the generalization capability. For pre-trained Language Model (LM), in spite of the huge impact made in NLP, relational knowledge remains understudied. In fact, GPT-3 (Brown et al., 2020), one of the largest LM at the time being with 175 billions of parameters, has shown to perform worse than a traditional statistical baseline in an analogy benchmark. Our initial results hinted at the type of relational knowledge encoded in some of the LMs. However, we found out that such knowledge can be hardly extracted with a carefully designed method tuned on a task specific validation set. According to such finding, we proposed a method (RelBERT) for distilling relational knowledge via LM fine-tuning. This method successfully retrieves flexible relation embeddings that achieve State-of-The-Art (SoTA) in various analogy benchmarks. Moreover, it exhibits a high generalization ability to be able to handle relation types that are not included in the training data. Finally, we propose a new task of modelling graded relation in named entities, which reveals some limitations of recent SoTA LMs as well as RelBERT, suggesting future research direction to model relational knowledge in the current LM era, especially when it comes to named entities

Online Research @ Cardiff

Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation

Author
Publication venue: The Association for Computational Linguistics
Publication date: 19/04/2021
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto

The Future of Information Sciences : INFuture2009 : Digital Resources and Knowledge Sharing

Author
Publication venue: Department of Information Sciences, Faculty of Humanities and Social Sciences, University of Zagreb
Publication date: 01/11/2009
Field of study

Repozitorij Filozofskog fakulteta u Zagrebu' at University of Zagreb

One Model to Rule them all: Multitask and Multilingual Modelling for Lexical Analysis

Author: Bjerva Johannes
Publication venue
Publication date: 01/01/2017
Field of study

When learning a new skill, you take advantage of your preexisting skills and knowledge. For instance, if you are a skilled violinist, you will likely have an easier time learning to play cello. Similarly, when learning a new language you take advantage of the languages you already speak. For instance, if your native language is Norwegian and you decide to learn Dutch, the lexical overlap between these two languages will likely benefit your rate of language acquisition. This thesis deals with the intersection of learning multiple tasks and learning multiple languages in the context of Natural Language Processing (NLP), which can be defined as the study of computational processing of human language. Although these two types of learning may seem different on the surface, we will see that they share many similarities. The traditional approach in NLP is to consider a single task for a single language at a time. However, recent advances allow for broadening this approach, by considering data for multiple tasks and languages simultaneously. This is an important approach to explore further as the key to improving the reliability of NLP, especially for low-resource languages, is to take advantage of all relevant data whenever possible. In doing so, the hope is that in the long term, low-resource languages can benefit from the advances made in NLP which are currently to a large extent reserved for high-resource languages. This, in turn, may then have positive consequences for, e.g., language preservation, as speakers of minority languages will have a lower degree of pressure to using high-resource languages. In the short term, answering the specific research questions posed should be of use to NLP researchers working towards the same goal.Comment: PhD thesis, University of Groninge

arXiv.org e-Print Archive

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Searching to Translate and Translating to Search: When Information Retrieval Meets Machine Translation

Author: Ture Ferhan
Publication venue
Publication date: 01/01/2013
Field of study

With the adoption of web services in daily life, people have access to tremendous amounts of information, beyond any human's reading and comprehension capabilities. As a result, search technologies have become a fundamental tool for accessing information. Furthermore, the web contains information in multiple languages, introducing another barrier between people and information. Therefore, search technologies need to handle content written in multiple languages, which requires techniques to account for the linguistic differences. Information Retrieval (IR) is the study of search techniques, in which the task is to find material relevant to a given information need. Cross-Language Information Retrieval (CLIR) is a special case of IR when the search takes place in a multi-lingual collection. Of course, it is not helpful to retrieve content in languages the user cannot understand. Machine Translation (MT) studies the translation of text from one language into another efficiently (within a reasonable amount of time) and effectively (fluent and retaining the original meaning), which helps people understand what is being written, regardless of the source language. Putting these together, we observe that search and translation technologies are part of an important user application, calling for a better integration of search (IR) and translation (MT), since these two technologies need to work together to produce high-quality output. In this dissertation, the main goal is to build better connections between IR and MT, for which we present solutions to two problems: Searching to translate explores approximate search techniques for extracting bilingual data from multilingual Wikipedia collections to train better translation models. Translating to search explores the integration of a modern statistical MT system into the cross-language search processes. In both cases, our best-performing approach yielded improvements over strong baselines for a variety of language pairs. Finally, we propose a general architecture, in which various components of IR and MT systems can be connected together into a feedback loop, with potential improvements to both search and translation tasks. We hope that the ideas presented in this dissertation will spur more interest in the integration of search and translation technologies

CiteSeerX

Digital Repository at the University of Maryland

Entity-Oriented Search

Author: Balog Krisztian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/02/2021
Field of study

This open access book covers all facets of entity-oriented search—where “search” can be interpreted in the broadest sense of information access—from a unified point of view, and provides a coherent and comprehensive overview of the state of the art. It represents the first synthesis of research in this broad and rapidly developing area. Selected topics are discussed in-depth, the goal being to establish fundamental techniques and methods as a basis for future research and development. Additional topics are treated at a survey level only, containing numerous pointers to the relevant literature. A roadmap for future research, based on open issues and challenges identified along the way, rounds out the book. The book is divided into three main parts, sandwiched between introductory and concluding chapters. The first two chapters introduce readers to the basic concepts, provide an overview of entity-oriented search tasks, and present the various types and sources of data that will be used throughout the book. Part I deals with the core task of entity ranking: given a textual query, possibly enriched with additional elements or structural hints, return a ranked list of entities. This core task is examined in a number of different variants, using both structured and unstructured data collections, and numerous query formulations. In turn, Part II is devoted to the role of entities in bridging unstructured and structured data. Part III explores how entities can enable search engines to understand the concepts, meaning, and intent behind the query that the user enters into the search box, and how they can provide rich and focused responses (as opposed to merely a list of documents)—a process known as semantic search. The final chapter concludes the book by discussing the limitations of current approaches, and suggesting directions for future research. Researchers and graduate students are the primary target audience of this book. A general background in information retrieval is sufficient to follow the material, including an understanding of basic probability and statistics concepts as well as a basic knowledge of machine learning concepts and supervised learning algorithms

Directory of Open Access Books (DOAB)

Applied Deep Learning: Case Studies in Computer Vision and Natural Language Processing

Author: Hoque Md Reshad Ul
Publication venue: ODU Digital Commons
Publication date: 01/08/2022
Field of study

Deep learning has proved to be successful for many computer vision and natural language processing applications. In this dissertation, three studies have been conducted to show the efficacy of deep learning models for computer vision and natural language processing. In the first study, an efficient deep learning model was proposed for seagrass scar detection in multispectral images which produced robust, accurate scars mappings. In the second study, an arithmetic deep learning model was developed to fuse multi-spectral images collected at different times with different resolutions to generate high-resolution images for downstream tasks including change detection, object detection, and land cover classification. In addition, a super-resolution deep model was implemented to further enhance remote sensing images. In the third study, a deep learning-based framework was proposed for fact-checking on social media to spot fake scientific news. The framework leveraged deep learning, information retrieval, and natural language processing techniques to retrieve pertinent scholarly papers for given scientific news and evaluate the credibility of the news

Old Dominion University