Search CORE

800 research outputs found

PersoNER: Persian named-entity recognition

Author: Abdous M
Borzeshi EZ
Piccardi M
Poostchi H
Publication venue
Publication date: 01/01/2016
Field of study

© 1963-2018 ACL. Named-Entity Recognition (NER) is still a challenging task for languages with low digital resources. The main difficulties arise from the scarcity of annotated corpora and the consequent problematic training of an effective NER pipeline. To abridge this gap, in this paper we target the Persian language that is spoken by a population of over a hundred million people world-wide. We first present and provide ArmanPerosNERCorpus, the first manually-annotated Persian NER corpus. Then, we introduce PersoNER, an NER pipeline for Persian that leverages a word embedding and a sequential max-margin classifier. The experimental results show that the proposed approach is capable of achieving interesting MUC7 and CoNNL scores while outperforming two alternatives based on a CRF and a recurrent neural network

OPUS - University of Technology Sydney

Learning Linear Transformations between Counting-based and Prediction-based Word Embeddings

Author: Bollegala D
Hayashi Kohei
Kawarabayashi Ken-ichi
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 19/09/2017
Field of study

Despite the growing interest in prediction-based word embedding learning methods, it remains unclear as to how the vector spaces learnt by the prediction-based methods differ from that of the counting-based methods, or whether one can be transformed into the other. To study the relationship between counting-based and prediction-based embeddings, we propose a method for learning a linear transformation between two given sets of word embeddings. Our proposal contributes to the word embedding learning research in three ways: (a) we propose an efficient method to learn a linear transformation between two sets of word embeddings, (b) using the transformation learnt in (a), we empirically show that it is possible to predict distributed word embeddings for novel unseen words, and (c) empirically it is possible to linearly transform counting-based embeddings to prediction-based embeddings, for frequent words, different POS categories, and varying degrees of ambiguities

University of Liverpool Repository

Directory of Open Access Journals

DeepOnto: A Python Package for Ontology Engineering with Deep Learning

Author: Allocca Carlo
Chen Jiaoyan
Dong Hang
He Yuan
Horrocks Ian
Kim Taehun
Sapkota Brahmananda
Publication venue
Publication date: 06/07/2023
Field of study

Applying deep learning techniques, particularly language models (LMs), in ontology engineering has raised widespread attention. However, deep learning frameworks like PyTorch and Tensorflow are predominantly developed for Python programming, while widely-used ontology APIs, such as the OWL API and Jena, are primarily Java-based. To facilitate seamless integration of these frameworks and APIs, we present Deeponto, a Python package designed for ontology engineering. The package encompasses a core ontology processing module founded on the widely-recognised and reliable OWL API, encapsulating its fundamental features in a more "Pythonic" manner and extending its capabilities to include other essential components including reasoning, verbalisation, normalisation, projection, and more. Building on this module, Deeponto offers a suite of tools, resources, and algorithms that support various ontology engineering tasks, such as ontology alignment and completion, by harnessing deep learning methodologies, primarily pre-trained LMs. In this paper, we also demonstrate the practical utility of Deeponto through two use-cases: the Digital Health Coaching in Samsung Research UK and the Bio-ML track of the Ontology Alignment Evaluation Initiative (OAEI).Comment: under review at Semantic Web Journa

arXiv.org e-Print Archive

Knowledge Graphs Evolution and Preservation -- A Technical Report from ISWS 2019

Author: Ahmad Sakor
Alba Catalina Morales Tirado
Alessandro Umbrico
Allard Oelen
Amine Dadoun
Aneta Koleva
Anna Nguyen
Ariam Rivas Mendez
Axel Polleres
Bilal Koteich
Chang Sun
Chuangtao Ma
Claudia d'Amato
Eleonora Marzi
Fabio Mariani
Federico Igne
Felix Bensmann
Frances Gillis-Webber
Francesca Alloatti
Francesca Giovannetti
Genet Asefa Gesese
Gianmarco Spinaci
Glenda Amaral
Harald Sack
Harm Delva
Heiko Paulheim
Irene Celino
Ismail Harrando
Ivan Heibi
Jaime Salas
Jan Portisch
John Domingue
Kabul Kurniawan
Kader Pustu-Iren
Kholoud Alghamdi
Laurine Huber
Lientje Maas
Ling Cai
Luigi Asprino
Maheshkumar Mistry
Marc Gallofré Ocaña
Margherita Porena
Marieke van Erp
Martin Beno
Martin Mansfield
Marìa Granados Buey
Meilin Shi
Mengya Liu
Michalis Georgiou
Michel Dumontier
Mohamad Yaser Jaradeh
Molka Tounsi Dhouib
Mortaza Alinam
Nacira Abbas
Neha Keshan
Omaima Fallatah
Paola Espinoza Arias
Riley Capshaw
Russa Biswas
Sebastian Rudolph
Sebastián Ferrada
Sepideh Mesbah
Soheil Roshankish
Stefano De Giorgis
Tabea Tietz
Thomas Schleider
Valentina Anita Carriero
Valentina Pasqual
Valentina Presutti
Viet Bach Nguyen
Vincent Emonet
Vitor Horta
Weiqin Xu
Wouter van den Berg
Publication venue
Publication date: 01/01/2020
Field of study

One of the grand challenges discussed during the Dagstuhl Seminar "Knowledge Graphs: New Directions for Knowledge Representation on the Semantic Web" and described in its report is that of a: "Public FAIR Knowledge Graph of Everything: We increasingly see the creation of knowledge graphs that capture information about the entirety of a class of entities. [...] This grand challenge extends this further by asking if we can create a knowledge graph of "everything" ranging from common sense concepts to location based entities. This knowledge graph should be "open to the public" in a FAIR manner democratizing this mass amount of knowledge." Although linked open data (LOD) is one knowledge graph, it is the closest realisation (and probably the only one) to a public FAIR Knowledge Graph (KG) of everything. Surely, LOD provides a unique testbed for experimenting and evaluating research hypotheses on open and FAIR KG. One of the most neglected FAIR issues about KGs is their ongoing evolution and long term preservation. We want to investigate this problem, that is to understand what preserving and supporting the evolution of KGs means and how these problems can be addressed. Clearly, the problem can be approached from different perspectives and may require the development of different approaches, including new theories, ontologies, metrics, strategies, procedures, etc. This document reports a collaborative effort performed by 9 teams of students, each guided by a senior researcher as their mentor, attending the International Semantic Web Research School (ISWS 2019). Each team provides a different perspective to the problem of knowledge graph evolution substantiated by a set of research questions as the main subject of their investigation. In addition, they provide their working definition for KG preservation and evolution

Archivio istituzionale della ricerca - Università di Bari

Coherence in Machine Translation

Author: Sim Smith Karin M
Publication venue: 'University of Sheffield Conference Proceedings'
Publication date: 01/04/2018
Field of study

Coherence ensures individual sentences work together to form a meaningful document. When properly translated, a coherent document in one language should result in a coherent document in another language. In Machine Translation, however, due to reasons of modeling and computational complexity, sentences are pieced together from words or phrases based on short context windows and with no access to extra-sentential context. In this thesis I propose ways to automatically assess the coherence of machine translation output. The work is structured around three dimensions: entity-based coherence, coherence as evidenced via syntactic patterns, and coherence as evidenced via discourse relations. For the first time, I evaluate existing monolingual coherence models on this new task, identifying issues and challenges that are specific to the machine translation setting. In order to address these issues, I adapted a state-of-the-art syntax model, which also resulted in improved performance for the monolingual task. The results clearly indicate how much more difficult the new task is than the task of detecting shuffled texts. I proposed a new coherence model, exploring the crosslingual transfer of discourse relations in machine translation. This model is novel in that it measures the correctness of the discourse relation by comparison to the source text rather than to a reference translation. I identified patterns of incoherence common across different language pairs, and created a corpus of machine translated output annotated with coherence errors for evaluation purposes. I then examined lexical coherence in a multilingual context, as a preliminary study for crosslingual transfer. Finally, I determine how the new and adapted models correlate with human judgements of translation quality and suggest that improvements in general evaluation within machine translation would benefit from having a coherence component that evaluated the translation output with respect to the source text

White Rose E-theses Online