371 research outputs found
Predicate Matrix: an interoperable lexical knowledge base for predicates
183 p.La Matriz de Predicados (Predicate Matrix en inglés) es un nuevo recurso léxico-semántico resultado de la integración de múltiples fuentes de conocimiento, entre las cuales se encuentran FrameNet, VerbNet, PropBank y WordNet. La Matriz de Predicados proporciona un léxico extenso y robusto que permite mejorar la interoperabilidad entre los recursos semánticos mencionados anteriormente. La creación de la Matriz de Predicados se basa en la integración de Semlink y nuevos mappings obtenidos utilizando métodos automáticos que enlazan el conocimiento semántico a nivel léxico y de roles. Asimismo, hemos ampliado la Predicate Matrix para cubrir los predicados nominales (inglés, español) y predicados en otros idiomas (castellano, catalán y vasco). Como resultado, la Matriz de predicados proporciona un léxico multilingüe que permite el análisis semántico interoperable en múltiples idiomas
Workshop Proceedings of the 12th edition of the KONVENS conference
The 2014 issue of KONVENS is even more a forum for exchange: its main topic is the interaction between Computational Linguistics and Information Science, and the synergies such interaction, cooperation and integrated views can produce. This topic at the crossroads of different research traditions which deal with natural language as a container of knowledge, and with methods to extract and manage knowledge that is linguistically represented is close to the heart of many researchers at the Institut für Informationswissenschaft und Sprachtechnologie of Universität Hildesheim: it has long been one of the institute’s research topics, and it has received even more attention over the last few years
Machine Learning Algorithm for the Scansion of Old Saxon Poetry
Several scholars designed tools to perform the automatic scansion of poetry in many languages, but none of these tools
deal with Old Saxon or Old English. This project aims to be a first attempt to create a tool for these languages. We
implemented a Bidirectional Long Short-Term Memory (BiLSTM) model to perform the automatic scansion of Old Saxon
and Old English poems. Since this model uses supervised learning, we manually annotated the Heliand manuscript, and
we used the resulting corpus as labeled dataset to train the model. The evaluation of the performance of the algorithm
reached a 97% for the accuracy and a 99% of weighted average for precision, recall and F1 Score. In addition, we tested
the model with some verses from the Old Saxon Genesis and some from The Battle of Brunanburh, and we observed that
the model predicted almost all Old Saxon metrical patterns correctly misclassified the majority of the Old English input
verses
The automatic processing of multiword expressions in Irish
It is well-documented that Multiword Expressions (MWEs) pose a unique challenge
to a variety of NLP tasks such as machine translation, parsing, information retrieval,
and more. For low-resource languages such as Irish, these challenges can be exacerbated by the scarcity of data, and a lack of research in this topic. In order to
improve handling of MWEs in various NLP tasks for Irish, this thesis will address
both the lack of resources specifically targeting MWEs in Irish, and examine how
these resources can be applied to said NLP tasks.
We report on the creation and analysis of a number of lexical resources as part
of this PhD research. Ilfhocail, a lexicon of Irish MWEs, is created through extract-
ing MWEs from other lexical resources such as dictionaries. A corpus annotated
with verbal MWEs in Irish is created for the inclusion of Irish in the PARSEME
Shared Task 1.2. Additionally, MWEs were tagged in a bilingual EN-GA corpus
for inclusion in experiments in machine translation. For the purposes of annotation, a categorisation scheme for nine categories of MWEs in Irish is created, based
on combining linguistic analysis on these types of constructions and cross-lingual
frameworks for defining MWEs.
A case study in applying MWEs to NLP tasks is undertaken, with the exploration of incorporating MWE information while training Neural Machine Translation
systems. Finally, the topic of automatic identification of Irish MWEs is explored,
documenting the training of a system capable of automatically identifying Irish
MWEs from a variety of categories, and the challenges associated with developing
such a system.
This research contributes towards a greater understanding of Irish MWEs and
their applications in NLP, and provides a foundation for future work in exploring
other methods for the automatic discovery and identification of Irish MWEs, and
further developing the MWE resources described above
Geographic information extraction from texts
A large volume of unstructured texts, containing valuable geographic information, is available online. This information – provided implicitly or explicitly – is useful not only for scientific studies (e.g., spatial humanities) but also for many practical applications (e.g., geographic information retrieval). Although large progress has been achieved in geographic information extraction from texts, there are still unsolved challenges and issues, ranging from methods, systems, and data, to applications and privacy. Therefore, this workshop will provide a timely opportunity to discuss the recent advances, new ideas, and concepts but also identify research gaps in geographic information extraction
Predictive embodied concepts: an exploration of higher cognition within the predictive processing paradigm
Predictive processing, an increasingly popular paradigm in cognitive sciences, has
focused primarily on giving accounts of perception, motor control and a host of
psychological phenomena, including consciousness. But higher cognitive processes,
like conceptual thought, language, and logic, have received only limited attention to
date and PP still stands disconnected from a huge body of research in those areas.
In this thesis, I aim to address this gap and I attempt to go some way towards
developing and defending a cognitive-computational approach to higher cognition
within the predictive processing paradigm. To test its explanatory potential, I apply it
to a range of linguistic and conceptual phenomena. I proceed in three steps. First, I
lay out an account of concepts and suggest how concepts are represented, how they
can be context-sensitively processed, and how the apparent diversity of formats
arise. Secondly, I propose how paradigmatic higher cognitive competencies, like
language and logical reasoning, could fit into the PP picture. Thirdly, I apply the PP
account of concepts and language to a range of linguistic-conceptual phenomena as
test cases, namely: metaphor, the semantic paradox (specifically the Liar Paradox)
and copredication. Finally, I discuss some challenges and objections to the PP
framework as applied to higher cognition and in general
Modern trends in digital transformation of marketing & management
The monograph examines the current trends in the development of digital technologies in marketing, management and business administration. The prospects for the development of digital technologies in various sectors of the economy of Ukraine and the trends of the influence of digital technologies on global shifts in the systems of marketing management and business administration are determined. The transformations of business models in the conditions of the digital economy are analyzed, the impact of blockchain technologies on the development of promising areas of the marketing management system and business administration is analyzed. Reasonable impact of digital technologies on the transformation of management systems in social, public, legal and administrative spheres and various sectors of the economy. The contours of the formation of the digital economy in the sectors of economic activity and the social sphere have been developed
Jornadas Nacionales de Investigación en Ciberseguridad: actas de las VIII Jornadas Nacionales de Investigación en ciberseguridad: Vigo, 21 a 23 de junio de 2023
Jornadas Nacionales de Investigación en Ciberseguridad (8ª. 2023. Vigo)atlanTTicAMTEGA: Axencia para a modernización tecnolóxica de GaliciaINCIBE: Instituto Nacional de Cibersegurida
Translation Alignment Applied to Historical Languages: methods, evaluation, applications, and visualization
Translation alignment is an essential task in Digital Humanities and Natural
Language Processing, and it aims to link words/phrases in the source
text with their translation equivalents in the translation. In addition to
its importance in teaching and learning historical languages, translation
alignment builds bridges between ancient and modern languages through
which various linguistics annotations can be transferred. This thesis focuses
on word-level translation alignment applied to historical languages in general
and Ancient Greek and Latin in particular. As the title indicates, the thesis
addresses four interdisciplinary aspects of translation alignment.
The starting point was developing Ugarit, an interactive annotation tool
to perform manual alignment aiming to gather training data to train an
automatic alignment model. This effort resulted in more than 190k accurate
translation pairs that I used for supervised training later. Ugarit has been
used by many researchers and scholars also in the classroom at several
institutions for teaching and learning ancient languages, which resulted
in a large, diverse crowd-sourced aligned parallel corpus allowing us to
conduct experiments and qualitative analysis to detect recurring patterns in
annotators’ alignment practice and the generated translation pairs.
Further, I employed the recent advances in NLP and language modeling to
develop an automatic alignment model for historical low-resourced languages,
experimenting with various training objectives and proposing a training
strategy for historical languages that combines supervised and unsupervised
training with mono- and multilingual texts. Then, I integrated this alignment
model into other development workflows to project cross-lingual annotations
and induce bilingual dictionaries from parallel corpora.
Evaluation is essential to assess the quality of any model. To ensure employing the best practice, I reviewed the current evaluation procedure, defined
its limitations, and proposed two new evaluation metrics. Moreover, I introduced a visual analytics framework to explore and inspect alignment gold
standard datasets and support quantitative and qualitative evaluation of
translation alignment models. Besides, I designed and implemented visual
analytics tools and reading environments for parallel texts and proposed
various visualization approaches to support different alignment-related tasks
employing the latest advances in information visualization and best practice.
Overall, this thesis presents a comprehensive study that includes manual and
automatic alignment techniques, evaluation methods and visual analytics
tools that aim to advance the field of translation alignment for historical
languages
- …