214 research outputs found
Language technologies for an eLearning scenario
One of the problems with eLearning platforms when collating together documents from different resources is the retrieval of documents and their accessibility. By providing documents with additional metadata using Language Technologies one enables users to access information more effectively. In this paper we present an overview of the objectives and results achieved for the LT4eL Project, which aims at providing Language Technologies to eLearning platforms and to integrate semantic knowledge to facilitate the management, distribution and retrieval of the learning material.peer-reviewe
Approaches towards a Lexical Web: the role of Interoperability
After highlighting some of the major dimensions that are relevant for Language Resources (LR) and contribute to their infrastructural role, I underline some priority areas of concern today with respect to implementing an open Language Infrastructure, and specifically what we could call a ?Lexical Web?. My objective is to show that it is imperative to define an underlying global strategy behind the set of initiatives which are/can be launched in Europe and world-wide, and that it is necessary an allembracing vision and a cooperation among different communities to achieve more coherent and useful results. I end up mentioning two new European initiatives that in this direction and promise to be influential in shaping the future of the LR area
Romanian Language Technology — a view from an academic perspective
The article reports on research and developments pursued by the Research Institute for Artificial Intelligence "Mihai Draganescu" of the Romanian Academy in order to narrow the gaps identified by the deep analysis on the European languages made by Meta-Net white papers and published by Springer in 2012. Except English, all the European languages needed significant research and development in order to reach an adequate technological level, in line with the expectations and requirements of the knowledge society
{YAGO}2: A Spatially and Temporally Enhanced Knowledge Base from {Wikipedia}
We present YAGO2, an extension of the YAGO knowledge base, in which entities, facts, and events are anchored in both time and space. YAGO2 is built automatically from Wikipedia, GeoNames, and WordNet. It contains 80 million facts about 9.8 million entities. Human evaluation confirmed an accuracy of 95\% of the facts in YAGO2. In this paper, we present the extraction methodology, the integration of the spatio-temporal dimension, and our knowledge representation SPOTL, an extension of the original SPO-triple model to time and space
VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation
We introduce VoxPopuli, a large-scale multilingual corpus providing 100K
hours of unlabelled speech data in 23 languages. It is the largest open data to
date for unsupervised representation learning as well as semi-supervised
learning. VoxPopuli also contains 1.8K hours of transcribed speeches in 16
languages and their aligned oral interpretations into 5 other languages
totaling 5.1K hours. We provide speech recognition baselines and validate the
versatility of VoxPopuli unlabelled data in semi-supervised learning under
challenging out-of-domain settings. We will release the corpus at
https://github.com/facebookresearch/voxpopuli under an open license.Comment: Accepted to ACL 2021 (long paper
Computational Etymology: Word Formation and Origins
While there are over seven thousand languages in the world, substantial language technologies exist only for a small percentage of these. The large majority of world languages do not have enough bilingual or even monolingual data for developing technologies like machine translation using current approaches. The computational study and modeling of word origins and word formation is a key step in developing comprehensive translation dictionaries for low-resource languages. This dissertation presents novel foundational work in computational etymology, a promising field which this work is pioneering. The dissertation also includes novel models of core vocabulary, dictionary information distillation, and of the diverse linguistic processes of word formation and concept realization between languages, including compounding, derivation, sense-extension, borrowing, and historical cognate relationships, utilizing statistical and neural models trained on the unprecedented scale of thousands of languages. Collectively these are important components in tackling the grand challenges of universal translation, endangered language documentation and revitalization, and supporting technologies for speakers of thousands of underserved languages
MSPGI : a geoportal feasibility study - Planning Authority MSP geoportal MSP Implementation Initiative
Directive 2014/89/EU calls for Member States to apply Maritime Spatial Planning (MSP) in their marine waters. In applying this framework, Member States are required to adopt a process to analyse and organise human activities to achieve ecological, economic and social objectives. The preparation of a MSP plan is the key deliverable expected from Member States and in doing so are expected to organise the use of the best available data, and decide how to organise the sharing of information necessary for MSP plans. The availability of information for stakeholders can also contribute towards effective co-ordination at a national level particularly in regulating different maritime sectors.EASME/EMFF/2015/1.2.1.3/02/SI2.742101peer-reviewe
Multidimensional opinion mining from social data
Social media popularity and importance is on the increase due to people using it for various types of social interaction across multiple channels. This thesis focuses on the evolving research area of Social Opinion Mining, tasked with the identification of multiple opinion dimensions, such as subjectivity, sentiment polarity, emotion, affect, sarcasm, and irony, from user-generated content represented across multiple social media platforms and in various media formats, like textual, visual, and audio. Mining people’s social opinions from social sources, such as social media platforms and newswires commenting
sections, is a valuable business asset that can be utilised in many ways and in multiple domains, such as Politics, Finance, and Government. The main objective of this research is to investigate how a multidimensional approach to Social Opinion Mining affects fine-grained opinion search and summarisation at an aspect-based level and whether such a multidimensional approach outperforms single dimension approaches in the context of an extrinsic human evaluation conducted in a real-world context: the Malta Government Budget, where five social opinion dimensions are taken into consideration, namely subjectivity, sentiment polarity, emotion, irony, and sarcasm. This human evaluation determines whether the multidimensional opinion summarisation results provide added-value to potential end-users, such as policy-makers and decision-takers, thereby providing a nuanced voice to the general public on their social opinions on topics of a national importance. Results obtained indicate that a more fine-grained aspect-based opinion summary based on the combined dimensions of subjectivity, sentiment polarity, emotion, and sarcasm or
irony is more informative and more useful than one based on sentiment polarity only. This research contributes towards the advancement of intelligent search and information retrieval from social data and impacts entities utilising Social Opinion Mining results towards effective policy formulation, policy-making, decision-making, and decision-taking at
a strategic level
Blogging the hyperlocal : the disruption and renegotiation of hegemony in Malta
This thesis examines how blogging is being deployed to disrupt institutional hegemony in Malta. The island state is an example of a hyperlocal context that includes strong political, ecclesiastical and media institutions, advanced take-up of social technologies and a popular culture adjusting to the promise of modernity represented by EU membership. Popular discourse is dominated by political partisanship and advocacy journalism, with Malta being the only European country that permits political parties to directly own broadcasting stations.The primary evidence in this study is derived from an analysis of online texts during an organic crisis that eventually led to a national referendum to consider the introduction of divorce legislation in Malta. Using netnography supplemented by critical discourse analysis, the research identifies a set of strategies bloggers used to resist, challenge and disrupt the discourse of a hegemonic alliance that included the ruling political party, the Roman Catholic Church and their media. The empirical results indicate that blogging in Malta is contributing to the erosion of the Church’s hegemony. Subjects that were previously marginalised as alternative are increasingly finding an online outlet in blog posts, social media networks and commentary on newspaper portals.Nevertheless, a culture of social surveillance together with the natural barriers of size and the permeability of the social web facilitates the appropriation of blogging by political blocs, who remain vigilant to the opportunity of extending their influence in new media to disrupt horizontal networks of information exchange. Blogging is increasingly operating as a component of a hybrid media ecosystem that thrives on reflexive cycles of entertainment: the independent newspaper media, for long an active partner in the hegemonic set up in Malta, are being transformed and rendered more permeable at the same time as their power and influence are being eroded. The study concludes that a new episteme is more likely to emerge through the symbiosis of hybrid media and reflexive waves of networked individualism than systemic, organised attempts at online political disruption
- …