34 research outputs found
Bulgariana: A Bulgarian Aggregator to Europeana
Europeana is the European virtual museum which was established in
2008. Its ambition is to create a common space allowing to access the cultural
heritage of Europe from a single portal, by creating a network in all European
countries. To make the initiative technically possible, Europeana has built a
technological infrastructure to aggregate metadata from the different Europeana
countries and memory institutions, while keeping the original digitized content
on their sites. This paper presents Bulgariana, the Bulgarian chapter of
Europeana. It is part of the technical infrastructure of Europeana, an established
technical aggregator of Bulgarian cultural heritage content. It uses the two
Europeana representation models ESE and EDM. Bulgariana is also a community building initiative putting in place a Bulgarian wide network of professionals and institutions working together to preserve and present Bulgarian cultural heritage around the world
Adverbs in the transfer module of MDS
This manuscript describes the treatment of adverbs in the transfer module of the MDS version of VerbMobil. The following problems are discussed in it: 1) necessity, methodology, and results of the contrastive linguistic analysis pursued, 2) amount of the data, 3) connection between the semantic construction module and the transfer module, 4) transfer rules for adverbs, 5) English semantic lexicon, 6) analysis of the achieved, problems and improvements suggestions
Query-Based Summarization: A survey
This paper presents a survey of recent extractive query-based
summarization techniques. We explore approaches for single document
and multi-document summarization. Knowledge-based and machine
learning methods for choosing the most relevant sentences from
documents with respect to a given query are considered. Further, we
expose tailored summarization techniques for particular domains like
medical texts. The most recent developments in the field are presented
with opinion summarization of blog entries.This research is supported by the SmartBook project,
subsidized by the Bulgarian National Science Fund, under Grant D002-111
/15.12.2008
ISO-DR-core plugs into ISO-dialogue acts for a cross-linguistic taxonomy of discourse markers
The present paper proposes an interoperable
taxonomy to represent the meaning of discourse
markers based on ISO DR-core (ISO
24617-8) but with a plug-in to ISO-dialogue
acts (ISO 24617-2). The proposed taxonomy
encompasses two dimensions: the semantic,
with values regarding the discourse relations
signalled by discourse markers, and the pragmatic,
with values concerning the communicative
function realized by discourse markers.
We present a proof of concept for this twodimensional
taxonomy in a multilingual parallel
dataset in three languages, English, European
Portuguese and Bulgarian, comprising
165 textual segments with multiword discourse
makers obtained from publicly available
TED Talk transcripts. We show that the
two-dimensional taxonomy can successfully annotate
cross-linguistically the meaning of discourse
markers and discuss linguistic evidence
where extension of the proposed taxonomy can
be relevant
Semi-automatic generation of quizzes and learning artifacts from Linked Data
In this position paper, we illustrate how Linked Data can be effectively used in a Technology-enhanced Learning scenario. Specifically, we aim at using structured data to semi-automatically generate artifacts to support learning delivery and assessment: natural language facts, Q&A systems and quizzes, also used with a gaming favour, can be creatively generated to help teachers and learners to support and improve the learning path. Moreover, those artifacts can in turn be published on the Web as Linked Data, thus directly contributing to make the Web a global data space also for learning purposes
Machine Learning Methods for Discourse Marker Detection in Italian
The latest advances in NLP, more precisely NLP Transformers, show great performance in building universal language representations. The trained vectors of words
or sentences can provide unique representation for multiple languages, exclusively extracting semantic information from texts that is mapped into shared embedding space.
This semantic information can be leveraged to train a model for specific downstream
tasks, such as text classification, clustering, and others, while also leveraging semantic
information for language understanding. The resulting model from the training phase
can be universally used for all languages whose shared vector space is encompassed,
thus avoiding the need to train separate models for each language individually
Speaker Attitudes Detection through Discourse Markers Analysis
Speaker attitude detection is important for processing opinionated text. Survey data as such provide a valuable source of information and research for different scientific disciplines. They are also of interest to practitioners such as policymakers, politicians, government bodies, educators, journalists, and all other stakeholders with occupations related to people and society. Survey data provide evidence about particular language phenomena and public attitudes to provide a broader picture about the clusters of social attitudes. In this regard, attitudinal discourse markers play a central role in the sense that they are pointers to the speaker's attitudes
Izrada OWL ontologije za prikaz, povezivanje i pretraživanje SemAF diskursnih oznaka
Linguistic Linked Open Data (LLOD) are technologies that provide a powerful instrument for representing and interpreting language phenomena on a web-scale. The main objective of this paper is to demonstrate how LLOD technologies can be applied to represent and annotate a corpus composed of multiword discourse markers, and what the effects of this are. In particular, it is our aim to apply semantic web standards such as RDF and OWL for publishing and integrating data. We present a novel scheme for discourse annotation that combines ISO standards describing discourse relations and dialogue acts – ISO DR-Core (ISO 24617-8) and ISO-Dialogue Acts (ISO 24617-2) in 9 languages (cf. Silvano and Damova 2022; Silvano, et al. 2022). We develop an OWL ontology to formalize that scheme, provide a newly annotated dataset and link its RDF edition with the ontology. Consequently, we describe the conjoint querying of the ontology and the annotations by means of SPARQL, the standard query language for the web of data. The ultimate result is that we are able to perform queries over multiple, interlinked datasets with complex internal structure. This is a first, but essential step, in developing novel, powerful, and groundbreaking means for the corpus-based study of multilingual discourse, communication analysis, or attitudes discovery.Diskursni markeri jezični su znakovi koji pokazuju kako se iskaz odnosi na kontekst diskursa i koju ulogu ima u razgovoru. Lingvistički povezani otvoreni podatci (LLOD) tehnologije su u nastajanju koje omogućuju snažan instrument za prikaz i tumačenje jezičnih fenomena na razini weba. Glavni je cilj ovoga rada pokazati kako se tehnologije lingvistički povezanih otvorenih podataka (LLOD) mogu primijeniti za prikaz i označavanje korpusa višerječnih diskursnih markera te koji su učinci toga. Konkretno, naš je cilj primijeniti standarde semantičkoga weba kao što su RDF i Web Ontology Language (OWL) za objavljivanje i integraciju podataka. Autori predstavljaju novu shemu za označavanje diskursa koja kombinira ISO standarde za opis diskursnih odnosa i dijaloških činova – ISO DR-Core (ISO 24617-8) i ISO-Dialogue Acts (ISO 24617-2) na devet jezika (usp. Silvano, Purificação et al. 2022a; Silvano, Purificação et al. 2022b). Razvijamo OWL ontologiju kako bismo formalizirali tu shemu, pružili nov označeni skup podataka i povezali njegovu RDF inačicu s ontologijom. U skladu s tim opisujemo zajedničko postavljanje upita ontologiji i oznakama s pomoću SPARQL-a, standardnoga jezika upita za web podataka. Konačni je rezultat taj da možemo izvršiti upite nad višestrukim, međusobno povezanim skupovima podataka sa složenom unutarnjom strukturom bez potrebe za ikakvim specijaliziranim softverom. Umjesto toga upotrebljavaju se gotove tehnologije utemeljene na web standardima koje se bez napora mogu prenijeti na različite operativne sustave, baze podataka i programske jezike. Ovo je prvi, ali prijeloman korak u razvoju novih, snažnih i (u određenom trenutku) pristupačnih sredstava za korpusno utemeljena istraživanja višejezičnoga diskursa te za analizu komunikacije i otkrivanje stavova