    Bulgariana: A Bulgarian Aggregator to Europeana

    Europeana is the European virtual museum which was established in 2008. Its ambition is to create a common space allowing to access the cultural heritage of Europe from a single portal, by creating a network in all European countries. To make the initiative technically possible, Europeana has built a technological infrastructure to aggregate metadata from the different Europeana countries and memory institutions, while keeping the original digitized content on their sites. This paper presents Bulgariana, the Bulgarian chapter of Europeana. It is part of the technical infrastructure of Europeana, an established technical aggregator of Bulgarian cultural heritage content. It uses the two Europeana representation models ESE and EDM. Bulgariana is also a community building initiative putting in place a Bulgarian wide network of professionals and institutions working together to preserve and present Bulgarian cultural heritage around the world

    Adverbs in the transfer module of MDS

    This manuscript describes the treatment of adverbs in the transfer module of the MDS version of VerbMobil. The following problems are discussed in it: 1) necessity, methodology, and results of the contrastive linguistic analysis pursued, 2) amount of the data, 3) connection between the semantic construction module and the transfer module, 4) transfer rules for adverbs, 5) English semantic lexicon, 6) analysis of the achieved, problems and improvements suggestions

    Query-Based Summarization: A survey

    This paper presents a survey of recent extractive query-based summarization techniques. We explore approaches for single document and multi-document summarization. Knowledge-based and machine learning methods for choosing the most relevant sentences from documents with respect to a given query are considered. Further, we expose tailored summarization techniques for particular domains like medical texts. The most recent developments in the field are presented with opinion summarization of blog entries.This research is supported by the SmartBook project, subsidized by the Bulgarian National Science Fund, under Grant D002-111 /15.12.2008

    ISO-DR-core plugs into ISO-dialogue acts for a cross-linguistic taxonomy of discourse markers

    The present paper proposes an interoperable taxonomy to represent the meaning of discourse markers based on ISO DR-core (ISO 24617-8) but with a plug-in to ISO-dialogue acts (ISO 24617-2). The proposed taxonomy encompasses two dimensions: the semantic, with values regarding the discourse relations signalled by discourse markers, and the pragmatic, with values concerning the communicative function realized by discourse markers. We present a proof of concept for this twodimensional taxonomy in a multilingual parallel dataset in three languages, English, European Portuguese and Bulgarian, comprising 165 textual segments with multiword discourse makers obtained from publicly available TED Talk transcripts. We show that the two-dimensional taxonomy can successfully annotate cross-linguistically the meaning of discourse markers and discuss linguistic evidence where extension of the proposed taxonomy can be relevant

    Semi-automatic generation of quizzes and learning artifacts from Linked Data

    In this position paper, we illustrate how Linked Data can be effectively used in a Technology-enhanced Learning scenario. Specifically, we aim at using structured data to semi-automatically generate artifacts to support learning delivery and assessment: natural language facts, Q&A systems and quizzes, also used with a gaming favour, can be creatively generated to help teachers and learners to support and improve the learning path. Moreover, those artifacts can in turn be published on the Web as Linked Data, thus directly contributing to make the Web a global data space also for learning purposes

    Machine Learning Methods for Discourse Marker Detection in Italian

    The latest advances in NLP, more precisely NLP Transformers, show great performance in building universal language representations. The trained vectors of words or sentences can provide unique representation for multiple languages, exclusively extracting semantic information from texts that is mapped into shared embedding space. This semantic information can be leveraged to train a model for specific downstream tasks, such as text classification, clustering, and others, while also leveraging semantic information for language understanding. The resulting model from the training phase can be universally used for all languages whose shared vector space is encompassed, thus avoiding the need to train separate models for each language individually

    Speaker Attitudes Detection through Discourse Markers Analysis

    Speaker attitude detection is important for processing opinionated text. Survey data as such provide a valuable source of information and research for different scientific disciplines. They are also of interest to practitioners such as policymakers, politicians, government bodies, educators, journalists, and all other stakeholders with occupations related to people and society. Survey data provide evidence about particular language phenomena and public attitudes to provide a broader picture about the clusters of social attitudes. In this regard, attitudinal discourse markers play a central role in the sense that they are pointers to the speaker's attitudes

    Izrada OWL ontologije za prikaz, povezivanje i pretraživanje SemAF diskursnih oznaka

    Linguistic Linked Open Data (LLOD) are technologies that provide a powerful instrument for representing and interpreting language phenomena on a web-scale. The main objective of this paper is to demonstrate how LLOD technologies can be applied to represent and annotate a corpus composed of multiword discourse markers, and what the effects of this are. In particular, it is our aim to apply semantic web standards such as RDF and OWL for publishing and integrating data. We present a novel scheme for discourse annotation that combines ISO standards describing discourse relations and dialogue acts – ISO DR-Core (ISO 24617-8) and ISO-Dialogue Acts (ISO 24617-2) in 9 languages (cf. Silvano and Damova 2022; Silvano, et al. 2022). We develop an OWL ontology to formalize that scheme, provide a newly annotated dataset and link its RDF edition with the ontology. Consequently, we describe the conjoint querying of the ontology and the annotations by means of SPARQL, the standard query language for the web of data. The ultimate result is that we are able to perform queries over multiple, interlinked datasets with complex internal structure. This is a first, but essential step, in developing novel, powerful, and groundbreaking means for the corpus-based study of multilingual discourse, communication analysis, or attitudes discovery.Diskursni markeri jezični su znakovi koji pokazuju kako se iskaz odnosi na kontekst diskursa i koju ulogu ima u razgovoru. Lingvistički povezani otvoreni podatci (LLOD) tehnologije su u nastajanju koje omogućuju snažan instrument za prikaz i tumačenje jezičnih fenomena na razini weba. Glavni je cilj ovoga rada pokazati kako se tehnologije lingvistički povezanih otvorenih podataka (LLOD) mogu primijeniti za prikaz i označavanje korpusa višerječnih diskursnih markera te koji su učinci toga. Konkretno, naš je cilj primijeniti standarde semantičkoga weba kao što su RDF i Web Ontology Language (OWL) za objavljivanje i integraciju podataka. Autori predstavljaju novu shemu za označavanje diskursa koja kombinira ISO standarde za opis diskursnih odnosa i dijaloških činova – ISO DR-Core (ISO 24617-8) i ISO-Dialogue Acts (ISO 24617-2) na devet jezika (usp. Silvano, Purificação et al. 2022a; Silvano, Purificação et al. 2022b). Razvijamo OWL ontologiju kako bismo formalizirali tu shemu, pružili nov označeni skup podataka i povezali njegovu RDF inačicu s ontologijom. U skladu s tim opisujemo zajedničko postavljanje upita ontologiji i oznakama s pomoću SPARQL-a, standardnoga jezika upita za web podataka. Konačni je rezultat taj da možemo izvršiti upite nad višestrukim, međusobno povezanim skupovima podataka sa složenom unutarnjom strukturom bez potrebe za ikakvim specijaliziranim softverom. Umjesto toga upotrebljavaju se gotove tehnologije utemeljene na web standardima koje se bez napora mogu prenijeti na različite operativne sustave, baze podataka i programske jezike. Ovo je prvi, ali prijeloman korak u razvoju novih, snažnih i (u određenom trenutku) pristupačnih sredstava za korpusno utemeljena istraživanja višejezičnoga diskursa te za analizu komunikacije i otkrivanje stavova