16 research outputs found
A Speaker De-Identification System Based on Sound Processing
In the context of products employing speech recognition, where the speech signal is sent from the device to centralized servers that process data, or simply products that involve data storage on servers, privacy for audio data is an important issue, just as it is for other types of data. Ignoring privacy has consequences for both, speakers (information leaks) and server administrators (legal issues). In this paper, we propose a speaker de-identification solution based on sound processing, altering voice characteristics, along with an API. Our solution consisting of pitch shift and noise mix (the latter is an optional augmentation method) has a great speaker de-identification performance, without an important loss in terms of word intelligibility. It is worth mentioning that sometimes the recordings may not be easy to understand in the initial (i.e., not de-identified) form, due to the speakerâs pronunciation, talking speed, and other related factors
A LargeâScale Eâvoting System Based on Blockchain
E-voting systems are increasingly used, considering the various facilities they offer: casting and counting votes in real time. The current voting systems are currently the target of attempted fraud and this is a major problem globally, which has not been solved even to this day. In the field of computer science, these e-voting platforms need to provide integrated security, thus enhancing the scalability and performance of the blockchainâbased eâvoting system. Our aim is to develop a secure internet-based voting system to maximize user participation, by allowing them to vote from anywhere. This paper proposes a system architecture based on blockchain technology along with a web interface in order to securely authenticate the voters on the platform. It should be noted in addition that these two components can be used together or separately, depending on the applicationâs needs
What Makes Your Writing Style Unique? Significant Differences Between Two Famous Romanian Orators
This paper introduces a novel, in-depth approach of analyzing the
differences in writing style between two famous Romanian orators, based on
automated textual complexity indices for Romanian language. The considered
authors are: (a) Mihai Eminescu, Romaniaâs national poet and a remarkable
journalist of his time, and (b) Ion C. BrÄtianu, one of the most important
Romanian politicians from the middle of the 18th century. Both orators have a
common journalistic interest consisting in their desire to spread the word about
political issues in Romania via the printing press, the most important public
voice at that time. In addition, both authors exhibit writing style particularities,
and our aim is to explore these differences through our ReaderBench framework
that computes a wide range of lexical and semantic textual complexity indices
for Romanian and other languages. The used corpus contains two collections of
speeches for each orator that cover the period 1857â1880. The results of this
study highlight the lexical and cohesive textual complexity indices that reflect
very well the differences in writing style, measures relying on Latent Semantic
Analysis (LSA) and Latent Dirichlet Allocation (LDA) semantic models.This study is part of the RAGE project. The RAGE project has received funding from the European Unionâs Horizon 2020 research and innovation programme under grant agreement No 644187. This publication reflects only the author's view. The European Commission is not responsible for any use that may be made of the information it contains
Opinion and Sentiment Analysis of Italian print press
As it is known, the success of a newspaper article for the public opinion can be measured by the degree in which the journalist is able to report and modify (if needed) attitudes, opinions, feelings and political beliefs. We present a symbolic system for Italian, derived from GETARUNS, which integrates a range of natural language processing tools with the intent to characterise the print press discourse. The system is multilingual and can produce deep text understanding. This has been done on some 500K words of text, extracted from three Italian newspaper in order to characterize their stance on a deep political crisis situation. We tried two different approaches: a lexicon-based approach for semantic polarity using off-the-shelf dictionaries with the addition of manually supervised domain related concepts; another one is a feature-based semantic and pragmatic approach, which computes propositional level analysis with the intent to better characterize important component like factuality and subjectivity. Results are quite revealing and confirm the otherwise common knowledge about the political stance of each newspaper on such topic as the change of government, that took placeatthe end of lastyear,2011
Opinion and Factivity Analysis of Italian political discourse
The success of a newspaper article for the public opinion can be measured by the degree in which the journalist is able to report and modify (if needed) attitudes, opinions, feelings and political beliefs. We present a symbolic system for Italian, derived from GETARUNS, which integrates a range of natural language processing tools with the intent to characterise the print press discourse from a semantic and pragmatic point of view. This has been done on some 500K words of text, extracted from three Italian newspapers in order to characterize their stance on a deep political crisis situation. We tried two different approaches: a lexicon-based approach for semantic polarity using off-the-shelf dictionaries with the addition of manually supervised domain related concepts; another one is a feature-based semantic and pragmatic approach, which computes propositional level analysis with the intent to better characterize important component like factuality and subjectivity. Results are quite revealing and confirm the otherwise common knowledge about the political stance of each newspaper on such topic as the change of government that took place at the end of last year, 2011
A survey of guidelines and best practices for the generation, interlinking, publication, and validation of linguistic linked data
This article discusses a survey carried out within the NexusLinguarum COST Action which aimed to give an overview of existing guidelines (GLs) and best practices (BPs) in linguistic linked data. In particular it focused on four core tasks in the production/publication of linked data: generation, interlinking, publication, and validation. We discuss the importance of GLs and BPs for LLD before describing the survey and its results in full. Finally we offer a number of directions for future work in order to address the findings of the survey
LLODIA: A Linguistic Linked Open Data Model for Diachronic Analysis
editorial reviewedThis article proposes a linguistic linked open data model for diachronic analysis (LLODIA) that combines data derived from diachronic analysis of multilingual corpora with dictionary-based evidence. A humanities use case was devised as a proof of concept that includes examples in five languages (French, Hebrew, Latin, Lithuanian and Romanian) related to various meanings of the term ârevolutionâ considered at different time intervals. The examples were compiled through diachronic word embedding and dictionary alignment
Workflow Reversal and Data Wrangling in Multilingual Diachronic Analysis and Linguistic Linked Open Data Modelling
peer reviewedThe article deals with data wrangling in a multilingual collection intended for diachronic analysis and linguistic linked open data modelling for tracing concept change over time. Two types of static word embeddings are used: word2vec (French and Hebrew data sets), and fastText (Latin and Lithuanian data sets). We model examples from these embeddings via the OntoLex-FrAC formalism. To address the challenge of heterogeneity, we use a minimalist workflow design allowing for both convergence and flexibility in attaining the project goals.CA18209 - European network for Web-centred linguistic data science (NexusLinguarum
When linguistics meets web technologies. Recent advances in modelling linguistic linked data
This article provides an up-to-date and comprehensive survey of models (including vocabularies, taxonomies and ontologies) used for representing linguistic linked data (LLD). It focuses on the latest developments in the area and both builds upon and complements previous works covering similar territory. The article begins with an overview of recent trends which have had an impact on linked data models and vocabularies, such as the growing influence of the FAIR guidelines, the funding of several major projects in which LLD is a key component, and the increasing importance of the relationship of the digital humanities with LLD. Next, we give an overview of some of the most well known vocabularies and models in LLD. After this we look at some of the latest developments in community standards and initiatives such as OntoLex-Lemon as well as recent work which has been in carried out in corpora and annotation and LLD including a discussion of the LLD metadata vocabularies META-SHARE and lime and language identifiers. In the following part of the paper we look at work which has been realised in a number of recent projects and which has a significant impact on LLD vocabularies and models