3,260 research outputs found

    Multilingual Twitter Sentiment Classification: The Role of Human Annotators

    Get PDF
    What are the limits of automated Twitter sentiment classification? We analyze a large set of manually labeled tweets in different languages, use them as training data, and construct automated classification models. It turns out that the quality of classification models depends much more on the quality and size of training data than on the type of the model trained. Experimental results indicate that there is no statistically significant difference between the performance of the top classification models. We quantify the quality of training data by applying various annotator agreement measures, and identify the weakest points of different datasets. We show that the model performance approaches the inter-annotator agreement when the size of the training set is sufficiently large. However, it is crucial to regularly monitor the self- and inter-annotator agreements since this improves the training datasets and consequently the model performance. Finally, we show that there is strong evidence that humans perceive the sentiment classes (negative, neutral, and positive) as ordered

    The strategic impact of META-NET on the regional, national and international level

    Get PDF
    This article provides an overview of the dissemination work carried out in META-NET from 2010 until 2015; we describe its impact on the regional, national and international level, mainly with regard to politics and the funding situation for LT topics. The article documents the initiative's work throughout Europe in order to boost progress and innovation in our field.Peer ReviewedPostprint (author's final draft

    Hungarian neutral vowels

    Get PDF
    In Hungarian, stems containing only front unrounded (neutral) vowels fall into two groups: one group taking front suffixes, the other taking back suffixes in vowel harmony. The distinction is traditionally thought of as purely lexical. Beňuš and Gafos (2007) have recently challenged this position, claiming that there are significant articulatory differences between the vowels in the two groups. Neutral vowels also occur in vacillating stems. These typically contain one back vowel and one or more neutral vowels, and accept both front and back suffixes, with extensive inter- and intra-speaker variation. Based on Beňuš and Gafos’s line of argument, the expectation is that vacillating stems will display a kind of phonetic realisation that is distinct from both harmonic and anti-harmonic stems. We present the results of an ongoing acoustic study on the acoustics of neutral vowels, partly re-creating Beňuš and Gafos’s conditions, but also including vacillating stems. To map the extent of individual and dialectal variation regarding vacillating stems, a grammaticality judgement test was also carried out on speakers of two dialects of Hungarian, crucially differing in the surface inventory of neutral vowels. We present our first findings about how this phonetic difference influences the phonological behaviour of vacillating stems

    Vektorski prikaz riječi utemeljen na velikim mrežnim korpusima kao moćan leksikografski alat

    Get PDF
    The Aranea Project offers a set of comparable corpora for two dozens of (mostly European) languages providing a convenient dataset for nLP applications that require training on large amounts of data. The article presents word embedding models trained on the Aranea corpora and an online interface to query the models and visualize the results. The implementation is aimed towards lexicographic use but can be also useful in other fields of linguistic study since the vector space is a plausible model of semantic space of word meanings. Three different models are available – one for a combination of part of speech and lemma, one for raw word forms, and one based on fastText algorithm uses subword vectors and is not limited to whole or known words in finding their semantic relations. The article is describing the interface and major modes of its functionality; it does not try to perform detailed linguistic analysis of presented examples.Projekt Aranea sadržava niz usporednih korpusa za 24 (uglavnom europskih) jezika. On pruža prikladan skup podataka za aplikacije za obradu prirodnoga jezika (nLP) koje zahtijevaju učenje na velikoj količini podataka. U radu se prikazuju modeli vektorskoga prikaza riječi koji su uspostavljeni učenjem na korpusima Aranea te mrežno sučelje kako bi se propitali modeli i vizualizirali rezultati. To može biti korisno za leksikografsku praksu, ali i u drugim područjima leksikografskoga proučavanja jer je vektorski prostor vjerodostojan model semantičkoga prostora značenja riječi. Postoje tri moguća modela: prvi za kombinaciju vrste riječi i leme, drugi za sirove forme riječi i treći koji se temelji na algoritmu fastText koji upotrebljava vektore na razini nižoj od riječi i nije ograničen na cijele riječi ili poznate riječi pri pronalaženju semantičkih odnosa. U radu se opisuju sučelje i osnovni modeli njegova funkcioniranja, ali se ne pokušava provesti iscrpna jezična analiza prikazanih primjera

    Associates or zamestnanci? Language choice, attitudes and code-switching practices: The case of workplace email communication in Slovakia

    Full text link
    [ES] El cambio de co'digo esta' sujeto a una gran variedad de factores que dependen del medio de comunicacio'n y de la situacio'n comunicativa. En general, segu'n la lingu¿i'stica, el cambio de co'digo ocurre cuando un hablante alterna entre dos o ma's lenguas o variedades de una lengua en una misma conversacio'n. Estas pra'cticas comunicativas se han tratado en muchos contextos, lenguas y contacto entre culturas. Sin embargo, no hay estudios sobre el tema en el contexto eslovaco, de ahi' la relevancia del trabajo recogido en esta tesis doctoral que tiene como fin u'ltimo paliar esta escasez de estudios lingu¿i'sticos. En concreto, aqui' analizamos el cambio de co'digo exclusivamente en comunicaciones realizadas por correo electro'nico en un entorno laboral multilingu¿e y multicultural. El estudio se centra en la eleccio'n de lengua, las actitudes hacia una lengua y la mezcla de lenguas en las comunicaciones entre los compan¿eros de trabajo de una multinacional hotelera radicada en Eslovaquia. El ana'lisis examina u'nicamente los correos escritos en eslovaco que presentan cambios de co'digo al ingle's. Debido a la naturaleza multidisciplinar de este proyecto de investigacio'n, asi' como a su cara'cter dual, es decir, el examen de las actitudes hacia una lengua por una parte y las pra'cticas de cambio de co'digo por otra, esta tesis plantea varias preguntas de investigacio'n y tiene una serie de objetivos que pasamos a detallar. El objetivo principal del estudio cuantitativo basado en el cuestionario disen¿ado es examinar hasta que' punto los encuestados son conscientes de los cambios de co'digo al ingle's durante sus comunicaciones, en particular durante sus conversaciones electro'nicas, y determinar las razones por las que se lleva a cabo este cambio de co'digo, sacando a la luz las actitudes hacia este feno'meno lingu¿i'stico. Adema's, dado que los conocimientos que aportan los cuestionarios son generalmente limitados, se ha llevado a cabo un ana'lisis del discurso para observar ma's detalladamente el alcance de los cambios de lengua, los tipos y las funciones que presentan. La metodologi'a empleada, que sigue el me'todo mixto de investigacio'n, se utiliza para analizar las motivaciones y las razones por las que nuestros participantes prefieren usar el ingle's en lugar de su lengua nativa. Asi' pues, esta tesis doctoral recoge el primer ana'lisis completo de este tipo sobre el cambio de co'digo eslovaco/ingle's en la comunicacio'n electro'nica que examina conversaciones aute'nticas por Internet dentro de una corporacio'n. Palabras clave: cambio de co'digo, comunicacio'n electro'nica, correo electro'nico, comunicacio'n en entornos laborales, actitudes hacia la lengua[CA] El canvi de codi esta¿ subjecte a una gran varietat de factors que depenen del mitja¿ de comunicacio' i de la situacio' comunicativa. En general, segons la lingu¿i'stica, el canvi de codi ocorre quan un parlant alterna entre dues o me's llengu¿es o varietats d'una llengua en una mateixa conversa. Aquestes pra¿ctiques comunicatives han estat tractades en molts contexts, llengu¿es i contacte entre cultures. No obstant aixo¿, no hi ha estudis sobre el tema en el context eslovac, d'aqui' la relleva¿ncia del treball recollit en aquesta tesi doctoral que te' com a finalitat u'ltima pal.liar l'escassetat d'estudis lingu¿i'stics sobre el tema. En concret, aci' analitzem el canvi de codi exclusivament en comunicacions realitzades per correu electro¿nic en un entorn laboral multilingu¿e i multicultural. L'estudi se centra en l'eleccio' de llengua, les actituds cap a una llengua i la mescla de llengu¿es en les comunicacions entre els companys de treball d'una multinacional hotelera radicada a Eslova¿quia. L'ana¿lisi examina u'nicament els correus escrits en eslovac que presenten canvis de codi a l'angle's. A causa de la naturalesa multidisciplina¿ria d'aquest projecte d'investigacio', aixi' com al seu cara¿cter dual, e's a dir, l'examen de les actituds cap a una llengua per una part i les pra¿ctiques de canvi de codi per altra, aquesta tesi planteja diverses preguntes d'investigacio' i te' una se¿rie d'objectius que detallarem a continuacio'. L'objectiu principal de l'estudi quantitatiu basat en el qu¿estionari dissenyat e's examinar fins a quin punt les persones enquestades so'n conscients del canvi de codi a l'angle's durant les seues comunicacions, en particular durant les seues converses electro¿niques, i determinar les raons per les quals es duu a terme aquest canvi de codi, traient a la llum les actituds cap a aquest fenomen lingu¿i'stic. A me's, ate's que els coneixements que aporten els qu¿estionaris so'n generalment limitats, s'ha realitzat una ana¿lisi del discurs per a observar me's detalladament l'abast dels canvis de llengua, els tipus i les funcions que representen. La metodologia emprada, que segueix el me¿tode mixt d'investigacio', s'utilitza per a analitzar les motivacions i les raons per les quals els nostres participants prefereixen fer u's de l'angle's en comptes de la seua llengua nativa. Per tant, aquesta tesi doctoral recull la primera ana¿lisi completa d'aquest tipus sobre el canvi de codi eslovac/angle's en la comunicacio' electro¿nica que examina converses aute¿ntiques per Internet dins d'una corporacio'. Paraules clau: canvi de codi, comunicacio' electro¿nica, correu electro¿nic, comunicacio' en entorns laborals, actituds cap a la llengua.[EN] Code-switching (CS) is subject to the wide range of interrelations between medium and situation factors. Generally, from a linguistic point of view, CS occurs when a speaker alternates between two or more languages, or language varieties, in the course of a single conversation. The practice has been noticed all around the world in many contexts, language and culture contact situations. Hence, based on earlier studies of CS phenomenon, but shifting towards a more specific environment, the workplace, the present study aims to fill a considerable gap in scholarly knowledge about the online/ written CS practices of Slovak native speakers in the context of workplace email communication. Therefore, the study focuses on language choice, language attitudes and CS practices among colleagues in a multilingual workplace environment of a multinational hospitality company in Slovakia, focusing solely on the participants' workplace interactions, in particular their email messages written in Slovak (the national language) with switches to English. Due to the interdisciplinary nature of this research project, as well as its dual focus on language attitudes on the one hand and actual CS practices on the other, this thesis addresses a number of research questions and provides a series of analyses centring around the following objectives. The main focus of the quantitative, questionnaire- based study is to examine the participants' metalinguistic awareness of the extent of switching to English during their communication (particularly focusing on their CMC interactions) and to determine their reasons for doing so, while uncovering the attitudes they hold towards this phenomenon. Furthermore, as the depth of knowledge obtained through questionnaire survey is limited, the corpus analysis of email interactions is conducted in order to investigate more closely the extent of switching and the types, forms and functions of CS involved. Employing a mixed method approach in the process, motivations and reasons why our participants choose English over their native language are examined. Hence, the study represents the first comprehensive analysis of its kind on Slovak-English CS in CMC using authentic naturally-occurring computer-mediated corporate interactions. Keywords: code-switching, CMC, email, workplace communication, attitudesThe traineeship was partly funded by Erasmus+ grant, for which I am grateful.Lengyelová, A. (2019). Associates or zamestnanci? Language choice, attitudes and code-switching practices: The case of workplace email communication in Slovakia [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/124352TESI
    corecore