9,701 research outputs found

    Categorization of unorganized text corpora for better domain-specific language modeling

    Get PDF
    This paper describes the process of categorization of unorganized text data gathered from the Internet to the in-domain and out-of-domain data for better domain-specific language modeling and speech recognition. An algorithm for text categorization and topic detection based on the most frequent key phrases is presented. In this scheme, each document entered into the process of text categorization is represented by a vector space model with term weighting based on computing the term frequency and inverse document frequency. Text documents are then classified to the in-domain and out-of-domain data automatically with predefined threshold using one of the selected distance/similarity measures comparing to the list of key phrases. The experimental results of the language modeling and adaptation to the judicial domain show significant improvement in the model perplexity about 19 % and decreasing of the word error rate of the Slovak transcription and dictation system about 5,54 %, relatively

    Information in the Context of Philosophy and Cognitive Sciences

    Get PDF
    This textbook briefly maps as many as possible areas and contexts in which information plays an important role. It attempts an approach that also seeks to explore areas of research that are not commonly associated, such as informatics, information and library science, information physics, or information ethics. Given that the text is intended especially for students of the Master's Degree in Cognitive Studies, emphasis is placed on a humane, philosophical and interdisciplinary approach. It offers rather directions of thought, questions, and contexts than a complete theory developed into mathematical and technical details

    A Multiplicative Model for Learning Distributed Text-Based Attribute Representations

    Full text link
    In this paper we propose a general framework for learning distributed representations of attributes: characteristics of text whose representations can be jointly learned with word embeddings. Attributes can correspond to document indicators (to learn sentence vectors), language indicators (to learn distributed language representations), meta-data and side information (such as the age, gender and industry of a blogger) or representations of authors. We describe a third-order model where word context and attribute vectors interact multiplicatively to predict the next word in a sequence. This leads to the notion of conditional word similarity: how meanings of words change when conditioned on different attributes. We perform several experimental tasks including sentiment classification, cross-lingual document classification, and blog authorship attribution. We also qualitatively evaluate conditional word neighbours and attribute-conditioned text generation.Comment: 11 pages. An earlier version was accepted to the ICML-2014 Workshop on Knowledge-Powered Deep Learning for Text Minin

    Civil Society in the 'Visegrad Four': Data and Literature in the Czech Republic, Hungary, Poland and Slovakia

    Get PDF
    The first of three publications on the '25 Years After -- Mapping Civil Society in the Visegrád Four' project contains an overview of existing data and literature in the Czech Republic, Hungary, Poland and Slovakia. It looks at where and what kind of research on civil society has been and is being done, who is doing it and where the gaps are.To be consistent and comparable, the four country reports include the same core sections: relevant publications on civil society in the respective country; existing databases and other data sources; active centres of research, training, and policy studies. More than providing just a list, this report looks at how they can be evaluated in terms of scope, accurateness and depth. Finally, it considers the question of what the most crucial gaps in research and funding in the countries are.An academic volume is slated for the end of 2014. For other publications in English and German, see www.maecenata.eu

    Associates or zamestnanci? Language choice, attitudes and code-switching practices: The case of workplace email communication in Slovakia

    Full text link
    [ES] El cambio de co'digo esta' sujeto a una gran variedad de factores que dependen del medio de comunicacio'n y de la situacio'n comunicativa. En general, segu'n la lingu¿i'stica, el cambio de co'digo ocurre cuando un hablante alterna entre dos o ma's lenguas o variedades de una lengua en una misma conversacio'n. Estas pra'cticas comunicativas se han tratado en muchos contextos, lenguas y contacto entre culturas. Sin embargo, no hay estudios sobre el tema en el contexto eslovaco, de ahi' la relevancia del trabajo recogido en esta tesis doctoral que tiene como fin u'ltimo paliar esta escasez de estudios lingu¿i'sticos. En concreto, aqui' analizamos el cambio de co'digo exclusivamente en comunicaciones realizadas por correo electro'nico en un entorno laboral multilingu¿e y multicultural. El estudio se centra en la eleccio'n de lengua, las actitudes hacia una lengua y la mezcla de lenguas en las comunicaciones entre los compan¿eros de trabajo de una multinacional hotelera radicada en Eslovaquia. El ana'lisis examina u'nicamente los correos escritos en eslovaco que presentan cambios de co'digo al ingle's. Debido a la naturaleza multidisciplinar de este proyecto de investigacio'n, asi' como a su cara'cter dual, es decir, el examen de las actitudes hacia una lengua por una parte y las pra'cticas de cambio de co'digo por otra, esta tesis plantea varias preguntas de investigacio'n y tiene una serie de objetivos que pasamos a detallar. El objetivo principal del estudio cuantitativo basado en el cuestionario disen¿ado es examinar hasta que' punto los encuestados son conscientes de los cambios de co'digo al ingle's durante sus comunicaciones, en particular durante sus conversaciones electro'nicas, y determinar las razones por las que se lleva a cabo este cambio de co'digo, sacando a la luz las actitudes hacia este feno'meno lingu¿i'stico. Adema's, dado que los conocimientos que aportan los cuestionarios son generalmente limitados, se ha llevado a cabo un ana'lisis del discurso para observar ma's detalladamente el alcance de los cambios de lengua, los tipos y las funciones que presentan. La metodologi'a empleada, que sigue el me'todo mixto de investigacio'n, se utiliza para analizar las motivaciones y las razones por las que nuestros participantes prefieren usar el ingle's en lugar de su lengua nativa. Asi' pues, esta tesis doctoral recoge el primer ana'lisis completo de este tipo sobre el cambio de co'digo eslovaco/ingle's en la comunicacio'n electro'nica que examina conversaciones aute'nticas por Internet dentro de una corporacio'n. Palabras clave: cambio de co'digo, comunicacio'n electro'nica, correo electro'nico, comunicacio'n en entornos laborales, actitudes hacia la lengua[CA] El canvi de codi esta¿ subjecte a una gran varietat de factors que depenen del mitja¿ de comunicacio' i de la situacio' comunicativa. En general, segons la lingu¿i'stica, el canvi de codi ocorre quan un parlant alterna entre dues o me's llengu¿es o varietats d'una llengua en una mateixa conversa. Aquestes pra¿ctiques comunicatives han estat tractades en molts contexts, llengu¿es i contacte entre cultures. No obstant aixo¿, no hi ha estudis sobre el tema en el context eslovac, d'aqui' la relleva¿ncia del treball recollit en aquesta tesi doctoral que te' com a finalitat u'ltima pal.liar l'escassetat d'estudis lingu¿i'stics sobre el tema. En concret, aci' analitzem el canvi de codi exclusivament en comunicacions realitzades per correu electro¿nic en un entorn laboral multilingu¿e i multicultural. L'estudi se centra en l'eleccio' de llengua, les actituds cap a una llengua i la mescla de llengu¿es en les comunicacions entre els companys de treball d'una multinacional hotelera radicada a Eslova¿quia. L'ana¿lisi examina u'nicament els correus escrits en eslovac que presenten canvis de codi a l'angle's. A causa de la naturalesa multidisciplina¿ria d'aquest projecte d'investigacio', aixi' com al seu cara¿cter dual, e's a dir, l'examen de les actituds cap a una llengua per una part i les pra¿ctiques de canvi de codi per altra, aquesta tesi planteja diverses preguntes d'investigacio' i te' una se¿rie d'objectius que detallarem a continuacio'. L'objectiu principal de l'estudi quantitatiu basat en el qu¿estionari dissenyat e's examinar fins a quin punt les persones enquestades so'n conscients del canvi de codi a l'angle's durant les seues comunicacions, en particular durant les seues converses electro¿niques, i determinar les raons per les quals es duu a terme aquest canvi de codi, traient a la llum les actituds cap a aquest fenomen lingu¿i'stic. A me's, ate's que els coneixements que aporten els qu¿estionaris so'n generalment limitats, s'ha realitzat una ana¿lisi del discurs per a observar me's detalladament l'abast dels canvis de llengua, els tipus i les funcions que representen. La metodologia emprada, que segueix el me¿tode mixt d'investigacio', s'utilitza per a analitzar les motivacions i les raons per les quals els nostres participants prefereixen fer u's de l'angle's en comptes de la seua llengua nativa. Per tant, aquesta tesi doctoral recull la primera ana¿lisi completa d'aquest tipus sobre el canvi de codi eslovac/angle's en la comunicacio' electro¿nica que examina converses aute¿ntiques per Internet dins d'una corporacio'. Paraules clau: canvi de codi, comunicacio' electro¿nica, correu electro¿nic, comunicacio' en entorns laborals, actituds cap a la llengua.[EN] Code-switching (CS) is subject to the wide range of interrelations between medium and situation factors. Generally, from a linguistic point of view, CS occurs when a speaker alternates between two or more languages, or language varieties, in the course of a single conversation. The practice has been noticed all around the world in many contexts, language and culture contact situations. Hence, based on earlier studies of CS phenomenon, but shifting towards a more specific environment, the workplace, the present study aims to fill a considerable gap in scholarly knowledge about the online/ written CS practices of Slovak native speakers in the context of workplace email communication. Therefore, the study focuses on language choice, language attitudes and CS practices among colleagues in a multilingual workplace environment of a multinational hospitality company in Slovakia, focusing solely on the participants' workplace interactions, in particular their email messages written in Slovak (the national language) with switches to English. Due to the interdisciplinary nature of this research project, as well as its dual focus on language attitudes on the one hand and actual CS practices on the other, this thesis addresses a number of research questions and provides a series of analyses centring around the following objectives. The main focus of the quantitative, questionnaire- based study is to examine the participants' metalinguistic awareness of the extent of switching to English during their communication (particularly focusing on their CMC interactions) and to determine their reasons for doing so, while uncovering the attitudes they hold towards this phenomenon. Furthermore, as the depth of knowledge obtained through questionnaire survey is limited, the corpus analysis of email interactions is conducted in order to investigate more closely the extent of switching and the types, forms and functions of CS involved. Employing a mixed method approach in the process, motivations and reasons why our participants choose English over their native language are examined. Hence, the study represents the first comprehensive analysis of its kind on Slovak-English CS in CMC using authentic naturally-occurring computer-mediated corporate interactions. Keywords: code-switching, CMC, email, workplace communication, attitudesThe traineeship was partly funded by Erasmus+ grant, for which I am grateful.Lengyelová, A. (2019). Associates or zamestnanci? Language choice, attitudes and code-switching practices: The case of workplace email communication in Slovakia [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/124352TESI
    corecore