859 research outputs found

    Detecting New Word Meanings: A Comparison of Word Embedding Models in Spanish

    Full text link
    Semantic neologisms (SN) are defined as words that acquire a new word meaning while maintaining their form. Given the nature of this kind of neologisms, the task of identifying these new word meanings is currently performed manually by specialists at observatories of neology. To detect SN in a semi-automatic way, we developed a system that implements a combination of the following strategies: topic modeling, keyword extraction, and word sense disambiguation. The role of topic modeling is to detect the themes that are treated in the input text. Themes within a text give clues about the particular meaning of the words that are used, for example: viral has one meaning in the context of computer science (CS) and another when talking about health. To extract keywords, we used TextRank with POS tag filtering. With this method, we can obtain relevant words that are already part of the Spanish lexicon. We use a deep learning model to determine if a given keyword could have a new meaning. Embeddings that are different from all the known meanings (or topics) indicate that a word might be a valid SN candidate. In this study, we examine the following word embedding models: Word2Vec, Sense2Vec, and FastText. The models were trained with equivalent parameters using Wikipedia in Spanish as corpora. Then we used a list of words and their concordances (obtained from our database of neologisms) to show the different embeddings that each model yields. Finally, we present a comparison of these outcomes with the concordances of each word to show how we can determine if a word could be a valid candidate for SN.Comment: 16 pages, 3 figure

    CLIR teknikak baliabide urriko hizkuntzetarako

    Get PDF
    152 p.Hizkuntza arteko informazioaren berreskurapenerako sistema bat garatxean kontsulta itzultzea da hizkuntzaren mugari aurre egiteko hurbilpenik erabiliena. Kontsulta itzultzeko estrategia arrakastatsuenak itzulpen automatikoko sistem aedo corpus paraleloetan oinarritzen dira, baina baliabide hauek urriak dira baliabide urriko hizkuntzen eszenatokietan. Horrelako egoeretan egokiagoa litzateke eskuragarriago diren baliabideetan oinarritutako komtsulta itzultzeko estrategia bat. Tesi honetan frogatu nahi dugu baliabide nagusi horiek hiztegi elebiduna eta horren osagarri diren corpus konparagarriak eta kontsulta-saioak izan daitezkeela. // Hizkuntza arteko informazioaren berreskurapenerako sistema bat garatxean kontsulta itzultzea da hizkuntzaren mugari aurre egiteko hurbilpenik erabiliena. Kontsulta itzultzeko estrategia arrakastatsuenak itzulpen automatikoko sistem aedo corpus paraleloetan oinarritzen dira, baina baliabide hauek urriak dira baliabide urriko hizkuntzen eszenatokietan. Horrelako egoeretan egokiagoa litzateke eskuragarriago diren baliabideetan oinarritutako komtsulta itzultzeko estrategia bat. Tesi honetan frogatu nahi dugu baliabide nagusi horiek hiztegi elebiduna eta horren osagarri diren corpus konparagarriak eta kontsulta-saioak izan daitezkeela

    CLIR teknikak baliabide urriko hizkuntzetarako

    Get PDF
    152 p.Hizkuntza arteko informazioaren berreskurapenerako sistema bat garatxean kontsulta itzultzea da hizkuntzaren mugari aurre egiteko hurbilpenik erabiliena. Kontsulta itzultzeko estrategia arrakastatsuenak itzulpen automatikoko sistem aedo corpus paraleloetan oinarritzen dira, baina baliabide hauek urriak dira baliabide urriko hizkuntzen eszenatokietan. Horrelako egoeretan egokiagoa litzateke eskuragarriago diren baliabideetan oinarritutako komtsulta itzultzeko estrategia bat. Tesi honetan frogatu nahi dugu baliabide nagusi horiek hiztegi elebiduna eta horren osagarri diren corpus konparagarriak eta kontsulta-saioak izan daitezkeela. // Hizkuntza arteko informazioaren berreskurapenerako sistema bat garatxean kontsulta itzultzea da hizkuntzaren mugari aurre egiteko hurbilpenik erabiliena. Kontsulta itzultzeko estrategia arrakastatsuenak itzulpen automatikoko sistem aedo corpus paraleloetan oinarritzen dira, baina baliabide hauek urriak dira baliabide urriko hizkuntzen eszenatokietan. Horrelako egoeretan egokiagoa litzateke eskuragarriago diren baliabideetan oinarritutako komtsulta itzultzeko estrategia bat. Tesi honetan frogatu nahi dugu baliabide nagusi horiek hiztegi elebiduna eta horren osagarri diren corpus konparagarriak eta kontsulta-saioak izan daitezkeela

    Climate change and ‘climategate’ in online reader comments: a mixed methods study

    Get PDF
    Climate change has rarely been out of the public spotlight in the first decade of this century. The high-profile international meetings and controversies such as ‘climategate’ have highlighted the fact that it is as much a political issue as it is a scientific one, while also drawing our attention to the role of social media in reflecting, promoting or resisting such politicisation. In this article, we propose a framework for analysing one type of social media venue that so far has received little attention from social scientists – online reader comments. Like media reporting on climate change, reader comments on this reporting contribute to the diverse, complex and contested discourses on climate change, and can reveal the meanings and discursive resources brought to the ongoing debate by laypeople rather than political elites. The proposed framework draws on research in computer- mediated communication, corpus linguistics and discourse analysis and takes into account both the content of such ‘lay talk’ and its linguistic characteristics within the specific parameters of the web-based context. Using word frequencies, qualitative study of co-text and user ratings, we analyse a large volume of comments published on the UK tabloid newspaper website at two different points in time – before and after the East Anglia controversy. The results reveal how stereotypes of science and politics are appropriated in this type of discourse, how readers’ constructions of climate science have changed after ‘climategate’, and how climate-sceptic arguments are adopted and contested in computer-mediated peer-to-peer interaction

    Climate change and 'climategate' in online reader comments: a mixed methods study

    Get PDF
    Climate change has rarely been out of the public spotlight in the first decade of this century. The high‐profile international meetings and controversies such as 'climategate' have highlighted the fact that it is as much a political issue as it is a scientific one, while also drawing our attention to the role of social media in reflecting, promoting or resisting such politicisation. In this article, we propose a framework for analysing one type of social media venue that so far has received little attention from social scientists – online reader comments. Like media reporting on climate change, reader comments on this reporting contribute to the diverse, complex and contested discourses on climate change, and can reveal the meanings and discursive resources brought to the ongoing debate by laypeople rather than political elites. The proposed framework draws on research in computer‐mediated communication, corpus linguistics and discourse analysis and takes into account both the content of such 'lay talk' and its linguistic characteristics within the specific parameters of the web‐based context. Using word frequencies, qualitative study of co‐text and user ratings, we analyse a large volume of comments published on the UK tabloid newspaper website at two different points in time – before and after the East Anglia controversy. The results reveal how stereotypes of science and politics are appropriated in this type of discourse, how readers' constructions of climate science have changed after 'climategate', and how climate‐sceptic arguments are adopted and contested in computer‐mediated peer‐to‐peer interaction

    Applying Deep Learning Techniques for Sentiment Analysis to Assess Sustainable Transport

    Get PDF
    Users voluntarily generate large amounts of textual content by expressing their opinions, in social media and specialized portals, on every possible issue, including transport and sustainability. In this work we have leveraged such User Generated Content to obtain a high accuracy sentiment analysis model which automatically analyses the negative and positive opinions expressed in the transport domain. In order to develop such model, we have semiautomatically generated an annotated corpus of opinions about transport, which has then been used to fine-tune a large pretrained language model based on recent deep learning techniques. Our empirical results demonstrate the robustness of our approach, which can be applied to automatically process massive amounts of opinions about transport. We believe that our method can help to complement data from official statistics and traditional surveys about transport sustainability. Finally, apart from the model and annotated dataset, we also provide a transport classification score with respect to the sustainability of the transport types found in the use case dataset.This work has been partially funded by the Spanish Ministry of Science, Innovation and Universities (DeepReading RTI2018-096846-B-C21, MCIU/AEI/FEDER, UE), Ayudas Fundación BBVA a Equipos de Investigación Científica 2018 (BigKnowledge), DeepText (KK-2020/00088), funded by the Basque Government and the COLAB19/19 project funded by the UPV/EHU. Rodrigo Agerri is also funded by the RYC-2017-23647 fellowship and acknowledges the donation of a Titan V GPU by the NVIDIA Corporation

    MEDIA DISCOURSE AND INTERNATIONAL POLITICS: FOCUS ON THE REPRESENTATION OF IRAN'S NUCLEAR PROGRAM IN THE MEDIA

    Get PDF
    Abstract- After the 1979 Revolution and the establishment of the Islamic Republic government in Iran and the change in political perspectives, the nuclear programme of Iran changed into a controversial issue between Iran and the West. Iran\u2019s nuclear programme was featured as one of the most important topics in newspapers and the media since then. It was told that Iran and the West (esp. US) have been on the edge of a war before The Joint Comprehensive Plan of Action (JCPOA) so the agreement signed in Vienna on 14 July 2015 between Iran and the P5+1 on Iran\u2019s nuclear programme was a real turning point since the beginning of the controversy. In this PhD thesis I looked at the role of language and media discourse in the coverage and representation of events related to Iran\u2019s nuclear programme from January to July 2015, during Barack Obama\u2019s administration, which brought about an important change in the attitude towards Iran\u2019s nuclear programme in both the Iranian and the Western media. The main questions this study aimed to seek a reply are:1) How were events, players and policies related to the Iranian nuclear programme question portrayed in the media, in Iran and the West respectively in the period under consideration? 2) What was the role of language and media discourse in the dissemination of information about events related to Iran\u2019s nuclear programme in both the Iranian and the Western printed media? 3) How the main actors and the most important themes were presented in the Persian and the Western newspaper articles on Iranian nuclear discourse? 4) To what extent did the views put forth by the two sides of the negotiations differ in the six months before the JCPOA? Unlike the previous studies, for the English language corpus, the articles of both English and American newspapers have been selected to widen the spectrum of the viewpoint of the West regarding to the issue: \u201cThe Washington Post\u201d, \u201cThe New York Times\u201d (two American newspapers), \u201cThe Guardian\u201d and \u201cThe Times\u201d (two British newspapers). For the Persian language corpus I selected four significant newspapers in Iran form two important political parties (two reformist newspapers \u201cShargh\u201d \u634\u631\u642 and \u201cEtemad\u201d \u627\u639\u62a\u645\u627\u62f, and two principalist newspapers \u201cKeyhan\u201d \u6a9\u6cc\u647\u627\u646 and \u201cResalat\u201d \u631\u633\u627\u644\u62a). The Western corpus has been collected from Lexis-Nexis which provides electronic accessibility to legal and journalistic documents. The Persian News articles were gathered from an archive of Iranian press called \u201cMagiran\u201d www.magiran.com. I used critical discourse analysis as an interdisciplinary approach which is very popular to study news discourse and generally to the study of discourse that views \u201clanguage as social practice\u201d (Fairclough and Wodak 1997). At the same time I profit important benefit of the corpus-based approach to discourse analysis which is reducing research bias. The study provides confirmation that underlying ideological filters frame the news, working most often as an invisible hand, which makes every media text biased even when ostensibly reporting the same facts, in different language versions. The use of different linguistic resources frames the same news event differently, relying on different assumptions and triggering different interpretations. The discourse is clearly differentiated as for example the headlines differ in length, thematic structure and quantity of information given, lexical choices & syntactic and functional structures, depending on the assumed background information of the two audiences. The idea that recourse to hedges and boosters varies in different cultures can be confirmed considering the overuse of hedges and boosters by the Western journalists. All in all, this PhD. thesis confirms the impact of culture and national identity on political practices in mass media

    Language games and nature: a corpus-based analysis of ecological discourse

    Get PDF
    This dissertation approaches environmental discourse from the perspective of intercultural communication research. As a discipline, intercultural communication has encompassed a range of analytical levels, from micro-analysis of everyday communicative interactions to the macro-level structural factors that were brought into light by the critical turn. In light of planetary environmental issues, some researchers have called for an “ecological turn” as a new research paradigm. However, the complexity of integrating communication, culture, and the natural world into a coherent research program poses significant conceptual and methodological challenges. This dissertation seeks to provide both a methodological and conceptual framework for discourse at the interface of human cultures and the natural world
    corecore