1,290 research outputs found

    Monitoring English Sandhi Linking – A Study of Polish Listeners’ L2 Perception

    Get PDF
    This paper presents a set of word monitoring experiments with Polish learners of English. Listeners heard short recordings of native English speech, and were instructed to respond when they recognized an English target word that had been presented on a computer screen. Owing to phonological considerations, we compared reaction times to two types of vowel-initial words, which had been produced either with glottalization, or had been joined via sandhi linking processes to the preceding word. Results showed that the effects of the glottalization as a boundary cue were less robust than expected. Implications of these findings for models of L2 speech are discussed. It is suggested that the prevalence of glottalization in L1 production makes listeners less sensitive to its effects as a boundary cue in L2

    Blog Style Classification: Refining Affective Blogs

    Get PDF
    In the constantly growing blogosphere with no restrictions on form or topic, a number of writing styles and genres have emerged. Recognition and classification of these styles has become significant for information processing with an aim to improve blog search or sentiment mining. One of the main issues in this field is detection of informative and affective articles. However, such differentiation does not suffice today. In this paper we extend the differentiation and suggest a fine-grained set of subcategories for affective articles. We propose and evaluate a classification method employing novel lexical, morphological, lightweight syntactic and structural features of written text. The results show that our method outperforms the existing approaches

    Croatian Speech Recognition

    Get PDF

    KORPUSNI PRISTUP ENGLESKIM POSUĐENICAMA: BAZA ENGLESKIH RIJEČI U HRVATSKOME

    Get PDF
    Unadapted English loanwords have become part of informal communication in many languages, including Croatian. Their use is often motivated by the lack of adequate native equivalents, exposure to English through the media, but also by the prestigious status of the English language. A vast body of research has been dedicated to lexical borrowing, especially from English. At the same time, corpus analyses have mostly been conducted on smaller, ad hoc corpora. Therefore, the goal of this paper is to present the database of English loanwords in Croatian. The database was developed by algorithmic and manual classification of words from the Corpus of Croatian news portals, ENGRI, and provides a list of 9,452 unadapted English loanwords together with the data on their absolute and relative frequencies. The analysis showed that most loanwords (75.85%) appear less than 50 times, while a total of 44.78% of words appear 10 times or less. The biggest drop in the number of loanwords is observed in the categories of occurrence above 500, while only 27 words appear 5,000 times or more. The most frequent English loanword in the corpus is ‘show’ with 80,805 occurrences, which is 0.0122% of all words in the corpus. The analysis of loanwords that occur more than 5,000 times showed that most of them have Croatian translation equivalents, which confirms the role of the media in the introduction of new words. In addition to providing an insight into the occurrence of English loanwords in Croatian, this database also represents a valuable contribution to Croatian computational linguistics resources and enables future experimental research by providing the data on word frequency.Neprilagođene engleske posuđenice postale su dio neformalne komunikacije u mnogim jezicima, uključujući i hrvatski. Njihova je uporaba često motivirana nedostatkom odgovarajućih domaćih riječi, izloženošću engleskom jeziku kroz medije, ali i prestižnim statusom engleskog jezika. Jezično je posuđivanje česta tema jezikoslovnih istraživanja, posebice posuđivanje iz engleskog. Dosadašnji su rezultati uglavnom temeljeni na analizama manjih, ad hoc korpusa. Stoga je cilj ovoga rada predstaviti Bazu engleskih riječi u hrvatskome. Baza je nastala kao rezultat algoritamske i ručne klasifikacije posuđenica iz Korpusa novinskih portala ENGRI te donosi popis 9,452 neprilagođenih engleskih posuđenica i podatke o njihovoj pojavnosti u korpusu. Analiza dobivenih podataka pokazala je da se većina riječi (75,85%) pojavljuje manje od 50 puta, dok se ukupno 44,78% posuđenica pojavljuje 10 ili manje puta. Najveći pad u broju posuđenica primjećuje se u kategorijima pojavnosti iznad 500, dok se samo 27 posuđenica pojavljuje 5,000 puta ili više. Najčešća engleska posuđenica u navedenom korpusu je show, a pojavljuje se 80,805 puta, što je 0.0122% svih posuđenica u korpusu. Analiza posuđenica koje se pojavljuju više od 5,000 puta pokazala je da većina njih ima domaće prijevodne istovrijednice, što potvrđuje ulogu medija u uvođenju novih riječi. Osim što pruža uvid u pojavnost engleskih posuđenica u hrvatskome, ova baza predstavlja i doprinos hrvatskim računalno-jezikoslovnim resursima te omogućuje podatke potrebne za eksperimentalna istraživanja

    Towards the Global SentiWordNet

    Get PDF

    Territory, Place, and Identity in Slovak Church-state Conflict: 1948-1989

    Get PDF
    This paper focuses on the development and utilization of a conceptual framework for studying religion from a spatial perspective, drawing on themes and methodologies from human geography. The goal of this research is to help reconnect the geography of religion as a subdiscipline with broader themes in the discipline. Through an examination of Catholicism in Slovakia between 1948 and 1989, it examines how the Church utilized and organized geographic space, how it crafted a Catholic sense of place, and how the Communist government in Slovakia competed with the Church for authority and control within these spatial 'realms.' Examining issues of territoriality, power relations, and identity formation at a number of spatial scales, ranging from the local to the international, the paper attempts to show their interrelation. This project draws on a collection of primary documents obtained from state and ecclesiastic archives in Eastern Slovakia

    Transfer learning and subword sampling for asymmetric-resource one-to-many neural translation

    Get PDF
    There are several approaches for improving neural machine translation for low-resource languages: monolingual data can be exploited via pretraining or data augmentation; parallel corpora on related language pairs can be used via parameter sharing or transfer learning in multilingual models; subword segmentation and regularization techniques can be applied to ensure high coverage of the vocabulary. We review these approaches in the context of an asymmetric-resource one-to-many translation task, in which the pair of target languages are related, with one being a very low-resource and the other a higher-resource language. We test various methods on three artificially restricted translation tasks—English to Estonian (low-resource) and Finnish (high-resource), English to Slovak and Czech, English to Danish and Swedish—and one real-world task, Norwegian to North Sámi and Finnish. The experiments show positive effects especially for scheduled multi-task learning, denoising autoencoder, and subword sampling.There are several approaches for improving neural machine translation for low-resource languages: monolingual data can be exploited via pretraining or data augmentation; parallel corpora on related language pairs can be used via parameter sharing or transfer learning in multilingual models; subword segmentation and regularization techniques can be applied to ensure high coverage of the vocabulary. We review these approaches in the context of an asymmetric-resource one-to-many translation task, in which the pair of target languages are related, with one being a very low-resource and the other a higher-resource language. We test various methods on three artificially restricted translation tasks-English to Estonian (low-resource) and Finnish (high-resource), English to Slovak and Czech, English to Danish and Swedish-and one real-world task, Norwegian to North Sami and Finnish. The experiments show positive effects especially for scheduled multi-task learning, denoising autoencoder, and subword sampling.Peer reviewe

    A survey on author profiling, deception, and irony detection for the Arabic language

    Full text link
    "This is the peer reviewed version of the following article: [FULL CITE], which has been published in final form at [Link to final article using the DOI]. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Self-Archiving."[EN] The possibility of knowing people traits on the basis of what they write is a field of growing interest named author profiling. To infer a user's gender, age, native language, language variety, or even when the user lies, simply by analyzing her texts, opens a wide range of possibilities from the point of view of security. In this paper, we review the state of the art about some of the main author profiling problems, as well as deception and irony detection, especially focusing on the Arabic language.Qatar National Research Fund, Grant/Award Number: NPRP 9-175-1-033Rosso, P.; Rangel-Pardo, FM.; Hernandez-Farias, DI.; Cagnina, L.; Zaghouani, W.; Charfi, A. (2018). A survey on author profiling, deception, and irony detection for the Arabic language. Language and Linguistics Compass. 12(4):1-20. https://doi.org/10.1111/lnc3.12275S120124Abuhakema , G. Faraj , R. Feldman , A. Fitzpatrick , E. 2008 Annotating an arabic learner corpus for error Proceedings of The sixth international conference on Language Resources and Evaluation, LREC 2008Adouane , W. Dobnik , S. 2017 Identification of languages in algerian arabic multilingual documents Proceedings of The Third Arabic Natural Language Processing Workshop (WANLP)Adouane , W. Semmar , N. Johansson , R 2016a Romanized berber and romanized arabic automatic language identification using machine learning Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects; COLING 53 61Adouane , W. Semmar , N. Johansson , R. 2016b ASIREM participation at the discriminating similar languages shared task 2016 Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects; COLING 163 169Adouane , W. Semmar , N. Johansson , R. Bobicev , V. 2016c Automatic detection of arabicized berber and arabic varieties Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects; COLING 63 72Alfaifi , A. Atwell , E. Hedaya , I. 2014 Arabic learner corpus (ALC) v2: A new written and spoken corpus of Arabic learnersAlharbi , K. 2015 The irony volcano explodes black comedyAli , A. Bell , P. Renals , S. 2015 Automatic dialect detection in Arabic broadcast speechAlmeman , K. Lee , M. 2013 Automatic building of Arabic multi dialect text corpora by bootstrapping dialect words 1 6Aloshban , N. Al-Dossari , H. 2016 A new approach for group spam detection in social media for Arabic language (AGSD) 20 23Al-Sabbagh , R. Girju , R. 2012 YADAC: Yet another dialectal Arabic corpusAlsmearat , K. Al-Ayyoub , M. Al-Shalabi , R. 2014 An extensive study of the bag-of-words approach for gender identification of Arabic articlesAlsmearat , K. Shehab , M. Al-Ayyoub , M. Al-Shalabi , R. Kanaan , G. 2015 Emotion analysis of Arabic articles and its impact on identifying the authors genderArfath , P. Al-Badrashiny , M. Diab , M. El Kholy , A. Eskander , R. Habash , N. Pooleery , M. Rambow , O. Roth , R. M. 2014 MADAMIRA: A fast, comprehensive tool for morphological analysis and disambiguation of ArabicBarbieri , F. Basile , V. Croce , D. Nissim , M. Novielli , N. Patti , V. 2016 Overview of the Evalita 2016 sentiment polarity classification taskBarbieri , F. Saggion , H 2014 Modelling irony in twitter 56 64Barbieri , F. Saggion , H. Ronzano , F 2014 Modelling sarcasm in Twitter, a novel approachBasile , V. Bolioli , A. Nissim , M. Patti , V. Rosso , P. 2014 Overview of the Evalita 2014 sentiment polarity classification taskBlanchard, D., Tetreault, J., Higgins, D., Cahill, A., & Chodorow, M. (2013). TOEFL11: A CORPUS OF NON-NATIVE ENGLISH. ETS Research Report Series, 2013(2), i-15. doi:10.1002/j.2333-8504.2013.tb02331.xBosco, C., Patti, V., & Bolioli, A. (2013). Developing Corpora for Sentiment Analysis: The Case of Irony and Senti-TUT. IEEE Intelligent Systems, 28(2), 55-63. doi:10.1109/mis.2013.28Bouamor , H. Habash , N. Salameh , M. Zaghouani , W. Rambow , O. Abdulrahim , D. Oflazer , K. 2018 The MADAR Arabic Dialect Corpus and LexiconBouchlaghem , R. Elkhlifi , A. Faiz , R. 2014 Tunisian dialect Wordnet creation and enrichment using web resources and other Wordnets 104 113 https://doi.org/10.3115/v1/W14-3613Boujelbane , R. BenAyed , S. Belguith , L. H. 2013 Building bilingual lexicon to create dialect Tunisian corpora and adapt language modelCagnina L. Rosso , P 2015 Classification of deceptive opinions using a low dimensionality representationCavalli-Sforza , V. Saddiki , H. Bouzoubaa , K. Abouenour , L. Maamouri , M. Goshey , E. 2013 Bootstrapping a Wordnet for an Arabic dialect from other Wordnets and dictionary resourcesCotterell , R. Callison-Burch , C. 2014 A multi-dialect, multi-genre corpus of informal written ArabicDahlmeier , D. Tou Ng , H. Mei Wu , S. 2013 Building a large annotated corpus of learner English: the NUS corpus of learner English 22 31Darwish , K. Sajjad , H. Mubarak , H. 2014 Verifiably effective Arabic dialect identification 1465 1468Duh , K. Kirchhoff , K. 2006 Lexicon acquisition for dialectal Arabic using transductive learningElfardy , E. Diab , M. T. 2013 Sentence level dialect identification in Arabic 456 461Estival , D. Gaustad , T. Hutchinson , B. Bao-Pham , S. Radford , W. 2008 Author profiling for English and Arabic emailsFitzpatrick, E., Bachenko, J., & Fornaciari, T. (2015). Automatic Detection of Verbal Deception. Synthesis Lectures on Human Language Technologies, 8(3), 1-119. doi:10.2200/s00656ed1v01y201507hlt029Franco-Salvador, M., Rangel, F., Rosso, P., Taulé, M., & Antònia Martít, M. (2015). Language Variety Identification Using Distributed Representations of Words and Documents. Experimental IR Meets Multilinguality, Multimodality, and Interaction, 28-40. doi:10.1007/978-3-319-24027-5_3Ghosh , A. Li , G. Veale , T. Rosso , P. Shutova , E. Barnden , J. Reyes , A. 2015 Semeval-2015 task 11: Sentiment analysis of figurative language in twitter 470 478Graff , D. Maamouri , M. 2012 Developing LMF-XML bilingual dictionaries for colloquial Arabic dialects 269 274Habash , N. Khalifa , S. Eryani , F. Rambow , O. Abdulrahim , D. Erdmann , A. Saddiki , H. 2018 Unified Guidelines and Resources for Arabic Dialect OrthographyHabash , N. Rambow , O. Kiraz , G. 2005 Morphological analysis and generation for Arabic dialectsHaggan, M. (1991). Spelling errors in native Arabic-speaking English majors: A comparison between remedial students and fourth year students. System, 19(1-2), 45-61. doi:10.1016/0346-251x(91)90007-cHassan , H. Daud , N. M. 2011 Corpus analysis of conjunctions: Arabic learners difficulties with collocationsHayes-Harb, R. (2006). Native Speakers of Arabic and ESL Texts: Evidence for the Transfer of Written Word Identification Processes. TESOL Quarterly, 40(2), 321. doi:10.2307/40264525Hernández-Farías, I., Benedí, J.-M., & Rosso, P. (2015). Applying Basic Features from Sentiment Analysis for Automatic Irony Detection. Lecture Notes in Computer Science, 337-344. doi:10.1007/978-3-319-19390-8_38Hernández Fusilier, D., Montes-y-Gómez, M., Rosso, P., & Guzmán Cabrera, R. (2015). Detecting positive and negative deceptive opinions using PU-learning. Information Processing & Management, 51(4), 433-443. doi:10.1016/j.ipm.2014.11.001Karoui , J. Benamara , F. Moriceau , V. Aussenac-Gilles , N. Hadrich Belguith , L. 2015 Towards a contextual pragmatic model to detect irony in tweetsKaroui , J. Zitoune , F. B. Moriceau , V. 2017 SOUKHRIA: Towards an irony detection system for Arabic in social mediaLjubesic , N. Mikelic , N. Boras , D. 2007 Language identification: How to distinguish similar languagesLópez-Monroy, A. P., Montes-y-Gómez, M., Escalante, H. J., Villaseñor-Pineda, L., & Stamatatos, E. (2015). Discriminative subprofile-specific representations for author profiling in social media. Knowledge-Based Systems, 89, 134-147. doi:10.1016/j.knosys.2015.06.024Magdy, W., Darwish, K., & Weber, I. (2016). #FailedRevolutions: Using Twitter to study the antecedents of ISIS support. First Monday. doi:10.5210/fm.v21i2.6372Maier , W. Gomez-Rodriguez , C. 2014 Language variety identification in Spanish tweetsMalmasi , S. Dras , M. 2014 Arabic native language identificationMechti , S. Abbassi , A. Belguith , L. H. Faiz , R. 2016 An empirical method using features combination for Arabic native language identificationMukherjee, A., Liu, B., & Glance, N. (2012). Spotting fake reviewer groups in consumer reviews. Proceedings of the 21st international conference on World Wide Web - WWW ’12. doi:10.1145/2187836.2187863Proceedings of the EMNLP’2014 Workshop on Language Technology for Closely Related Languages and Language Variants. (2014). doi:10.3115/v1/w14-42Pennebaker , J. W. Chung , C. K. Ireland , M. E. Gonzales , A. L. Booth , R. J. 2007 The development and psychometric properties of LIWC2007 http://www.liwc.net/LIWC2007LanguageManual.pdf http://liwc.netPotthast , M. Rangel , F. Tschuggnall , M. Stamatatos , E. Rosso , P. Stein , B. 2017 Overview of PAN'17 G. Jones 10456 Springer, ChamRandall M. Groom , N. 2009 The BUiD Arab learner corpus: a resource for studying the acquisition of l2 English spellingRangel , F. Rosso , P. 2015 On the multilingual and genre robustness of emographs for author profiling in social media 274 280 Springer-Verlag, LNCSRangel, F., & Rosso, P. (2016). On the impact of emotions on author profiling. Information Processing & Management, 52(1), 73-92. doi:10.1016/j.ipm.2015.06.003Rangel , F. Rosso , P. Koppel , M. Stamatatos , E. Inches , G. 2013 Overview of the author profiling task at PAN 2013 P. Forner R. Navigli D. TufisRangel , F. Rosso , P. Potthast , M. Stein , B. Daelemans , W. 2015 Overview of the 3rd author profiling task at PAN 2015 L. Cappellato N. Ferro G. Jones E. San JuanRangel , F. Rosso , P. Verhoeven , B. Daelemans , W. Potthast , M. Stein , B. 2016 Overview of the 4th author profiling task at PAN 2016: Cross-genre evaluationsRefaee , E. Rieser , V. 2014 An Arabic twitter corpus for subjectivity and sentiment analysis 2268 2273Reyes, A., Rosso, P., & Buscaldi, D. (2012). From humor recognition to irony detection: The figurative language of social media. Data & Knowledge Engineering, 74, 1-12. doi:10.1016/j.datak.2012.02.005Reyes, A., Rosso, P., & Veale, T. (2012). A multidimensional approach for detecting irony in Twitter. Language Resources and Evaluation, 47(1), 239-268. doi:10.1007/s10579-012-9196-xRosso, P., & Cagnina, L. C. (2017). Deception Detection and Opinion Spam. Socio-Affective Computing, 155-171. doi:10.1007/978-3-319-55394-8_8Saâdane , H. 2015 Traitement Automatique de L'Arabe Dialectalise: Aspects Methodologiques et AlgorithmiquesSaâdane , H. Nouvel , D. Seffih , H. Fluhr , C. 2017 Une approche linguistique pour la détection des dialectes arabesSadat , F. Kazemi , F. Farzindar , A. 2014 Automatic identification of Arabic language varieties and dialects in social mediaSadhwani , P. 2005 Phonological and orthographic knowledge: An Arab-Emirati perspectiveSchler , J. Koppel , M. Argamon , S. Pennebaker , J. W. 2006 Effects of age and gender on blogging 199 205Shoufan , A. Al-Ameri , S. 2015 Natural language processing for dialectical Arabic: A surveySoliman , T. Elmasry , M. Hedar , A-R. Doss , M. 2013 MINING SOCIAL NETWORKS' ARABIC SLANG COMMENTSSulis, E., Irazú Hernández Farías, D., Rosso, P., Patti, V., & Ruffo, G. (2016). Figurative messages and affect in Twitter: Differences between #irony, #sarcasm and #not. Knowledge-Based Systems, 108, 132-143. doi:10.1016/j.knosys.2016.05.035Tetreault , J. Blanchard , D. Cahill , A. 2013 A report on the first native language identification shared task Proceedings of the 8th Workshop on Innovative Use of NLP for Building Educational Applications 48 57Tillmann , C. Mansour , S. Al Onaizan , Y. 2014 Improved sentence-level Arabic dialect classification Proceedings of the VarDia006C Workshop 110 119Tono, Y. (2012). International Corpus of Crosslinguistic Interlanguage: Project overview and a case study on the acquisition of new verb co-occurrence patterns. Tokyo University of Foreign Studies, 27-46. doi:10.1075/tufs.4.07tonWahsheh , H. A. Al-Kabi , M. N. Alsmadi , I. M. 2013b SPAR: A system to detect spam in Arabic opinionsZaghouani , W. Charfi , A. 2018a Arap-Tweet: A Large Multi-Dialect Twitter Corpus for Gender, Age and Language Variety Identification Miyazaki, JapanZaghouani , W. Charfi , A. 2018b Guidelines and Annotation Framework for Arabic Author Profiling Miyazaki, JapanZaghouani , W. Mohit , B. Habash , N. Obeid , O. Tomeh , N. Rozovskaya , A. Farra , N. Alkuhlani , S. Oflazer , K. 2014 Large scale Arabic error annotation: Guidelines and frameworkZaghouani , W. Habash , N. Bouamor , H. Rozovskaya , A. Mohit , B. Heider , A. Oflazer , K. 2015 Correction annotation for non-native Arabic texts: Guidelines and corpus Proceedings of the Association for Computational Linguistics, Fourth Linguistic Annotation Workshop 129 139Zaidan , O. F. Callison-Burch , C 2011 The Arabic online commentary dataset: An annotated dataset of informal Arabic with high dialectal content Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers -Volume 2 Association for Computational Linguistics 37 41Zaidan, O. F., & Callison-Burch, C. (2014). Arabic Dialect Identification. Computational Linguistics, 40(1), 171-202. doi:10.1162/coli_a_00169Zampieri , M. Gebre , B. G. 2012 Automatic identification of language varieties: The case of PortugueseZampieri , M. Tan , L. Ljubesic , N. Tiedemann , J. 2014 A report on the DSL shared task 2014 Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects 58 67Zampieri , M. Tan , L. Ljubesic , N. Tiedemann , J. Nakov , P. 2015 Overview of the DSL shared task 2015 1Zbib , R. Malchiodi , E. Devlin , J. Stallard , D. Matsoukas , S. Schwartz , R. Makhoul , J. Zaidan , O. F. Callison Burch , C. 2012 Machine translation of Arabic dialects Proceedings of the 2012 conference of the North American chapter of the Association for Computational Linguistics: Human language technologies Association for Computational Linguistics 49 5
    corecore