64 research outputs found

    Automatic Speech Recognition Errors Detection Using Supervised Learning Techniques

    Get PDF
    Over the last years, many advances have been made in the field of Automatic Speech Recognition (ASR). However, the persistent presence of ASR errors is limiting the widespread adoption of speech technology in real life applications. This motivates the attempts to find alternative techniques to automatically detect and correct ASR errors, which can be very effective and especially when the user does not have access to tune the features, the models or the decoder of the ASR system or when the transcription serves as input to downstream systems like machine translation, information retrieval, and question answering. In this paper, we present an ASR errors detection system targeted towards substitution and insertion errors. The proposed system is based on supervised learning techniques and uses input features deducted only from the ASR output words and hence should be usable with any ASR system. Applying this system on TV program transcription data leads to identify 40.30% of the recognition errors generated by the ASR system

    Vertical intent prediction approach based on Doc2vec and convolutional neural networks for improving vertical selection in aggregated search

    Get PDF
    Vertical selection is the task of selecting the most relevant verticals to a given query in order to improve the diversity and quality of web search results. This task requires not only predicting relevant verticals but also these verticals must be those the user expects to be relevant for his particular information need. Most existing works focused on using traditional machine learning techniques to combine multiple types of features for selecting several relevant verticals. Although these techniques are very efficient, handling vertical selection with high accuracy is still a challenging research task. In this paper, we propose an approach for improving vertical selection in order to satisfy the user vertical intent and reduce user’s browsing time and efforts. First, it generates query embeddings vectors using the doc2vec algorithm that preserves syntactic and semantic information within each query. Secondly, this vector will be used as input to a convolutional neural network model for increasing the representation of the query with multiple levels of abstraction including rich semantic information and then creating a global summarization of the query features. We demonstrate the effectiveness of our approach through comprehensive experimentation using various datasets. Our experimental findings show that our system achieves significant accuracy. Further, it realizes accurate predictions on new unseen data

    A survey on author profiling, deception, and irony detection for the Arabic language

    Full text link
    "This is the peer reviewed version of the following article: [FULL CITE], which has been published in final form at [Link to final article using the DOI]. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Self-Archiving."[EN] The possibility of knowing people traits on the basis of what they write is a field of growing interest named author profiling. To infer a user's gender, age, native language, language variety, or even when the user lies, simply by analyzing her texts, opens a wide range of possibilities from the point of view of security. In this paper, we review the state of the art about some of the main author profiling problems, as well as deception and irony detection, especially focusing on the Arabic language.Qatar National Research Fund, Grant/Award Number: NPRP 9-175-1-033Rosso, P.; Rangel-Pardo, FM.; Hernandez-Farias, DI.; Cagnina, L.; Zaghouani, W.; Charfi, A. (2018). A survey on author profiling, deception, and irony detection for the Arabic language. Language and Linguistics Compass. 12(4):1-20. https://doi.org/10.1111/lnc3.12275S120124Abuhakema , G. Faraj , R. Feldman , A. Fitzpatrick , E. 2008 Annotating an arabic learner corpus for error Proceedings of The sixth international conference on Language Resources and Evaluation, LREC 2008Adouane , W. Dobnik , S. 2017 Identification of languages in algerian arabic multilingual documents Proceedings of The Third Arabic Natural Language Processing Workshop (WANLP)Adouane , W. Semmar , N. Johansson , R 2016a Romanized berber and romanized arabic automatic language identification using machine learning Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects; COLING 53 61Adouane , W. Semmar , N. Johansson , R. 2016b ASIREM participation at the discriminating similar languages shared task 2016 Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects; COLING 163 169Adouane , W. Semmar , N. Johansson , R. Bobicev , V. 2016c Automatic detection of arabicized berber and arabic varieties Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects; COLING 63 72Alfaifi , A. Atwell , E. Hedaya , I. 2014 Arabic learner corpus (ALC) v2: A new written and spoken corpus of Arabic learnersAlharbi , K. 2015 The irony volcano explodes black comedyAli , A. Bell , P. Renals , S. 2015 Automatic dialect detection in Arabic broadcast speechAlmeman , K. Lee , M. 2013 Automatic building of Arabic multi dialect text corpora by bootstrapping dialect words 1 6Aloshban , N. Al-Dossari , H. 2016 A new approach for group spam detection in social media for Arabic language (AGSD) 20 23Al-Sabbagh , R. Girju , R. 2012 YADAC: Yet another dialectal Arabic corpusAlsmearat , K. Al-Ayyoub , M. Al-Shalabi , R. 2014 An extensive study of the bag-of-words approach for gender identification of Arabic articlesAlsmearat , K. Shehab , M. Al-Ayyoub , M. Al-Shalabi , R. Kanaan , G. 2015 Emotion analysis of Arabic articles and its impact on identifying the authors genderArfath , P. Al-Badrashiny , M. Diab , M. El Kholy , A. Eskander , R. Habash , N. Pooleery , M. Rambow , O. Roth , R. M. 2014 MADAMIRA: A fast, comprehensive tool for morphological analysis and disambiguation of ArabicBarbieri , F. Basile , V. Croce , D. Nissim , M. Novielli , N. Patti , V. 2016 Overview of the Evalita 2016 sentiment polarity classification taskBarbieri , F. Saggion , H 2014 Modelling irony in twitter 56 64Barbieri , F. Saggion , H. Ronzano , F 2014 Modelling sarcasm in Twitter, a novel approachBasile , V. Bolioli , A. Nissim , M. Patti , V. Rosso , P. 2014 Overview of the Evalita 2014 sentiment polarity classification taskBlanchard, D., Tetreault, J., Higgins, D., Cahill, A., & Chodorow, M. (2013). TOEFL11: A CORPUS OF NON-NATIVE ENGLISH. ETS Research Report Series, 2013(2), i-15. doi:10.1002/j.2333-8504.2013.tb02331.xBosco, C., Patti, V., & Bolioli, A. (2013). Developing Corpora for Sentiment Analysis: The Case of Irony and Senti-TUT. IEEE Intelligent Systems, 28(2), 55-63. doi:10.1109/mis.2013.28Bouamor , H. Habash , N. Salameh , M. Zaghouani , W. Rambow , O. Abdulrahim , D. Oflazer , K. 2018 The MADAR Arabic Dialect Corpus and LexiconBouchlaghem , R. Elkhlifi , A. Faiz , R. 2014 Tunisian dialect Wordnet creation and enrichment using web resources and other Wordnets 104 113 https://doi.org/10.3115/v1/W14-3613Boujelbane , R. BenAyed , S. Belguith , L. H. 2013 Building bilingual lexicon to create dialect Tunisian corpora and adapt language modelCagnina L. Rosso , P 2015 Classification of deceptive opinions using a low dimensionality representationCavalli-Sforza , V. Saddiki , H. Bouzoubaa , K. Abouenour , L. Maamouri , M. Goshey , E. 2013 Bootstrapping a Wordnet for an Arabic dialect from other Wordnets and dictionary resourcesCotterell , R. Callison-Burch , C. 2014 A multi-dialect, multi-genre corpus of informal written ArabicDahlmeier , D. Tou Ng , H. Mei Wu , S. 2013 Building a large annotated corpus of learner English: the NUS corpus of learner English 22 31Darwish , K. Sajjad , H. Mubarak , H. 2014 Verifiably effective Arabic dialect identification 1465 1468Duh , K. Kirchhoff , K. 2006 Lexicon acquisition for dialectal Arabic using transductive learningElfardy , E. Diab , M. T. 2013 Sentence level dialect identification in Arabic 456 461Estival , D. Gaustad , T. Hutchinson , B. Bao-Pham , S. Radford , W. 2008 Author profiling for English and Arabic emailsFitzpatrick, E., Bachenko, J., & Fornaciari, T. (2015). Automatic Detection of Verbal Deception. Synthesis Lectures on Human Language Technologies, 8(3), 1-119. doi:10.2200/s00656ed1v01y201507hlt029Franco-Salvador, M., Rangel, F., Rosso, P., Taulé, M., & Antònia Martít, M. (2015). Language Variety Identification Using Distributed Representations of Words and Documents. Experimental IR Meets Multilinguality, Multimodality, and Interaction, 28-40. doi:10.1007/978-3-319-24027-5_3Ghosh , A. Li , G. Veale , T. Rosso , P. Shutova , E. Barnden , J. Reyes , A. 2015 Semeval-2015 task 11: Sentiment analysis of figurative language in twitter 470 478Graff , D. Maamouri , M. 2012 Developing LMF-XML bilingual dictionaries for colloquial Arabic dialects 269 274Habash , N. Khalifa , S. Eryani , F. Rambow , O. Abdulrahim , D. Erdmann , A. Saddiki , H. 2018 Unified Guidelines and Resources for Arabic Dialect OrthographyHabash , N. Rambow , O. Kiraz , G. 2005 Morphological analysis and generation for Arabic dialectsHaggan, M. (1991). Spelling errors in native Arabic-speaking English majors: A comparison between remedial students and fourth year students. System, 19(1-2), 45-61. doi:10.1016/0346-251x(91)90007-cHassan , H. Daud , N. M. 2011 Corpus analysis of conjunctions: Arabic learners difficulties with collocationsHayes-Harb, R. (2006). Native Speakers of Arabic and ESL Texts: Evidence for the Transfer of Written Word Identification Processes. TESOL Quarterly, 40(2), 321. doi:10.2307/40264525Hernández-Farías, I., Benedí, J.-M., & Rosso, P. (2015). Applying Basic Features from Sentiment Analysis for Automatic Irony Detection. Lecture Notes in Computer Science, 337-344. doi:10.1007/978-3-319-19390-8_38Hernández Fusilier, D., Montes-y-Gómez, M., Rosso, P., & Guzmán Cabrera, R. (2015). Detecting positive and negative deceptive opinions using PU-learning. Information Processing & Management, 51(4), 433-443. doi:10.1016/j.ipm.2014.11.001Karoui , J. Benamara , F. Moriceau , V. Aussenac-Gilles , N. Hadrich Belguith , L. 2015 Towards a contextual pragmatic model to detect irony in tweetsKaroui , J. Zitoune , F. B. Moriceau , V. 2017 SOUKHRIA: Towards an irony detection system for Arabic in social mediaLjubesic , N. Mikelic , N. Boras , D. 2007 Language identification: How to distinguish similar languagesLópez-Monroy, A. P., Montes-y-Gómez, M., Escalante, H. J., Villaseñor-Pineda, L., & Stamatatos, E. (2015). Discriminative subprofile-specific representations for author profiling in social media. Knowledge-Based Systems, 89, 134-147. doi:10.1016/j.knosys.2015.06.024Magdy, W., Darwish, K., & Weber, I. (2016). #FailedRevolutions: Using Twitter to study the antecedents of ISIS support. First Monday. doi:10.5210/fm.v21i2.6372Maier , W. Gomez-Rodriguez , C. 2014 Language variety identification in Spanish tweetsMalmasi , S. Dras , M. 2014 Arabic native language identificationMechti , S. Abbassi , A. Belguith , L. H. Faiz , R. 2016 An empirical method using features combination for Arabic native language identificationMukherjee, A., Liu, B., & Glance, N. (2012). Spotting fake reviewer groups in consumer reviews. Proceedings of the 21st international conference on World Wide Web - WWW ’12. doi:10.1145/2187836.2187863Proceedings of the EMNLP’2014 Workshop on Language Technology for Closely Related Languages and Language Variants. (2014). doi:10.3115/v1/w14-42Pennebaker , J. W. Chung , C. K. Ireland , M. E. Gonzales , A. L. Booth , R. J. 2007 The development and psychometric properties of LIWC2007 http://www.liwc.net/LIWC2007LanguageManual.pdf http://liwc.netPotthast , M. Rangel , F. Tschuggnall , M. Stamatatos , E. Rosso , P. Stein , B. 2017 Overview of PAN'17 G. Jones 10456 Springer, ChamRandall M. Groom , N. 2009 The BUiD Arab learner corpus: a resource for studying the acquisition of l2 English spellingRangel , F. Rosso , P. 2015 On the multilingual and genre robustness of emographs for author profiling in social media 274 280 Springer-Verlag, LNCSRangel, F., & Rosso, P. (2016). On the impact of emotions on author profiling. Information Processing & Management, 52(1), 73-92. doi:10.1016/j.ipm.2015.06.003Rangel , F. Rosso , P. Koppel , M. Stamatatos , E. Inches , G. 2013 Overview of the author profiling task at PAN 2013 P. Forner R. Navigli D. TufisRangel , F. Rosso , P. Potthast , M. Stein , B. Daelemans , W. 2015 Overview of the 3rd author profiling task at PAN 2015 L. Cappellato N. Ferro G. Jones E. San JuanRangel , F. Rosso , P. Verhoeven , B. Daelemans , W. Potthast , M. Stein , B. 2016 Overview of the 4th author profiling task at PAN 2016: Cross-genre evaluationsRefaee , E. Rieser , V. 2014 An Arabic twitter corpus for subjectivity and sentiment analysis 2268 2273Reyes, A., Rosso, P., & Buscaldi, D. (2012). From humor recognition to irony detection: The figurative language of social media. Data & Knowledge Engineering, 74, 1-12. doi:10.1016/j.datak.2012.02.005Reyes, A., Rosso, P., & Veale, T. (2012). A multidimensional approach for detecting irony in Twitter. Language Resources and Evaluation, 47(1), 239-268. doi:10.1007/s10579-012-9196-xRosso, P., & Cagnina, L. C. (2017). Deception Detection and Opinion Spam. Socio-Affective Computing, 155-171. doi:10.1007/978-3-319-55394-8_8Saâdane , H. 2015 Traitement Automatique de L'Arabe Dialectalise: Aspects Methodologiques et AlgorithmiquesSaâdane , H. Nouvel , D. Seffih , H. Fluhr , C. 2017 Une approche linguistique pour la détection des dialectes arabesSadat , F. Kazemi , F. Farzindar , A. 2014 Automatic identification of Arabic language varieties and dialects in social mediaSadhwani , P. 2005 Phonological and orthographic knowledge: An Arab-Emirati perspectiveSchler , J. Koppel , M. Argamon , S. Pennebaker , J. W. 2006 Effects of age and gender on blogging 199 205Shoufan , A. Al-Ameri , S. 2015 Natural language processing for dialectical Arabic: A surveySoliman , T. Elmasry , M. Hedar , A-R. Doss , M. 2013 MINING SOCIAL NETWORKS' ARABIC SLANG COMMENTSSulis, E., Irazú Hernández Farías, D., Rosso, P., Patti, V., & Ruffo, G. (2016). Figurative messages and affect in Twitter: Differences between #irony, #sarcasm and #not. Knowledge-Based Systems, 108, 132-143. doi:10.1016/j.knosys.2016.05.035Tetreault , J. Blanchard , D. Cahill , A. 2013 A report on the first native language identification shared task Proceedings of the 8th Workshop on Innovative Use of NLP for Building Educational Applications 48 57Tillmann , C. Mansour , S. Al Onaizan , Y. 2014 Improved sentence-level Arabic dialect classification Proceedings of the VarDia006C Workshop 110 119Tono, Y. (2012). International Corpus of Crosslinguistic Interlanguage: Project overview and a case study on the acquisition of new verb co-occurrence patterns. Tokyo University of Foreign Studies, 27-46. doi:10.1075/tufs.4.07tonWahsheh , H. A. Al-Kabi , M. N. Alsmadi , I. M. 2013b SPAR: A system to detect spam in Arabic opinionsZaghouani , W. Charfi , A. 2018a Arap-Tweet: A Large Multi-Dialect Twitter Corpus for Gender, Age and Language Variety Identification Miyazaki, JapanZaghouani , W. Charfi , A. 2018b Guidelines and Annotation Framework for Arabic Author Profiling Miyazaki, JapanZaghouani , W. Mohit , B. Habash , N. Obeid , O. Tomeh , N. Rozovskaya , A. Farra , N. Alkuhlani , S. Oflazer , K. 2014 Large scale Arabic error annotation: Guidelines and frameworkZaghouani , W. Habash , N. Bouamor , H. Rozovskaya , A. Mohit , B. Heider , A. Oflazer , K. 2015 Correction annotation for non-native Arabic texts: Guidelines and corpus Proceedings of the Association for Computational Linguistics, Fourth Linguistic Annotation Workshop 129 139Zaidan , O. F. Callison-Burch , C 2011 The Arabic online commentary dataset: An annotated dataset of informal Arabic with high dialectal content Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers -Volume 2 Association for Computational Linguistics 37 41Zaidan, O. F., & Callison-Burch, C. (2014). Arabic Dialect Identification. Computational Linguistics, 40(1), 171-202. doi:10.1162/coli_a_00169Zampieri , M. Gebre , B. G. 2012 Automatic identification of language varieties: The case of PortugueseZampieri , M. Tan , L. Ljubesic , N. Tiedemann , J. 2014 A report on the DSL shared task 2014 Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects 58 67Zampieri , M. Tan , L. Ljubesic , N. Tiedemann , J. Nakov , P. 2015 Overview of the DSL shared task 2015 1Zbib , R. Malchiodi , E. Devlin , J. Stallard , D. Matsoukas , S. Schwartz , R. Makhoul , J. Zaidan , O. F. Callison Burch , C. 2012 Machine translation of Arabic dialects Proceedings of the 2012 conference of the North American chapter of the Association for Computational Linguistics: Human language technologies Association for Computational Linguistics 49 5

    Survey on IoT: Security Threats and Applications

    Get PDF
    the rapid growth of the internet of things (IoT) in the world in recent years is due to its wide range of usability, adaptability, and smartness. Most of the IoT applications are performing jobs an automatic manner without interactions of human or physical objects. It’s required that the current and upcoming devices will be smart, efficient and able to provide the services to the users to implement such a new technology with a secure manner. Thus the security issues are exploring day by day by the researchers. IoT devices are most portable and light in nature so it has several issues such as battery consumption, memory, and as these devices are working open range so the most important is security. In this survey paper, we have elaborated on the security attacks with reference to the different kinds of IoT layers. In the last, we have presented some of the applications of the IoT. This study will provide assistance to the researchers and manufacturers to evaluate and decrease the attacks range on IoT devices

    Smart Learning

    Get PDF
    Artificial intelligence applied to the educational field has a vast potential, especially after the e ects worldwide of the COVID-19 pandemic. Online or blended educational modes are needed to respond to the health situation we are living in. The tutorial e ort is higher than in the traditional face-to-face approach. Thus, educational systems are claiming smarter learning technologies that do not pretend to substitute the faculty but make their teaching activities easy. This Special Issue is oriented to present a collection of papers of original advances in educational applications and services propelled by artificial intelligence, big data, machine learning, and deep learning

    A study on IoT-related security issues, challenges, and solutions.

    Get PDF
    The Internet of Things is now being developed to be the most cutting-edge and user-centric technology in the works. Raising both an individual\u27s and society\u27s level of life is the goal of this endeavour. When a technology advances, it always acquires certain flaws, which are always open to being attacked and taken advantage of in some manner. In this work, the problems posed by the Internet of Things (IoT) based on the fundamental security principles of confidentiality, integrity, and availability are discussed. It has also been discussed how an overview of the security restrictions, requirements, processes, and solutions implemented for the challenges generated in secured communication inside the IoT ecosystem. In this paper, the vulnerabilities of the underlying Internet of Things network are brought to light, and many security concerns on multiple tiers of the Internet of Things ecosystem have been examined. Based on the findings of our research into the vulnerabilities that are now present, a variety of potential solutions have been proposed in order to solve the ongoing problems that are plaguing the IoT ecosystem. In addition to that, it provides an overview of the various protocols that are used for security in IoT

    A novel approach for partial shape matching and similarity based on data envelopment analysis

    Get PDF
    Due to the growing number of 3D objects in digital libraries, the task of searching and browsing models in an extensive 3D database has been the focus of considerable research in the area. In the last decade, several approaches to retrieve 3D models based on shape similarity have been proposed. The majority of the existing methods addresses the problem of similarity between objects as a global matching problem. Consequently, most of these techniques do not support a part of the object as a query, in addition to their poor performance for classes with globally non-similar shape models and also for articulated objects. The partial matching technique seems to be a suitable solution to these problems. In this paper, we address the problem of shape matching and retrieval. We propose a new approach based on partial matching in which each 3D object is segmented into its constituent parts, and shape descriptors are computed from these elements to compare similarities. Several experiments investigated that our technique enables fast computing for content-based 3D shape retrieval and significantly improves the results of our method based on Data Envelopment Analysis descriptor for global matching

    Blockchain Challenges and Security Schemes: A Survey

    Get PDF
    International audienceWith the increasing number of connected devices and the number of online transactions today, managing all these transactions and devices and maintaining network security is a research issue. Current solutions are mainly based on cloud computing infrastructures, which require servers high-end and broadband networks to provide data storage and computing services. These solutions have a number of significant disadvantages, such as high maintenance costs of centralized servers, critical weakness of Internet Of Things applications, security and trust issues, etc. The blockchain is seen as a promising technique for addressing the mentioned security issues and design new decentralization frameworks. However, this new technology has a great potential in the most diverse technological fields. In this paper, we focus on presenting an overview of blockchain technology, highlighting its advantages, limitations and areas of application. The originality of this work resides in the comparison between the different blockchain systems and their security schemes and the perspective of integrating this technology into secured systems models for our comfort and our private life
    corecore