Search CORE

101 research outputs found

Prediction of Part of Speech Tags for Punjabi using Support Vector Machines

Author: Dinesh Kumar
Gurpreet Josan
Publication venue
Publication date: 01/05/2020
Field of study

Abstract: Part-of-Speech (POS

CiteSeerX

Natural language processing for similar languages, varieties, and dialects: A survey

Author: Nakov Preslav
Scherrer Yves
Zampieri Marcos
Publication venue
Publication date: 20/11/2020
Field of study

There has been a lot of recent interest in the natural language processing (NLP) community in the computational processing of language varieties and dialects, with the aim to improve the performance of applications such as machine translation, speech recognition, and dialogue systems. Here, we attempt to survey this growing field of research, with focus on computational methods for processing similar languages, varieties, and dialects. In particular, we discuss the most important challenges when dealing with diatopic language variation, and we present some of the available datasets, the process of data collection, and the most common data collection strategies used to compile datasets for similar languages, varieties, and dialects. We further present a number of studies on computational methods developed and/or adapted for preprocessing, normalization, part-of-speech tagging, and parsing similar languages, language varieties, and dialects. Finally, we discuss relevant applications such as language and dialect identification and machine translation for closely related languages, language varieties, and dialects.Non peer reviewe

Helsingin yliopiston digitaalinen arkisto

Sentiment analysis in geo social streams by using machine learning technique

Author: Alqahtani Faisal
Dawood Omar Mhaidi Dawood
Kim Hong Yeol
Kumar Rakesh
Migliorato Massimiliano
Missous Mohamed
Monteverde Umberto
Sexton James
Young Robert
Publication venue
Publication date: 02/03/2018
Field of study

Dissertation submitted in partial fulfilment of the requirements for the degree of Master of Science in Geospatial TechnologiesMassive amounts of sentiment rich data are generated on social media in the form of Tweets, status updates, blog post, reviews, etc. Different people and organizations are using these user generated content for decision making. Symbolic techniques or Knowledge base approaches and Machine learning techniques are two main techniques used for analysis sentiments from text. The rapid increase in the volume of sentiment rich data on the web has resulted in an increased interaction among researchers regarding sentiment analysis and opinion (Kaushik & Mishra, 2014). However, limited research has been conducted considering location as another dimension along with the sentiment rich data. In this work, we analyze the sentiments of Geotweets, tweets containing latitude and longitude coordinates, and visualize the results in the form of a map in real time. We collect tweets from Twitter using its Streaming API, filtered by English language and location (bounding box). For those tweets which don’t have geographic coordinates, we geocode them using geocoder from GeoPy. Textblob, an open source library in python was used to calculate the sentiments of Geotweets. Map visualization was implemented using Leaflet. Plugins for clusters, heat maps and real-time have been used in this visualization. The visualization gives an insight of location sentiments

Directory of Open Access Journals

Repositório da Universidade Nova de Lisboa

The University of Manchester - Institutional Repository

Classifier combination approach for question classification for Bengali question answering system

Author: Banerjee Somnath
Bndyopadhyay Sivaji
Kumar Naskar Sudip
Rosso Paolo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2019
Field of study

[EN] Question classification (QC) is a prime constituent of an automated question answering system. The work presented here demonstrates that a combination of multiple models achieves better classification performance than those obtained with existing individual models for the QC task in Bengali. We have exploited state-of-the-art multiple model combination techniques, i.e., ensemble, stacking and voting, to increase QC accuracy. Lexical, syntactic and semantic features of Bengali questions are used for four well-known classifiers, namely Naive Bayes, kernel Naive Bayes, Rule Induction and Decision Tree, which serve as our base learners. Single-layer question-class taxonomy with 8 coarse-grained classes is extended to two-layer taxonomy by adding 69 fine-grained classes. We carried out the experiments both on single-layer and two-layer taxonomies. Experimental results confirmed that classifier combination approaches outperform single-classifier classification approaches by 4.02% for coarse-grained question classes. Overall, the stacking approach produces the best results for fine-grained classification and achieves 87.79% of accuracy. The approach presented here could be used in other Indo-Aryan or Indic languages to develop a question answering system.Somnath Banerjee and Sudip Kumar Naskar are supported by Digital India Corporation (formerly Media Lab Asia), MeitY, Government of India, under the Visvesvaraya Ph.D. Scheme for Electronics and IT. The work of Paolo Rosso was partially funded by the Spanish MICINN under the research project PGC2018-096212-B-C31.Banerjee, S.; Kumar Naskar, S.; Rosso, P.; Bndyopadhyay, S. (2019). Classifier combination approach for question classification for Bengali question answering system. Sadhana. 44(12):1-14. https://doi.org/10.1007/s12046-019-1224-81144412Jurafsky D and Martin J H 2014 Speech and language processing. Pearson, LondonMartin J H and Jurafsky D 2000 Speech and language processing, international edition 710Voorhees E M 2002 Overview of the TREC 2001 question answering track. NIST Special Publication, pp. 42–51Hovy E, Gerber L, Hermjakob U, Lin C Y and Ravichandran D 2001 Toward semantics-based answer pinpointing. In: Proceedings of Human Language Technology Research, ACL, pp. 1–7Ittycheriah A, Franz M, Zhu W J, Ratnaparkhi A and Mammone R J 2000 IBM’s statistical question answering system. In: Proceedings of TRECMoldovan D, Paşca M, Harabagiu S and Surdeanu M 2003 Performance issues and error analysis in an open-domain question answering system. ACM Trans. Inf. Syst. 21(2): 133–154Banerjee S and Bandyopadhyay S 2012 Bengali question classification: towards developing QA system. In: Proceedings of the 3rd Workshop on South and Sotheast Asian Language Processing (SANLP), COLING, pp. 25–40Loni B 2011 A survey of state-of-the-art methods on question classification. Technical Report, Delft University of TechnologyHull D A 1999 Xerox TREC-8 question answering track report. In: Proceedings of TRECPrager J, Radev D, Brown E, Coden A and Samn V 1999 The use of predictive annotation for question answering in TREC8. Inf. Retr. 1(3): 4Moschitti A, Quarteroni S, Basili R and Manandhar S 2007 Exploiting syntactic and shallow semantic kernels for question answer classification. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, p. 776Zhang D and Lee W S 2003 Question classification using support vector machines. In: Proceedings of Research and Development in Informaion Retrieval, ACM, pp. 26–32Huang Z, Thint M and Qin Z 2008 Question classification using head words and their hypernyms. In: Proceedings of Empirical Methods in Natural Language Processing, ACL, pp. 927–936Silva J, Coheur L, Mendes A C and Wichert A 2011 From symbolic to sub-symbolic information in question classification. Artif. Intell. Rev. 35(2): 137–154Li X and Roth D 2006 Learning question classifiers: the role of semantic information. Nat. Lang. Eng. 12(03): 229–249McCallum A, Freitag D and Pereira F C N 2000 Maximum entropy markov models for information extraction and segmentation. In: Proceedings of the International Conference on Machine Learning (ICML), vol. 17, pp. 591–598Cortes C and Vapnik V 1995 Support-vector networks. Mach. Learn. 20(3): 273–297Breiman L 1996 Bagging predictors. Mach. Learn. 24(2): 123–140Clemen R T 1989 Combining forecasts: a review and annotated bibliography. Int. J. Forecast. 5(4): 559–583Perrone M P 1993 Improving regression estimation: averaging methods for variance reduction with extensions to general convex measure optimization. Ph.D. Thesis, Brown UniversityWolpert D H 1992 Stacked generalization. Neural Netw. 5(2): 241–259Hansen L K and Salamon P 1990 Neural network ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 12: 993–1001Krogh A, Vedelsby J et al 1995 Neural network ensembles, cross validation, and active learning. Adv. Neural Inf. Process. Syst. 7: 231–238Hashem S 1997 Optimal linear combinations of neural networks. Neural Netw. 10(4): 599–614Opitz D W and Shavlik J W 1996 Actively searching for an effective neural network ensemble. Connect. Sci. 8(3–4): 337–354Opitz D W and Shavlik J W 1996 Generating accurate and diverse members of a neural-network ensemble. In: Advances in neural information processing systems, pp. 535–541Xin L, Huang X J and Wu L 2006 Question classification by ensemble learning. Int. J. Comput. Sci. Netw. Secur. 6(3): 147Schapire R E 1990 The strength of weak learnability. Mach. Learn. 5(2): 197–227Brill E 1995 Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Comput. Linguist. 21(4): 543–565Jia K, Chen K, Fan X and Zhang Y 2007 Chinese question classification based on ensemble learning. In: Proceedings of ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, SNPD 2007. IEEE, vol. 3, pp. 342–347Su L, Liao H, Yu Z and Zhao Q 2009 Ensemble learning for question classification. In: Proceedings of Intelligent Computing and Intelligent Systems, ICIS. IEEE, pp. 501–505Ferrucci D, Brown E, Chu-Carroll J, Fan J et al 2010 Building Watson: an overview of the DeepQA project. AI Mag. 31(3): 59–79Pérez-Coutiño M A, Montes-y-Gómez M, López-López A and Villaseñor-Pineda L 2005 Experiments for tuning the values of lexical features in question answering for Spanish. In: CLEF Working NotesNeumann G and Sacaleanu B 2003 A cross-language question/answering system for German and English. In: Proceedings of the Workshop of the Cross-Language Evaluation Forum for European Languages, pp. 559–571Blunsom P, Kocik K and Curran J R 2006 Question classification with log-linear models. In: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, pp. 615–616Rosso P, Benajiba Y and Lyhyaoui A 2006 In: Proceedings of the 4th Conference on Scientific Research Outlook and Technology Development in the Arab World, pp. 11–14Abouenour L, Bouzoubaa K and Rosso P 2012 IDRAAQ: new Arabic question answering system based on query expansion and passage retrieval. In: Proceedings of CELCTSakai T, Saito Y, Ichimura Y, Koyama M, Kokubu T and Manabe T 2004 ASKMi: a Japanese question answering system based on semantic role analysis. In: Proceedings of Coupling Approaches, Coupling Media and Coupling Languages for Information Retrieval, pp. 215–231Isozaki H, Sudoh K and Tsukada H 2005 NTT’s Japanese–English cross-language question answering system. In: Proceedings of NTCIRYongkui Z, Zheqian Z, Lijun B and Xinqing C 2003 Internet-based Chinese question-answering system. Comput. Eng. 15: 34Sun A, Jiang M, He Y, Chen L and Yuan B 2008 Chinese question answering based on syntax analysis and answer classification. Acta Electron. Sin. 36(5): 833–839Sahu S, Vasnik N and Roy D 2012 Prashnottar: a Hindi question answering system. Int. J. Comput. Sci. Inf. Technol. 4(2): 149Nanda G, Dua M and Singla K 2016 A Hindi question answering system using machine learning approach. In: Proceedings of Computational Techniques in Information and Communication Technologies (ICCTICT). IEEE, pp. 311–314Sekine S and Grishman R 2003 Hindi–English cross-lingual question-answering system. ACM Trans. Asian Lang. Inf. Process. 2(3): 181–192Shukla P, Mukherjee A and Raina A 2004 Towards a language independent encoding of documents. In: Proceedings of NLUCS 2004, p. 116Ray S K, Ahmad A and Shaalan K 2018 A review of the state of the art in Hindi question answering systems. In: Proceedings of Intelligent Natural Language Processing: Trends and Applications, pp. 265–292Kumar P, Kashyap S, Mittal A and Gupta S 2003 A query answering system for e-learning Hindi documents. South Asian Lang. Rev. 13(1–2): 69–81Reddy R, Reddy N and Bandyopadhyay S 2006 Dialogue based question answering system in Telugu. In: Proceedings of the Workshop on Multilingual Question Answering, pp. 53–60Dhanjal G S, Sharma S and Sarao P K 2016 Gravity based Punjabi question answering system. Int. J. Comput. Appl. 147(3): 30–35Bindu M S and Mary I S 2012 Design and development of a named entity based question answering system for Malayalam language. Ph.D. Thesis, Cochin University of Science and TechnologyLee C W et al 2005 ASQA: academia sinica question answering system for NTCIR-5 CLQA. In: Proceedings of the NTCIR-5 Workshop, pp. 202–208Banerjee S and Bandyopadhyay S 2013 Ensemble approach for fine-grained question classification in Bengali. In: Proceedings of the 27th Pacific–Asia Conference on Language, Information, and Computation (PACLIC-27), pp. 75–84Loni B, Van Tulder G, Wiggers P, Tax D M J and Loog M 2011 Question classification by weighted combination of lexical, syntactic and semantic features. In: Proceedings of the International Conference on Text, Speech, and Dialogue, pp. 243–250Huang Z, Thint M and Celikyilmaz A 2009 Investigation of question classifier in question answering. In: Proceedings of Empirical Methods in Natural Language Processing. ACL, vol. 2, pp. 543–550Blunsom P, Kocik K and Curran J R 2006 Question classification with log-linear models. In: Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, pp. 615–616Diwakar S, Goyal P and Gupta R 2010 Transliteration among indian languages using WX notation. In: Proceedings of the Conference on Natural Language Processing, EPFL-CONF-168805. Saarland University Press, pp. 147–150Banerjee S, Naskar S K and Bandyopadhyay S Bengali named entity recognition using margin infused relaxed algorithm. In: Proceedings of the International Conference on Text, Speech, and Dialogue, pp. 125–132Li X and Roth D Learning question classifiers. In: Proceedings of the 19th International Conference on Computational Linguistics, ACL, vol. 1, pp. 1–7Cohen J 1960 A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20(1): 37–46Schapire R E 1990 The strength of weak learnability. Mach. Learn. 5(2): 197–22

arXiv.org e-Print Archive

RiuNet

Learning to Behave: Internalising Knowledge

Author
Publication venue: 'University Library/University of Twente'
Publication date: 21/11/2000
Field of study

University of Twente Research Information

Discourse Analysis of Argumentative Essays of English Learners based on their CEFR Level

Author: Hanel Blaise
Publication venue
Publication date: 20/07/2023
Field of study

This thesis aims to explore the relationship between discourse information and the CEFR-level (Common European Framework of Reference for Languages) in argumentative English learner essays. The study leverages two prominent frameworks: the Rhetorical Structure Theory (RST) and the Penn Discourse TreeBank (PDTB), to analyze essays obtained from The International Corpus Network of Asian Learners (ICNALE) and the Corpus and Repository of Writing (CROW). The research investigates the influence of different discourse relations and connectives on the language proficiency level of the writers, and further explores the potential of using discourse information as additional features for automated CEFR-level determination. The analysis of the collected essays reveals significant findings regarding the utilization of discourse relations by English learners. Notably, the RST relations of EXPLANATION and BACKGROUND are statistically used more often by writers with a CEFR level below fluency. In addition, as the CEFR level increases, the use of the PDTB relation of CONTINGENCY decreases. These results provide empirical evidence of the relationship between discourse relations and language proficiency, highlighting the differential usage patterns among learners at various CEFR levels. To validate these findings computationally, discourse relations and connectives are employed as supplementary features for machine learning models. The experimental results indicate that incorporating discourse information into the automated CEFR-level determination process leads to a mild increase in performance compared to relying solely on lexical and grammatical features. However, it is important to note that the proposed approach does not outperform the use of large language models, such as RoBERTa, which have demonstrated superior performance in various natural language processing tasks. Nevertheless, this study contributes valuable insights into the relationship between discourse relations and argumentative English learner essays. The findings highlight the potential influence of discourse relations on language proficiency and suggest avenues for further research and development in language assessment methodologies

Concordia University Research Repository

EXPERIMENTAL-COMPUTATIONAL ANALYSIS OF VIGILANCE DYNAMICS FOR APPLICATIONS IN SLEEP AND EPILEPSY

Author: Yaghouby Farid
Publication venue: UKnowledge
Publication date: 01/01/2015
Field of study

Epilepsy is a neurological disorder characterized by recurrent seizures. Sleep problems can cooccur with epilepsy, and adversely affect seizure diagnosis and treatment. In fact, the relationship between sleep and seizures in individuals with epilepsy is a complex one. Seizures disturb sleep and sleep deprivation aggravates seizures. Antiepileptic drugs may also impair sleep quality at the cost of controlling seizures. In general, particular vigilance states may inhibit or facilitate seizure generation, and changes in vigilance state can affect the predictability of seizures. A clear understanding of sleep-seizure interactions will therefore benefit epilepsy care providers and improve quality of life in patients. Notable progress in neuroscience research—and particularly sleep and epilepsy—has been achieved through experimentation on animals. Experimental models of epilepsy provide us with the opportunity to explore or even manipulate the sleep-seizure relationship in order to decipher different aspects of their interactions. Important in this process is the development of techniques for modeling and tracking sleep dynamics using electrophysiological measurements. In this dissertation experimental and computational approaches are proposed for modeling vigilance dynamics and their utility demonstrated in nonepileptic control mice. The general framework of hidden Markov models is used to automatically model and track sleep state and dynamics from electrophysiological as well as novel motion measurements. In addition, a closed-loop sensory stimulation technique is proposed that, in conjunction with this model, provides the means to concurrently track and modulate 3 vigilance dynamics in animals. The feasibility of the proposed techniques for modeling and altering sleep are demonstrated for experimental applications related to epilepsy. Finally, preliminary data from a mouse model of temporal lobe epilepsy are employed to suggest applications of these techniques and directions for future research. The methodologies developed here have clear implications the design of intelligent neuromodulation strategies for clinical epilepsy therapy

University of Kentucky