148,382 research outputs found

    Classifier combination approach for question classification for Bengali question answering system

    Full text link
    [EN] Question classification (QC) is a prime constituent of an automated question answering system. The work presented here demonstrates that a combination of multiple models achieves better classification performance than those obtained with existing individual models for the QC task in Bengali. We have exploited state-of-the-art multiple model combination techniques, i.e., ensemble, stacking and voting, to increase QC accuracy. Lexical, syntactic and semantic features of Bengali questions are used for four well-known classifiers, namely Naive Bayes, kernel Naive Bayes, Rule Induction and Decision Tree, which serve as our base learners. Single-layer question-class taxonomy with 8 coarse-grained classes is extended to two-layer taxonomy by adding 69 fine-grained classes. We carried out the experiments both on single-layer and two-layer taxonomies. Experimental results confirmed that classifier combination approaches outperform single-classifier classification approaches by 4.02% for coarse-grained question classes. Overall, the stacking approach produces the best results for fine-grained classification and achieves 87.79% of accuracy. The approach presented here could be used in other Indo-Aryan or Indic languages to develop a question answering system.Somnath Banerjee and Sudip Kumar Naskar are supported by Digital India Corporation (formerly Media Lab Asia), MeitY, Government of India, under the Visvesvaraya Ph.D. Scheme for Electronics and IT. The work of Paolo Rosso was partially funded by the Spanish MICINN under the research project PGC2018-096212-B-C31.Banerjee, S.; Kumar Naskar, S.; Rosso, P.; Bndyopadhyay, S. (2019). Classifier combination approach for question classification for Bengali question answering system. Sadhana. 44(12):1-14. https://doi.org/10.1007/s12046-019-1224-81144412Jurafsky D and Martin J H 2014 Speech and language processing. Pearson, LondonMartin J H and Jurafsky D 2000 Speech and language processing, international edition 710Voorhees E M 2002 Overview of the TREC 2001 question answering track. NIST Special Publication, pp. 42–51Hovy E, Gerber L, Hermjakob U, Lin C Y and Ravichandran D 2001 Toward semantics-based answer pinpointing. In: Proceedings of Human Language Technology Research, ACL, pp. 1–7Ittycheriah A, Franz M, Zhu W J, Ratnaparkhi A and Mammone R J 2000 IBM’s statistical question answering system. In: Proceedings of TRECMoldovan D, PaƟca M, Harabagiu S and Surdeanu M 2003 Performance issues and error analysis in an open-domain question answering system. ACM Trans. Inf. Syst. 21(2): 133–154Banerjee S and Bandyopadhyay S 2012 Bengali question classification: towards developing QA system. In: Proceedings of the 3rd Workshop on South and Sotheast Asian Language Processing (SANLP), COLING, pp. 25–40Loni B 2011 A survey of state-of-the-art methods on question classification. Technical Report, Delft University of TechnologyHull D A 1999 Xerox TREC-8 question answering track report. In: Proceedings of TRECPrager J, Radev D, Brown E, Coden A and Samn V 1999 The use of predictive annotation for question answering in TREC8. Inf. Retr. 1(3): 4Moschitti A, Quarteroni S, Basili R and Manandhar S 2007 Exploiting syntactic and shallow semantic kernels for question answer classification. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, p. 776Zhang D and Lee W S 2003 Question classification using support vector machines. In: Proceedings of Research and Development in Informaion Retrieval, ACM, pp. 26–32Huang Z, Thint M and Qin Z 2008 Question classification using head words and their hypernyms. In: Proceedings of Empirical Methods in Natural Language Processing, ACL, pp. 927–936Silva J, Coheur L, Mendes A C and Wichert A 2011 From symbolic to sub-symbolic information in question classification. Artif. Intell. Rev. 35(2): 137–154Li X and Roth D 2006 Learning question classifiers: the role of semantic information. Nat. Lang. Eng. 12(03): 229–249McCallum A, Freitag D and Pereira F C N 2000 Maximum entropy markov models for information extraction and segmentation. In: Proceedings of the International Conference on Machine Learning (ICML), vol. 17, pp. 591–598Cortes C and Vapnik V 1995 Support-vector networks. Mach. Learn. 20(3): 273–297Breiman L 1996 Bagging predictors. Mach. Learn. 24(2): 123–140Clemen R T 1989 Combining forecasts: a review and annotated bibliography. Int. J. Forecast. 5(4): 559–583Perrone M P 1993 Improving regression estimation: averaging methods for variance reduction with extensions to general convex measure optimization. Ph.D. Thesis, Brown UniversityWolpert D H 1992 Stacked generalization. Neural Netw. 5(2): 241–259Hansen L K and Salamon P 1990 Neural network ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 12: 993–1001Krogh A, Vedelsby J et al 1995 Neural network ensembles, cross validation, and active learning. Adv. Neural Inf. Process. Syst. 7: 231–238Hashem S 1997 Optimal linear combinations of neural networks. Neural Netw. 10(4): 599–614Opitz D W and Shavlik J W 1996 Actively searching for an effective neural network ensemble. Connect. Sci. 8(3–4): 337–354Opitz D W and Shavlik J W 1996 Generating accurate and diverse members of a neural-network ensemble. In: Advances in neural information processing systems, pp. 535–541Xin L, Huang X J and Wu L 2006 Question classification by ensemble learning. Int. J. Comput. Sci. Netw. Secur. 6(3): 147Schapire R E 1990 The strength of weak learnability. Mach. Learn. 5(2): 197–227Brill E 1995 Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Comput. Linguist. 21(4): 543–565Jia K, Chen K, Fan X and Zhang Y 2007 Chinese question classification based on ensemble learning. In: Proceedings of ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, SNPD 2007. IEEE, vol. 3, pp. 342–347Su L, Liao H, Yu Z and Zhao Q 2009 Ensemble learning for question classification. In: Proceedings of Intelligent Computing and Intelligent Systems, ICIS. IEEE, pp. 501–505Ferrucci D, Brown E, Chu-Carroll J, Fan J et al 2010 Building Watson: an overview of the DeepQA project. AI Mag. 31(3): 59–79PĂ©rez-Coutiño M A, Montes-y-GĂłmez M, LĂłpez-LĂłpez A and Villaseñor-Pineda L 2005 Experiments for tuning the values of lexical features in question answering for Spanish. In: CLEF Working NotesNeumann G and Sacaleanu B 2003 A cross-language question/answering system for German and English. In: Proceedings of the Workshop of the Cross-Language Evaluation Forum for European Languages, pp. 559–571Blunsom P, Kocik K and Curran J R 2006 Question classification with log-linear models. In: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, pp. 615–616Rosso P, Benajiba Y and Lyhyaoui A 2006 In: Proceedings of the 4th Conference on Scientific Research Outlook and Technology Development in the Arab World, pp. 11–14Abouenour L, Bouzoubaa K and Rosso P 2012 IDRAAQ: new Arabic question answering system based on query expansion and passage retrieval. In: Proceedings of CELCTSakai T, Saito Y, Ichimura Y, Koyama M, Kokubu T and Manabe T 2004 ASKMi: a Japanese question answering system based on semantic role analysis. In: Proceedings of Coupling Approaches, Coupling Media and Coupling Languages for Information Retrieval, pp. 215–231Isozaki H, Sudoh K and Tsukada H 2005 NTT’s Japanese–English cross-language question answering system. In: Proceedings of NTCIRYongkui Z, Zheqian Z, Lijun B and Xinqing C 2003 Internet-based Chinese question-answering system. Comput. Eng. 15: 34Sun A, Jiang M, He Y, Chen L and Yuan B 2008 Chinese question answering based on syntax analysis and answer classification. Acta Electron. Sin. 36(5): 833–839Sahu S, Vasnik N and Roy D 2012 Prashnottar: a Hindi question answering system. Int. J. Comput. Sci. Inf. Technol. 4(2): 149Nanda G, Dua M and Singla K 2016 A Hindi question answering system using machine learning approach. In: Proceedings of Computational Techniques in Information and Communication Technologies (ICCTICT). IEEE, pp. 311–314Sekine S and Grishman R 2003 Hindi–English cross-lingual question-answering system. ACM Trans. Asian Lang. Inf. Process. 2(3): 181–192Shukla P, Mukherjee A and Raina A 2004 Towards a language independent encoding of documents. In: Proceedings of NLUCS 2004, p. 116Ray S K, Ahmad A and Shaalan K 2018 A review of the state of the art in Hindi question answering systems. In: Proceedings of Intelligent Natural Language Processing: Trends and Applications, pp. 265–292Kumar P, Kashyap S, Mittal A and Gupta S 2003 A query answering system for e-learning Hindi documents. South Asian Lang. Rev. 13(1–2): 69–81Reddy R, Reddy N and Bandyopadhyay S 2006 Dialogue based question answering system in Telugu. In: Proceedings of the Workshop on Multilingual Question Answering, pp. 53–60Dhanjal G S, Sharma S and Sarao P K 2016 Gravity based Punjabi question answering system. Int. J. Comput. Appl. 147(3): 30–35Bindu M S and Mary I S 2012 Design and development of a named entity based question answering system for Malayalam language. Ph.D. Thesis, Cochin University of Science and TechnologyLee C W et al 2005 ASQA: academia sinica question answering system for NTCIR-5 CLQA. In: Proceedings of the NTCIR-5 Workshop, pp. 202–208Banerjee S and Bandyopadhyay S 2013 Ensemble approach for fine-grained question classification in Bengali. In: Proceedings of the 27th Pacific–Asia Conference on Language, Information, and Computation (PACLIC-27), pp. 75–84Loni B, Van Tulder G, Wiggers P, Tax D M J and Loog M 2011 Question classification by weighted combination of lexical, syntactic and semantic features. In: Proceedings of the International Conference on Text, Speech, and Dialogue, pp. 243–250Huang Z, Thint M and Celikyilmaz A 2009 Investigation of question classifier in question answering. In: Proceedings of Empirical Methods in Natural Language Processing. ACL, vol. 2, pp. 543–550Blunsom P, Kocik K and Curran J R 2006 Question classification with log-linear models. In: Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, pp. 615–616Diwakar S, Goyal P and Gupta R 2010 Transliteration among indian languages using WX notation. In: Proceedings of the Conference on Natural Language Processing, EPFL-CONF-168805. Saarland University Press, pp. 147–150Banerjee S, Naskar S K and Bandyopadhyay S Bengali named entity recognition using margin infused relaxed algorithm. In: Proceedings of the International Conference on Text, Speech, and Dialogue, pp. 125–132Li X and Roth D Learning question classifiers. In: Proceedings of the 19th International Conference on Computational Linguistics, ACL, vol. 1, pp. 1–7Cohen J 1960 A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20(1): 37–46Schapire R E 1990 The strength of weak learnability. Mach. Learn. 5(2): 197–22

    Improved Chinese Language Processing for an Open Source Search Engine

    Get PDF
    Natural Language Processing (NLP) is the process of computers analyzing on human languages. There are also many areas in NLP. Some of the areas include speech recognition, natural language understanding, and natural language generation. Information retrieval and natural language processing for Asians languages has its own unique set of challenges not present for Indo-European languages. Some of these are text segmentation, named entity recognition in unsegmented text, and part of speech tagging. In this report, we describe our implementation of and experiments with improving the Chinese language processing sub-component of an open source search engine, Yioop. In particular, we rewrote and improved the following sub-systems of Yioop to try to make them as state-of-the-art as possible: Chinese text segmentation, Part-of-speech (POS) tagging, Named Entity Recognition (NER), and Question and Answering System. Compared to the previous system we had a 9% improvement on Chinese words Segmentation accuracy. We built POS tagging with 89% accuracy. And We implement NER System with 76% accuracy

    Comparative Study of College Admission System Based on Baccalauréat in Cameroon and Gaokao in China

    Get PDF
    The aim of this study is to try answering to the question concerning the similarities and differences between Cameroonian and Chinese college admission system and the factors determining students’ qualifications for higher education. Based on the documentary trends, interviews, observation and personal experiences, authors after an overview of admission systems in the world have presented the college admission systems in both countries, Cameroon and China. And then, they have analyzed the administrative structure, exam content, examination period and results release, assessment and the enrollment policies. Finally, based on some of the best practices showcased by both countries, some practical suggestions to both countries for a possible implementation have been given. Keywords: Comparative study; College admission system; Higher education; Cameroon; China

    Question Paraphrase Generation for Question Answering System

    Get PDF
    The queries to a practical Question Answering (QA) system range from keywords, phrases, badly written questions, and occasionally grammatically perfect questions. Among different kinds of question analysis approaches, the pattern matching works well in analyzing such queries. It is costly to build this pattern matching module because tremendous manual labor is needed to expand its coverage to so many variations in natural language questions. This thesis proposes that the costly manual labor should be saved by the technique of paraphrase generation which can automatically generate semantically similar paraphrases of a natural language question. Previous approaches of paraphrase generation either require large scale of corpus and the dependency parser, or only deal with the relation-entity type of simple question queries. By introducing a method of inferring transformation operations between paraphrases, and a description of sentence structure, this thesis develops a paraphrase generation method and its implementation in Chinese with very limited amount of corpus. The evaluation results of this implementation show its ability to aid humans to efficiently create a pattern matching module for QA systems as it greatly outperforms the human editors in the coverage of natural language questions, with an acceptable precision in generated paraphrases

    LCC-DCU C-C question answering task at NTCIR-5

    Get PDF
    This paper describes the work for our participation in the NTCIR-5 Chinese to Chinese Question Answering task. Our strategy is based on the “Retrieval plus Extraction” approach. We first retrieve relevant documents, then retrieve short passages from the above documents, and finally extract named entity answers from the most relevant passages. For question type identification, we use simple heuristic rules which can cover most questions. The Lemur toolkit with the OKAPI model is used for document retrieval. Results of our task submission are given and some preliminary conclusions drawn

    An analysis of question processing of English and Chinese for the NTCIR 5 cross-language question answering task

    Get PDF
    An important element in question answering systems is the analysis and interpretation of questions. Using the NTCIR 5 Cross-Language Question Answering (CLQA) question test set we demonstrate that the accuracy of deep question analysis is dependent on the quantity and suitability of the available linguistic resources. We further demonstrate that applying question analysis tools developed on monolingual training materials to questions translated Chinese-English and English-Chinese using machine translation produces much reduced effectiveness in interpretation of the question. This latter result indicates that question analysis for CLQA should primarily be conducted in the question language prior to translation

    ICT-DCU question answering task at NTCIR-6

    Get PDF
    This paper describes details of our participation in the NTCIR-6 Chinese-to-Chinese Question Answering task. We use the “retrieval plus extraction approach” to get answers for questions. We first split the documents into short passages, and then retrieve potentially relevant passages for a question, and finally extract named entity answers from the most relevant passages. For question type identification, we use simple heuristic rules which cover most questions. The Lemur toolkit was used with the okapi model for document retrieval. Results of our task submission are given and some preliminary conclusions drawn
    • 

    corecore