56 research outputs found

    A Hybrid Question Answering System based on Ontology and Topic Modeling

    Get PDF
    A Question Answering (QA) system is an application which could provide accurate answer in response to the natural language questions. However, some QA systems have their weaknesses, especially for the QA system built based on Knowledge-based approach. It requires to pre-define various triple patterns in order to solve different question types. The ultimate goal of this paper is to propose an automated QA system using a hybrid approach, a combination of the knowledge-based and text-based approaches. Our approach only requires two SPARQLs to retrieve the candidate answers from the ontology without defining any question pattern, and then uses the Topic Model to find the most related candidate answers as the answers. We also investigate and evaluate different language models (unigram and bigram). Our results have shown that this proposed QA system is able to perform beyond the random baseline and solve up to 44 out of 80 questions with Mean Reciprocal Rank (MRR) of 38.73% using bigram LDA

    An empirical study on CO2 emissions in ASEAN countries

    Get PDF
    This paper proposes a local feature selection (FS) measure namely, Categorical Descriptor Term (CTD) for text categorization. It is derived based on classic term weighting scheme, TFIDF. The method explicitly chooses feature set for each category by only selecting set of terms from relevant category. Although past literatures have suggested that the use of features from irrelevant categories can improve the measure of text categorization, we believe that by incorporating only relevant feature can be highly effective. The experimental comparison is carried out between CTD and five wellknown feature selection measures: Information Gain, Chi-Square, Correlation Coefficient, Odd Ratio and GSS Coefficient. The results also show that our proposed method can perform comparatively well with other FS measures, especially on collection with highly overlapped topics

    Joint Distance and Information Content Word Similarity Measure

    Get PDF
    Measuring semantic similarity between words is very important to many applications related to information retrieval and natural language processing. In the paper, we have discovered that word similarity metrics suffer from the drawback of obtaining equal similarities of two words, if they have the same path and depth values in WordNet. Likewise information content methods which depend on word probability of a corpus tend to posture the same drawback. This paper proposes a new hybrid semantic similarity to overcome the drawbacks by exploiting advantages of Li and Lin methods. On a benchmark set of human judgments on Miller Charles and Rubenstein Goodenough data sets, the proposed approach outperforms existing methods in distance and information content based methods

    An empirical study of feature selection for text categorization based on term weightage

    Get PDF
    This paper proposes a local feature selection (FS) measure namely, Categorical Descriptor Term (CTD) for text categorization. It is derived based on classic term weighting scheme, TFIDF. The method explicitly chooses feature set for each category by only selecting set of terms from relevant category. Although past literatures have suggested that the use of features from irrelevant categories can improve the measure of text categorization, we believe that by incorporating only relevant feature can be highly effective. The experimental comparison is carried out between CTD and five wellknown feature selection measures: Information Gain, Chi-Square, Correlation Coefficient, Odd Ratio and GSS Coefficient. The results also show that our proposed method can perform comparatively well with other FS measures, especially on collection with highly overlapped topics

    A Framework to Predict Software “Quality in Use” from Software Reviews

    Get PDF
    Software reviews are verified to be a good source of users’ experience. The software “quality in use” concerns meeting users’ needs. Current software quality models such as McCall and Boehm, are built to support software development process, rather than users perspectives. In this paper, opinion mining is used to extract and summarize software “quality in use” from software reviews. A framework to detect software “quality in use” as defined by the ISO/IEC 25010 standard is presented here. The framework employs opinionfeature double propagation to expand predefined lists of software “quality in use” features to domain specific features. Clustering is used to learn software feature “quality in use” characteristics group. A preliminary result of extracted software features shows promising results in this direction

    Assessing Malaysian University English Test (MUET) Essay on Language and Semantic Features Using Intelligent Essay Grader (IEG)

    Get PDF
    Automated Essay Scoring (AES) refers to the Artificial Intelligence (AI) application with the “intelligence” in assessing and scoring essays. There are several well-known commercial AES adopted by western countries, as well as many research works conducted in investigating automated essay scoring. However, most of the products and research works are not related to the Malaysian English test context. The AES products tend to score essays based on the scoring rubrics of a particular English text context (e.g., TOEFL, GMAT) by employing their proprietary scoring algorithm that is not accessible by the users. In Malaysia, the research and development of AES are scarce. This paper intends to formulate a Malaysia-based AES, namely Intelligent Essay Grader (IEG), for the Malaysian English test environment by using our collection of two Malaysian University English Test (MUET) essay dataset. We proposed the essay scoring rubric based on its language and semantic features. We analyzed the correlation of the proposed language and semantic features with the essay grade using the Pearson Correlation Coefficient. Furthermore, we constructed an essay scoring model to predict the essay grades. In our result, we found that the language featured such as vocabulary count and advanced part of speech were highly correlated with the essay grades, and the language features showed a greater influence on essay grades than the semantic features. From our prediction model, we observed that the model yielded better accuracy results based on the selected high-correlated essay features, followed by the language features
    • …
    corecore