3,060 research outputs found

    A Performance Evaluation of Classifiers Employ Language Dependent Tools for Indonesian Text

    Get PDF
    This paper evaluates the performance of Maximum Entropy (MaxEnt), Support Vector Machine (SVM) and Na¨ıve Bayes (NB) techniques for Indonesian text classification. Performance of MaxEnt and SVM techniques are compared against baseline NB technique. We also investigate the effect of language dependent tools such as Indonesian stemming and stop words removal can have on these techniques for text classification performances. Up to now, there is no experimental report about the effect of Indonesian stemmer on the text classification accuracy. From our experiments, we conclude that maximum entropy performs better than other classifiers in general. Language dependent tools such as stemming and stop words removal have only little effect on the accuracy of text classification. However stemmed approach scored highest average accuracy and due to the dimension reduction of feature vectors used in classification, make this approach is viable step in pre-processing stage

    ANNOTATED DISJUNCT FOR MACHINE TRANSLATION

    Get PDF
    Most information found in the Internet is available in English version. However, most people in the world are non-English speaker. Hence, it will be of great advantage to have reliable Machine Translation tool for those people. There are many approaches for developing Machine Translation (MT) systems, some of them are direct, rule-based/transfer, interlingua, and statistical approaches. This thesis focuses on developing an MT for less resourced languages i.e. languages that do not have available grammar formalism, parser, and corpus, such as some languages in South East Asia. The nonexistence of bilingual corpora motivates us to use direct or transfer approaches. Moreover, the unavailability of grammar formalism and parser in the target languages motivates us to develop a hybrid between direct and transfer approaches. This hybrid approach is referred as a hybrid transfer approach. This approach uses the Annotated Disjunct (ADJ) method. This method, based on Link Grammar (LG) formalism, can theoretically handle one-to-one, many-to-one, and many-to-many word(s) translations. This method consists of transfer rules module which maps source words in a source sentence (SS) into target words in correct position in a target sentence (TS). The developed transfer rules are demonstrated on English → Indonesian translation tasks. An experimental evaluation is conducted to measure the performance of the developed system over available English-Indonesian MT systems. The developed ADJ-based MT system translated simple, compound, and complex English sentences in present, present continuous, present perfect, past, past perfect, and future tenses with better precision than other systems, with the accuracy of 71.17% in Subjective Sentence Error Rate metric

    A scoring rubric for automatic short answer grading system

    Get PDF
    During the past decades, researches about automatic grading have become an interesting issue. These studies focuses on how to make machines are able to help human on assessing students’ learning outcomes. Automatic grading enables teachers to assess student's answers with more objective, consistent, and faster. Especially for essay model, it has two different types, i.e. long essay and short answer. Almost of the previous researches merely developed automatic essay grading (AEG) instead of automatic short answer grading (ASAG). This study aims to assess the sentence similarity of short answer to the questions and answers in Indonesian without any language semantic's tool. This research uses pre-processing steps consisting of case folding, tokenization, stemming, and stopword removal. The proposed approach is a scoring rubric obtained by measuring the similarity of sentences using the string-based similarity methods and the keyword matching process. The dataset used in this study consists of 7 questions, 34 alternative reference answers and 224 student’s answers. The experiment results show that the proposed approach is able to achieve a correlation value between 0.65419 up to 0.66383 at Pearson's correlation, with Mean Absolute Error () value about 0.94994 until 1.24295. The proposed approach also leverages the correlation value and decreases the error value in each method

    Analysis of the Impact of Vectorization Methods on Machine Learning-Based Sentiment Analysis of Tweets Regarding Readiness for Offline Learning

    Get PDF
    Twitter users use social media to express emotions about something, whether it is criticism or praise. Analyzing the opinions or sentiments in the tweets that Twitter users send can identify their emotions for a particular topic. This study aims to determine the impact of vectorization methods on public sentiment analysis regarding the readiness for offline learning in Indonesia during the Covid-19 pandemic. The authors labeled sentiment using two different approaches: manually and automatically using the NLP TextBlob library. We compared the vectorization method used by employing count vectorization, TF-IDF, and a combination of both. The feature vectors were then classified using three classification methods: naïve Bayes, logistic regression, and k-nearest neighbor, for both manual and automatic labeling. To assess the performance of sentiment analysis models, we used accuracy, precision, recall, and F1-score for performance metrics. The best results showed that the Logistic regression classifier with the feature extraction technique that combines count vectorization and TF-IDF provided the best performance for both data with manual and automatic labeling

    MANHATTAN DISTANCE AND DICE SIMILARITY EVALUATION ON INDONESIAN ESSAY EXAMINATION SYSTEM

    Get PDF
    Each learning process requires an evaluation tool to measure the level of understanding of students. The type of evaluation can be multiple choice questions, short entries and essays. Some studies reveal essay exams better than other types of evaluations. An essay assessment is automatically needed to save teacher time in correcting answers. However, the development of essay assessments is still ongoing. The aim is to obtain a better accuracy value than the method used in the assessment. Based on these problems, this study proposes a comparative analysis of similarity methods for online essay exam assessment. The similarity method compared is Similarity Dice and Manhattan Distance. Both methods produce coefficient values which are then compared to the assessment of the system with manual scales with the same scale. The data used were 2162 data. This data was obtained from 50 students who answered 40 questions (politics, sports, lifestyle and technology). The data obtained in this study can be used to support other research that can be accessed at www.indonesian-ir.org. This research shows that the Dice similarity scheme is more accurate than Manhattan Distanc

    Study and Implementation of Monolingual Approach on Indonesian Question Answering for Factoid and Non-Factoid Question

    Get PDF

    Summarizing Text for Indonesian Language by Using Latent Dirichlet Allocation and Genetic Algorithm

    Full text link
    The number of documents progressively increases especially for the electronic one. This degrades effectivity and efficiency in managing them. Therefore, it is a must to manage the documents. Automatic text summarization is able to solve by producing text document summaries. The goal of the research is to produce a tool to summarize documents in Bahasa: Indonesian Language. It is aimed to satisfy the user's need of relevant and consistent summaries. The algorithm is based on sentence features scoring by using Latent Dirichlet Allocation and Genetic Algorithm for determining sentence feature weights. It is evaluated by calculating summarization speed, precision, recall, F-measure, and some subjective evaluations. Extractive summaries from the original text documents can represent important information from a single document in Bahasa with faster summarization speed compared to manual process. Best F-measure value is 0,556926 (with precision of 0.53448 and recall of 0.58134) and summary ratio of 30%

    An Attempt to Create an Automatic Scoring Tool of Short Text Answer in Bahasa Indonesia

    Full text link
    Closed questions offer poor information on student's ability to manage and apply knowledge. On the other hand, open questions have advantages because it may be used to grasp students' conceptual maturity and ability of communication. However, scoring open question answer is not trivial and time-consuming so an automatic scoring tool becomes necessary. An attempt was made to create a scoring tool for open and short text question answer in Bahasa Indonesia that resembles the way school teachers do scoring. Automatic scoring of a student answer was based on the similarity between the answer and predefined key answers. The proposed automatic scoring tool has a form of correlation with human scoring so that the model may be used to predict teacher scoring
    • …
    corecore