24 research outputs found

    Sentiment Analysis or Opinion Mining: A Review

    Get PDF
    Opinion Mining (OM) or Sentiment Analysis (SA) can be defined as the task of detecting, extracting and classifying opinions on something. It is a type of the processing of the natural language (NLP) to track the public mood to a certain law, policy, or marketing, etc. It involves a way that development for the collection and examination of comments and opinions about legislation, laws, policies, etc., which are posted on the social media. The process of information extraction is very important because it is a very useful technique but also a challenging task. That mean, to extract sentiment from an object in the web-wide, need to automate opinion-mining systems to do it. The existing techniques for sentiment analysis include machine learning (supervised and unsupervised), and lexical-based approaches. Hence, the main aim of this paper presents a survey of sentiment analysis (SA) and opinion mining (OM) approaches, various techniques used that related in this field. As well, it discusses the application areas and challenges for sentiment analysis with insight into the past researcher's works

    Schema Matching for Large-Scale Data Based on Ontology Clustering Method

    Get PDF
    Holistic schema matching is the process of identifying semantic correspondences among multiple schemas at once. The key challenge behind holistic schema matching lies in selecting an appropriate method that has the ability to maintain effectiveness and efficiency. Effectiveness refers to the quality of matching while efficiency refers to the time and memory consumed within the matching process. Several approaches have been proposed for holistic schema matching. These approaches were mainly dependent on clustering techniques. In fact, clustering aims to group the similar fields within the schemas in multiple groups or clusters. However, fields on schemas contain much complicated semantic relations due to schema level. Ontology which is a hierarchy of taxonomies, has the ability to identify semantic correspondences with various levels. Hence, this study aims to propose an ontology-based clustering approach for holistic schema matching. Two datasets have been used from ICQ query interfaces consisting of 40 interfaces, which refer to Airfare and Job. The ontology used in this study has been built using the XBenchMatch which is a benchmark lexicon that contains rich semantic correspondences for the field of schema matching. In order to accommodate the schema matching using the ontology, a rule-based clustering approach is used with multiple distance measures including Dice, Cosine and Jaccard. The evaluation has been conducted using the common information retrieval metrics; precision, recall and f-measure. In order to assess the performance of the proposed ontology-based clustering, a comparison among two experiments has been performed. The first experiment aims to conduct the ontology-based clustering approach (i.e. using ontology and rule-based clustering), while the second experiment aims to conduct the traditional clustering approaches without the use of ontology. Results show that the proposed ontology-based clustering approach has outperformed the traditional clustering approaches without ontology by achieving an f-measure of 94% for Airfare and 92% for Job datasets. This emphasizes the strength of ontology in terms of identifying correspondences with semantic level variation

    Classification of Encouragement (Targhib) And Warning (Tarhib) Using Sentiment Analysis on Classical Arabic

    Get PDF
    The Holy Qur’an is the main religious text of Islam. The Qur’an has its own methods of Targhib (encouragement) and Tarhib (warning), which are important features of the Qur’an. Most of the Quranic verses would urge and encourage people to do right and good deeds, and also warn them from committing evil and bad deeds. The method of classifying a text into two opposing opinions has been applied previously in solving the problem of sentiment analysis. Currently, it is applied in identifying between Targhib (encouragement) and Tarhib (warning) verses in the Qur’an. Each verse of the Qur’an can be treated as either an encouragement, warning or neutral. The language of the Holy Qur’an is one of the most challenging natural languages in sentiment analysis.  The aim of this work is to classify the verses of encouragement and warning using sentiment analysis and NLP techniques. Several approaches are used in the Sentiment Analysis classification, such as the machine learning approach, the lexicon-based approach and the hybrid approach. In carrying out this aim, the applied machine learning approach was used, where the impact of the use of different techniques such as POS tagging, N-Gram and Feature selection with correlation based were evaluated and investigated. 95.6% accuracy was achieved using Naïve Bayes (NB) and 91.5% accuracy was achieved using the Support Vector Machines (SVM). This study is a significant study in extracting information and knowledge from the Holy Qur’an. It is significant for both researchers in the field of Islamic studies as well as non-specialized researchers

    Named entity recognition for quranic text using rule based approaches

    Get PDF
    The variety and difference between domains for textual data require customization in the Natural Language Processing component especially in Named Entity Recognition where different domains contain several types of entities. The current NER model is deemed not fit to accurately extract entities from Quranic text due to its unique content. This paper describes the building of a rule-based Named Entity Recognition method to extract the entities that exist in the English translation to the meaning of the Quranic text and its performance evaluation. Named entity tagging, a common task in-text annotation, in which entities (nouns) in the unstructured text are identified and assigned a class. A few rules are built to extract several types of entities such as the name of prophets and people, creation, location, time, and the various names of God. The rules are built mainly using regular expressions and gazetteers. The rules that have been built result in high precision and recall as well as a satisfactory F-score of over 90%. The results from this experiment can be used as annotation in building a machine learning model to extract entities from the same type of domain specifically on the Quranic text or generally in the Islamic domain text

    The effectiveness of url features on phishing emails classification using machine learning approach

    Get PDF
    Phishing email classification requires features so that the performance obtained produces good accuracy. One of the reasons for the lack of development of models for detecting phishing emails is the complexity of the feature selection. Feature selection is one of the essential parts of getting a good classification result, commonly used features are header, body, and Uniform Resource Locator (URL). Besides the email body text content, the URL is one of the leading indicators that the phishing attack successfully happened. The URL is commonly located on the body of the phishing email to get the victim's attention. It will redirect the victim to a fake website to obtain personal information from the victim. There is a lack of information about how the URL features affect the phishing email classification results. Therefore, this work focuses on using URL features to determine whether an email is phishing or legitimate using machine learning approaches. Two public datasets used in this work are the Online Phishing Corpus and Enron Corpus. The URL features are extracted using the Beautiful Soup library. Two machine learning classifiers used in this work are Support Vector Machine (SVM) and Artificial Neural Network (ANN). The experiments were divided into two based on features used in the classifiers. The first experiment used raw email data with URL features, while the second only used raw email data. The first experiment shows higher accuracy in both classifiers, SVM and ANN. Hence, this research proves that the impact of selecting URL features will increase the performance of the classification

    Preliminary study on an ontology learning from textual data

    Get PDF
    Natural language understanding is needed to intelligently handle the large volumes of information that is explosive growth over the last decade on the WWW. Ontologies may help with analyzing and understanding text where ontology provides a capability to represent objects, concepts and other entities and the relationships between them. Ontologies may be used as a tool for finding possible meanings of words in text, and meaning of text in general. Now, much of this ontology development has been directed towards extraction from textual data as human language is a primary mode of knowledge transfer. The aim of this paper is to give a general overview and preliminary study on some of the ontology learning from text that plays a prominent role on the knowledge retrieval and how the ontological semantic can be improved through the adoption of semantic web technolog

    Pashto language stemming algorithm

    Get PDF
    This paper presents a stemming algorithm for morphological analysis for less popular or minor language like Pashto language. There is lack of resources and tools that can be applied in different applications such as in document indexing, clustering, language processing, text analysis, database search systems, information retrieval, and linguistic applications. The review of literature shows that only a few morphological studies have been conducted in the Pashto language, and research which focused on automatic stemming has not yet been fully analysed. In addition, no stemming algorithm has been proposed for extracting Pashto root words from the Pashto corpus, which is applicable for the above mentioned functions. Therefore, the objective of the current thesis is to develop a rule-based stemming algorithm for the Pashto language. The Pashto corpus is directly used as the input and the stemming algorithm uses both inflectional and derivational morphemes. The output is in the form of meaningful root word without affixes. Furthermore, the accuracy and strength of the proposed algorithm is evaluated using word count method. To validate the function of the developed algorithm, two native speakers of Pashto were recruited to evaluate the algorithm in terms of its accuracy and strength. The result of the study shows that the proposed algorithm has the accuracy of 87%. This study can have a great contribution to Pashto language in terms of extracting the root words useful for different purposes including data indexing, information retrieval, linguistic application, etc. This research also lays the ground for further studies on Pashto language analysis

    Ontology extraction

    Get PDF
    Ontology is an important emerging discipline that has the huge potential to improve information organization, management and understanding. Ontology has become an important mean for structuring knowledge and building knowledge-intensive systems. The importance of domain ontologies is widely recognized, particularly in its relation to the expected advent of the Semantic Web. As the term refers to the shared understanding of some domains of interest, which is often conceived as a set of concepts, relations, functions, axioms and instances (Gruber, 1993), the goal of a domain ontology is to reduce the conceptual and terminological confusion among the members of a virtual community of users that need to share electronic documents and information of various kinds. According to Uschold and Jasper (1999), `An ontology may take a variety of forms, but necessarily it will include a vocabulary of terms, and some specification of their meaning. This includes definitions and an indication of how concepts are interrelated which collectively impose a structure on the domain and constrain the possible interpretations of terms.' Gruber (1993) defines ontology as `the specification of conceptualizations, used to help programs and humans share knowledge’. The conceptualization is the couching of knowledge about the world in terms of entities (things, the relationships they hold and the constraints between them). The specification is the representation of this conceptualization in a concrete form. One step in this specification is the encoding of the conceptualization in a knowledge representation language

    Towards context-sensitive domain of Islamic knowledge ontology extraction

    Get PDF
    Ontology is one of the essential topics in the scope of an important area of current computer science and Semantic Web. Ontologies present well defined, straightforward and standardized form of the repositories (vast and reliable knowledge) where it can be interoperable and machine understandable. There are many possible utilization of ontologies from automatic annotation of web resources to domain representation and reasoning task. Ontology is an effective conceptualism used for the semantic web. However there is none of the research try to construct an ontology from Islamic knowledge which consist of Holy Quran, Hadiths and etc. Therefore as a first stage, in this paper we try to propose a simple methodology in order to extract a concept based on Al-Quran. Finally, we discuss about the experiment that have been conducted
    corecore