15 research outputs found

    KACST Arabic Text Classification Project: Overview and Preliminary Results

    No full text
    Electronically formatted Arabic free-texts can be found in abundance these days on the World Wide Web, often linked to commercial enterprises and/or government organizations. Vast tracts of knowledge and relations lie hidden within these texts, knowledge that can be exploited once the correct intelligent tools have been identified and applied. For example, text mining may help with text classification and categorization. Text classification aims to automatically assign text to a predefined category based on identifiable linguistic features. Such a process has different useful applications including, but not restricted to, E-Mail spam detection, web pages content filtering, and automatic message routing. In this paper an overview of King Abdulaziz City for Science and Technology (KACST) Arabic Text Classification Project will be illustrated along with some preliminary results. This project will contribute to the better understanding and elaboration of Arabic text classification techniques

    Fast web page categorization without the web page

    Get PDF
    he World Wide Web has enormously increased day by day. Hence it is necessary for classifying the web pages. We demonstrate the usefulness of the uniform resource locator (URL) alone in performing web page classification. This approach is faster than typical web page classification, as the pages do not have to be fetched and analyzed Uniform Resource Locators (URLs) mark the address of the resource on the World Wide Web, are often humanreadable can indicate metadata about the resource[11]. Our approach segments the URL into meaningful tokens. We construct a binary tree for the entire set of tokens used in the hyperlinks and use J48 classifier. Our results show that in certain scenarios, URL-based methods approach show better performance

    Ontology network analysis for safety learning in the railway domain

    Get PDF
    Ontologies have been used in diverse areas such as Knowledge Management (KM), Artificial Intelligence (AI), Natural Language Processing (NLP) and Semantic Web as they allow software applications to integrate, query and reason about concepts and relations within a knowledge domain. For Big Data Risk Analysis (BDRA) in railways, ontologies are a key enabler for obtaining valuable insights into safety from the large amount of data available from the railway. Traditionally, the ontology building has been an entirely manual process that has required a considerable human effort and development time. During the last decade, the in-formation explosion due to the Internet and the need to develop large-scale methods to extract patterns in a systematic way, has given rise the research area of “ontology learning”. Despite recent research efforts, ontol-ogy learning systems are still struggling with extracting terms (words or multiple-words) from text-based data. This manuscript explores the benefits of visual analytics to support the construction of ontologies for a particular part of railway safety management: possessions. In railways, possession operations are the protection arrangements for engineering work that ensure track workers remain separated from moving trains. A network of terms from possession operations standards is represented to extract the concepts of the ontology that enable the safety learning from events related to possession operations

    Biocom_Usp: tweet sentiment analysis with adaptive boosting ensemble

    Get PDF
    We describe our approach for the SemEval-2014 task 9: Sentiment Analysis in Twitter. We make use of an ensemble learning method for sentimento classification of tweets that relies on varied features such as feature hashing, part-of-speech, and lexical features. Our system was evaluated in the Twitter message-level task.CAPESFAPESPCNP

    Veröffentlichungen und VortrĂ€ge 2004 der Mitglieder der FakultĂ€t fĂŒr Informatik

    Get PDF

    Automatic message annotation and semantic interface for context aware mobile computing

    Get PDF
    In this thesis, the concept of mobile messaging awareness has been investigated by designing and implementing a framework which is able to annotate the short text messages with context ontology for semantic reasoning inference and classification purposes. The annotated metadata of text message keywords are identified and annotated with concepts, entities and knowledge that drawn from ontology without the need of learning process and the proposed framework supports semantic reasoning based messages awareness for categorization purposes. The first stage of the research is developing the framework of facilitating mobile communication with short text annotated messages (SAMS), which facilitates annotating short text message with part of speech tags augmented with an internal and external metadata. In the SAMS framework the annotation process is carried out automatically at the time of composing a message. The obtained metadata is collected from the device’s file system and the message header information which is then accumulated with the message’s tagged keywords to form an XML file, simultaneously. The significance of annotation process is to assist the proposed framework during the search and retrieval processes to identify the tagged keywords and The Semantic Web Technologies are utilised to improve the reasoning mechanism. Later, the proposed framework is further improved “Contextual Ontology based Short Text Messages reasoning (SOIM)”. SOIM further enhances the search capabilities of SAMS by adopting short text message annotation and semantic reasoning capabilities with domain ontology as Domain ontology is modeled into set of ontological knowledge modules that capture features of contextual entities and features of particular event or situation. Fundamentally, the framework SOIM relies on the hierarchical semantic distance to compute an approximated match degree of new set of relevant keywords to their corresponding abstract class in the domain ontology. Adopting contextual ontology leverages the framework performance to enhance the text comprehension and message categorization. Fuzzy Sets and Rough Sets theory have been integrated with SOIM to improve the inference capabilities and system efficiency. Since SOIM is based on the degree of similarity to choose the matched pattern to the message, the issue of choosing the best-retrieved pattern has arisen during the stage of decision-making. Fuzzy reasoning classifier based rules that adopt the Fuzzy Set theory for decision making have been applied on top of SOIM framework in order to increase the accuracy of the classification process with clearer decision. The issue of uncertainty in the system has been addressed by utilising the Rough Sets theory, in which the irrelevant and indecisive properties which affect the framework efficiency negatively have been ignored during the matching process.EThOS - Electronic Theses Online ServiceMinistry of Higher Education and Scientific Research (Iraq)GBUnited Kingdo
    corecore