24 research outputs found

    Ontea: Platform for Pattern Based Automated Semantic Annotation

    Get PDF
    Automated annotation of web documents is a key challenge of the Semantic Web effort. Semantic metadata can be created manually or using automated annotation or tagging tools. Automated semantic annotation tools with best results are built on various machine learning algorithms which require training sets. Other approach is to use pattern based semantic annotation solutions built on natural language processing, information retrieval or information extraction methods. The paper presents Ontea platform for automated semantic annotation or semantic tagging. Implementation based on regular expression patterns is presented with evaluation of results. Extensible architecture for integrating pattern based approaches is presented. Most of existing semi-automatic annotation solutions can not prove it real usage on large scale data such as web or email communication, but semantic web can be exploited only when computer understandable metadata will reach critical mass. Thus we also present approach to large scale pattern based annotation

    Semantic Annotation of Unstructured Documents Using Concepts Similarity

    Get PDF

    Email Analysis and Information Extraction for Enterprise Benefit

    Get PDF
    In spite of rapid advances in multimedia and interactive technologies, enterprise users prefer to battle with email spam and overload rather than lose the benefits of communicating, collaborating and solving business tasks over email. Many aspects of email have significantly improved over time, but its overall integration with the enterprise environment remained practically the same. In this paper we describe and evaluate a light-weight approach to enterprise email communication analysis and information extraction. We provide several use cases exploiting the extracted information, such as the enrichment of emails with relevant contextual information, social network extraction and its subsequent search, creation of semantic objects as well as the relationship between email analysis and information extraction on one hand, and email protocols and email servers on the other. The proposed approach was partially tested on several small and medium enterprises (SMEs) and seems to be promising for enterprise interoperability and collaboration in SMEs that depend on emails to accomplish their daily business tasks

    Knowledge Discovery in Online Repositories: A Text Mining Approach

    Get PDF
    Before the advent of the Internet, the newspapers were the prominent instrument of mobilization for independence and political struggles. Since independence in Nigeria, the political class has adopted newspapers as a medium of Political Competition and Communication. Consequently, most political information exists in unstructured form and hence the need to tap into it using text mining algorithm. This paper implements a text mining algorithm on some unstructured data format in some newspapers. The algorithm involves the following natural language processing techniques: tokenization, text filtering and refinement. As a follow-up to the natural language techniques, association rule mining technique of data mining is used to extract knowledge using the Modified Generating Association Rules based on Weighting scheme (GARW). The main contributions of the technique are that it integrates information retrieval scheme (Term Frequency Inverse Document Frequency) (for keyword/feature selection that automatically selects the most discriminative keywords for use in association rules generation) with Data Mining technique for association rules discovery. The program is applied to Pre-Election information gotten from the website of the Nigerian Guardian newspaper. The extracted association rules contained important features and described the informative news included in the documents collection when related to the concluded 2007 presidential election. The system presented useful information that could help sanitize the polity as well as protect the nascent democracy

    Discovering Relations by Entity Search in Lightweight Semantic Text Graphs

    Get PDF
    Entity search is becoming a popular alternative for full text search. Recently Google released its entity search based on confirmed, human-generated data such as Wikipedia. In spite of these developments, the task of entity discovery, search, or relation search in unstructured text remains a major challenge in the fields of information retrieval and information extraction. This paper tries to address that challenge, focusing specifically on entity relation discovery. This is achieved by processing unstructured text using simple information extraction methods, building lightweight semantic graphs and reusing them for entity relation discovery by applying algorithms from graph theory. An important part is also user interaction with semantic graphs, which can significantly improve information extraction results and entity relation search. Entity relations can be discovered by various text mining methods, but the advantage of the presented method lies in the similarity between the lightweight semantics extracted from a text and the information networks available as structured data. Both graph structures have similar properties and similar relation discovery algorithms can be applied. In addition, we can benefit from the integration of such graph data. We provide both a relevance and performance evaluations of the approach and showcase it in several use case applications

    News Extraction from Web Pages

    Get PDF
    Import 05/08/2014V této diplomové práci jsem se zabýval problematikou extrakce zpráv z webu, což je problém tématicky spadající pod dobře známý problém dolování strukturovaných dat z HTML dokumentů na Internetu. Vypracoval jsem průzkum různých stávajících přístupů k tomuto problému, který je shrnut na začátku této práce. Dále jsem se věnoval zkoumání stávajících wrapperů a jejich možných uplatnění při řešení problému extrakce zpráv z webu. Také jsem vypracoval rozsáhlé pozorování nejznámějších zpravodajských portálů a zpráv na nich. Poté jsem získané poznatky aplikoval při tvorbě vlastních řešení tohoto problému. Definoval jsem, co je zpráva a jak se liší od informace. Vlastní řešení jsem následně otestoval v reálných podmínkách na skutečných, dobře známých zpravodajských webových portálech. Výsledky tohoto testování jsou prezentované v závěru práce.The main goal of this diploma thesis is to perform large – scale research about text mining methods especially text mining of structured data from web, concrete from HTML documents, what is well-known problem. Results of this research will be summarized in fist part of this document. Next I probe a few web wrapper’s, especially I’ll try to find some existing wrapper, which could be used as solution for extraction news from web. I also perform an extensive observation of the most famous news portals and news on them. Finally acquired knowledge will be used for developing my own solution of problem extraction news from web pages. I’ll define what web news is and how they differs from information. Then I test my solution in real conditions on real well known news portals. All results of this testing will be presented in last chapter of this thesis.460 - Katedra informatikyvýborn

    Comparing Instances of Ontological Concepts for Personalized Recommendation in Large Information Spaces

    Get PDF
    We present a novel method for instance comparison of ontological concepts with regard to personalized content presentation and/or navigation in large information spaces. We assume that comparing properties of documents which users found interesting leads to discovery of information about users' interests specifically when considering Semantic Web applications where documents or their parts are represented by ontological concepts. We employ the ontology structure and different similarity metrics for datatype and object properties and investigate reasons behind user interest in the presented content. Moreover, we propose and evaluate an approach to instance similarity computation for a particular user while also considering the user's individual preferences

    Automatic message annotation and semantic interface for context aware mobile computing

    Get PDF
    In this thesis, the concept of mobile messaging awareness has been investigated by designing and implementing a framework which is able to annotate the short text messages with context ontology for semantic reasoning inference and classification purposes. The annotated metadata of text message keywords are identified and annotated with concepts, entities and knowledge that drawn from ontology without the need of learning process and the proposed framework supports semantic reasoning based messages awareness for categorization purposes. The first stage of the research is developing the framework of facilitating mobile communication with short text annotated messages (SAMS), which facilitates annotating short text message with part of speech tags augmented with an internal and external metadata. In the SAMS framework the annotation process is carried out automatically at the time of composing a message. The obtained metadata is collected from the device’s file system and the message header information which is then accumulated with the message’s tagged keywords to form an XML file, simultaneously. The significance of annotation process is to assist the proposed framework during the search and retrieval processes to identify the tagged keywords and The Semantic Web Technologies are utilised to improve the reasoning mechanism. Later, the proposed framework is further improved “Contextual Ontology based Short Text Messages reasoning (SOIM)”. SOIM further enhances the search capabilities of SAMS by adopting short text message annotation and semantic reasoning capabilities with domain ontology as Domain ontology is modeled into set of ontological knowledge modules that capture features of contextual entities and features of particular event or situation. Fundamentally, the framework SOIM relies on the hierarchical semantic distance to compute an approximated match degree of new set of relevant keywords to their corresponding abstract class in the domain ontology. Adopting contextual ontology leverages the framework performance to enhance the text comprehension and message categorization. Fuzzy Sets and Rough Sets theory have been integrated with SOIM to improve the inference capabilities and system efficiency. Since SOIM is based on the degree of similarity to choose the matched pattern to the message, the issue of choosing the best-retrieved pattern has arisen during the stage of decision-making. Fuzzy reasoning classifier based rules that adopt the Fuzzy Set theory for decision making have been applied on top of SOIM framework in order to increase the accuracy of the classification process with clearer decision. The issue of uncertainty in the system has been addressed by utilising the Rough Sets theory, in which the irrelevant and indecisive properties which affect the framework efficiency negatively have been ignored during the matching process.EThOS - Electronic Theses Online ServiceMinistry of Higher Education and Scientific Research (Iraq)GBUnited Kingdo
    corecore