Search CORE

24 research outputs found

Ontea: Platform for Pattern Based Automated Semantic Annotation

Author: Ciglan Marek
Hluchý Ladislav
Laclavík Michal
Šeleng Martin
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 26/01/2012
Field of study

Automated annotation of web documents is a key challenge of the Semantic Web effort. Semantic metadata can be created manually or using automated annotation or tagging tools. Automated semantic annotation tools with best results are built on various machine learning algorithms which require training sets. Other approach is to use pattern based semantic annotation solutions built on natural language processing, information retrieval or information extraction methods. The paper presents Ontea platform for automated semantic annotation or semantic tagging. Implementation based on regular expression patterns is presented with evaluation of results. Extensible architecture for integrating pattern based approaches is presented. Most of existing semi-automatic annotation solutions can not prove it real usage on large scale data such as web or email communication, but semantic web can be exploited only when computer understandable metadata will reach critical mass. Thus we also present approach to large scale pattern based annotation

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Semantic Annotation of Unstructured Documents Using Concepts Similarity

Author
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2017
Field of study

Crossref

Email Analysis and Information Extraction for Enterprise Benefit

Author: Balogh Zoltán
Dlugolinský Štefan
Gatial Emil
Hluchý Ladislav
Kvassay Marcel
Laclavík Michal
Šeleng Martin
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 26/01/2012
Field of study

In spite of rapid advances in multimedia and interactive technologies, enterprise users prefer to battle with email spam and overload rather than lose the benefits of communicating, collaborating and solving business tasks over email. Many aspects of email have significantly improved over time, but its overall integration with the enterprise environment remained practically the same. In this paper we describe and evaluate a light-weight approach to enterprise email communication analysis and information extraction. We provide several use cases exploiting the extracted information, such as the enrichment of emails with relevant contextual information, social network extraction and its subsequent search, creation of semantic objects as well as the relationship between email analysis and information extraction on one hand, and email protocols and email servers on the other. The proposed approach was partially tested on several small and medium enterprises (SMEs) and seems to be promising for enterprise interoperability and collaboration in SMEs that depend on emails to accomplish their daily business tasks

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Knowledge Discovery in Online Repositories: A Text Mining Approach

Author: Afolabi I. T.
Ayo C. K.
Musa G. A.
Sofoluwe A. B.
Publication venue: EuroJournals Publishing
Publication date: 01/01/2008
Field of study

Before the advent of the Internet, the newspapers were the prominent instrument of mobilization for independence and political struggles. Since independence in Nigeria, the political class has adopted newspapers as a medium of Political Competition and Communication. Consequently, most political information exists in unstructured form and hence the need to tap into it using text mining algorithm. This paper implements a text mining algorithm on some unstructured data format in some newspapers. The algorithm involves the following natural language processing techniques: tokenization, text filtering and refinement. As a follow-up to the natural language techniques, association rule mining technique of data mining is used to extract knowledge using the Modified Generating Association Rules based on Weighting scheme (GARW). The main contributions of the technique are that it integrates information retrieval scheme (Term Frequency Inverse Document Frequency) (for keyword/feature selection that automatically selects the most discriminative keywords for use in association rules generation) with Data Mining technique for association rules discovery. The program is applied to Pre-Election information gotten from the website of the Nigerian Guardian newspaper. The extracted association rules contained important features and described the informative news included in the documents collection when related to the concluded 2007 presidential election. The system presented useful information that could help sanitize the polity as well as protect the nascent democracy

Covenant University Repository

Discovering Relations by Entity Search in Lightweight Semantic Text Graphs

Author: Ciglan Marek
Dlugolinský Štefan
Laclavík Michal
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 10/02/2015
Field of study

Entity search is becoming a popular alternative for full text search. Recently Google released its entity search based on confirmed, human-generated data such as Wikipedia. In spite of these developments, the task of entity discovery, search, or relation search in unstructured text remains a major challenge in the fields of information retrieval and information extraction. This paper tries to address that challenge, focusing specifically on entity relation discovery. This is achieved by processing unstructured text using simple information extraction methods, building lightweight semantic graphs and reusing them for entity relation discovery by applying algorithms from graph theory. An important part is also user interaction with semantic graphs, which can significantly improve information extraction results and entity relation search. Entity relations can be discovered by various text mining methods, but the advantage of the presented method lies in the similarity between the lightweight semantics extracted from a text and the information networks available as structured data. Both graph structures have similar properties and similar relation discovery algorithms can be applied. In addition, we can benefit from the integration of such graph data. We provide both a relevance and performance evaluations of the approach and showcase it in several use case applications

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

News Extraction from Web Pages

Author: Blanár Štefan
Publication venue: Vysoká škola báňská - Technická univerzita Ostrava
Publication date: 01/01/2014
Field of study

Import 05/08/2014V této diplomové práci jsem se zabýval problematikou extrakce zpráv z webu, což je problém tématicky spadající pod dobře známý problém dolování strukturovaných dat z HTML dokumentů na Internetu. Vypracoval jsem průzkum různých stávajících přístupů k tomuto problému, který je shrnut na začátku této práce. Dále jsem se věnoval zkoumání stávajících wrapperů a jejich možných uplatnění při řešení problému extrakce zpráv z webu. Také jsem vypracoval rozsáhlé pozorování nejznámějších zpravodajských portálů a zpráv na nich. Poté jsem získané poznatky aplikoval při tvorbě vlastních řešení tohoto problému. Definoval jsem, co je zpráva a jak se liší od informace. Vlastní řešení jsem následně otestoval v reálných podmínkách na skutečných, dobře známých zpravodajských webových portálech. Výsledky tohoto testování jsou prezentované v závěru práce.The main goal of this diploma thesis is to perform large – scale research about text mining methods especially text mining of structured data from web, concrete from HTML documents, what is well-known problem. Results of this research will be summarized in fist part of this document. Next I probe a few web wrapper’s, especially I’ll try to find some existing wrapper, which could be used as solution for extraction news from web. I also perform an extensive observation of the most famous news portals and news on them. Finally acquired knowledge will be used for developing my own solution of problem extraction news from web pages. I’ll define what web news is and how they differs from information. Then I test my solution in real conditions on real well known news portals. All results of this testing will be presented in last chapter of this thesis.460 - Katedra informatikyvýborn

DSpace at VSB Technical University of Ostrava

Recommended from our members

Automatic message annotation and semantic interface for context aware mobile computing

Author: Al-Sultany Ghaidaa
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2012
Field of study

This thesis was submitted for the degree of Docter of Philosophy and awarded by Brunel University.In this thesis, the concept of mobile messaging awareness has been investigated by designing and implementing a framework which is able to annotate the short text messages with context ontology for semantic reasoning inference and classification purposes. The annotated metadata of text message keywords are identified and annotated with concepts, entities and knowledge that drawn from ontology without the need of learning process and the proposed framework supports semantic reasoning based messages awareness for categorization purposes. The first stage of the research is developing the framework of facilitating mobile communication with short text annotated messages (SAMS), which facilitates annotating short text message with part of speech tags augmented with an internal and external metadata. In the SAMS framework the annotation process is carried out automatically at the time of composing a message. The obtained metadata is collected from the device’s file system and the message header information which is then accumulated with the message’s tagged keywords to form an XML file, simultaneously. The significance of annotation process is to assist the proposed framework during the search and retrieval processes to identify the tagged keywords and The Semantic Web Technologies are utilised to improve the reasoning mechanism. Later, the proposed framework is further improved “Contextual Ontology based Short Text Messages reasoning (SOIM)”. SOIM further enhances the search capabilities of SAMS by adopting short text message annotation and semantic reasoning capabilities with domain ontology as Domain ontology is modeled into set of ontological knowledge modules that capture features of contextual entities and features of particular event or situation. Fundamentally, the framework SOIM relies on the hierarchical semantic distance to compute an approximated match degree of new set of relevant keywords to their corresponding abstract class in the domain ontology. Adopting contextual ontology leverages the framework performance to enhance the text comprehension and message categorization. Fuzzy Sets and Rough Sets theory have been integrated with SOIM to improve the inference capabilities and system efficiency. Since SOIM is based on the degree of similarity to choose the matched pattern to the message, the issue of choosing the best-retrieved pattern has arisen during the stage of decision-making. Fuzzy reasoning classifier based rules that adopt the Fuzzy Set theory for decision making have been applied on top of SOIM framework in order to increase the accuracy of the classification process with clearer decision. The issue of uncertainty in the system has been addressed by utilising the Rough Sets theory, in which the irrelevant and indecisive properties which affect the framework efficiency negatively have been ignored during the matching process.The Ministry of Higher Education and Scientific Research (IRAQ

Brunel University Research Archive

Comparing Instances of Ontological Concepts for Personalized Recommendation in Large Information Spaces

Author: Andrejko Anton
Bieliková Mária
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 26/01/2012
Field of study

We present a novel method for instance comparison of ontological concepts with regard to personalized content presentation and/or navigation in large information spaces. We assume that comparing properties of documents which users found interesting leads to discovery of information about users' interests specifically when considering Semantic Web applications where documents or their parts are represented by ontological concepts. We employ the ontology structure and different similarity metrics for datatype and object properties and investigate reasons behind user interest in the presented content. Moreover, we propose and evaluate an approach to instance similarity computation for a particular user while also considering the user's individual preferences

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Automatic message annotation and semantic interface for context aware mobile computing

Author: Al-Raweshidy H
Al-Sultany Ghaidaa Abdalhussein Billal
Li M
Publication venue
Publication date: 01/01/2012
Field of study

In this thesis, the concept of mobile messaging awareness has been investigated by designing and implementing a framework which is able to annotate the short text messages with context ontology for semantic reasoning inference and classification purposes. The annotated metadata of text message keywords are identified and annotated with concepts, entities and knowledge that drawn from ontology without the need of learning process and the proposed framework supports semantic reasoning based messages awareness for categorization purposes. The first stage of the research is developing the framework of facilitating mobile communication with short text annotated messages (SAMS), which facilitates annotating short text message with part of speech tags augmented with an internal and external metadata. In the SAMS framework the annotation process is carried out automatically at the time of composing a message. The obtained metadata is collected from the device’s file system and the message header information which is then accumulated with the message’s tagged keywords to form an XML file, simultaneously. The significance of annotation process is to assist the proposed framework during the search and retrieval processes to identify the tagged keywords and The Semantic Web Technologies are utilised to improve the reasoning mechanism. Later, the proposed framework is further improved “Contextual Ontology based Short Text Messages reasoning (SOIM)”. SOIM further enhances the search capabilities of SAMS by adopting short text message annotation and semantic reasoning capabilities with domain ontology as Domain ontology is modeled into set of ontological knowledge modules that capture features of contextual entities and features of particular event or situation. Fundamentally, the framework SOIM relies on the hierarchical semantic distance to compute an approximated match degree of new set of relevant keywords to their corresponding abstract class in the domain ontology. Adopting contextual ontology leverages the framework performance to enhance the text comprehension and message categorization. Fuzzy Sets and Rough Sets theory have been integrated with SOIM to improve the inference capabilities and system efficiency. Since SOIM is based on the degree of similarity to choose the matched pattern to the message, the issue of choosing the best-retrieved pattern has arisen during the stage of decision-making. Fuzzy reasoning classifier based rules that adopt the Fuzzy Set theory for decision making have been applied on top of SOIM framework in order to increase the accuracy of the classification process with clearer decision. The issue of uncertainty in the system has been addressed by utilising the Rough Sets theory, in which the irrelevant and indecisive properties which affect the framework efficiency negatively have been ignored during the matching process.EThOS - Electronic Theses Online ServiceMinistry of Higher Education and Scientific Research (Iraq)GBUnited Kingdo

OpenGrey Repository

IMPROVED INTEGRATED MINING OF HETEROGENEOUS DATA IN DECISION SUPPORT SYSTEMS

Author: Afolabi I. T.
Publication venue
Publication date: 01/03/2012
Field of study

Covenant University Repository