4 research outputs found

    A Semi-Automatic and Low-Cost Method to Learn Patterns for Named Entity Recognition

    No full text
    Named Entity Recognition is a basic task in Information Extraction that aims at identifying entities of interest within full text documents. The patterns used to recognize entities can be rule based, as in the popular JAPE system. However, hand-crafting effective patterns is often difficult, and yet there is little research devoted to methods capable of learning human-readable patterns, possibly with arbitrary sets of features. In this paper, we present a semi-Automatic method to generate both regular expressions and a subset of the JAPE language. It does not need a corpus annotated beforehand. Instead, it employs active learning and combines clustering with an algorithm that finds alignments between symbols present in the entities discovered during the learning process. The method currently supports a fixed set of character features and an arbitrary set of token features, but it can incorporate other kinds of features as well. Through several experiments with an English corpus, we show the ability of the method to generate effective patterns at a low annotation cost, and how it can successfully help in the annotation of brand new corpora.Accepted author manuscriptMultimedia Computin

    Development and Evaluation of a Holistic, Cloud-driven and Microservices-based Architecture for Automated Semantic Annotation of Web Documents

    Get PDF
    The Semantic Web is based on the concept of representing information on the web such that computers can both understand and process them. This implies defining context for web information to give them a well-defined meaning. Semantic Annotation defines the process of adding annotation data to web information for the much-needed context. However, despite several solutions and techniques for semantic annotation, it is still faced with challenges which have hindered the growth of the semantic web. With recent significant technological innovations such as Cloud Computing, Internet of Things as well as Mobile Computing and their various integrations with semantic technologies to proffer solutions in IT, little has been done towards leveraging these technologies to address semantic annotation challenges. Hence, this research investigates leveraging cloud computing paradigm to address some semantic annotation challenges, with focus on an automated system for providing semantic annotation as a service. Firstly, considering the current disparate nature observable with most semantic annotation solutions, a holistic perspective to semantic annotation is proposed based on a set of requirements. Then, a capability assessment towards the feasibility of leveraging cloud computing is conducted which produces a Cloud Computing Capability Model for Holistic Semantic Annotation. Furthermore, an investigation into application deployment patterns in the cloud and how they relate to holistic semantic annotation was conducted. A set of determinant factors that define different patterns for application deployment in the cloud were identified and these resulted into the development of a Cloud Computing Maturity Model and the conceptualisation of a “Cloud-Driven” development methodology for holistic semantic annotation in the cloud. Some key components of the “Cloud-Driven” concept include Microservices, Operating System-Level Virtualisation and Orchestration. With the role Microservices Software Architectural Patterns play towards developing solutions that can fully maximise cloud computing benefits; CloudSea: a holistic, cloud-driven and microservices-based architecture for automated semantic annotation of web documents is proposed as a novel approach to semantic annotation. The architecture draws from the theory of “Design Patterns” in Software Engineering towards its design and development which subsequently resulted into the development of twelve Design Patterns and a Pattern Language for Holistic Semantic Annotation, based on the CloudSea architectural design. As proof-of-concept, a prototype implementation for CloudSea was developed and deployed in the cloud based on the “Cloud-Driven” methodology and a functionality evaluation was carried out on it. A comparative evaluation of the CloudSea architecture was also conducted in relation to current semantic annotation solutions; both proposed in academic literature and existing as industry solutions. In addition, to evaluate the proposed Cloud Computing Maturity Model for Holistic Semantic Annotation, an experimental evaluation of the model was conducted by developing and deploying six instances of the prototype and deploying them differently, based on the patterns described in the model. This empirical investigation was implemented by testing the instances for performance through series of API load tests and results obtained confirmed the validity of both the “Cloud-Driven” methodology and the entire model

    Development and Evaluation of a Holistic, Cloud-driven and Microservices-based Architecture for Automated Semantic Annotation of Web Documents

    Get PDF
    The Semantic Web is based on the concept of representing information on the web such that computers can both understand and process them. This implies defining context for web information to give them a well-defined meaning. Semantic Annotation defines the process of adding annotation data to web information for the much-needed context. However, despite several solutions and techniques for semantic annotation, it is still faced with challenges which have hindered the growth of the semantic web. With recent significant technological innovations such as Cloud Computing, Internet of Things as well as Mobile Computing and their various integrations with semantic technologies to proffer solutions in IT, little has been done towards leveraging these technologies to address semantic annotation challenges. Hence, this research investigates leveraging cloud computing paradigm to address some semantic annotation challenges, with focus on an automated system for providing semantic annotation as a service. Firstly, considering the current disparate nature observable with most semantic annotation solutions, a holistic perspective to semantic annotation is proposed based on a set of requirements. Then, a capability assessment towards the feasibility of leveraging cloud computing is conducted which produces a Cloud Computing Capability Model for Holistic Semantic Annotation. Furthermore, an investigation into application deployment patterns in the cloud and how they relate to holistic semantic annotation was conducted. A set of determinant factors that define different patterns for application deployment in the cloud were identified and these resulted into the development of a Cloud Computing Maturity Model and the conceptualisation of a “Cloud-Driven” development methodology for holistic semantic annotation in the cloud. Some key components of the “Cloud-Driven” concept include Microservices, Operating System-Level Virtualisation and Orchestration. With the role Microservices Software Architectural Patterns play towards developing solutions that can fully maximise cloud computing benefits; CloudSea: a holistic, cloud-driven and microservices-based architecture for automated semantic annotation of web documents is proposed as a novel approach to semantic annotation. The architecture draws from the theory of “Design Patterns” in Software Engineering towards its design and development which subsequently resulted into the development of twelve Design Patterns and a Pattern Language for Holistic Semantic Annotation, based on the CloudSea architectural design. As proof-of-concept, a prototype implementation for CloudSea was developed and deployed in the cloud based on the “Cloud-Driven” methodology and a functionality evaluation was carried out on it. A comparative evaluation of the CloudSea architecture was also conducted in relation to current semantic annotation solutions; both proposed in academic literature and existing as industry solutions. In addition, to evaluate the proposed Cloud Computing Maturity Model for Holistic Semantic Annotation, an experimental evaluation of the model was conducted by developing and deploying six instances of the prototype and deploying them differently, based on the patterns described in the model. This empirical investigation was implemented by testing the instances for performance through series of API load tests and results obtained confirmed the validity of both the “Cloud-Driven” methodology and the entire model
    corecore