9 research outputs found

    Ontology Driven Web Extraction from Semi-structured and Unstructured Data for B2B Market Analysis

    No full text
    The Market Blended Insight project1 has the objective of improving the UK business to business marketing performance using the semantic web technologies. In this project, we are implementing an ontology driven web extraction and translation framework to supplement our backend triple store of UK companies, people and geographical information. It deals with both the semi-structured data and the unstructured text on the web, to annotate and then translate the extracted data according to the backend schema

    Analysing the problem and main approaches for ontology population

    Get PDF
    Knowledge systems are a suitable computational approach to solve complex problems and to provide decision support. Ontologies are an approach for knowledge representation and Ontology Population looks for instantiating the constituent elements of an ontology, like properties and non-taxonomic relationships. Manual population by domain experts and knowledge engineers is an expensive and time consuming task. Thus, automatic or semi-automatic approaches are needed. This paper discusses the problem of Automatic Ontology Population and proposes a generic process specifying its phases and what kind of techniques can be used to perform the activities of each phase. Some techniques representing the state of the art of this field are also described along with the solutions they adopt for each phase of the AOP process with their advantages and limitations. This work is part of HERMES, a Brazil/Portugal research cooperation project looking for techniques and tools for automating the process of ontology learning and population.This work is supported by CNPq, CAPES and FAPEMA, research funding agencies of the Brazilian government

    Ontology validation algorithm on data driven approach and vocabulary aspect

    Get PDF
    Ontology evaluation is required before using the ontology within applications. Similar with software practice, the purpose of ontology evaluation is to identify the achievement of requirement criteria. Users who require coverage criteria often seeking ontology that contain the terms related to their focused domain knowledge. Users encounter the difficulty to select a suitable ontology from variety of ontology evaluation approaches. Conceptualization of information related to ontology evaluation helps to identify the important component within ontology that helps towards coverage criteria achievement. This work proposes an algorithm to extract ontology documents gained from public ontology repositories like Falcons into its vocabulary parts focused on classes and literals. The algorithm then processes the extracted ontology components with similarity algorithm and later displays the result on the coverage match of ontology with provided terms and the terms that are synonym expanded using WordNet

    Ontology-based information extraction from learning management systems

    Get PDF
    In this work we present a system for information extraction from Learning Management Systems. This system is ontology-based. It retrieves information according to the structure of the ontology to populate the ontology. We graphically present statistics about the ontology data. These statistics present latent knowledge which is difficult to see in the traditional Learning Management System. To answer questions about the ontology, a question answering system was developed using Natural Language Processing in the conversion of the natural language question into an ontology query language; Sumário: Extração de Informação de Sistemas de Gestão para Educação Usando Ontologias Neste dissertação apresentamos um sistema de extracção de informação de sistemas de gestão para educação (Learning Management Systems). Este sistema é baseado em ontologias e extrai informação de acordo com a estrutura da ontologia para a popular. Também permite apresentar graficamente algumas estatísticas sobre os dados da ontologia. Estas estatísticas revelam o conhecimento latente que é difícil de ver num sistema tradicional de gestão para a educação. Para poder responder a perguntas sobre os dados da ontologia, um sistema de resposta automática a perguntas em língua natural foi desenvolvido usando Processamento de Língua Natural para converter as perguntas para linguagem de interrogação de ontologias

    Model-based Approach for Product Requirement Representation and Generation in Product Lifecycle Management

    Get PDF
    The requirement specification is an official documentation activity, which is a collection of certain information to specify the product and its life-cycle activities in terms of functions, features, performance, constraints, production, maintenance, disposal process, etc. It contains mainly two phases; product requirement generation and representation. Appropriate criteria for the product design and further life-cycle activities are determined based on the requirement specification as well as the interrelations of product requirements with other life-cycle information such as; materials, manufacturing, working environments, finance, and regulations. The determination of these criteria is normally error-prone. It is difficult to identify and maintain the completeness and consistency of the requirement information across the product life-cycle. Product requirements are normally expressed in abstract and conceptual terms with document base representation which yields unstructured and heterogeneous information base and it is unsuitable for intelligent machine interpretations. Most of the time determination of the requirements and development of the requirement specification documents are performed by the designers/engineers based on their own experiences that might lead to incompleteness and inconsistency. This research work proposes a unique model-based product requirement representation and generation architecture to aid designers/engineers to specify product requirements across the product life-cycle. A requirement knowledge management architecture is developed to enhance the capabilities of the current Product Life-cycle Management (PLM) platforms in terms of product requirement representation and generation. After a systematic study on the categorization of product requirements, an ontological framework is developed for the specification of the requirements and related product life-cycle domain information. The ontological framework is embedded in an existing PLM system. A computational platform is developed and integrated into the PLM system for the intelligent machine processing of the product requirements and related information. This architecture supports product requirement representation in terms of the ontological framework and further information retrieval, inference, and requirement text generation activities

    Modelo de acesso a fontes em linguagem natural no governo electrónico

    Get PDF
    Doutoramento em Engenharia InformáticaFor the actual existence of e-government it is necessary and crucial to provide public information and documentation, making its access simple to citizens. A portion, not necessarily small, of these documents is in an unstructured form and in natural language, and consequently outside of which the current search systems are generally able to cope and effectively handle. Thus, in thesis, it is possible to improve access to these contents using systems that process natural language and create structured information, particularly if supported in semantics. In order to put this thesis to test, this work was developed in three major phases: (1) design of a conceptual model integrating the creation of structured information and making it available to various actors, in line with the vision of e-government 2.0; (2) definition and development of a prototype instantiating the key modules of this conceptual model, including ontology based information extraction supported by examples of relevant information, knowledge management and access based on natural language; (3) assessment of the usability and acceptability of querying information as made possible by the prototype - and in consequence of the conceptual model - by users in a realistic scenario, that included comparison with existing forms of access. In addition to this evaluation, at another level more related to technology assessment and not to the model, evaluations were made on the performance of the subsystem responsible for information extraction. The evaluation results show that the proposed model was perceived as more effective and useful than the alternatives. Associated with the performance of the prototype to extract information from documents, comparable to the state of the art, results demonstrate the feasibility and advantages, with current technology, of using natural language processing and integration of semantic information to improve access to unstructured contents in natural language. The conceptual model and the prototype demonstrator intend to contribute to the future existence of more sophisticated search systems that are also more suitable for e-government. To have transparency in governance, active citizenship, greater agility in the interaction with the public administration, among others, it is necessary that citizens and businesses have quick and easy access to official information, even if it was originally created in natural language.Para a efectiva existência de governo electrónico é necessário e crucial a disponibilização de informação e documentação pública e tornar simples o acesso a esta pelos cidadãos. Uma parte, não necessariamente pequena, destes documentos encontra-se sob uma forma não estruturada e em linguagem natural e, consequentemente, fora do que os sistemas de pesquisa actuais conseguem em geral suportar e disponibilizar eficazmente. Assim, em tese, é possível melhorar o acesso a estes conteúdos com recurso a sistemas que processem linguagem natural e que sejam capazes de criar informação estruturada, em especial se suportados numa semântica. Com o objectivo de colocar esta tese à prova, o desenvolvimento deste trabalho integrou três grandes fases ou vertentes: (1) Criação de um modelo conceptual integrando a criação de informação estruturada e a sua disponibilização para vários actores, alinhado com a visão do governo electrónico 2.0; (2) Definição e desenvolvimento de um protótipo instanciando os módulos essenciais deste modelo conceptual, nomeadamente a extracção de informação suportada em ontologias e exemplos de informação relevante, gestão de conhecimento e acesso baseado em linguagem natural; (3) Uma avaliação de usabilidade e aceitabilidade da consulta à informação tornada possível pelo protótipo – e em consequência do modelo conceptual - por utilizadores num cenário realista e que incluiu comparação com formas de acesso existentes. Além desta avaliação, a outro nível, mais relacionado com avaliação de tecnologias e não do modelo, foram efectuadas avaliações do desempenho do subsistema responsável pela extracção de informação. Os resultados da avaliação mostram que o modelo proposto foi percepcionado como mais eficaz e mais útil que as alternativas. Associado ao desempenho do protótipo a extrair informação dos documentos, comparável com o estado da arte, os resultados obtidos mostram a viabilidade e as vantagens, com a tecnologia actual, de utilizar processamento de linguagem natural e integração de informação semântica para melhorar acesso a conteúdos em linguagem natural e não estruturados. O modelo conceptual e o protótipo demonstrador pretendem contribuir para a existência futura de sistemas de pesquisa mais sofisticados e adequados ao governo electrónico. Para existir transparência na governação, cidadania activa, maior agilidade na interacção com a administração pública, entre outros, é necessário que cidadãos e empresas tenham acesso rápido e fácil a informação oficial, mesmo que ela tenha sido originalmente criada em linguagem natural

    Multilinguisation d'ontologies dans le cadre de la recherche d'information translingue dans des collections d'images accompagnées de textes spontanés

    Get PDF
    Le Web est une source proliférante d'objets multimédia, décrits dans différentes langues natu- relles. Afin d'utiliser les techniques du Web sémantique pour la recherche de tels objets (images, vidéos, etc.), nous proposons une méthode d'extraction de contenu dans des collections de textes multilingues, paramétrée par une ou plusieurs ontologies. Le processus d'extraction est utilisé pour indexer les objets multimédia à partir de leur contenu textuel, ainsi que pour construire des requêtes formelles à partir d'énoncés spontanés. Il est basé sur une annotation interlingue des textes, conservant les ambiguïtés de segmentation et la polysémie dans des graphes. Cette première étape permet l'utilisation de processus de désambiguïsation factorisés au niveau d'un lexique pivot (de lexèmes interlingues). Le passage d'une ontologie en paramètre du système se fait en l'alignant de façon automatique avec le lexique interlingue. Il est ainsi possible d'utiliser des ontologies qui n'ont pas été conçues pour une utilisation multilingue, et aussi d'ajouter ou d'étendre l'ensemble des langues et leurs couvertures lexicales sans modifier les ontologies. Un démonstrateur pour la recherche multilingue d'images, développé pour le projet ANR OMNIA, a permis de concrétiser les approches proposées. Le passage à l'échelle et la qualité des annotations produites ont ainsi pu être évalués.The World Wide Web is a proliferating source of multimedia objects described using various natural languages. In order to use semantic Web techniques for retrieval of such objects (images, videos, etc.), we propose a content extraction method in multilingual text collections, using one or several ontologies as parameters. The content extraction process is used on the one hand to index multimedia objects using their textual content, and on the other to build formal requests from spontaneous user requests. The process is based on an interlingual annotation of texts, keeping ambiguities (polysemy and segmentation) in graphs. This first step allows using common desambiguation processes at th elevel of a pivot langage (interlingual lexemes). Passing an ontology as a parameter of the system is done by aligning automatically its elements with the interlingual lexemes of the pivot language. It is thus possible to use ontologies that have not been built for a specific use in a multilingual context, and to extend the set of languages and their lexical coverages without modifying the ontologies. A demonstration software for multilingual image retrieval has been built with the proposed approach in the framework of the OMNIA ANR project, allowing to implement the proposed approaches. It has thus been possible to evaluate the scalability and quality of annotations produiced during the retrieval process.SAVOIE-SCD - Bib.électronique (730659901) / SudocGRENOBLE1/INP-Bib.électronique (384210012) / SudocGRENOBLE2/3-Bib.électronique (384219901) / SudocSudocFranceF

    ontoX- A Method for Ontology-Driven Information Extraction

    No full text
    Abstract. Information Extraction (IE) is an important research field within the Artificial Intelligence community, for it tries to extract relevant information out of vast amounts of data. In this paper, we propose an extraction method that utilises the content and pre-defined semantics of ontologies formulated in the Web Ontology Language (OWL) to perform the extraction task. We also propose our method to detect outof-date constructs in the ontology to suggest changes to the user. After stating the results of our evaluation, we conclude that the use of ontologies in conjunction with IESs can indeed yield feasible results and contribute to the better scalability and portability of the system

    Des spécifications en langage naturel aux spécifications formelles via une ontologie comme modèle pivot

    Get PDF
    Le développement d'un système a pour objectif de répondre à des exigences. Aussi, le succès de sa réalisation repose en grande partie sur la phase de spécification des exigences qui a pour vocation de décrire de manière précise et non ambiguë toutes les caractéristiques du système à développer.Les spécifications d'exigences sont le résultat d'une analyse des besoins faisant intervenir différentes parties. Elles sont généralement rédigées en langage naturel (LN) pour une plus large compréhension, ce qui peut mener à diverses interprétations, car les textes en LN peuvent contenir des ambiguïtés sémantiques ou des informations implicites. Il n'est donc pas aisé de spécifier un ensemble complet et cohérent d'exigences. D'où la nécessité d'une vérification formelle des spécifications résultats.Les spécifications LN ne sont pas considérées comme formelles et ne permettent pas l'application directe de méthodes vérification formelles.Ce constat mène à la nécessité de transformer les spécifications LN en spécifications formelles.C'est dans ce contexte que s'inscrit cette thèse.La difficulté principale d'une telle transformation réside dans l'ampleur du fossé entre spécifications LN et spécifications formelles.L'objectif de mon travail de thèse est de proposer une approche permettant de vérifier automatiquement des spécifications d'exigences utilisateur, écrites en langage naturel et décrivant le comportement d'un système.Pour cela, nous avons exploré les possibilités offertes par un modèle de représentation fondé sur un formalisme logique.Nos contributions portent essentiellement sur trois propositions :1) une ontologie en OWL-DL fondée sur les logiques de description, comme modèle de représentation pivot permettant de faire le lien entre spécifications en langage naturel et spécifications formelles; 2) une approche d'instanciation du modèle de représentation pivot, fondée sur une analyse dirigée par la sémantique de l'ontologie, permettant de passer automatiquement des spécifications en langage naturel à leur représentation conceptuelle; et 3) une approche exploitant le formalisme logique de l'ontologie, pour permettre un passage automatique du modèle de représentation pivot vers un langage de spécifications formelles nommé Maude.The main objective of system development is to address requirements. As such, success in its realisation is highly dependent on a requirement specification phase which aims to describe precisely and unambiguously all the characteristics of the system that should be developed. In order to arrive at a set of requirements, a user needs analysis is carried out which involves different parties (stakeholders). The system requirements are generally written in natural language to garantuee a wider understanding. However, since NL texts can contain semantic ambiguities, implicit information, or other inconsistenties, this can lead to diverse interpretations. Hence, it is not easy to specify a set of complete and consistent requirements, and therefore, the specified requirements must be formally checked. Specifications written in NL are not considered to be formal and do not allow for a direct application of formal methods. We must therefore transform NL requirements into formal specifications. The work presented in this thesis was carried out in this framework. The main difficulty of such transformation is the gap between NL requirements and formal specifications. The objective of this work is to propose an approach for an automatic verification of user requirements which are written in natural language and describe a system's expected behaviour. Our approach uses the potential offered by a representation model based on a logical formalism. Our contribution has three main aspects: 1) an OWL-DL ontology based on description logic, used as a pivot representation model that serves as a link between NL requirements to formal specifications; 2) an approach for the instantiation of the pivot ontology, which allows an automatic transformation of NL requirements to their conceptual representations; and 3) an approach exploiting the logical formalism of the ontology in order to automatically translate the ontology into a formal specification language called Maude.PARIS11-SCD-Bib. électronique (914719901) / SudocSudocFranceF
    corecore