247 research outputs found

    CoMMA Corporate Memory Management through Agents Corporate Memory Management through Agents: The CoMMA project final report

    Get PDF
    This document is the final report of the CoMMA project. It gives an overview of the different search activities that have been achieved through the project. First, a description of the general requirements is proposed through the definition of two scenarios. Then it shows the different technical aspects of the projects and the solution that has been proposed and implemented

    Knowledge Organization and Terminology: application to Cork

    Get PDF
    This PhD thesis aims to prove the relevance of texts within the conceptual strand of terminological work. Our methodology serves to demonstrate how linguists can infer knowledge information from texts and subsequently systematise it, either through semiformal or formal representations. We mainly focus on the terminological analysis of specialised corpora resorting to semi-automatic tools for text analysis to systematise lexical-semantic relationships observed in specialised discourse context and subsequent modelling of the underlying conceptual system. The ultimate goal of this methodology is to propose a typology that can help lexicographers to write definitions. Based on the double dimension of Terminology, we hypothesise that text and logic modelling do not go hand in hand since the latter does not directly relate to the former. We highlight that knowledge and language are crucial for knowledge systematisation, albeit keeping in mind that they pertain to different levels of analysis, for they are not isomorphic. To meet our goals, we resorted to specialised texts produced within the industry of cork. These texts provide us with a test bed made of knowledge-rich data which enable us to demonstrate our deductive mechanisms employing the Aristotelian formula: X=Y+DC through the linguistic and conceptual analysis of the semi-automatically extracted textual data. To explore the corpus, we resorted to text mining strategies where regular expressions play a central role. The final goal of this study is to create a terminological resource for the cork industry, where two types of resources interlink, namely the CorkCorpus and the OntoCork. TermCork is a project that stems from the organisation of knowledge in the specialised field of cork. For that purpose, a terminological knowledge database is being developed to feed an e-dictionary. This e-dictionary is designed as a multilingual and multimodal product, where several resources, namely linguistic and conceptual ones are paired. OntoCork is a micro domain-ontology where the concepts are enriched with natural language definitions and complemented with images, either annotated with metainformation or enriched with hyperlinks to additional information, such as a lexicographic resource. This type of e-dictionary embodies what we consider a useful terminological tool in the current digital information society: accounting for its main features, along with an electronic format that can be integrated into the Semantic Web due to its interoperability data format. This aspect emphasises its contribution to reduce ambiguity as much as possible and to increase effective communication between experts of the domain, future experts, and language professionals.Cette thèse vise à prouver la pertinence des textes dans le volet conceptuel du travail terminologique. Notre méthodologie sert à démontrer comment les linguistes peuvent déduire des informations de connaissance à partir de textes et les systématiser par la suite, soit à travers des représentations semi-formelles ou formelles. Nous nous concentrons principalement sur l'analyse terminologique de corpus spécialisé faisant appel à des outils semi-automatiques d'analyse de texte pour systématiser les relations lexico-sémantiques observées dans un contexte de discours spécialisé et la modélisation ultérieure du système conceptuel sous-jacent. L’objectif de cette méthodologie est de proposer une typologie qui peut aider les lexicographes à rédiger des définitions. Sur la base de la double dimension de la terminologie, nous émettons l'hypothèse que la modélisation textuelle et logique ne va pas de pair puisque cette dernière n'est pas directement liée à la première. Nous soulignons que la connaissance et le langage sont essentiels pour la systématisation des connaissances, tout en gardant à l'esprit qu'ils appartiennent à différents niveaux d'analyse, car ils ne sont pas isomorphes. Pour atteindre nos objectifs, nous avons eu recours à des textes spécialisés produits dans l'industrie du liège. Ces textes nous fournissent un banc d'essai constitué de données riches en connaissances qui nous permettent de démontrer nos mécanismes déductifs utilisant la formule aristotélicienne : X = Y + DC à travers l'analyse linguistique et conceptuelle des données textuelles extraites semi-automatiquement. Pour l'exploitation du corpus, nous avons recours à des stratégies de text mining où les expressions régulières jouent un rôle central. Le but de cette étude est de créer une ressource terminologique pour l'industrie du liège, où deux types de ressources sont liés, à savoir le CorkCorpus et l'OntoCork. TermCork est un projet qui découle de l'organisation des connaissances dans le domaine spécialisé du liège. À cette fin, une base de données de connaissances terminologiques est en cours de développement pour alimenter un dictionnaire électronique. Cet edictionnaire est conçu comme un produit multilingue et multimodal, où plusieurs ressources, à savoir linguistiques et conceptuelles, sont jumelées. OntoCork est une micro-ontologie de domaine où les concepts sont enrichis de définitions de langage naturel et complétés par des images, annotées avec des méta-informations ou enrichies d'hyperliens vers des informations supplémentaires. Ce type de dictionnaire électronique désigne ce que nous considérons comme un outil terminologique utile dans la société de l'information numérique actuelle : la prise en compte de ses principales caractéristiques, ainsi qu'un format électronique qui peut être intégré dans le Web sémantique en raison de son format de données d'interopérabilité. Cet aspect met l'accent sur sa contribution à réduire autant que possible l'ambiguïté et à accroître l'efficacité de la communication entre les experts du domaine, les futurs experts et les professionnels de la langue

    From Semantic Search & Integration to Analytics

    Get PDF

    Corporate Memory Management through Agents: The CoMMA project final report

    Get PDF
    This document is the final report of the CoMMA project. It gives an overview of the different search activities that have been achieved through the project. First, a description of the general requirements is proposed through the definition of two scenarios. Then it shows the different technical aspects of the projects and the solution that has been proposed and implemented

    Ontology Based E-Learning Systems in the Semantic Web

    Get PDF
    The Semantic Web is a collaborative movement led by the international standards body, the World Wide Web Consortium (W3C). Semantic web is an extension of the current web that provides an easier way to find, share, reuse and combine information. Ontology formally represents knowledge as a set of concepts within a domain, and the relationships between pairs of concepts. It can be used to model a domain and support reasoning about concepts. This work shows how Ontology takes an important place in E-learning System. Ontology is used to classify the things which are needed in E-Learning Systems. This work will be very useful for students those who are more interested in data mining. And I have used Very simple words to understand the concept of Ontology in E-Learning Systems.

    Semi-automated Ontology Generation for Biocuration and Semantic Search

    Get PDF
    Background: In the life sciences, the amount of literature and experimental data grows at a tremendous rate. In order to effectively access and integrate these data, biomedical ontologies – controlled, hierarchical vocabularies – are being developed. Creating and maintaining such ontologies is a difficult, labour-intensive, manual process. Many computational methods which can support ontology construction have been proposed in the past. However, good, validated systems are largely missing. Motivation: The biocuration community plays a central role in the development of ontologies. Any method that can support their efforts has the potential to have a huge impact in the life sciences. Recently, a number of semantic search engines were created that make use of biomedical ontologies for document retrieval. To transfer the technology to other knowledge domains, suitable ontologies need to be created. One area where ontologies may prove particularly useful is the search for alternative methods to animal testing, an area where comprehensive search is of special interest to determine the availability or unavailability of alternative methods. Results: The Dresden Ontology Generator for Directed Acyclic Graphs (DOG4DAG) developed in this thesis is a system which supports the creation and extension of ontologies by semi-automatically generating terms, definitions, and parent-child relations from text in PubMed, the web, and PDF repositories. The system is seamlessly integrated into OBO-Edit and Protégé, two widely used ontology editors in the life sciences. DOG4DAG generates terms by identifying statistically significant noun-phrases in text. For definitions and parent-child relations it employs pattern-based web searches. Each generation step has been systematically evaluated using manually validated benchmarks. The term generation leads to high quality terms also found in manually created ontologies. Definitions can be retrieved for up to 78% of terms, child ancestor relations for up to 54%. No other validated system exists that achieves comparable results. To improve the search for information on alternative methods to animal testing an ontology has been developed that contains 17,151 terms of which 10% were newly created and 90% were re-used from existing resources. This ontology is the core of Go3R, the first semantic search engine in this field. When a user performs a search query with Go3R, the search engine expands this request using the structure and terminology of the ontology. The machine classification employed in Go3R is capable of distinguishing documents related to alternative methods from those which are not with an F-measure of 90% on a manual benchmark. Approximately 200,000 of the 19 million documents listed in PubMed were identified as relevant, either because a specific term was contained or due to the automatic classification. The Go3R search engine is available on-line under www.Go3R.org

    A customized semantic service retrieval methodology for the digital ecosystems environment

    Get PDF
    With the emergence of the Web and its pervasive intrusion on individuals, organizations, businesses etc., people now realize that they are living in a digital environment analogous to the ecological ecosystem. Consequently, no individual or organization can ignore the huge impact of the Web on social well-being, growth and prosperity, or the changes that it has brought about to the world economy, transforming it from a self-contained, isolated, and static environment to an open, connected, dynamic environment. Recently, the European Union initiated a research vision in relation to this ubiquitous digital environment, known as Digital (Business) Ecosystems. In the Digital Ecosystems environment, there exist ubiquitous and heterogeneous species, and ubiquitous, heterogeneous, context-dependent and dynamic services provided or requested by species. Nevertheless, existing commercial search engines lack sufficient semantic supports, which cannot be employed to disambiguate user queries and cannot provide trustworthy and reliable service retrieval. Furthermore, current semantic service retrieval research focuses on service retrieval in the Web service field, which cannot provide requested service retrieval functions that take into account the features of Digital Ecosystem services. Hence, in this thesis, we propose a customized semantic service retrieval methodology, enabling trustworthy and reliable service retrieval in the Digital Ecosystems environment, by considering the heterogeneous, context-dependent and dynamic nature of services and the heterogeneous and dynamic nature of service providers and service requesters in Digital Ecosystems.The customized semantic service retrieval methodology comprises: 1) a service information discovery, annotation and classification methodology; 2) a service retrieval methodology; 3) a service concept recommendation methodology; 4) a quality of service (QoS) evaluation and service ranking methodology; and 5) a service domain knowledge updating, and service-provider-based Service Description Entity (SDE) metadata publishing, maintenance and classification methodology.The service information discovery, annotation and classification methodology is designed for discovering ubiquitous service information from the Web, annotating the discovered service information with ontology mark-up languages, and classifying the annotated service information by means of specific service domain knowledge, taking into account the heterogeneous and context-dependent nature of Digital Ecosystem services and the heterogeneous nature of service providers. The methodology is realized by the prototype of a Semantic Crawler, the aim of which is to discover service advertisements and service provider profiles from webpages, and annotating the information with service domain ontologies.The service retrieval methodology enables service requesters to precisely retrieve the annotated service information, taking into account the heterogeneous nature of Digital Ecosystem service requesters. The methodology is presented by the prototype of a Service Search Engine. Since service requesters can be divided according to the group which has relevant knowledge with regard to their service requests, and the group which does not have relevant knowledge with regard to their service requests, we respectively provide two different service retrieval modules. The module for the first group enables service requesters to directly retrieve service information by querying its attributes. The module for the second group enables service requesters to interact with the search engine to denote their queries by means of service domain knowledge, and then retrieve service information based on the denoted queries.The service concept recommendation methodology concerns the issue of incomplete or incorrect queries. The methodology enables the search engine to recommend relevant concepts to service requesters, once they find that the service concepts eventually selected cannot be used to denote their service requests. We premise that there is some extent of overlap between the selected concepts and the concepts denoting service requests, as a result of the impact of service requesters’ understandings of service requests on the selected concepts by a series of human-computer interactions. Therefore, a semantic similarity model is designed that seeks semantically similar concepts based on selected concepts.The QoS evaluation and service ranking methodology is proposed to allow service requesters to evaluate the trustworthiness of a service advertisement and rank retrieved service advertisements based on their QoS values, taking into account the contextdependent nature of services in Digital Ecosystems. The core of this methodology is an extended CCCI (Correlation of Interaction, Correlation of Criterion, Clarity of Criterion, and Importance of Criterion) metrics, which allows a service requester to evaluate the performance of a service provider in a service transaction based on QoS evaluation criteria in a specific service domain. The evaluation result is then incorporated with the previous results to produce the eventual QoS value of the service advertisement in a service domain. Service requesters can rank service advertisements by considering their QoS values under each criterion in a service domain.The methodology for service domain knowledge updating, service-provider-based SDE metadata publishing, maintenance, and classification is initiated to allow: 1) knowledge users to update service domain ontologies employed in the service retrieval methodology, taking into account the dynamic nature of services in Digital Ecosystems; and 2) service providers to update their service profiles and manually annotate their published service advertisements by means of service domain knowledge, taking into account the dynamic nature of service providers in Digital Ecosystems. The methodology for service domain knowledge updating is realized by a voting system for any proposals for changes in service domain knowledge, and by assigning different weights to the votes of domain experts and normal users.In order to validate the customized semantic service retrieval methodology, we build a prototype – a Customized Semantic Service Search Engine. Based on the prototype, we test the mathematical algorithms involved in the methodology by a simulation approach and validate the proposed functions of the methodology by a functional testing approach

    Semi-automated Ontology Generation for Biocuration and Semantic Search

    Get PDF
    Background: In the life sciences, the amount of literature and experimental data grows at a tremendous rate. In order to effectively access and integrate these data, biomedical ontologies – controlled, hierarchical vocabularies – are being developed. Creating and maintaining such ontologies is a difficult, labour-intensive, manual process. Many computational methods which can support ontology construction have been proposed in the past. However, good, validated systems are largely missing. Motivation: The biocuration community plays a central role in the development of ontologies. Any method that can support their efforts has the potential to have a huge impact in the life sciences. Recently, a number of semantic search engines were created that make use of biomedical ontologies for document retrieval. To transfer the technology to other knowledge domains, suitable ontologies need to be created. One area where ontologies may prove particularly useful is the search for alternative methods to animal testing, an area where comprehensive search is of special interest to determine the availability or unavailability of alternative methods. Results: The Dresden Ontology Generator for Directed Acyclic Graphs (DOG4DAG) developed in this thesis is a system which supports the creation and extension of ontologies by semi-automatically generating terms, definitions, and parent-child relations from text in PubMed, the web, and PDF repositories. The system is seamlessly integrated into OBO-Edit and Protégé, two widely used ontology editors in the life sciences. DOG4DAG generates terms by identifying statistically significant noun-phrases in text. For definitions and parent-child relations it employs pattern-based web searches. Each generation step has been systematically evaluated using manually validated benchmarks. The term generation leads to high quality terms also found in manually created ontologies. Definitions can be retrieved for up to 78% of terms, child ancestor relations for up to 54%. No other validated system exists that achieves comparable results. To improve the search for information on alternative methods to animal testing an ontology has been developed that contains 17,151 terms of which 10% were newly created and 90% were re-used from existing resources. This ontology is the core of Go3R, the first semantic search engine in this field. When a user performs a search query with Go3R, the search engine expands this request using the structure and terminology of the ontology. The machine classification employed in Go3R is capable of distinguishing documents related to alternative methods from those which are not with an F-measure of 90% on a manual benchmark. Approximately 200,000 of the 19 million documents listed in PubMed were identified as relevant, either because a specific term was contained or due to the automatic classification. The Go3R search engine is available on-line under www.Go3R.org
    • …
    corecore