2,132 research outputs found

    SWI-Prolog and the Web

    Get PDF
    Where Prolog is commonly seen as a component in a Web application that is either embedded or communicates using a proprietary protocol, we propose an architecture where Prolog communicates to other components in a Web application using the standard HTTP protocol. By avoiding embedding in external Web servers development and deployment become much easier. To support this architecture, in addition to the transfer protocol, we must also support parsing, representing and generating the key Web document types such as HTML, XML and RDF. This paper motivates the design decisions in the libraries and extensions to Prolog for handling Web documents and protocols. The design has been guided by the requirement to handle large documents efficiently. The described libraries support a wide range of Web applications ranging from HTML and XML documents to Semantic Web RDF processing. To appear in Theory and Practice of Logic Programming (TPLP)Comment: 31 pages, 24 figures and 2 tables. To appear in Theory and Practice of Logic Programming (TPLP

    Constructing a Personal Knowledge Graph from Disparate Data Sources

    Get PDF
    This thesis revolves around the idea of a Personal Knowledge Graph as a uniform coherent structure of personal data collected from multiple disparate sources: A knowledge base consisting of entities such as persons, events, locations and companies interlinked with semantically meaningful relationships in a graph structure where the user is at its center. The personal knowledge graph is intended to be a valuable resource for a digital personal assistant, expanding its capabilities to answer questions and perform tasks that require personal knowledge about the user. We explored techniques within Knowledge Representation, Knowledge Extraction/ Information Extraction and Information Management for the purpose of constructing such a graph. We show the practical advantages of using Knowledge Graphs for personal information management, utilizing the structure for extracting and inferring answers and for handling resources like documents, emails and calendar entries. We have proposed a framework for aggregating user data and shown how existing ontologies can be used to model personal knowledge. We have shown that a personal knowledge graph based on the user's personal resources is a viable concept, however we were not able to enrich our personal knowledge graph with knowledge extracted from unstructured private sources. This was mainly due to sparsity of relevant information, the informal nature and the lack of context in personal correspondence

    Preliminary evaluation of the CellFinder literature curation pipeline for gene expression in kidney cells and anatomical parts

    Get PDF
    Biomedical literature curation is the process of automatically and/or manually deriving knowledge from scientific publications and recording it into specialized databases for structured delivery to users. It is a slow, error-prone, complex, costly and, yet, highly important task. Previous experiences have proven that text mining can assist in its many phases, especially, in triage of relevant documents and extraction of named entities and biological events. Here, we present the curation pipeline of the CellFinder database, a repository of cell research, which includes data derived from literature curation and microarrays to identify cell types, cell lines, organs and so forth, and especially patterns in gene expression. The curation pipeline is based on freely available tools in all text mining steps, as well as the manual validation of extracted data. Preliminary results are presented for a data set of 2376 full texts from which >4500 gene expression events in cell or anatomical part have been extracted. Validation of half of this data resulted in a precision of ~50% of the extracted data, which indicates that we are on the right track with our pipeline for the proposed task. However, evaluation of the methods shows that there is still room for improvement in the named-entity recognition and that a larger and more robust corpus is needed to achieve a better performance for event extraction. Database URL: http://www.cellfinder.org

    Knowledgebase Representation for Royal Bengal Tiger In The Context of Bangladesh

    Get PDF
    Royal Bengal Tiger is one of the penetrating threaten animal in Bangladesh forest at Sundarbans. In this work we have had concentrate to establish a robust Knowledgebase for Royal Bengal Tiger. We improve our previous work to achieve efficiency on knowledgebase representation. We have categorized the tigers from others animal from collected data by using Support Vector Machines(SVM) .Manipulating our collected data in a structured way by XML parsing on JAVA platform. Our proposed system generates n-triple by considering parsed data. We proceed on an ontology is constructed by ProtE9;gE9; which containing information about names, places, awards. A straightforward approach of this work to make the knowledgebase representation of Royal Bengal Tiger more reliable on the web. Our experiments show the effectiveness of knowledgebase construction. Complete knowledgebase construction of Royal Bengal Tigers how the efficient out-put. The complete knowledgebase construction helps to integrate the raw data in a structured way. The outcome of our proposed system contains the complete knowledgebase. Our experimental results show the strength of our system by retrieving information from ontology in reliable way

    BlogForever D2.6: Data Extraction Methodology

    Get PDF
    This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform

    Exploration of documents concerning Foundlings in Fafe along XIX Century

    Get PDF
    Dissertação de mestrado integrado em Informatics EngineeringThe abandonment of children and newborns is a problem in our society. In the last few decades, the introduction of contraceptive methods, the development of social programs and family planning were fundamental to control undesirable pregnancies and support families in need. But these developments were not enough to solve the abandonment epidemic. The anonymous abandonment has a dangerous aspect. In order to preserve the family identity, a child is usually left in a public place at night. Since children and newborns are one of the most vulnerable groups in our society, the time between the abandonment and the assistance of the child is potentially deadly. The establishment of public institutions in the past, such as the foundling wheel, was extremely important as a strategy to save lives. These institutions supported the abandoned children, while simultaneously providing a safer abandonment process, without compromising the anonymity of the family. The focus of the Master’s Project discussed in this dissertation is the analysis and processing of nineteenth century documents, concerning the Foundling Wheel of Fafe. The analysis of sample documents is the initial step in the development of an ontology. The ontology has a fundamental role in the organization and structure of the information contained in these historical documents. The identification of concepts and the relationships between them, culminates in a structured knowledge repository. Other important component is the development of a digital platform, where users are able to access the content stored in the knowledge repository and explore the digital archive, which incorporates the digitized version of documents and books from these historical institutions. The development of this project is important for some reasons. Directly, the implementation of a knowledge repository and a digital platform preserves information. These documents are mostly unique records and due to their age and advanced state of degradation, the substitution of the physical by digital access reduces the wear and tear associated to each consultation. Additionally, the digital archive facilitates the dissemination of valuable information. Research groups or the general public are able to use the platform as a tool to discover the past, by performing biographic, cultural or socio-economic studies over documents dated to the ninetieth century.O abandono de crianças e de recĂ©m-nascidos Ă© um flagelo da sociedade. Nas Ășltimas dĂ©cadas, a introdução de mĂ©todos contraceptivos e de programas sociais foram essenciais para o desenvolvimento do planeamento familiar. Apesar destes avanços, estes programas nĂŁo solucionaram a problemĂĄtica do abandono de crianças e recĂ©m-nascidos. Problemas socioeconĂłmicos sĂŁo o principal factor que explica o abandono. O processo de abandono de crianças possui uma agravante perigosa. De forma a proteger a identidade da famĂ­lia, este processo ocorre normalmente em locais pĂșblicos e durante a noite. Como crianças e recĂ©m-nascidos constituem um dos grupos mais vulnerĂĄveis da sociedade, o tempo entre o abandono da criança e seu salvamento, pode ser demasiado longo e fatal. A casa da roda foi uma instituição introduzida de forma a tornar o processo de abandono anĂłnimo mais seguro. O foco do Projeto de Mestrado discutido nesta dissertação Ă© a anĂĄlise e tratamento de documentos do sĂ©culo XIX, relativos Ă  Casa da Roda de Fafe preservados pelo Arquivo Municipal de Fafe. A anĂĄlise documental representa o ponto de partida do processo de desenvolvimento de uma ontologia. A ontologia possui um papel fundamental na organização e estruturação da informação contida nos documentos histĂłricos. O processo de desenvolvimento de uma base de conhecimento consiste na identificação de conceitos e relaçÔes existentes nos documentos. Outra componente fundamental deste projecto Ă© o desenvolvimento de uma plataforma digital, que permite utilizadores acederem Ă  base de conhecimento desenvolvida. Os utilizadores podem pesquisar, explorar e adicionar informação Ă  base de conhecimento. O desenvolvimento deste projecto possui importĂąncia. De forma imediata, a implementação de uma plataforma digital permite salvaguardar e preservar informação contida nos documentos. Estes documentos sĂŁo os Ășnicos registos existentes com esse conteĂșdo e muitos encontram-se num estado avançado de degradação. A substituição de acessos fĂ­sicos por acessos digitais reduz o desgaste associado a cada consulta. O desenvolvimento da plataforma digital permite disseminar a informação contida na base documental. Investigadores ou o pĂșblico em geral podem utilizar esta ferramenta com o intuito de realizar estudos biogrĂĄficos, culturais e sociais sobre este arquivo histĂłrico

    Internet based molecular collaborative and publishing tools

    No full text
    The scientific electronic publishing model has hitherto been an Internet based delivery of electronic articles that are essentially replicas of their paper counterparts. They contain little in the way of added semantics that may better expose the science, assist the peer review process and facilitate follow on collaborations, even though the enabling technologies have been around for some time and are mature. This thesis will examine the evolution of chemical electronic publishing over the past 15 years. It will illustrate, which the help of two frameworks, how publishers should be exploiting technologies to improve the semantics of chemical journal articles, namely their value added features and relationships with other chemical resources on the Web. The first framework is an early exemplar of structured and scalable electronic publishing where a Web content management system and a molecular database are integrated. It employs a test bed of articles from several RSC journals and supporting molecular coordinate and connectivity information. The value of converting 3D molecular expressions in chemical file formats, such as the MOL file, into more generic 3D graphics formats, such as Web3D, is assessed. This exemplar highlights the use of metadata management for bidirectional hyperlink maintenance in electronic publishing. The second framework repurposes this metadata management concept into a Semantic Web application called SemanticEye. SemanticEye demonstrates how relationships between chemical electronic articles and other chemical resources are established. It adapts the successful semantic model used for digital music metadata management by popular applications such as iTunes. Globally unique identifiers enable relationships to be established between articles and other resources on the Web and SemanticEye implements two: the Document Object Identifier (DOI) for articles and the IUPAC International Chemical Identifier (InChI) for molecules. SemanticEye’s potential as a framework for seeding collaborations between researchers, who have hitherto never met, is explored using FOAF, the friend-of-a-friend Semantic Web standard for social networks
    • 

    corecore