602 research outputs found

    CRIS-IR 2006

    Get PDF
    The recognition of entities and their relationships in document collections is an important step towards the discovery of latent knowledge as well as to support knowledge management applications. The challenge lies on how to extract and correlate entities, aiming to answer key knowledge management questions, such as; who works with whom, on which projects, with which customers and on what research areas. The present work proposes a knowledge mining approach supported by information retrieval and text mining tasks in which its core is based on the correlation of textual elements through the LRD (Latent Relation Discovery) method. Our experiments show that LRD outperform better than other correlation methods. Also, we present an application in order to demonstrate the approach over knowledge management scenarios.Fundação para a Ciência e a Tecnologia (FCT) Denmark's Electronic Research Librar

    Fortifying Applications Against Xpath Injection Attacks

    Get PDF
    Code injection derives from a software vulnerability that allows a malicious user to inject custom code into the server engine. In recent years, there have been a great number of such exploits targeting web applications. In this paper we propose an approach that prevents a specific kind of code injection attacks known as xpath injection in a novel way. To detect an attack, our scheme uses location-specific identifiers to validate the executable xpath code. These identifiers represent all the unique fragments of this code along with their call sites within the application

    Web Data Extraction, Applications and Techniques: A Survey

    Full text link
    Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

    Modeling image databases using Xml schema

    Full text link
    This thesis presents a model for still images in order to support content-based querying and browsing by hierarchical tree structures and object relational graphs. We use the extensible markup language (XML) schema to illustrate and exemplify the proposed model because of its interoperability and flexibility advantages. Of primary interest is the notion of complex types and referential integrity to fully describe the physical and semantic properties of images. XQuery is used to support query processing. We further show how these complex types of XML schema can be used to overcome the shortcomings of reported image database descriptions in the literature

    Yale Leaf Morphology Digitization and Network Project

    Get PDF
    This article describes a digitization project inspired by the innovative leaf morphology classification work of a faculty member in the Geology and Geophysics Department and the Peabody Museum at Yale University. We began our initiative by scanning the Flora Fossilis Arctica, a 7-volume fossil leaf identification tool covering various geological areas, published between 1868 and 1883. This classic paleobotany resource was digitized, creating tiff, pdf, and searchable pdf files. We are now converting the searchable pdf files into ASCII text, enhancing the raw data with metadata elements, placing this material on the web for searching and display; and linking this material to an existing set of preserved leaf plates, a locally created index of annotated article clippings, an online leaf morphology tutorial, and the published online literature. Many decisions must be made in terms of host platforms, mark-up standards, search and linking options, and preservation documentation. This article will outline our decision process as we explore the post-digitization dataset handling, which may prove instructive for others attempting to create and link locally digitized materials

    An MPEG-7 scheme for semantic content modelling and filtering of digital video

    Get PDF
    Abstract Part 5 of the MPEG-7 standard specifies Multimedia Description Schemes (MDS); that is, the format multimedia content models should conform to in order to ensure interoperability across multiple platforms and applications. However, the standard does not specify how the content or the associated model may be filtered. This paper proposes an MPEG-7 scheme which can be deployed for digital video content modelling and filtering. The proposed scheme, COSMOS-7, produces rich and multi-faceted semantic content models and supports a content-based filtering approach that only analyses content relating directly to the preferred content requirements of the user. We present details of the scheme, front-end systems used for content modelling and filtering and experiences with a number of users

    Annotation-based storage and retrieval of models and simulation descriptions in computational biology

    Get PDF
    This work aimed at enhancing reuse of computational biology models by identifying and formalizing relevant meta-information. One type of meta-information investigated in this thesis is experiment-related meta-information attached to a model, which is necessary to accurately recreate simulations. The main results are: a detailed concept for model annotation, a proposed format for the encoding of simulation experiment setups, a storage solution for standardized model representations and the development of a retrieval concept.Die vorliegende Arbeit widmete sich der besseren Wiederverwendung biologischer Simulationsmodelle. Ziele waren die Identifikation und Formalisierung relevanter Modell-Meta-Informationen, sowie die Entwicklung geeigneter Modellspeicherungs- und Modellretrieval-Konzepte. Wichtigste Ergebnisse der Arbeit sind ein detailliertes Modellannotationskonzept, ein Formatvorschlag für standardisierte Kodierung von Simulationsexperimenten in XML, eine Speicherlösung für Modellrepräsentationen sowie ein Retrieval-Konzept

    Just-in-time hypermedia

    Get PDF
    Many analytical applications, especially legacy systems, create documents and display screens in response to user queries dynamically or in real time . These documents and displays do not exist in advance, and thus hypermedia must be generated \u27just in time -automatically and dynamically. This dissertation details the idea of \u27just-in-time hypermedia and discusses challenges encountered in this research area. A fully detailed literature review about the research issues and related research work is given. A framework for the \u27just-in-time hypermedia compares virtual documents with static documents, as well as dynamic with static hypermedia functionality. Conceptual \u27just-in-time hypermedia architecture is proposed in terms of requirements and logical components. The \u27just-in-time hypermedia engine is described in terms of architecture, functional components, information flow, and implementation details. Then test results are described and evaluated. Lastly, contributions, limitations, and future work are discussed
    corecore