Search CORE

602 research outputs found

CRIS-IR 2006

Author
Publication venue
Publication date: 01/11/2006
Field of study

The recognition of entities and their relationships in document collections is an important step towards the discovery of latent knowledge as well as to support knowledge management applications. The challenge lies on how to extract and correlate entities, aiming to answer key knowledge management questions, such as; who works with whom, on which projects, with which customers and on what research areas. The present work proposes a knowledge mining approach supported by information retrieval and text mining tasks in which its core is based on the correlation of textual elements through the LRD (Latent Relation Discovery) method. Our experiments show that LRD outperform better than other correlation methods. Also, we present an application in order to demonstrate the approach over knowledge management scenarios.Fundação para a Ciência e a Tecnologia (FCT) Denmark's Electronic Research Librar

Universidade do Minho: RepositoriUM

Fortifying Applications Against Xpath Injection Attacks

Author: Karakoidas Vassilios
Mitropoulos Dimitris
Spinellis Diomidis
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2009
Field of study

Code injection derives from a software vulnerability that allows a malicious user to inject custom code into the server engine. In recent years, there have been a great number of such exploits targeting web applications. In this paper we propose an approach that prevents a specific kind of code injection attacks known as xpath injection in a novel way. To detect an attack, our scheme uses location-specific identifiers to validate the executable xpath code. These identifiers represent all the unique fragments of this code along with their call sites within the application

AIS Electronic Library (AISeL)

Web Data Extraction, Applications and Techniques: A Survey

Author: Abel
Amalfitano
Balduzzi
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Berger
Berthold
Bettencourt
Califf
Catanese
Chang
Chen
Chen
Chen
Collins
Conover
Crandall
Crescenzi
Crescenzi
Dalvi
Dalvi
De Meo
De Meo
Doan
Emilio Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Flesca
Freitag
Furche
Gatterbauer
Gatterbauer
Giacomo Fiumara
Gjoka
Gkotsis
Gottlob
Gottlob
Hammersley
Han
Hecht
Hsu
Irmak
Khare
Kim
Kinsella
Kleinberg
Kleinberg
Kohlschütter
Kokkoras
Kokkoras
Kokkoras
Krüpl
Kushmerick
Kwak
Laender
Liu
Manning
Masanès
Mathes
Meng
Mislove
Monge
Muslea
Oro
Pan
Pasquale De Meo
Perito
Phan
Plake
Rahm
Rahm
Reis
Robert Baumgartner
Sahuguet
Sarawagi
Schifanella
Selkow
Shi
Soderland
Szomszor
Turmo
Vosecky
Wang
Wang
Weikum
Wilson
Winograd
Yang
Ye
Zafarani
Zanasi
Zhai
Zhang
Zhang
Publication venue: 'Elsevier BV'
Publication date: 09/06/2014
Field of study

Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

arXiv.org e-Print Archive

Crossref

Modeling image databases using Xml schema

Author: Xu Min
Publication venue: Digital Scholarship@UNLV
Publication date: 01/01/2002
Field of study

This thesis presents a model for still images in order to support content-based querying and browsing by hierarchical tree structures and object relational graphs. We use the extensible markup language (XML) schema to illustrate and exemplify the proposed model because of its interoperability and flexibility advantages. Of primary interest is the notion of complex types and referential integrity to fully describe the physical and semantic properties of images. XQuery is used to support query processing. We further show how these complex types of XML schema can be used to overcome the shortcomings of reported image database descriptions in the literature

University of Nevada, Las Vegas Repository

Yale Leaf Morphology Digitization and Network Project

Author: Stern David
Publication venue: ISU ReD: Research and eData
Publication date: 01/01/2006
Field of study

This article describes a digitization project inspired by the innovative leaf morphology classification work of a faculty member in the Geology and Geophysics Department and the Peabody Museum at Yale University. We began our initiative by scanning the Flora Fossilis Arctica, a 7-volume fossil leaf identification tool covering various geological areas, published between 1868 and 1883. This classic paleobotany resource was digitized, creating tiff, pdf, and searchable pdf files. We are now converting the searchable pdf files into ASCII text, enhancing the raw data with metadata elements, placing this material on the web for searching and display; and linking this material to an existing set of preserved leaf plates, a locally created index of annotated article clippings, an online leaf morphology tutorial, and the published online literature. Many decisions must be made in terms of host platforms, mark-up standards, search and linking options, and preservation documentation. This article will outline our decision process as we explore the post-digitization dataset handling, which may prove instructive for others attempting to create and link locally digitized materials

ISU ReD: Research and eData

An MPEG-7 scheme for semantic content modelling and filtering of digital video

Author: A. Vakali
A. Vetro
B.L. Tseng
B.L. Tseng
C. Okoli
C.S. Goldfarb
F. Golshani
F. Kretz
G. Rowe
H. Kosch
H.W. Agius
H.W. Agius
H.W. Agius
Harry Agius
J. Hunter
J. Magalhães
J.F. Allen
L. Al-Safadi
L. Wenyin
M. Davis
M. Echiffre
M. Eirinaki
M.C. Angelides
M.R. Naphande
Marios C. Angelides
N. Adami
P. Correia
P. Salembier
P.M. Fonseca
R. Zhao
S. Adali
S.R. Newcomb
S.R. Newcomb
S.W. Ambler
T. Meyer-Boudnik
U. Westermann
Y.F. Day
É Germain
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/02/2006
Field of study

Abstract Part 5 of the MPEG-7 standard specifies Multimedia Description Schemes (MDS); that is, the format multimedia content models should conform to in order to ensure interoperability across multiple platforms and applications. However, the standard does not specify how the content or the associated model may be filtered. This paper proposes an MPEG-7 scheme which can be deployed for digital video content modelling and filtering. The proposed scheme, COSMOS-7, produces rich and multi-faceted semantic content models and supports a content-based filtering approach that only analyses content relating directly to the preferred content requirements of the user. We present details of the scheme, front-end systems used for content modelling and filtering and experiences with a number of users

Crossref

Brunel University Research Archive

Annotation-based storage and retrieval of models and simulation descriptions in computational biology

Author: Waltemath Dagmar (gnd: 1016855753)
Publication venue: Universität Rostock Rostock
Publication date: 01/01/2011
Field of study

This work aimed at enhancing reuse of computational biology models by identifying and formalizing relevant meta-information. One type of meta-information investigated in this thesis is experiment-related meta-information attached to a model, which is necessary to accurately recreate simulations. The main results are: a detailed concept for model annotation, a proposed format for the encoding of simulation experiment setups, a storage solution for standardized model representations and the development of a retrieval concept.Die vorliegende Arbeit widmete sich der besseren Wiederverwendung biologischer Simulationsmodelle. Ziele waren die Identifikation und Formalisierung relevanter Modell-Meta-Informationen, sowie die Entwicklung geeigneter Modellspeicherungs- und Modellretrieval-Konzepte. Wichtigste Ergebnisse der Arbeit sind ein detailliertes Modellannotationskonzept, ein Formatvorschlag für standardisierte Kodierung von Simulationsexperimenten in XML, eine Speicherlösung für Modellrepräsentationen sowie ein Retrieval-Konzept

Rostocker Dokumentenserver

Universität Rostock, Lehrstuhl Datenbank- und Informationssysteme: Dbis Repository

Just-in-time hypermedia

Author: Zhang Li
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/2005
Field of study

Many analytical applications, especially legacy systems, create documents and display screens in response to user queries dynamically or in real time . These documents and displays do not exist in advance, and thus hypermedia must be generated \u27just in time -automatically and dynamically. This dissertation details the idea of \u27just-in-time hypermedia and discusses challenges encountered in this research area. A fully detailed literature review about the research issues and related research work is given. A framework for the \u27just-in-time hypermedia compares virtual documents with static documents, as well as dynamic with static hypermedia functionality. Conceptual \u27just-in-time hypermedia architecture is proposed in terms of requirements and logical components. The \u27just-in-time hypermedia engine is described in terms of architecture, functional components, information flow, and implementation details. Then test results are described and evaluated. Lastly, contributions, limitations, and future work are discussed

Digital Commons @ New Jersey Institute of Technology (NJIT)

That obscure object of desire: multimedia metadata on the web

Author: Hardman L. (Lynda)
Nack F.-M. (Frank)
Ossenbruggen J.R. (Jacco) van
Publication venue
Publication date: 01/01/2003
Field of study

CWI's Institutional Repository