Search CORE

212,718 research outputs found

Web scraping technologies in an API world

Author: Anália Lourenço
Bare
Benson
Benton
Beran
Beran
Caspi
Croft
Daniel Glez-Peña
Day
Flicek
Florentino Fdez-Riverola
Galperin
Gene Ontology Consortium
Glez-Peña
Glez-Peña
Goble
Griffiths-Jones
Higgins
Hill
Hugo López-Fernández
Inusah
Johnson
Kanehisa
Katayama
Kerrien
Knox
Lewis
Mayer
Miguel Reboiro-Jato
Rajapakse
Ranzinger
Safran
Schaefer
Stein
Stockinger
Subramanian
Tenenbaum
The UniProt Consortium
Thomas
Verslyppe
Wall
Wang
Wang
Wheeler
Williams
Wishart
Yamamoto
Yang
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2013
Field of study

Web services are the de facto standard in biomedical data integration. However, there are data integration scenarios that cannot be fully covered by Web services. A number of Web databases and tools do not support Web services, and existing Web services do not cover for all possible user data demands. As a consequence, Web data scraping, one of the oldest techniques for extracting Web contents, is still in position to offer a valid and valuable service to a wide range of bioinformatics applications, ranging from simple extraction robots to online meta-servers. This article reviews existing scraping frameworks and tools, identifying their strengths and limitations in terms of extraction capabilities. The main focus is set on showing how straightforward it is today to set up a data scraping pipeline, with minimal programming effort, and answer a number of practical needs. For exemplification purposes, we introduce a biomedical data extraction scenario where the desired data sources, well-known in clinical microbiology and similar domains, do not offer programmatic interfaces yet. Moreover, we describe the operation of WhichGenes and PathJam, two bioinformatics meta-servers that use scraping as means to cope with gene set enrichment analysis.This work was partially funded by (i) the [TIN2009-14057-C03-02] project from the Spanish Ministry of Science and Innovation, the Plan E from the Spanish Government and the European Union from the European Regional Development Fund (ERDF), (ii) the Portugal-Spain cooperation action sponsored by the Foundation of Portuguese Universities [E 48/11] and the Spanish Ministry of Science and Innovation [AIB2010PT-00353] and (iii) the Agrupamento INBIOMED [2012/273] from the DXPCTSUG (Direccion Xeral de Promocion Cientifica e Tecnoloxica do Sistema Universitario de Galicia) from the Galician Government and the European Union from the ERDF unha maneira de facer Europa. H. L. F. was supported by a pre-doctoral fellowship from the University of Vigo

CiteSeerX

Universidade do Minho: RepositoriUM

Crossref

Hybrid Rules with Well-Founded Semantics

Author: A Levy
B Motik
C Baral
F Donini
F Fages
G Ferrand
J Dix
Jan Małuszyński
JR Shoenfield
JW Lloyd
K Kunen
K Marriott
KL Clark
KR Apt
KR Apt
PJ Stuckey
R Rosati
R Rosati
TC Przymusinski
TR Gruber
W Drabent
Włodzimierz Drabent
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/06/2009
Field of study

A general framework is proposed for integration of rules and external first order theories. It is based on the well-founded semantics of normal logic programs and inspired by ideas of Constraint Logic Programming (CLP) and constructive negation for logic programs. Hybrid rules are normal clauses extended with constraints in the bodies; constraints are certain formulae in the language of the external theory. A hybrid program is a pair of a set of hybrid rules and an external theory. Instances of the framework are obtained by specifying the class of external theories, and the class of constraints. An example instance is integration of (non-disjunctive) Datalog with ontologies formalized as description logics. The paper defines a declarative semantics of hybrid programs and a goal-driven formal operational semantics. The latter can be seen as a generalization of SLS-resolution. It provides a basis for hybrid implementations combining Prolog with constraint solvers. Soundness of the operational semantics is proven. Sufficient conditions for decidability of the declarative semantics, and for completeness of the operational semantics are given

arXiv.org e-Print Archive

Crossref

Publikationer från Linköpings universitet

Springer - Publisher Connector

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Coherent Integration of Databases by Abductive Logic Programming

Author: Arieli O.
Bruynooghe M.
Denecker M.
Van Nuffelen B.
Publication venue: 'AI Access Foundation'
Publication date: 01/01/2004
Field of study

We introduce an abductive method for a coherent integration of independent data-sources. The idea is to compute a list of data-facts that should be inserted to the amalgamated database or retracted from it in order to restore its consistency. This method is implemented by an abductive solver, called Asystem, that applies SLDNFA-resolution on a meta-theory that relates different, possibly contradicting, input databases. We also give a pure model-theoretic analysis of the possible ways to `recover' consistent data from an inconsistent database in terms of those models of the database that exhibit as minimal inconsistent information as reasonably possible. This allows us to characterize the `recovered databases' in terms of the `preferred' (i.e., most consistent) models of the theory. The outcome is an abductive-based application that is sound and complete with respect to a corresponding model-based, preferential semantics, and -- to the best of our knowledge -- is more expressive (thus more general) than any other implementation of coherent integration of databases

arXiv.org e-Print Archive

Lirias

Crossref

Probabilistic Methodology and Techniques for Artefact Conception and Development

Author: Bessiere Dr P
Publication venue
Publication date: 01/01/2003
Field of study

The purpose of this paper is to make a state of the art on probabilistic methodology and techniques for artefact conception and development. It is the 8th deliverable of the BIBA (Bayesian Inspired Brain and Artefacts) project. We first present the incompletness problem as the central difficulty that both living creatures and artefacts have to face: how can they perceive, infer, decide and act efficiently with incomplete and uncertain knowledge?. We then introduce a generic probabilistic formalism called Bayesian Programming. This formalism is then used to review the main probabilistic methodology and techniques. This review is organized in 3 parts: first the probabilistic models from Bayesian networks to Kalman filters and from sensor fusion to CAD systems, second the inference techniques and finally the learning and model acquisition and comparison methodologies. We conclude with the perspectives of the BIBA project as they rise from this state of the art