240 research outputs found
Integrating data warehouses with web data : a survey
This paper surveys the most relevant research on combining Data Warehouse (DW) and Web data. It studies the XML
technologies that are currently being used to integrate, store, query, and retrieve Web data and their application to DWs. The paper
reviews different DW distributed architectures and the use of XML languages as an integration tool in these systems. It also introduces
the problem of dealing with semistructured data in a DW. It studies Web data repositories, the design of multidimensional databases for
XML data sources, and the XML extensions of OnLine Analytical Processing techniques. The paper addresses the application of
information retrieval technology in a DW to exploit text-rich document collections. The authors hope that the paper will help to discover
the main limitations and opportunities that offer the combination of the DW and the Web fields, as well as to identify open research
line
State-of-the-art on evolution and reactivity
This report starts by, in Chapter 1, outlining aspects of querying and updating resources on
the Web and on the Semantic Web, including the development of query and update languages
to be carried out within the Rewerse project.
From this outline, it becomes clear that several existing research areas and topics are of
interest for this work in Rewerse. In the remainder of this report we further present state of
the art surveys in a selection of such areas and topics. More precisely: in Chapter 2 we give
an overview of logics for reasoning about state change and updates; Chapter 3 is devoted to briefly describing existing update languages for the Web, and also for updating logic programs;
in Chapter 4 event-condition-action rules, both in the context of active database systems and
in the context of semistructured data, are surveyed; in Chapter 5 we give an overview of some relevant rule-based agents frameworks
CELO: A System for Efficiently Building Informatics Solutions to Manage Biomedical Research Data
Traditional data management methods are unable to sufficiently support growing trends in biomedical research such as collection of larger data sets, use of diverse data types, and sharing of data among multiple laboratories. Although many technologies are readily available to help laboratories build data management solutions, many laboratories are not taking advantage of them. This may be due to hardware and software costs, the need for an informaticist to build customized solutions, and long development times.
Several systems already exist which attempt to address the informatics needs of biomedical researchers. A review of these systems has revealed the benefits and drawbacks of various system design approaches, and has helped us to identify a set of core requirements for a system that will successfully serve the biomedical research community. In consideration of these requirements, we developed the Customizable Electronic Laboratory Online (CELO) system to help laboratories efficiently build cost-effective informatics solutions. CELO automatically creates a generic database and web interface for laboratories that submit a simple web registration form. Researchers can then build their own customized data management systems using web-based features such as configurable user permissions, customizable user interfaces, support for multimedia files, and templates for defining research data representations.
An evaluation of the CELO system has demonstrated its ability to efficiently create customized solutions for research laboratories with basic data management needs. The evaluation has also highlighted areas in which CELO can be improved and has elucidated potential research problems that may be of interest to the biomedical informatics field
ATLAS: A flexible and extensible architecture for linguistic annotation
We describe a formal model for annotating linguistic artifacts, from which we
derive an application programming interface (API) to a suite of tools for
manipulating these annotations. The abstract logical model provides for a range
of storage formats and promotes the reuse of tools that interact through this
API. We focus first on ``Annotation Graphs,'' a graph model for annotations on
linear signals (such as text and speech) indexed by intervals, for which
efficient database storage and querying techniques are applicable. We note how
a wide range of existing annotated corpora can be mapped to this annotation
graph model. This model is then generalized to encompass a wider variety of
linguistic ``signals,'' including both naturally occuring phenomena (as
recorded in images, video, multi-modal interactions, etc.), as well as the
derived resources that are increasingly important to the engineering of natural
language processing systems (such as word lists, dictionaries, aligned
bilingual corpora, etc.). We conclude with a review of the current efforts
towards implementing key pieces of this architecture.Comment: 8 pages, 9 figure
- âŠ