6,906 research outputs found
XML content warehousing: Improving sociological studies of mailing lists and web data
In this paper, we present the guidelines for an XML-based approach for the
sociological study of Web data such as the analysis of mailing lists or
databases available online. The use of an XML warehouse is a flexible solution
for storing and processing this kind of data. We propose an implemented
solution and show possible applications with our case study of profiles of
experts involved in W3C standard-setting activity. We illustrate the
sociological use of semi-structured databases by presenting our XML Schema for
mailing-list warehousing. An XML Schema allows many adjunctions or crossings of
data sources, without modifying existing data sets, while allowing possible
structural evolution. We also show that the existence of hidden data implies
increased complexity for traditional SQL users. XML content warehousing allows
altogether exhaustive warehousing and recursive queries through contents, with
far less dependence on the initial storage. We finally present the possibility
of exporting the data stored in the warehouse to commonly-used advanced
software devoted to sociological analysis
An Intelligent Data Mining System to Detect Health Care Fraud
The chapter begins with an overview of the types of healthcare fraud. Next, there is a brief discussion of issues with the current fraud detection approaches. The chapter then develops information technology based approaches and illustrates how these technologies can improve current practice. Finally, there is a summary of the major findings and the implications for healthcare practice
On-Demand Big Data Integration: A Hybrid ETL Approach for Reproducible Scientific Research
Scientific research requires access, analysis, and sharing of data that is
distributed across various heterogeneous data sources at the scale of the
Internet. An eager ETL process constructs an integrated data repository as its
first step, integrating and loading data in its entirety from the data sources.
The bootstrapping of this process is not efficient for scientific research that
requires access to data from very large and typically numerous distributed data
sources. a lazy ETL process loads only the metadata, but still eagerly. Lazy
ETL is faster in bootstrapping. However, queries on the integrated data
repository of eager ETL perform faster, due to the availability of the entire
data beforehand.
In this paper, we propose a novel ETL approach for scientific data
integration, as a hybrid of eager and lazy ETL approaches, and applied both to
data as well as metadata. This way, Hybrid ETL supports incremental integration
and loading of metadata and data from the data sources. We incorporate a
human-in-the-loop approach, to enhance the hybrid ETL, with selective data
integration driven by the user queries and sharing of integrated data between
users. We implement our hybrid ETL approach in a prototype platform, Obidos,
and evaluate it in the context of data sharing for medical research. Obidos
outperforms both the eager ETL and lazy ETL approaches, for scientific research
data integration and sharing, through its selective loading of data and
metadata, while storing the integrated data in a scalable integrated data
repository.Comment: Pre-print Submitted to the DMAH Special Issue of the Springer DAPD
Journa
FROM DOCUMENT MANAGEMENT TO KNOWLEDGE MANAGEMENT
Documents circulating in paper form are increasingly being substituted by itselectronic equivalent in the modern office today so that any stored document can be retrievedwhenever needed later on. The office worker is already burdened with information overload, soeffective and effcient retrieval facilities become an important factor affecting worker productivity. The key thrust of this article is to analyse the benefits and importance of interaction betweendocument management and knowledge management. Information stored in text-based documentsrepresents a valuable repository for both the individual worker and the enterprise as a whole and ithas to be tapped into as part of the knowledge generation process.document management, knowledge management, Information and communication technologies
BUSINESS INTELLIGENT AGENTS FOR ENTERPRISE APPLICATION
Fierce competition in a market increasingly crowded and frequent changes in consumer requirements are the main forces that will cause companies to change their current organization and management. One solution is to move to open architectures and virtual type, which requires addressing business methods and technologies using distributed multi-agent systems. Intelligent agents are one of the most important areas of artificial intelligence that deals with the development of hardware and software systems able to reason, learn to recognize natural language, speak, make decisions, to recognize objects in the working environment etc. Thus in this paper, we presented some aspects of smart business, intelligent agents, intelligent systems, intelligent systems models, and I especially emphasized their role in managing business processes, which have become highly complex systems that are in a permanent change to meet the requirements of timely decision making. The purpose of this paper is to prove that there is no business without using the integration Business Process Management, Web Services and intelligent agents.business intelligence, intelligent agents, intelligent systems, management, enterprise, web services
Carpe Diem: Transforming Services in Academic Libraries
Amid a global economic crisis and spurred, in my country, by a great influx of funding intended to stimulate the economy quickly, librarians are confronted by other factors that could have transformative powers ??? if they choose to seize the opportunity. This paper focuses on the future of academic library services and the opportunities that await those who reject hunkering down in troubled times.unpublishednot peer reviewe
A Service Late Binding Enabled Solution for Data Integration from Autonomous and Evolving Databases
Integrating data from autonomous, distributed and heterogeneous data sources to provide a unified vision is a common demand for many businesses. Since the data sources may evolve frequently to satisfy their own independent business needs, solutions which use hard coded queries to integrate participating databases may cause high maintenance costs when evolution occurs. Thus a new solution which can handle database evolution with lower maintenance effort is required.
This thesis presents a new solution: Service Late binding Enabled Data Integration (SLEDI) which is set into a framework modeling the essential processes of the data integration activity. It integrates schematic heterogeneous relational databases with decreased maintenance costs for handling database evolution. An algorithm, named Information Provision Unit Describing (IPUD) is designed to describe each database as a set of Information Provision Units (IPUs). The IPUs are represented as Directed Acyclic Graph (DAG) structured data instead of hard coded queries, and further realized as data services. Hence the data integration is achieved through service invocations. Furthermore, a set of processes is defined to handle the database evolution through automatically identifying and modifying the IPUs which are affected by the evolution.
An extensive evaluation based on a case study is presented. The result shows that the schematic heterogeneities defined in this thesis can be solved by IPUD except the relation isomorphism discrepancy. Ten out of thirteen types of schematic database evolution can be automatically handled by the evolution handling processes as long as the evolution is represented by the designed data model. The computational costs of the automatic evolution handling show a slow linear growth with the number of participating databases. Other characteristics addressed include SLEDI’s scalability, independence of application domain and databases model. The descriptive comparison with other data integration approaches shows that although the Data as a Service approach may result in lower performance under some circumstances, it supports better flexibility for integrating data from autonomous and evolving data sources
- …