Search CORE

27,643 research outputs found

XML content warehousing: Improving sociological studies of mailing lists and web data

Author: Colazzo Dario
Dudouet François-Xavier
Manolescu Ioana
Nguyen Benjamin
Senellart Pierre
Vion Antoine
Publication venue
Publication date: 01/01/2011
Field of study

In this paper, we present the guidelines for an XML-based approach for the sociological study of Web data such as the analysis of mailing lists or databases available online. The use of an XML warehouse is a flexible solution for storing and processing this kind of data. We propose an implemented solution and show possible applications with our case study of profiles of experts involved in W3C standard-setting activity. We illustrate the sociological use of semi-structured databases by presenting our XML Schema for mailing-list warehousing. An XML Schema allows many adjunctions or crossings of data sources, without modifying existing data sets, while allowing possible structural evolution. We also show that the existence of hidden data implies increased complexity for traditional SQL users. XML content warehousing allows altogether exhaustive warehousing and recursive queries through contents, with far less dependence on the initial storage. We finally present the possibility of exporting the data stored in the warehouse to commonly-used advanced software devoted to sociological analysis

arXiv.org e-Print Archive

Base de publications de l'université Paris-Dauphine

Crossref

INRIA a CCSD electronic archive server

HAL UVSQ

HAL-Rennes 1

Designing the venue logistics management operations for a World Exposition

Author: Colicchia Claudia
Creazza Alessandro
Dallari Fabrizio
Publication venue: 'Informa UK Limited'
Publication date: 16/07/2014
Field of study

World Expositions, due to their size and peculiar features, pose a number of logistics challenges. This paper aims at developing a design framework for the venue logistics management (VLM) operations to replenish food products to the event site, through a combination of qualitative and quantitative research approaches. First, an in-depth interview methodology, combined with the outcomes of a literature review, is adopted for defining the key variables for the tactical and operational set-up of the VLM system. Second, a quantitative approach is developed to define the necessary logistics resources. The framework is then applied to the case of Milan 2015 World Exposition. It is the first time that such a design framework for a World Exposition is presented: the originality of this research lies in the proposal of a systematic approach that adds to the experiential practices constituting the current body of knowledge on event logistics

Repository@Hull - Worktribe

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

ARL - Archivio della Ricerca LIUC

Privacy and Confidentiality in an e-Commerce World: Data Mining, Data Warehousing, Matching and Disclosure Limitation

Author: Fienberg Stephen E.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2006
Field of study

The growing expanse of e-commerce and the widespread availability of online databases raise many fears regarding loss of privacy and many statistical challenges. Even with encryption and other nominal forms of protection for individual databases, we still need to protect against the violation of privacy through linkages across multiple databases. These issues parallel those that have arisen and received some attention in the context of homeland security. Following the events of September 11, 2001, there has been heightened attention in the United States and elsewhere to the use of multiple government and private databases for the identification of possible perpetrators of future attacks, as well as an unprecedented expansion of federal government data mining activities, many involving databases containing personal information. We present an overview of some proposals that have surfaced for the search of multiple databases which supposedly do not compromise possible pledges of confidentiality to the individuals whose data are included. We also explore their link to the related literature on privacy-preserving data mining. In particular, we focus on the matching problem across databases and the concept of ``selective revelation'' and their confidentiality implications.Comment: Published at http://dx.doi.org/10.1214/088342306000000240 in the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Heterogeneous biomedical database integration using a hybrid strategy: a p53 cancer research database.

Author: Bichutskiy Vadim Y
Brachmann Rainer K
Colman Richard
Lathrop Richard H
Publication venue: eScholarship, University of California
Publication date: 01/01/2006
Field of study

Complex problems in life science research give rise to multidisciplinary collaboration, and hence, to the need for heterogeneous database integration. The tumor suppressor p53 is mutated in close to 50% of human cancers, and a small drug-like molecule with the ability to restore native function to cancerous p53 mutants is a long-held medical goal of cancer treatment. The Cancer Research DataBase (CRDB) was designed in support of a project to find such small molecules. As a cancer informatics project, the CRDB involved small molecule data, computational docking results, functional assays, and protein structure data. As an example of the hybrid strategy for data integration, it combined the mediation and data warehousing approaches. This paper uses the CRDB to illustrate the hybrid strategy as a viable approach to heterogeneous data integration in biomedicine, and provides a design method for those considering similar systems. More efficient data sharing implies increased productivity, and, hopefully, improved chances of success in cancer research. (Code and database schemas are freely downloadable, http://www.igb.uci.edu/research/research.html.)

Directory of Open Access Journals

eScholarship - University of California

Graph-based Modelling of Concurrent Sequential Patterns

Author: Chen Weiru
Keech Malcolm
Lu Jing
Publication venue: Idea Group Inc
Publication date: 19/04/2010
Field of study

Structural relation patterns have been introduced recently to extend the search for complex patterns often hidden behind large sequences of data. This has motivated a novel approach to sequential patterns post-processing and a corresponding data mining method was proposed for Concurrent Sequential Patterns (ConSP). This article refines the approach in the context of ConSP modelling, where a companion graph-based model is devised as an extension of previous work. Two new modelling methods are presented here together with a construction algorithm, to complete the transformation of concurrent sequential patterns to a ConSP-Graph representation. Customer orders data is used to demonstrate the effectiveness of ConSP mining while synthetic sample data highlights the strength of the modelling technique, illuminating the theories developed