137 research outputs found
Notices bibliographiques en Unimarc dans les catalogues collectifs algériens - Diaporama
Diaporama accompagnant l\u27intervention de Fatima Zahra Ali Pacha sur les notices bibliographiques en Unimarc en Algérie
Data Mining-based Fragmentation of XML Data Warehouses
With the multiplication of XML data sources, many XML data warehouse models
have been proposed to handle data heterogeneity and complexity in a way
relational data warehouses fail to achieve. However, XML-native database
systems currently suffer from limited performances, both in terms of manageable
data volume and response time. Fragmentation helps address both these issues.
Derived horizontal fragmentation is typically used in relational data
warehouses and can definitely be adapted to the XML context. However, the
number of fragments produced by classical algorithms is difficult to control.
In this paper, we propose the use of a k-means-based fragmentation approach
that allows to master the number of fragments through its parameter. We
experimentally compare its efficiency to classical derived horizontal
fragmentation algorithms adapted to XML data warehouses and show its
superiority
Evolving NoSQL Databases Without Downtime
NoSQL databases like Redis, Cassandra, and MongoDB are increasingly popular
because they are flexible, lightweight, and easy to work with. Applications
that use these databases will evolve over time, sometimes necessitating (or
preferring) a change to the format or organization of the data. The problem we
address in this paper is: How can we support the evolution of high-availability
applications and their NoSQL data online, without excessive delays or
interruptions, even in the presence of backward-incompatible data format
changes?
We present KVolve, an extension to the popular Redis NoSQL database, as a
solution to this problem. KVolve permits a developer to submit an upgrade
specification that defines how to transform existing data to the newest
version. This transformation is applied lazily as applications interact with
the database, thus avoiding long pause times. We demonstrate that KVolve is
expressive enough to support substantial practical updates, including format
changes to RedisFS, a Redis-backed file system, while imposing essentially no
overhead in general use and minimal pause times during updates.Comment: Update to writing/structur
Model selection for semi-supervised clustering
Although there is a large and growing literature that tackles the semi-supervised clustering problem (i.e., using some labeled objects or cluster-guiding constraints like \must-link" or \cannot-link"), the evaluation of semi-supervised clustering approaches has rarely been discussed. The application of cross-validation techniques, for example, is far from straightforward in the semi-supervised setting, yet the problems associated with evaluation have yet to be addressed. Here we\ud
summarize these problems and provide a solution.\ud
Furthermore, in order to demonstrate practical applicability of semi-supervised clustering methods, we provide a method for model selection in semi-supervised clustering based on this sound evaluation procedure. Our method allows the user to select, based on the available information\ud
(labels or constraints), the most appropriate clustering model (e.g., number of clusters, density-parameters) for a given problem.NSERC (Canada)FAPESP (Brazil)CNPq (Brazil
Towards Mobility Data Science (Vision Paper)
Mobility data captures the locations of moving objects such as humans,
animals, and cars. With the availability of GPS-equipped mobile devices and
other inexpensive location-tracking technologies, mobility data is collected
ubiquitously. In recent years, the use of mobility data has demonstrated
significant impact in various domains including traffic management, urban
planning, and health sciences. In this paper, we present the emerging domain of
mobility data science. Towards a unified approach to mobility data science, we
envision a pipeline having the following components: mobility data collection,
cleaning, analysis, management, and privacy. For each of these components, we
explain how mobility data science differs from general data science, we survey
the current state of the art and describe open challenges for the research
community in the coming years.Comment: Updated arXiv metadata to include two authors that were missing from
the metadata. PDF has not been change
- …