4,088 research outputs found
Diamond Dicing
In OLAP, analysts often select an interesting sample of the data. For
example, an analyst might focus on products bringing revenues of at least 100
000 dollars, or on shops having sales greater than 400 000 dollars. However,
current systems do not allow the application of both of these thresholds
simultaneously, selecting products and shops satisfying both thresholds. For
such purposes, we introduce the diamond cube operator, filling a gap among
existing data warehouse operations.
Because of the interaction between dimensions the computation of diamond
cubes is challenging. We compare and test various algorithms on large data sets
of more than 100 million facts. We find that while it is possible to implement
diamonds in SQL, it is inefficient. Indeed, our custom implementation can be a
hundred times faster than popular database engines (including a row-store and a
column-store).Comment: 29 page
Scalable aggregation predictive analytics: a query-driven machine learning approach
We introduce a predictive modeling solution that provides high quality predictive analytics over aggregation queries in Big Data environments. Our predictive methodology is generally applicable in environments in which large-scale data owners may or may not restrict access to their data and allow only aggregation operators like COUNT to be executed over their data. In this context, our methodology is based on historical queries and their answers to accurately predict ad-hoc queriesâ answers. We focus on the widely used set-cardinality, i.e., COUNT, aggregation query, as COUNT is a fundamental operator for both internal data system optimizations and for aggregation-oriented data exploration and predictive analytics. We contribute a novel, query-driven Machine Learning (ML) model whose goals are to: (i) learn the query-answer space from past issued queries, (ii) associate the query space with local linear regression & associative function estimators, (iii) define query similarity, and (iv) predict the cardinality of the answer set of unseen incoming queries, referred to the Set Cardinality Prediction (SCP) problem. Our ML model incorporates incremental ML algorithms for ensuring high quality prediction results. The significance of contribution lies in that it (i) is the only query-driven solution applicable over general Big Data environments, which include restricted-access data, (ii) offers incremental learning adjusted for arriving ad-hoc queries, which is well suited for query-driven data exploration, and (iii) offers a performance (in terms of scalability, SCP accuracy, processing time, and memory requirements) that is superior to data-centric approaches. We provide a comprehensive performance evaluation of our model evaluating its sensitivity, scalability and efficiency for quality predictive analytics. In addition, we report on the development and incorporation of our ML model in Spark showing its superior performance compared to the Sparkâs COUNT method
CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap
After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in
multimedia search engines, we have identified and analyzed gaps within European research effort during our second year.
In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio-
economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown
of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on
requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the
community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our
Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as
National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core
technological gaps that involve research challenges, and âenablersâ, which are not necessarily technical research
challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal
challenges
Coarse-Grained Kinetic Computations for Rare Events: Application to Micelle Formation
We discuss a coarse-grained approach to the computation of rare events in the
context of grand canonical Monte Carlo (GCMC) simulations of self-assembly of
surfactant molecules into micelles. The basic assumption is that the {\it
computational} system dynamics can be decomposed into two parts -- fast (noise)
and slow (reaction coordinates) dynamics, so that the system can be described
by an effective, coarse grained Fokker-Planck (FP) equation. While such an
assumption may be valid in many circumstances, an explicit form of FP equation
is not always available. In our computations we bypass the analytic derivation
of such an effective FP equation. The effective free energy gradient and the
state-dependent magnitude of the random noise, which are necessary to formulate
the effective Fokker-Planck equation, are obtained from ensembles of short
bursts of microscopic simulations {\it with judiciously chosen initial
conditions}. The reaction coordinate in our micelle formation problem is taken
to be the size of a cluster of surfactant molecules. We test the validity of
the effective FP description in this system and reconstruct a coarse-grained
free energy surface in good agreement with full-scale GCMC simulations. We also
show that, for very small clusters, the cluster size seizes to be a good
reaction coordinate for a one-dimensional effective description. We discuss
possible ways to improve the current model and to take higher-dimensional
coarse-grained dynamics into account
Treillis des concepts skylines : Analyse multidimensionnelle des skylines fond\'ee sur les ensembles en accord
The skyline concept has been introduced in order to exhibit the best objects
according to all the criterion combinations and makes it possible to analyse
the relationships between skyline objects. Like the data cube, the skycube is
so voluminous that reduction approaches are really necessary. In this paper, we
define an approach which partially materializes the skycube. The underlying
idea is to discard from the representation the skycuboids which can be computed
again the most easily. To meet this reduction objective, we characterize a
formal framework: the agree concept lattice. From this structure, we derive the
skyline concept lattice which is one of its constrained instances. The strong
points of our approach are: (i) it is attribute oriented; (ii) it provides a
boundary for the number of lattice nodes; (iii) it facilitates the navigation
within the Skycuboids
Are traditions of facet theory geographically bounded or transcendent?
[Abstract]
By drawing on a variety of sources, including personal correspondence with Brian Vickery, this paper
draws upon a socio-historical approach in order to provide a platform for continued conversations among
facet theorists and those who seek to create faceted applications. Once common ground is established, it is
but a small step to the creation of operational definitions and functional requirements as Slavic (2008) and
others have discussed. With variant terminology under control, facet theorists can move quickly to identify
and promote exemplars of best practice for those seeking to implement facets as search and discovery
structures in contemporary information spaces
Proceedings of the ECCS 2005 satellite workshop: embracing complexity in design - Paris 17 November 2005
Embracing complexity in design is one of the critical issues and challenges of the 21st century. As the realization grows that design activities and artefacts display properties associated with complex adaptive systems, so grows the need to use complexity concepts and methods to understand these properties and inform the design of better artifacts. It is a great challenge because complexity science represents an epistemological and methodological swift that promises a holistic approach in the understanding and operational support of design. But design is also a major contributor in complexity research. Design science is concerned with problems that are fundamental in the sciences in general and complexity sciences in particular. For instance, design has been perceived and studied as a ubiquitous activity inherent in every human activity, as the art of generating hypotheses, as a type of experiment, or as a creative co-evolutionary process. Design science and its established approaches and practices can be a great source for advancement and innovation in complexity science. These proceedings are the result of a workshop organized as part of the activities of a UK government AHRB/EPSRC funded research cluster called Embracing Complexity in Design (www.complexityanddesign.net) and the European Conference in Complex Systems (complexsystems.lri.fr). Embracing complexity in design is one of the critical issues and challenges of the 21st century. As the realization grows that design activities and artefacts display properties associated with complex adaptive systems, so grows the need to use complexity concepts and methods to understand these properties and inform the design of better artifacts. It is a great challenge because complexity science represents an epistemological and methodological swift that promises a holistic approach in the understanding and operational support of design. But design is also a major contributor in complexity research. Design science is concerned with problems that are fundamental in the sciences in general and complexity sciences in particular. For instance, design has been perceived and studied as a ubiquitous activity inherent in every human activity, as the art of generating hypotheses, as a type of experiment, or as a creative co-evolutionary process. Design science and its established approaches and practices can be a great source for advancement and innovation in complexity science. These proceedings are the result of a workshop organized as part of the activities of a UK government AHRB/EPSRC funded research cluster called Embracing Complexity in Design (www.complexityanddesign.net) and the European Conference in Complex Systems (complexsystems.lri.fr)
- âŠ