4,088 research outputs found

    Diamond Dicing

    Get PDF
    In OLAP, analysts often select an interesting sample of the data. For example, an analyst might focus on products bringing revenues of at least 100 000 dollars, or on shops having sales greater than 400 000 dollars. However, current systems do not allow the application of both of these thresholds simultaneously, selecting products and shops satisfying both thresholds. For such purposes, we introduce the diamond cube operator, filling a gap among existing data warehouse operations. Because of the interaction between dimensions the computation of diamond cubes is challenging. We compare and test various algorithms on large data sets of more than 100 million facts. We find that while it is possible to implement diamonds in SQL, it is inefficient. Indeed, our custom implementation can be a hundred times faster than popular database engines (including a row-store and a column-store).Comment: 29 page

    Scalable aggregation predictive analytics: a query-driven machine learning approach

    Get PDF
    We introduce a predictive modeling solution that provides high quality predictive analytics over aggregation queries in Big Data environments. Our predictive methodology is generally applicable in environments in which large-scale data owners may or may not restrict access to their data and allow only aggregation operators like COUNT to be executed over their data. In this context, our methodology is based on historical queries and their answers to accurately predict ad-hoc queries’ answers. We focus on the widely used set-cardinality, i.e., COUNT, aggregation query, as COUNT is a fundamental operator for both internal data system optimizations and for aggregation-oriented data exploration and predictive analytics. We contribute a novel, query-driven Machine Learning (ML) model whose goals are to: (i) learn the query-answer space from past issued queries, (ii) associate the query space with local linear regression & associative function estimators, (iii) define query similarity, and (iv) predict the cardinality of the answer set of unseen incoming queries, referred to the Set Cardinality Prediction (SCP) problem. Our ML model incorporates incremental ML algorithms for ensuring high quality prediction results. The significance of contribution lies in that it (i) is the only query-driven solution applicable over general Big Data environments, which include restricted-access data, (ii) offers incremental learning adjusted for arriving ad-hoc queries, which is well suited for query-driven data exploration, and (iii) offers a performance (in terms of scalability, SCP accuracy, processing time, and memory requirements) that is superior to data-centric approaches. We provide a comprehensive performance evaluation of our model evaluating its sensitivity, scalability and efficiency for quality predictive analytics. In addition, we report on the development and incorporation of our ML model in Spark showing its superior performance compared to the Spark’s COUNT method

    CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap

    Get PDF
    After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in multimedia search engines, we have identified and analyzed gaps within European research effort during our second year. In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio- economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal challenges

    Coarse-Grained Kinetic Computations for Rare Events: Application to Micelle Formation

    Full text link
    We discuss a coarse-grained approach to the computation of rare events in the context of grand canonical Monte Carlo (GCMC) simulations of self-assembly of surfactant molecules into micelles. The basic assumption is that the {\it computational} system dynamics can be decomposed into two parts -- fast (noise) and slow (reaction coordinates) dynamics, so that the system can be described by an effective, coarse grained Fokker-Planck (FP) equation. While such an assumption may be valid in many circumstances, an explicit form of FP equation is not always available. In our computations we bypass the analytic derivation of such an effective FP equation. The effective free energy gradient and the state-dependent magnitude of the random noise, which are necessary to formulate the effective Fokker-Planck equation, are obtained from ensembles of short bursts of microscopic simulations {\it with judiciously chosen initial conditions}. The reaction coordinate in our micelle formation problem is taken to be the size of a cluster of surfactant molecules. We test the validity of the effective FP description in this system and reconstruct a coarse-grained free energy surface in good agreement with full-scale GCMC simulations. We also show that, for very small clusters, the cluster size seizes to be a good reaction coordinate for a one-dimensional effective description. We discuss possible ways to improve the current model and to take higher-dimensional coarse-grained dynamics into account

    Flexible hierarchies and fuzzy knowledge-based OLAP

    Get PDF

    Treillis des concepts skylines : Analyse multidimensionnelle des skylines fond\'ee sur les ensembles en accord

    Full text link
    The skyline concept has been introduced in order to exhibit the best objects according to all the criterion combinations and makes it possible to analyse the relationships between skyline objects. Like the data cube, the skycube is so voluminous that reduction approaches are really necessary. In this paper, we define an approach which partially materializes the skycube. The underlying idea is to discard from the representation the skycuboids which can be computed again the most easily. To meet this reduction objective, we characterize a formal framework: the agree concept lattice. From this structure, we derive the skyline concept lattice which is one of its constrained instances. The strong points of our approach are: (i) it is attribute oriented; (ii) it provides a boundary for the number of lattice nodes; (iii) it facilitates the navigation within the Skycuboids

    Are traditions of facet theory geographically bounded or transcendent?

    Get PDF
    [Abstract] By drawing on a variety of sources, including personal correspondence with Brian Vickery, this paper draws upon a socio-historical approach in order to provide a platform for continued conversations among facet theorists and those who seek to create faceted applications. Once common ground is established, it is but a small step to the creation of operational definitions and functional requirements as Slavic (2008) and others have discussed. With variant terminology under control, facet theorists can move quickly to identify and promote exemplars of best practice for those seeking to implement facets as search and discovery structures in contemporary information spaces

    Multidimensional process discovery

    Get PDF

    Proceedings of the ECCS 2005 satellite workshop: embracing complexity in design - Paris 17 November 2005

    Get PDF
    Embracing complexity in design is one of the critical issues and challenges of the 21st century. As the realization grows that design activities and artefacts display properties associated with complex adaptive systems, so grows the need to use complexity concepts and methods to understand these properties and inform the design of better artifacts. It is a great challenge because complexity science represents an epistemological and methodological swift that promises a holistic approach in the understanding and operational support of design. But design is also a major contributor in complexity research. Design science is concerned with problems that are fundamental in the sciences in general and complexity sciences in particular. For instance, design has been perceived and studied as a ubiquitous activity inherent in every human activity, as the art of generating hypotheses, as a type of experiment, or as a creative co-evolutionary process. Design science and its established approaches and practices can be a great source for advancement and innovation in complexity science. These proceedings are the result of a workshop organized as part of the activities of a UK government AHRB/EPSRC funded research cluster called Embracing Complexity in Design (www.complexityanddesign.net) and the European Conference in Complex Systems (complexsystems.lri.fr). Embracing complexity in design is one of the critical issues and challenges of the 21st century. As the realization grows that design activities and artefacts display properties associated with complex adaptive systems, so grows the need to use complexity concepts and methods to understand these properties and inform the design of better artifacts. It is a great challenge because complexity science represents an epistemological and methodological swift that promises a holistic approach in the understanding and operational support of design. But design is also a major contributor in complexity research. Design science is concerned with problems that are fundamental in the sciences in general and complexity sciences in particular. For instance, design has been perceived and studied as a ubiquitous activity inherent in every human activity, as the art of generating hypotheses, as a type of experiment, or as a creative co-evolutionary process. Design science and its established approaches and practices can be a great source for advancement and innovation in complexity science. These proceedings are the result of a workshop organized as part of the activities of a UK government AHRB/EPSRC funded research cluster called Embracing Complexity in Design (www.complexityanddesign.net) and the European Conference in Complex Systems (complexsystems.lri.fr)
    • 

    corecore