366,996 research outputs found

    Query-driven learning for predictive analytics of data subspace cardinality

    Get PDF
    Fundamental to many predictive analytics tasks is the ability to estimate the cardinality (number of data items) of multi-dimensional data subspaces, defined by query selections over datasets. This is crucial for data analysts dealing with, e.g., interactive data subspace explorations, data subspace visualizations, and in query processing optimization. However, in many modern data systems, predictive analytics may be (i) too costly money-wise, e.g., in clouds, (ii) unreliable, e.g., in modern Big Data query engines, where accurate statistics are difficult to obtain/maintain, or (iii) infeasible, e.g., for privacy issues. We contribute a novel, query-driven, function estimation model of analyst-defined data subspace cardinality. The proposed estimation model is highly accurate in terms of prediction and accommodating the well-known selection queries: multi-dimensional range and distance-nearest neighbors (radius) queries. Our function estimation model: (i) quantizes the vectorial query space, by learning the analysts’ access patterns over a data space, (ii) associates query vectors with their corresponding cardinalities of the analyst-defined data subspaces, (iii) abstracts and employs query vectorial similarity to predict the cardinality of an unseen/unexplored data subspace, and (iv) identifies and adapts to possible changes of the query subspaces based on the theory of optimal stopping. The proposed model is decentralized, facilitating the scaling-out of such predictive analytics queries. The research significance of the model lies in that (i) it is an attractive solution when data-driven statistical techniques are undesirable or infeasible, (ii) it offers a scale-out, decentralized training solution, (iii) it is applicable to different selection query types, and (iv) it offers a performance that is superior to that of data-driven approaches

    Cooperation between expert knowledge and data mining discovered knowledge: Lessons learned

    Get PDF
    Expert systems are built from knowledge traditionally elicited from the human expert. It is precisely knowledge elicitation from the expert that is the bottleneck in expert system construction. On the other hand, a data mining system, which automatically extracts knowledge, needs expert guidance on the successive decisions to be made in each of the system phases. In this context, expert knowledge and data mining discovered knowledge can cooperate, maximizing their individual capabilities: data mining discovered knowledge can be used as a complementary source of knowledge for the expert system, whereas expert knowledge can be used to guide the data mining process. This article summarizes different examples of systems where there is cooperation between expert knowledge and data mining discovered knowledge and reports our experience of such cooperation gathered from a medical diagnosis project called Intelligent Interpretation of Isokinetics Data, which we developed. From that experience, a series of lessons were learned throughout project development. Some of these lessons are generally applicable and others pertain exclusively to certain project types

    Data Science and Ebola

    Get PDF
    Data Science---Today, everybody and everything produces data. People produce large amounts of data in social networks and in commercial transactions. Medical, corporate, and government databases continue to grow. Sensors continue to get cheaper and are increasingly connected, creating an Internet of Things, and generating even more data. In every discipline, large, diverse, and rich data sets are emerging, from astrophysics, to the life sciences, to the behavioral sciences, to finance and commerce, to the humanities and to the arts. In every discipline people want to organize, analyze, optimize and understand their data to answer questions and to deepen insights. The science that is transforming this ocean of data into a sea of knowledge is called data science. This lecture will discuss how data science has changed the way in which one of the most visible challenges to public health is handled, the 2014 Ebola outbreak in West Africa.Comment: Inaugural lecture Leiden Universit

    Big-Data-Driven Materials Science and its FAIR Data Infrastructure

    Get PDF
    This chapter addresses the forth paradigm of materials research -- big-data driven materials science. Its concepts and state-of-the-art are described, and its challenges and chances are discussed. For furthering the field, Open Data and an all-embracing sharing, an efficient data infrastructure, and the rich ecosystem of computer codes used in the community are of critical importance. For shaping this forth paradigm and contributing to the development or discovery of improved and novel materials, data must be what is now called FAIR -- Findable, Accessible, Interoperable and Re-purposable/Re-usable. This sets the stage for advances of methods from artificial intelligence that operate on large data sets to find trends and patterns that cannot be obtained from individual calculations and not even directly from high-throughput studies. Recent progress is reviewed and demonstrated, and the chapter is concluded by a forward-looking perspective, addressing important not yet solved challenges.Comment: submitted to the Handbook of Materials Modeling (eds. S. Yip and W. Andreoni), Springer 2018/201
    • …
    corecore