1,639 research outputs found
Scalable aggregation predictive analytics: a query-driven machine learning approach
We introduce a predictive modeling solution that provides high quality predictive analytics over aggregation queries in Big Data environments. Our predictive methodology is generally applicable in environments in which large-scale data owners may or may not restrict access to their data and allow only aggregation operators like COUNT to be executed over their data. In this context, our methodology is based on historical queries and their answers to accurately predict ad-hoc queries’ answers. We focus on the widely used set-cardinality, i.e., COUNT, aggregation query, as COUNT is a fundamental operator for both internal data system optimizations and for aggregation-oriented data exploration and predictive analytics. We contribute a novel, query-driven Machine Learning (ML) model whose goals are to: (i) learn the query-answer space from past issued queries, (ii) associate the query space with local linear regression & associative function estimators, (iii) define query similarity, and (iv) predict the cardinality of the answer set of unseen incoming queries, referred to the Set Cardinality Prediction (SCP) problem. Our ML model incorporates incremental ML algorithms for ensuring high quality prediction results. The significance of contribution lies in that it (i) is the only query-driven solution applicable over general Big Data environments, which include restricted-access data, (ii) offers incremental learning adjusted for arriving ad-hoc queries, which is well suited for query-driven data exploration, and (iii) offers a performance (in terms of scalability, SCP accuracy, processing time, and memory requirements) that is superior to data-centric approaches. We provide a comprehensive performance evaluation of our model evaluating its sensitivity, scalability and efficiency for quality predictive analytics. In addition, we report on the development and incorporation of our ML model in Spark showing its superior performance compared to the Spark’s COUNT method
A Compositional Object-Based Approach to Learning Physical Dynamics
We present the Neural Physics Engine (NPE), an object-based neural network architecture for learning predictive models of intuitive physics. We propose a factorization of a physical scene into composable object-based representations and also the NPE architecture whose compositional structure factorizes object dynamics into pairwise interactions. Our approach draws on the strengths of both symbolic and neural approaches: like a symbolic physics engine, the NPE is endowed with generic notions of objects and their interactions, but as a neural network it can also be trained via stochastic gradient descent to adapt to specific object properties and dynamics of different worlds. We evaluate the efficacy of our approach on simple rigid body dynamics in two-dimensional worlds. By comparing to less structured architectures, we show that our model's compositional representation of the structure in physical interactions improves its ability to predict movement, generalize to different numbers of objects, and infer latent properties of objects such as mass.National Science Foundation (U.S.) (Award CCF-1231216)United States. Office of Naval Research (Grant N00014-16-1-2007
Abstract Learning Frameworks for Synthesis
We develop abstract learning frameworks (ALFs) for synthesis that embody the
principles of CEGIS (counter-example based inductive synthesis) strategies that
have become widely applicable in recent years. Our framework defines a general
abstract framework of iterative learning, based on a hypothesis space that
captures the synthesized objects, a sample space that forms the space on which
induction is performed, and a concept space that abstractly defines the
semantics of the learning process. We show that a variety of synthesis
algorithms in current literature can be embedded in this general framework.
While studying these embeddings, we also generalize some of the synthesis
problems these instances are of, resulting in new ways of looking at synthesis
problems using learning. We also investigate convergence issues for the general
framework, and exhibit three recipes for convergence in finite time. The first
two recipes generalize current techniques for convergence used by existing
synthesis engines. The third technique is a more involved technique of which we
know of no existing instantiation, and we instantiate it to concrete synthesis
problems
Methodological framework and design process for applying evolutionary simulation to musical interactions
This paper focuses on a methodological framework where the creative design process evolves through iterative cycles. The design process undertakes a complex network of tasks for integrating two domain models: dynamical simulation and musical interaction. The framework accounts for engi-neering technical and compositional affordances to accom-modate evolving behaviors to be expressed in real time per-formance interplay. This is illustrated with a case study of simulated swarms of heterogeneous agents. Highly integrat-ed parallel work streams are elucidated with sub-process elicitation in simulation, system integration and software engineering, composition, and performance. Framework formalization draws upon the established RAD model with significant modification to present the extended version that can be multi-threaded for concurrent creative processes. Two landmarks of 20th century music automation are drawn diachronically to frame the technical discussion in a social context of listening practice, developed by modeling crea-tive process and testing musical assumptions. Revisited cannon is redirected from bygone exemplars to ongoing practice, illuminating three baseline requirements for a methodological framework: interdisciplinary platform archi-tecture, complex systems model of music creation, and agile listening. Concluding theses on second order listening and interdisciplinary architecture summarize the proposed methodological framework addressing contextual listening and technical culture
Recommended from our members
Learning from Sequential User Data: Models and Sample-efficient Algorithms
Recent advances in deep learning have made learning representation from ever-growing datasets possible in the domain of vision, natural language processing (NLP), and robotics, among others. However, deep networks are notoriously data-hungry; for example, training language models with attention mechanisms sometimes requires trillions of parameters and tokens. In contrast, we can often access a limited number of samples in many tasks. It is crucial to learn models from these `limited\u27 datasets. Learning with limited datasets can take several forms. In this thesis, we study how to select data samples sequentially such that downstream task performance is maximized. Moreover, we study how to introduce prior knowledge in the deep networks to maximize prediction performance. We focus on four sequential tasks: computerized adaptive testing in psychometrics, sketching in recommender systems, knowledge tracing in computer-assisted education, and career path modeling in the labor market.
In the first two tasks, we devise novel sample-efficient algorithms to query a minimal number of sequential samples to improve future predictions. We propose a Bilevel Optimization-Based framework for computerized adaptive testing to learn a data-driven question selection algorithm that improves existing data selection policies. We also tackle the sketching problem in the recommender system, with the task of recommending the next item using a stored subset of prior data samples. In this setting, we develop a data-driven sequential selection algorithm that tackles evolving downstream task distribution. In the last two tasks, we devise novel neural models to introduce prior knowledge exploiting limited data samples. For knowledge tracing, we propose a novel neural architecture, inspired by cognitive and psychometric models, to improve the prediction of students\u27 future performance and utilize the labeled data samples efficiently. For career path modeling, we propose a novel and interpretable monotonic nonlinear state-space model to analyze online user professional profiles and provide actionable feedback and recommendations to users on how they can reach their career goals.
The data-driven differentiable data selection algorithms for the first two tasks open up future directions to query (a non-differentiable operation) a minimal number of samples optimally to maximize prediction performance. The structures, introduced in the neural architecture for the models in the last two tasks using prior knowledge, open up future directions to learn deep models augmented with prior knowledge using limited data samples
- …