258 research outputs found

    Modeling human intuitions about liquid flow with particle-based simulation

    Full text link
    Humans can easily describe, imagine, and, crucially, predict a wide variety of behaviors of liquids--splashing, squirting, gushing, sloshing, soaking, dripping, draining, trickling, pooling, and pouring--despite tremendous variability in their material and dynamical properties. Here we propose and test a computational model of how people perceive and predict these liquid dynamics, based on coarse approximate simulations of fluids as collections of interacting particles. Our model is analogous to a "game engine in the head", drawing on techniques for interactive simulations (as in video games) that optimize for efficiency and natural appearance rather than physical accuracy. In two behavioral experiments, we found that the model accurately captured people's predictions about how liquids flow among complex solid obstacles, and was significantly better than two alternatives based on simple heuristics and deep neural networks. Our model was also able to explain how people's predictions varied as a function of the liquids' properties (e.g., viscosity and stickiness). Together, the model and empirical results extend the recent proposal that human physical scene understanding for the dynamics of rigid, solid objects can be supported by approximate probabilistic simulation, to the more complex and unexplored domain of fluid dynamics.Comment: Under review at PLOS Computational Biolog

    Envisioning the qualitative effects of robot manipulation actions using simulation-based projections

    Get PDF
    Autonomous robots that are to perform complex everyday tasks such as making pancakes have to understand how the effects of an action depend on the way the action is executed. Within Artificial Intelligence, classical planning reasons about whether actions are executable, but makes the assumption that the actions will succeed (with some probability). In this work, we have designed, implemented, and analyzed a framework that allows us to envision the physical effects of robot manipulation actions. We consider envisioning to be a qualitative reasoning method that reasons about actions and their effects based on simulation-based projections. Thereby it allows a robot to infer what could happen when it performs a task in a certain way. This is achieved by translating a qualitative physics problem into a parameterized simulation problem; performing a detailed physics-based simulation of a robot plan; logging the state evolution into appropriate data structures; and then translating these sub-symbolic data structures into interval-based first-order symbolic, qualitative representations, called timelines. The result of the envisioning is a set of detailed narratives represented by timelines which are then used to infer answers to qualitative reasoning problems. By envisioning the outcome of actions before committing to them, a robot is able to reason about physical phenomena and can therefore prevent itself from ending up in unwanted situations. Using this approach, robots can perform manipulation tasks more efficiently, robustly, and flexibly, and they can even successfully accomplish previously unknown variations of tasks

    Benchmarking computational fluid dynamics models of lava flow simulation for hazard assessment, forecasting, and risk management

    Get PDF
    Abstract Numerical simulations of lava flow emplacement are valuable for assessing lava flow hazards, forecasting active flows, designing flow mitigation measures, interpreting past eruptions, and understanding the controls on lava flow behavior. Existing lava flow models vary in simplifying assumptions, physics, dimensionality, and the degree to which they have been validated against analytical solutions, experiments, and natural observations. In order to assess existing models and guide the development of new codes, we conduct a benchmarking study of computational fluid dynamics (CFD) models for lava flow emplacement, including VolcFlow, OpenFOAM, FLOW-3D, COMSOL, and MOLASSES. We model viscous, cooling, and solidifying flows over horizontal planes, sloping surfaces, and into topographic obstacles. We compare model results to physical observations made during well-controlled analogue and molten basalt experiments, and to analytical theory when available. Overall, the models accurately simulate viscous flow with some variability in flow thickness where flows intersect obstacles. OpenFOAM, COMSOL, and FLOW-3D can each reproduce experimental measurements of cooling viscous flows, and OpenFOAM and FLOW-3D simulations with temperature-dependent rheology match results from molten basalt experiments. We assess the goodness-of-fit of the simulation results and the computational cost. Our results guide the selection of numerical simulation codes for different applications, including inferring emplacement conditions of past lava flows, modeling the temporal evolution of ongoing flows during eruption, and probabilistic assessment of lava flow hazard prior to eruption. Finally, we outline potential experiments and desired key observational data from future flows that would extend existing benchmarking data sets

    MULTIMODAL LEARNING FOR AUDIO AND VISUAL PROCESSING

    Get PDF
    The world contains vast amounts of information which can be sensed and captured in a variety of ways and formats. Virtual environments also lend themselves to endless possibilities and diversity of data. Often our experiences draw from these separate but complementary parts which can be combined in a way to provide a comprehensive representation of the events. Multimodal learning focuses on these types of combinations. By fusing multiple modalities, multimodal learning can improve results beyond individual mode performance. However, many of today’s state-of-the-art techniques in computer vision, robotics, and machine learning rely solely or primarily on visual inputs even when the visual data is obtained from video where corresponding audio may also be readily available to augment learning. Vision only approaches can experience challenges in cases of highly reflective, transparent, or occluded objects and scenes where, if used alone or in conjunction with, audio may improve task performance. To address these challenges, this thesis explores coupling multimodal information to enhance task performance through learning-based methods for audio and visual processing using real and synthetic data. Physically-based graphics pipelines can naturally be extended for audio and visual synthetic data generation. To enhance the rigid body sound synthesis pipeline for objects containing a liquid, I used an added mass operator for fluid-structure coupling as a pre-processing step. My method is fast and practical for use in interactive 3D systems where live sound synthesis is desired. By fusing audio and visual data from real and synthetic videos, we also demonstrate enhanced processing and performance for object classification, tracking, and reconstruction tasks. As has been shown in visual question and answering and other related work, multiple modalities have the ability to complement one another and outperform single modality systems. To the best of my knowledge, I introduced the first use of audio-visual neural networks to analyze liquid pouring sequences by classifying their weight, liquid, and receiving container. Prior work often required predefined source weights or visual data. My contribution was to use the sound from a pouring sequence—a liquid being poured into a target container- to train a multimodal convolutional neural networks (CNNs) that fuses mel-scaled spectrograms as audio inputs with corresponding visual data based on video images. I described the first use of an audio-visual neural network for tracking tabletop sized objects and enhancing visual object trackers. Like object detection of reflective surfaces, object trackers can also run into challenges when objects collide, occlude, appear similar, or come close to one another. By using the impact sounds of the objects during collision, my audio-visual object tracking (AVOT) neural network can correct trackers that drift from their original objects that were assigned before collision. Reflective and textureless surfaces not only are difficult to detect and classify, they are also often poorly reconstructed and filled with depth discontinuities and holes. I proposed the first use of an audiovisual method that uses the reflections of sound to aid in geometry and audio reconstruction, referred to as ”Echoreconstruction”. The mobile phone prototype emits pulsed audio, while recording video for RGBbased 3D reconstruction and audio-visual classification. Reflected sound and images from the video are input into our audio (EchoCNN-A) and audio-visual (EchoCNN-AV) convolutional neural networks for surface and sound source detection, depth estimation, and material classification. EchoCNN inferences from these classifications enhance scene 3D reconstructions containing open spaces and reflective surfaces by depth filtering, inpainting, and placement of unmixed sound sources in the scene. In addition to enhancing scene reconstructions, I proposed a multimodal single- and multi-frame reconstruction LSTM autoencoder for 3D reconstructions using audio-visual inputs. Our neural network produces high-quality 3D reconstructions using voxel representation. It is the first audio-visual reconstruction neural network for 3D geometry and material representation. Contributions of this thesis include new neural network designs, new enhancements to real and synthetic audio-visual datasets, and prototypes that demonstrate audio and audio-augmented performance for sound synthesis, inference, and reconstruction.Doctor of Philosoph

    Stochastic particle advection velocimetry (SPAV): theory, simulations, and proof-of-concept experiments

    Full text link
    Particle tracking velocimetry (PTV) is widely used to measure time-resolved, three-dimensional velocity and pressure fields in fluid dynamics research. Inaccurate localization and tracking of particles is a key source of error in PTV, especially for single camera defocusing, plenoptic imaging, and digital in-line holography (DIH) sensors. To address this issue, we developed stochastic particle advection velocimetry (SPAV): a statistical data loss that improves the accuracy of PTV. SPAV is based on an explicit particle advection model that predicts particle positions over time as a function of the estimated velocity field. The model can account for non-ideal effects like drag on inertial particles. A statistical data loss that compares the tracked and advected particle positions, accounting for arbitrary localization and tracking uncertainties, is derived and approximated. We implement our approach using a physics-informed neural network, which simultaneously minimizes the SPAV data loss, a Navier-Stokes physics loss, and a wall boundary loss, where appropriate. Results are reported for simulated and experimental DIH-PTV measurements of laminar and turbulent flows. Our statistical approach significantly improves the accuracy of PTV reconstructions compared to a conventional data loss, resulting in an average reduction of error close to 50%. Furthermore, our framework can be readily adapted to work with other data assimilation techniques like state observer, Kalman filter, and adjoint-variational methods

    Dark, Beyond Deep: A Paradigm Shift to Cognitive AI with Humanlike Common Sense

    Full text link
    Recent progress in deep learning is essentially based on a "big data for small tasks" paradigm, under which massive amounts of data are used to train a classifier for a single narrow task. In this paper, we call for a shift that flips this paradigm upside down. Specifically, we propose a "small data for big tasks" paradigm, wherein a single artificial intelligence (AI) system is challenged to develop "common sense", enabling it to solve a wide range of tasks with little training data. We illustrate the potential power of this new paradigm by reviewing models of common sense that synthesize recent breakthroughs in both machine and human vision. We identify functionality, physics, intent, causality, and utility (FPICU) as the five core domains of cognitive AI with humanlike common sense. When taken as a unified concept, FPICU is concerned with the questions of "why" and "how", beyond the dominant "what" and "where" framework for understanding vision. They are invisible in terms of pixels but nevertheless drive the creation, maintenance, and development of visual scenes. We therefore coin them the "dark matter" of vision. Just as our universe cannot be understood by merely studying observable matter, we argue that vision cannot be understood without studying FPICU. We demonstrate the power of this perspective to develop cognitive AI systems with humanlike common sense by showing how to observe and apply FPICU with little training data to solve a wide range of challenging tasks, including tool use, planning, utility inference, and social learning. In summary, we argue that the next generation of AI must embrace "dark" humanlike common sense for solving novel tasks.Comment: For high quality figures, please refer to http://wellyzhang.github.io/attach/dark.pd
    • …
    corecore