4,027 research outputs found

    Group Invariance, Stability to Deformations, and Complexity of Deep Convolutional Representations

    Get PDF
    The success of deep convolutional architectures is often attributed in part to their ability to learn multiscale and invariant representations of natural signals. However, a precise study of these properties and how they affect learning guarantees is still missing. In this paper, we consider deep convolutional representations of signals; we study their invariance to translations and to more general groups of transformations, their stability to the action of diffeomorphisms, and their ability to preserve signal information. This analysis is carried by introducing a multilayer kernel based on convolutional kernel networks and by studying the geometry induced by the kernel mapping. We then characterize the corresponding reproducing kernel Hilbert space (RKHS), showing that it contains a large class of convolutional neural networks with homogeneous activation functions. This analysis allows us to separate data representation from learning, and to provide a canonical measure of model complexity, the RKHS norm, which controls both stability and generalization of any learned model. In addition to models in the constructed RKHS, our stability analysis also applies to convolutional networks with generic activations such as rectified linear units, and we discuss its relationship with recent generalization bounds based on spectral norms

    Disentangling Geometric Deformation Spaces in Generative Latent Shape Models

    Full text link
    A complete representation of 3D objects requires characterizing the space of deformations in an interpretable manner, from articulations of a single instance to changes in shape across categories. In this work, we improve on a prior generative model of geometric disentanglement for 3D shapes, wherein the space of object geometry is factorized into rigid orientation, non-rigid pose, and intrinsic shape. The resulting model can be trained from raw 3D shapes, without correspondences, labels, or even rigid alignment, using a combination of classical spectral geometry and probabilistic disentanglement of a structured latent representation space. Our improvements include more sophisticated handling of rotational invariance and the use of a diffeomorphic flow network to bridge latent and spectral space. The geometric structuring of the latent space imparts an interpretable characterization of the deformation space of an object. Furthermore, it enables tasks like pose transfer and pose-aware retrieval without requiring supervision. We evaluate our model on its generative modelling, representation learning, and disentanglement performance, showing improved rotation invariance and intrinsic-extrinsic factorization quality over the prior model.Comment: 22 page

    Segmentation and Recognition of Eating Gestures from Wrist Motion Using Deep Learning

    Get PDF
    This research considers training a deep learning neural network for segmenting and classifying eating related gestures from recordings of subjects eating unscripted meals in a cafeteria environment. It is inspired by the recent trend of success in deep learning for solving a wide variety of machine related tasks such as image annotation, classification and segmentation. Image segmentation is a particularly important inspiration, and this work proposes a novel deep learning classifier for segmenting time-series data based on the work done in [25] and [30]. While deep learning has established itself as the state-of-the-art approach in image segmentation, particularly in works such as [2],[25] and [31], very little work has been done for segmenting time-series data using deep learning models. Wrist mounted IMU sensors such as accelerometers and gyroscopes can record activity from a subject in a free-living environment, while being encapsulated in a watch-like device and thus being inconspicuous. Such a device can be used to monitor eating related activities as well, and is thought to be useful for monitoring energy intake for healthy individuals as well as those afflicted with conditions such as being overweight or obese. The data set that is used for this research study is known as the Clemson Cafeteria Dataset, available publicly at [14]. It contains data for 276 people eating a meal at the Harcombe Dining Hall at Clemson University, which is a large cafeteria environment. The data includes wrist motion measurements (accelerometer x, y, z; gyroscope yaw, pitch, roll) recorded when the subjects each ate an unscripted meal. Each meal consisted of 1-4 courses, of which 488 were used as part of this research. The ground truth labelings of gestures were created by a set of 18 trained human raters, and consist of labels such as ’bite’ used to indicate when the subject starts to put food in their mouth, and later moves the hand away for more ’bites’ or other activities. Other labels include ’drink’ for liquid intake, ’rest’ for stationary hands and ’utensiling’ for actions such as cutting the food into bite size pieces, stirring a liquid or dipping food in sauce among other things. All other activities are labeled as ’other’ by the human raters. Previous work in our group focused on recognizing these gesture types from manually segmented data using hidden Markov models [24],[27]. This thesis builds on that work, by considering a deep learning classifier for automatically segmenting and recognizing gestures. The neural network classifier proposed as part of this research performs satisfactorily well at recognizing intake gestures, with 79.6% of ’bite’ and 80.7% of ’drink’ gestures being recognized correctly on average per meal. Overall 77.7% of all gestures were recognized correctly on average per meal, indicating that a deep learning classifier can successfully be used to simultaneously segment and identify eating gestures from wrist motion measured through IMU sensors
    • …
    corecore