255 research outputs found

    Joint-Based Action Progress Prediction

    Get PDF
    Action understanding is a fundamental computer vision branch for several applications, ranging from surveillance to robotics. Most works deal with localizing and recognizing the action in both time and space, without providing a characterization of its evolution. Recent works have addressed the prediction of action progress, which is an estimate of how far the action has advanced as it is performed. In this paper, we propose to predict action progress using a different modality compared to previous methods: body joints. Human body joints carry very precise information about human poses, which we believe are a much more lightweight and effective way of characterizing actions and therefore their execution. Estimating action progress can in fact be determined based on the understanding of how key poses follow each other during the development of an activity. We show how an action progress prediction model can exploit body joints and integrate it with modules providing keypoint and action information in order to be run directly from raw pixels. The proposed method is experimentally validated on the Penn Action Dataset

    Multiple future prediction leveraging synthetic trajectories

    Get PDF

    A Sparse and Locally Coherent Morphable Face Model for Dense Semantic Correspondence Across Heterogeneous 3D Faces

    Get PDF
    The 3D Morphable Model (3DMM) is a powerful statistical tool for representing 3D face shapes. To build a 3DMM, a training set of face scans in full point-to-point correspondence is required, and its modeling capabilities directly depend on the variability contained in the training data. Thus, to increase the descriptive power of the 3DMM, establishing a dense correspondence across heterogeneous scans with sufficient diversity in terms of identities, ethnicities, or expressions becomes essential. In this manuscript, we present a fully automatic approach that leverages a 3DMM to transfer its dense semantic annotation across raw 3D faces, establishing a dense correspondence between them. We propose a novel formulation to learn a set of sparse deformation components with local support on the face that, together with an original non-rigid deformation algorithm, allow the 3DMM to precisely fit unseen faces and transfer its semantic annotation. We extensively experimented our approach, showing it can effectively generalize to highly diverse samples and accurately establish a dense correspondence even in presence of complex facial expressions. The accuracy of the dense registration is demonstrated by building a heterogeneous, large-scale 3DMM from more than 9,000 fully registered scans obtained by joining three large datasets together

    Increasing Video Perceptual Quality with GANs and Semantic Coding

    Get PDF

    Explaining autonomous driving by learning end-to-end visual attention

    Get PDF

    Space-time Zernike Moments and Pyramid Kernel Descriptors for Action Classification

    Get PDF
    Action recognition in videos is a relevant and challenging task of automatic semantic video analysis. Most successful approaches exploit local space-time descriptors. These descriptors are usually carefully engineered in order to obtain feature invariance to photometric and geometric variations. The main drawback of space-time descriptors is high dimensionality and efficiency. In this paper we propose a novel descriptor based on 3D Zernike moments computed for space-time patches. Moments are by construction not redundant and therefore optimal for compactness. Given the hierarchical structure of our descriptor we propose a novel similarity procedure that exploits this structure comparing features as pyramids. The approach is tested on a public dataset and compared with state-of-the art descriptors

    Forgery detection from printed images: a tool in crime scene analysis

    Get PDF
    .The preliminary analysis of the genuineness of a photo is become, in the time, the first step of any forensic examination that involves images, in case there is not a certainty of its intrinsic authenticity. Digital cameras have largely replaced film based devices, till some years ago, in some areas (countries) just images made from film negatives where considered fully reliable in Court. There was a widespread prejudicial thought regarding a digital image which, according to some people, it cannot ever been considered a legal proof, since its “inconsistent digital nature”. Great efforts have been made by the forensic science community on this field and now, after all this year, different approaches have been unveiled to discover and declare possible malicious frauds, thus to establish whereas an image is authentic or not or, at least, to assess a certain degree of probability of its “pureness”. Nowadays it’s an easy practice to manipulate digital images by using powerful photo editing tools. In order to alter the original meaning of the image, copy-move forgery is the one of the most common ways of manipulating the contents. With this technique a portion of the image is copied and pasted once or more times elsewhere into the same image to hide something or change the real meaning of it. Whenever a digital image (or a printed image) will be presented as an evidence into a Court, it should be followed the criteria to analyze the document with a forensic approach to determine if it contains traces of manipulation. Image forensics literature offers several examples of detectors for such manipulation and, among them, the most recent and effective ones are those based on Zernike moments and those based on Scale Invariant Feature Transform (SIFT). In particular, the capability of SIFT to discover correspondences among similar visual contents allows the forensic analysis to detect even very accurate and realistic copy-move forgeries. In some situation, however, instead of a digital document only its analog version may be available. It is interesting to ask whether it is possible to identify tampering from a printed picture rather than its digital counterpart. Scanned documents or recaptured printed documents by a digital camera are widely used in a number of different scenarios, from medical imaging, law enforcement to banking and daily consumer use. So, in this paper, the problem of identifying copy-move forgery from a printed picture is investigated. The copy-move manipulation is detected by proving the presence of copy-move patches in the scanned image by using the tool, named CADET (Cloned Area DETector), based on our previous methodology which has been adapted in a version tailored for printed image case (e.g. choice of the minimum number of matched keypoints, size of the input image, etc.) In this paper a real case of murder is presented, where an image of a crime scene, submitted as a printed documentary evidence, had been modified by the defense advisors to reject the theory of accusation given by the Prosecutor. The goal of this paper is to experimentally investigate the requirement set under which reliable copy-move forgery detection is possible on printed images, in that way the forgery test is the very first step of an appropriate operational check list manual

    Determining the Optical Flow Using Wavelet Coefficients

    Get PDF
    The optical flow (OF) can be used to perform motion-based segmentation or 3D reconstruction. Many techniques have been developed to estimate the OF. Some approaches are based on global assumptions; others deal with local information. ALthough OF has been studied for more than one decade, reducing the estimation error is still a difficult problem. Generally, algorithms to determine the OF are based on an equation, which links the gradient components of the luminance signal, so as to impose its invariance over time. Therefore, to determine the OF, it is usually necessary to calculate the gradient components in space and time. A new way to approximate this gradient information from a spatio- temporal wavelet decomposition is proposed here. In other words, assuming that the luminance information of the video sequences be represented in a multiresolution structure for compression or transmission purposes, we propose to estimate the luminance gradient components directly from the coefficients of the wavelet transform. Using a multiresolution formalism, we provide a way to estimate the motion field at different resolution levels. OF estimates obtained at low resolution can be projected at higher resolution levels so as to improve the robustness of the estimation to noise and to better locate the flow discontinuities, while remaining computationally efficient. Results are shown for both synthetic and real-world sequences, comparing it with a non multiresolution approach

    PLM-IPE: A Pixel-Landmark Mutual Enhanced Framework for Implicit Preference Estimation

    Get PDF
    In this paper, we are interested in understanding how customers perceive fashion recommendations, in particular when observing a proposed combination of garments to compose an outfit. Automatically understanding how a suggested item is perceived, without any kind of active engagement, is in fact an essential block to achieve interactive applications. We propose a pixel-landmark mutual enhanced framework for implicit preference estimation, named PLM-IPE, which is capable of inferring the user's implicit preferences exploiting visual cues, without any active or conscious engagement. PLM-IPE consists of three key modules: pixel-based estimator, landmark-based estimator and mutual learning based optimization. The former two modules work on capturing the implicit reaction of the user from the pixel level and landmark level, respectively. The last module serves to transfer knowledge between the two parallel estimators. Towards evaluation, we collected a real-world dataset, named SentiGarment, which contains 3,345 facial reaction videos paired with suggested outfits and human labeled reaction scores. Extensive experiments show the superiority of our model over state-of-the-art approaches

    Level Set Methods in an EM Framework for Shape Classification and Estimation

    Full text link
    Abstract. In this paper, we propose an Expectation-Maximization (EM) approach to separate a shape database into different shape classes, while simultaneously estimating the shape contours that best exemplify each of the different shape classes. We begin our formulation by employ-ing the level set function as the shape descriptor. Next, for each shape class we assume that there exists an unknown underlying level set func-tion whose zero level set describes the contour that best represents the shapes within that shape class. The level set function for each exam-ple shape is modeled as a noisy measurement of the appropriate shape class’s unknown underlying level set function. Based on this measure-ment model and the judicious introduction of the class labels as hidden data, our EM formulation calculates the labels for shape classification and estimates the shape contours that best typify the different shape classes. This resulting iterative algorithm is computationally efficient, simple, and accurate. We demonstrate the utility and performance of this algorithm by applying it to two medical applications.
    • …
    corecore