157,443 research outputs found

    Volume Component Analysis for Classification of LiDAR Data

    Get PDF
    One of the most difficult challenges of working with LiDAR data is the large amount of data points that are produced. Analysing these large data sets is an extremely time consuming process. For this reason, automatic perception of LiDAR scenes is a growing area of research. Currently, most LiDAR feature extraction relies on geometrical features specific to the point cloud of interest. These geometrical features are scene-specific, and often rely on the scale and orientation of the object for classification. This paper proposes a robust method for reduced dimensionality feature extraction of 3D objects using a volume component analysis (VCA) approach. This VCA approach is based on principal component analysis (PCA). PCA is a method of reduced feature extraction that computes a covariance matrix from the original input vector. The eigenvectors corresponding to the largest eigenvalues of the covariance matrix are used to describe an image. Block-based PCA is an adapted method for feature extraction in facial images because PCA, when performed in local areas of the image, can extract more significant features than can be extracted when the entire image is considered. The image space is split into several of these blocks, and PCA is computed individually for each block. This VCA proposes that a LiDAR point cloud can be represented as a series of voxels whose values correspond to the point density within that relative location. From this voxelized space, block-based PCA is used to analyze sections of the space where the sections, when combined, will represent features of the entire 3-D object. These features are then used as the input to a support vector machine which is trained to identify four classes of objects, vegetation, vehicles, buildings and barriers with an overall accuracy of 93.8%

    TRECVID 2007 - Overview

    Get PDF

    Rate-Distortion Classification for Self-Tuning IoT Networks

    Full text link
    Many future wireless sensor networks and the Internet of Things are expected to follow a software defined paradigm, where protocol parameters and behaviors will be dynamically tuned as a function of the signal statistics. New protocols will be then injected as a software as certain events occur. For instance, new data compressors could be (re)programmed on-the-fly as the monitored signal type or its statistical properties change. We consider a lossy compression scenario, where the application tolerates some distortion of the gathered signal in return for improved energy efficiency. To reap the full benefits of this paradigm, we discuss an automatic sensor profiling approach where the signal class, and in particular the corresponding rate-distortion curve, is automatically assessed using machine learning tools (namely, support vector machines and neural networks). We show that this curve can be reliably estimated on-the-fly through the computation of a small number (from ten to twenty) of statistical features on time windows of a few hundreds samples

    Deep Multimodal Speaker Naming

    Full text link
    Automatic speaker naming is the problem of localizing as well as identifying each speaking character in a TV/movie/live show video. This is a challenging problem mainly attributes to its multimodal nature, namely face cue alone is insufficient to achieve good performance. Previous multimodal approaches to this problem usually process the data of different modalities individually and merge them using handcrafted heuristics. Such approaches work well for simple scenes, but fail to achieve high performance for speakers with large appearance variations. In this paper, we propose a novel convolutional neural networks (CNN) based learning framework to automatically learn the fusion function of both face and audio cues. We show that without using face tracking, facial landmark localization or subtitle/transcript, our system with robust multimodal feature extraction is able to achieve state-of-the-art speaker naming performance evaluated on two diverse TV series. The dataset and implementation of our algorithm are publicly available online
    corecore