1,633 research outputs found

    Low-Power CMOS Vision Sensor for Gaussian Pyramid Extraction

    Get PDF
    This paper introduces a CMOS vision sensor chip in a standard 0.18 μm CMOS technology for Gaussian pyramid extraction. The Gaussian pyramid provides computer vision algorithms with scale invariance, which permits having the same response regardless of the distance of the scene to the camera. The chip comprises 176×120 photosensors arranged into 88×60 processing elements (PEs). The Gaussian pyramid is generated with a double-Euler switched capacitor (SC) network. Every PE comprises four photodiodes, one 8 b single-slope analog-to-digital converter, one correlated double sampling circuit, and four state capacitors with their corresponding switches to implement the double-Euler SC network. Every PE occupies 44×44 μm2 . Measurements from the chip are presented to assess the accuracy of the generated Gaussian pyramid for visual tracking applications. Error levels are below 2% full-scale output, thus making the chip feasible for these applications. Also, energy cost is 26.5 nJ/px at 2.64 Mpx/s, thus outperforming conventional solutions of imager plus microprocessor unit.Office of Naval Research, USA N00014-14-1-0355Ministerio de Economía y Competitividad TEC2015-66878- C3-1-R, TEC2015-66878-C3-3-RJunta de Andalucía TIC 2338, EM2013/038, EM2014/01

    Gaussian Pyramid Extraction with a CMOS Vision Sensor

    Get PDF
    Comunicación presentada en 2014 14th International Workshop on Cellular Nanoscale Networks and Their Applications, CNNA 2014; University of Notre Dame; United States; 29 July 2014 through 31 July 2014This paper addresses a CMOS vision sensor with 176 × 120 pixels in standard 0.18 μm CMOS technology that computes the Gaussian pyramid. The Gaussian pyramid is extracted with a double-Euler switched-capacitor network, giving RMSE errors below 1.2% of full-scale value. The chip provides a Gaussian pyramid of 3 octaves with 6 scales each with an energy cost of 26.5 nJ at 2.64 Mpx/s.Gobierno de España ONR N000141410355 TEC2009-12686 MICINNMINECO TEC2012- 38921-C02 (FEDER)MINECO IPT-2011-1625-430000 IPC-20111009Junta de Andalucía TIC 2338-2013Xunta de Galicia EM2013 / 038 (FEDER)FEDER CN2012/151 GPC2013 / 04

    Beyond Gaussian Pyramid: Multi-skip Feature Stacking for Action Recognition

    Full text link
    Most state-of-the-art action feature extractors involve differential operators, which act as highpass filters and tend to attenuate low frequency action information. This attenuation introduces bias to the resulting features and generates ill-conditioned feature matrices. The Gaussian Pyramid has been used as a feature enhancing technique that encodes scale-invariant characteristics into the feature space in an attempt to deal with this attenuation. However, at the core of the Gaussian Pyramid is a convolutional smoothing operation, which makes it incapable of generating new features at coarse scales. In order to address this problem, we propose a novel feature enhancing technique called Multi-skIp Feature Stacking (MIFS), which stacks features extracted using a family of differential filters parameterized with multiple time skips and encodes shift-invariance into the frequency space. MIFS compensates for information lost from using differential operators by recapturing information at coarse scales. This recaptured information allows us to match actions at different speeds and ranges of motion. We prove that MIFS enhances the learnability of differential-based features exponentially. The resulting feature matrices from MIFS have much smaller conditional numbers and variances than those from conventional methods. Experimental results show significantly improved performance on challenging action recognition and event detection tasks. Specifically, our method exceeds the state-of-the-arts on Hollywood2, UCF101 and UCF50 datasets and is comparable to state-of-the-arts on HMDB51 and Olympics Sports datasets. MIFS can also be used as a speedup strategy for feature extraction with minimal or no accuracy cost

    Multi-scale digital soil mapping with deep learning

    Get PDF
    We compared different methods of multi-scale terrain feature construction and their relative effectiveness for digital soil mapping with a Deep Learning algorithm. The most common approach for multi-scale feature construction in DSM is to filter terrain attributes based on different neighborhood sizes, however results can be difficult to interpret because the approach is affected by outliers. Alternatively, one can derive the terrain attributes on decomposed elevation data, but the resulting maps can have artefacts rendering the approach undesirable. Here, we introduce ‘mixed scaling’ a new method that overcomes these issues and preserves the landscape features that are identifiable at different scales. The new method also extends the Gaussian pyramid by introducing additional intermediate scales. This minimizes the risk that the scales that are important for soil formation are not available in the model. In our extended implementation of the Gaussian pyramid, we tested four intermediate scales between any two consecutive octaves of the Gaussian pyramid and modelled the data with Deep Learning and Random Forests. We performed the experiments using three different datasets and show that mixed scaling with the extended Gaussian pyramid produced the best performing set of covariates and that modelling with Deep Learning produced the most accurate predictions, which on average were 4–7% more accurate compared to modelling with Random Forests

    CMOS-3D smart imager architectures for feature detection

    Get PDF
    This paper reports a multi-layered smart image sensor architecture for feature extraction based on detection of interest points. The architecture is conceived for 3-D integrated circuit technologies consisting of two layers (tiers) plus memory. The top tier includes sensing and processing circuitry aimed to perform Gaussian filtering and generate Gaussian pyramids in fully concurrent way. The circuitry in this tier operates in mixed-signal domain. It embeds in-pixel correlated double sampling, a switched-capacitor network for Gaussian pyramid generation, analog memories and a comparator for in-pixel analog-to-digital conversion. This tier can be further split into two for improved resolution; one containing the sensors and another containing a capacitor per sensor plus the mixed-signal processing circuitry. Regarding the bottom tier, it embeds digital circuitry entitled for the calculation of Harris, Hessian, and difference-of-Gaussian detectors. The overall system can hence be configured by the user to detect interest points by using the algorithm out of these three better suited to practical applications. The paper describes the different kind of algorithms featured and the circuitry employed at top and bottom tiers. The Gaussian pyramid is implemented with a switched-capacitor network in less than 50 μs, outperforming more conventional solutions.Xunta de Galicia 10PXIB206037PRMinisterio de Ciencia e Innovación TEC2009-12686, IPT-2011-1625-430000Office of Naval Research N00014111031

    Multiscale approaches to music audio feature learning

    Get PDF
    Content-based music information retrieval tasks are typically solved with a two-stage approach: features are extracted from music audio signals, and are then used as input to a regressor or classifier. These features can be engineered or learned from data. Although the former approach was dominant in the past, feature learning has started to receive more attention from the MIR community in recent years. Recent results in feature learning indicate that simple algorithms such as K-means can be very effective, sometimes surpassing more complicated approaches based on restricted Boltzmann machines, autoencoders or sparse coding. Furthermore, there has been increased interest in multiscale representations of music audio recently. Such representations are more versatile because music audio exhibits structure on multiple timescales, which are relevant for different MIR tasks to varying degrees. We develop and compare three approaches to multiscale audio feature learning using the spherical K-means algorithm. We evaluate them in an automatic tagging task and a similarity metric learning task on the Magnatagatune dataset

    Single image example-based super-resolution using cross-scale patch matching and Markov random field modelling

    Get PDF
    Example-based super-resolution has become increasingly popular over the last few years for its ability to overcome the limitations of classical multi-frame approach. In this paper we present a new example-based method that uses the input low-resolution image itself as a search space for high-resolution patches by exploiting self-similarity across different resolution scales. Found examples are combined in a high-resolution image by the means of Markov Random Field modelling that forces their global agreement. Additionally, we apply back-projection and steering kernel regression as post-processing techniques. In this way, we are able to produce sharp and artefact-free results that are comparable or better than standard interpolation and state-of-the-art super-resolution techniques

    Salient Regions for Query by Image Content

    No full text
    Much previous work on image retrieval has used global features such as colour and texture to describe the content of the image. However, these global features are insufficient to accurately describe the image content when different parts of the image have different characteristics. This paper discusses how this problem can be circumvented by using salient interest points and compares and contrasts an extension to previous work in which the concept of scale is incorporated into the selection of salient regions to select the areas of the image that are most interesting and generate local descriptors to describe the image characteristics in that region. The paper describes and contrasts two such salient region descriptors and compares them through their repeatability rate under a range of common image transforms. Finally, the paper goes on to investigate the performance of one of the salient region detectors in an image retrieval situation
    corecore