5 research outputs found

    A sub-mW IoT-endnode for always-on visual monitoring and smart triggering

    Full text link
    This work presents a fully-programmable Internet of Things (IoT) visual sensing node that targets sub-mW power consumption in always-on monitoring scenarios. The system features a spatial-contrast 128x64128\mathrm{x}64 binary pixel imager with focal-plane processing. The sensor, when working at its lowest power mode (10μW10\mu W at 10 fps), provides as output the number of changed pixels. Based on this information, a dedicated camera interface, implemented on a low-power FPGA, wakes up an ultra-low-power parallel processing unit to extract context-aware visual information. We evaluate the smart sensor on three always-on visual triggering application scenarios. Triggering accuracy comparable to RGB image sensors is achieved at nominal lighting conditions, while consuming an average power between 193μW193\mu W and 277μW277\mu W, depending on context activity. The digital sub-system is extremely flexible, thanks to a fully-programmable digital signal processing engine, but still achieves 19x lower power consumption compared to MCU-based cameras with significantly lower on-board computing capabilities.Comment: 11 pages, 9 figures, submitteted to IEEE IoT Journa

    Exploring tradeoffs in accuracy, energy and latency of scale invariant feature transform in wireless camera networks

    No full text
    Advances in DSP technology create important avenues of research for embedded vision. One such avenue is the investigation of tradeoffs amongst system parameters which affect the energy, accuracy, and latency of the overall system. This paper reports work on benchmarking the performance and cost of Scale Invariant Feature Transform (SIFT) for visual classification on a Blackfin DSP processor. Through measurements and modeling of the camera sensor node, we investigate system performance (classification accuracy, latency, energy consumption) in light of image resolution, arithmetic precision, location of processing (local vs. server-side), and processor speed. A case study on counting eggs during avian nesting season is used to experimentally determine the tradeoffs of different design parameters and discuss implications to other application domains. Index Terms — embedded vision, system tradeoffs, DSP, SIFT, object recognitio

    A study of the scale-invariant feature transform on a parallel pipeline

    Get PDF
     Untitled Page In this thesis we study the running of the Scale Invariant Feature Transform (SIFT) algorithm on a pipelined computational platform. The SIFT algorithm is one of the most widely used methods for image feature extraction. We develop a tile based template for running SIFT that facilitates the analysis while abstracting away lower-level details. We formalize the computational pipeline and the time to execute any algorithm on it based on the relative times taken by the pipeline stages. In the context of the SIFT algorithm, this reduces the time to that of running the entire image through a bottlenecked stage and the time to run either the first or last tile through the remaining stages. Through an experimental study of the SIFT algorithm on a broad collection of test images, we determined image feature fraction values, that relate the sizes of the image extracts as it the computation proceeds through the stages of the SIFT algorithm. We show that for a single chip uniprocessor pipeline, the computational stage is the bottleneck. Specifically we show that for an N x N image with n x n tiles the overall time complexity is θ ( (n+x)2 pi Г0+αβN2x2 Г1+ (αβ+γ)n2logx P0 Г2 ) ; here x is the neigborhood of the tile, pi , po are the number of input, output pins of the chip, α,β,γ are the feature fractions, and Г0, Г1, Г2 are the input, compute, output clocks. The three terms in the expression represents the time complexities of input, compute and output stages. The input and output stages can be slowed down substantially without appreciate degradation of the overall performance. This slowdown can be traded off for lower power and higher signal quantity. For multicore chips, we show that for an N x N image on a P-core chip, the overall time complexity to process the image is θ ( N2 pi Г0+ (n2w2 + αβn2x2) P Г1+ (αβ+γ)n2logx P0 Г2 ) ; in addition to the quantities described earlier w is the window size used for the Gaussian blurring. Overall we establish that without improvements in the input bandwidth, the power of multicore processing cannot be used efficiently for SIFT

    Creating cohesive video with the narrative-informed use of ubiquitous wearable and imaging sensor networks

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2010.Page 232 blank. Cataloged from PDF version of thesis.Includes bibliographical references (p. 222-231).In today's digital era, elements of anyone's life can be captured, by themselves or others, and be instantly broadcast. With little or no regulation on the proliferation of camera technology and the increasing use of video for social communication, entertainment, and education, we have undoubtedly entered the age of ubiquitous media. A world permeated by connected video devices promises a more democratized approach to mass-media culture, enabling anyone to create and distribute personalized content. While these advancements present a plethora of possibilities, they are not without potential negative effects, particularly with regard to privacy, ownership, and the general decrease in quality associated with minimal barriers to entry. This dissertation presents a first-of-its-kind research platform designed to investigate the world of ubiquitous video devices in order to confront inherent problems and create new media applications. This system takes a novel approach to the creation of user-generated, documentary video by augmenting a network of video cameras integrated into the environment with on-body sensing. The distributed video camera network can record the entire life of anyone within its coverage range and it will be shown that it, almost instantly, records more audio and video than can be viewed without prohibitive human resource cost.(cont.) This drives the need to develop a mechanism to automatically understand the raw audiovisual information in order to create a cohesive video output that is understandable, informative, and/or enjoyable to its human audience. We address this need with the SPINNER system. As humans, we are inherently able to transform disconnected occurrences and ideas into cohesive narratives as a method to understand, remember, and communicate meaning. The design of the SPINNER application and ubiquitous sensor platform is informed by research into narratology, in other words how stories are created from fragmented events. The SPINNER system maps low level sensor data from the wearable sensors to higher level social signal and body language information. This information is used to label the raw video data. The SPINNER system can then build a cohesive narrative by stitching together the appropriately labeled video segments. The results from three test runs are shown, each resulting in one or more automatically edited video piece. The creation of these videos is evaluated through review by their intended audience and by comparing the system to a human trying to perform similar actions. In addition, the mapping of the wearable sensor data to meaningful information is evaluated by comparing the calculated results to those from human observation of the actual video.by Mathew Laibowitz.Ph.D
    corecore