134 research outputs found

    A new adaptive algorithm for video super-resolution with improved outlier handling capability

    Get PDF
    Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Engenharia Elétrica, Florianópolis, 2016.Abstract : Super resolution reconstruction (SRR) is a technique that consists basically in combining multiple low resolution images from a single scene in order to create an image with higher resolution. The main characteristics considered in the evaluation of SRR algorithms performance are the resulting image quality, its robustness to outliers and its computational cost. Among the super resolution algorithms present in the literature, the R-LMS has a very small computational cost, making it suitable for real-time operation. However, like many SRR techniques the R-LMS algorithm is also highly susceptible to outliers, which can lead the reconstructed image quality to be of lower quality than the low resolution observations. Although robust techniques have been proposed to mitigate this problem, the computational cost associated with even the simpler algorithms is not comparable to that of the R-LMS, making real-time operation impractical. It is therefore desirable to devise new algorithms that offer a better compromise between quality, robustness and computational cost. In this work, a new SRR technique based on the R-LMS algorithm is proposed. Based on the proximal-point cost function representation of the gradient descent iterative equation, an intuitive interpretation of the R-LMS algorithm behavior is obtained, both in ideal conditions and in the presence of outliers. Using a statistical model for the innovation outliers, a new regularization is then proposed to increase the algorithm robustness by allowing faster convergence on the subspace corresponding to the innovations while at the same time preserving the estimated image details. Two new algorithms are then derived. Computer simulations have shown that the new algorithms deliver a performance comparable to that of the R-LMS in the absence of outliers, and a significantly better performance in the presence of outliers, both quantitatively and visually. The computational cost of the proposed solution remained comparable to that of the R-LMS.Reconstrução com super resolução (SRR - Super resolution reconstruction) é uma técnica que consiste basicamente em combinar múltiplas imagens de baixa resolução a fim de formar uma única imagem com resolução superior. As principais características consideradas na avaliação de algoritmos de SRR são a qualidade da imagem reconstruída, sua robustez a outliers e o custo computacional associado. Uma maior qualidade nas imagens reconstruídas implica em um maior aumento efetivo na resolução das mesmas. Uma maior robustez, por outro lado, implica que um resultado de boa qualidade é obtido mesmo quando as imagens processadas não seguem fielmente o modelo matemático adotado. O custo computacional, por sua vez, é extremamente relevante em aplicações de SRR, dado que a dimensão do problema é extremamente grande. Uma das principais aplicações da SRR consiste na reconstrução de sequências de vídeo. De modo a facilitar o processamento em tempo real, o qual é um requisito frequente para aplicações de SRR de vídeo, algorítmos iterativos foram propostos, os quais processam apenas uma imagem a cada instante de tempo, utilizando informações presentes nas estimativas obtidas em instantes de tempo anteriores. Dentre os algoritmos de super resolução iterativos presentes na literatura, o R-LMS possui um custo computacional extremamente baixo, além de fornecer uma reconstrução com qualidade competitiva. Apesar disso, assim como grande parte das técnicas de SRR existentes o R-LMS é bastante suscetível a presença de outliers, os quais podem tornar a qualidade das imagens reconstruídas inferior àquela das observações de baixa resolução. A fim de mitigar esse problema, técnicas de SRR robusta foram propostas na literatura. Não obstante, mesmo o custo computacional dos algoritmos robustos mais simples não é comparável àquele do R-LMS, tornando o processamento em tempo real infactível. Deseja-se portanto desenvolver novos algoritmos que ofereçam um melhor compromisso entre qualidade, robustez e custo computacional. Neste trabalho uma nova técnica de SRR baseada no algoritmo R-LMS é proposta. Com base na representação da função custo do ponto proximal para a equação iterativa do método do gradiente, uma interpretação intuitiva para o comportamento do algoritmo R-LMS é obtida tanto para sua operação em condições ideais quanto na presença de outliers do tipo inovação, os quais representam variações significativas na cena entre frames adjacentes de uma sequência de vídeo. É demonstrado que o problema apresentado pelo R-LMS quanto a robustez à outliers de inovação se deve, principalmente, a sua baixa taxa de convergência. Além disso, um balanço direto pôde ser observado entre a rapidez da taxa de convergência e a preservação das informações estimadas em instantes de tempo anteriores. Desse modo, torna-se inviável obter, simultaneamente, uma boa qualidade no processamento de sequências bem comportadas e uma boa robustez na presença de inovações de grande porte. Desse modo, tem-se como objetivo projetar um algoritmo voltado à reconstrução de sequências de vídeo em tempo real que apresente uma maior robustez à outliers de grande porte, sem comprometer a preservação da informação estimada a partir da sequência de baixa resolução. Utilizando um modelo estatístico para os outliers provindos de inovações, uma nova regularização é proposta a fim de aumentar a robustez do algoritmo, permitindo simultaneamente uma convergência mais rápida no subespaço da imagem correspondente às inovações e a preservação dos detalhes previamente estimados. A partir disso dois novos algoritmos são então derivados. A nova regularização proposta penaliza variações entre estimativas adjacentes na sequência de vídeo em um subespaço aproximadamente ortogonal ao conteúdo das inovações. Verificou-se que o subespaço da imagem no qual a inovação contém menos energia é precisamente onde estão contidos os detalhes da imagem. Isso mostra que a regularização proposta, além de levar a uma maior robustez, também implica na preservação dos detalhes estimados na sequência de vídeo em instantes de tempo anteriores. Simulações computacionais mostram que apesar da solução proposta não levar a melhorias significativas no desempenho do algoritmo sob condições próximas às ideais, quando outliers estão presentes na sequência de imagens o método proposto superou consideravelmente o desempenho apresentado pelo R-LMS, tanto quantitativamente quanto visualmente. O custo computacional da solução proposta manteve-se comparável àquele do algoritmo R-LMS

    Rover and Telerobotics Technology Program

    Get PDF
    The Jet Propulsion Laboratory's (JPL's) Rover and Telerobotics Technology Program, sponsored by the National Aeronautics and Space Administration (NASA), responds to opportunities presented by NASA space missions and systems, and seeds commerical applications of the emerging robotics technology. The scope of the JPL Rover and Telerobotics Technology Program comprises three major segments of activity: NASA robotic systems for planetary exploration, robotic technology and terrestrial spin-offs, and technology for non-NASA sponsors. Significant technical achievements have been reached in each of these areas, including complete telerobotic system prototypes that have built and tested in realistic scenarios relevant to prospective users. In addition, the program has conducted complementary basic research and created innovative technology and terrestrial applications, as well as enabled a variety of commercial spin-offs

    Marshall Space Flight Center Research and Technology Report 2017

    Get PDF
    This report features over 60 technology development and scientific research efforts that collectively aim to enable new capabilities in spaceflight, expand the reach of human exploration, and reveal new knowledge about the universe in which we live. These efforts include a wide array of strategic developments: launch propulsion technologies that facilitate more reliable, routine, and cost effective access to space; in-space propulsion developments that provide new solutions to space transportation requirements; autonomous systems designed to increase our utilization of robotics to accomplish critical missions; life support technologies that target our ability to implement closed-loop environmental resource utilization; science instruments that enable terrestrial, solar, planetary and deep space observations and discovery; and manufacturing technologies that will change the way we fabricate everything from rocket engines to in situ generated fuel and consumables

    Automatic video segmentation employing object/camera modeling techniques

    Get PDF
    Practically established video compression and storage techniques still process video sequences as rectangular images without further semantic structure. However, humans watching a video sequence immediately recognize acting objects as semantic units. This semantic object separation is currently not reflected in the technical system, making it difficult to manipulate the video at the object level. The realization of object-based manipulation will introduce many new possibilities for working with videos like composing new scenes from pre-existing video objects or enabling user-interaction with the scene. Moreover, object-based video compression, as defined in the MPEG-4 standard, can provide high compression ratios because the foreground objects can be sent independently from the background. In the case that the scene background is static, the background views can even be combined into a large panoramic sprite image, from which the current camera view is extracted. This results in a higher compression ratio since the sprite image for each scene only has to be sent once. A prerequisite for employing object-based video processing is automatic (or at least user-assisted semi-automatic) segmentation of the input video into semantic units, the video objects. This segmentation is a difficult problem because the computer does not have the vast amount of pre-knowledge that humans subconsciously use for object detection. Thus, even the simple definition of the desired output of a segmentation system is difficult. The subject of this thesis is to provide algorithms for segmentation that are applicable to common video material and that are computationally efficient. The thesis is conceptually separated into three parts. In Part I, an automatic segmentation system for general video content is described in detail. Part II introduces object models as a tool to incorporate userdefined knowledge about the objects to be extracted into the segmentation process. Part III concentrates on the modeling of camera motion in order to relate the observed camera motion to real-world camera parameters. The segmentation system that is described in Part I is based on a background-subtraction technique. The pure background image that is required for this technique is synthesized from the input video itself. Sequences that contain rotational camera motion can also be processed since the camera motion is estimated and the input images are aligned into a panoramic scene-background. This approach is fully compatible to the MPEG-4 video-encoding framework, such that the segmentation system can be easily combined with an object-based MPEG-4 video codec. After an introduction to the theory of projective geometry in Chapter 2, which is required for the derivation of camera-motion models, the estimation of camera motion is discussed in Chapters 3 and 4. It is important that the camera-motion estimation is not influenced by foreground object motion. At the same time, the estimation should provide accurate motion parameters such that all input frames can be combined seamlessly into a background image. The core motion estimation is based on a feature-based approach where the motion parameters are determined with a robust-estimation algorithm (RANSAC) in order to distinguish the camera motion from simultaneously visible object motion. Our experiments showed that the robustness of the original RANSAC algorithm in practice does not reach the theoretically predicted performance. An analysis of the problem has revealed that this is caused by numerical instabilities that can be significantly reduced by a modification that we describe in Chapter 4. The synthetization of static-background images is discussed in Chapter 5. In particular, we present a new algorithm for the removal of the foreground objects from the background image such that a pure scene background remains. The proposed algorithm is optimized to synthesize the background even for difficult scenes in which the background is only visible for short periods of time. The problem is solved by clustering the image content for each region over time, such that each cluster comprises static content. Furthermore, it is exploited that the times, in which foreground objects appear in an image region, are similar to the corresponding times of neighboring image areas. The reconstructed background could be used directly as the sprite image in an MPEG-4 video coder. However, we have discovered that the counterintuitive approach of splitting the background into several independent parts can reduce the overall amount of data. In the case of general camera motion, the construction of a single sprite image is even impossible. In Chapter 6, a multi-sprite partitioning algorithm is presented, which separates the video sequence into a number of segments, for which independent sprites are synthesized. The partitioning is computed in such a way that the total area of the resulting sprites is minimized, while simultaneously satisfying additional constraints. These include a limited sprite-buffer size at the decoder, and the restriction that the image resolution in the sprite should never fall below the input-image resolution. The described multisprite approach is fully compatible to the MPEG-4 standard, but provides three advantages. First, any arbitrary rotational camera motion can be processed. Second, the coding-cost for transmitting the sprite images is lower, and finally, the quality of the decoded sprite images is better than in previously proposed sprite-generation algorithms. Segmentation masks for the foreground objects are computed with a change-detection algorithm that compares the pure background image with the input images. A special effect that occurs in the change detection is the problem of image misregistration. Since the change detection compares co-located image pixels in the camera-motion compensated images, a small error in the motion estimation can introduce segmentation errors because non-corresponding pixels are compared. We approach this problem in Chapter 7 by integrating risk-maps into the segmentation algorithm that identify pixels for which misregistration would probably result in errors. For these image areas, the change-detection algorithm is modified to disregard the difference values for the pixels marked in the risk-map. This modification significantly reduces the number of false object detections in fine-textured image areas. The algorithmic building-blocks described above can be combined into a segmentation system in various ways, depending on whether camera motion has to be considered or whether real-time execution is required. These different systems and example applications are discussed in Chapter 8. Part II of the thesis extends the described segmentation system to consider object models in the analysis. Object models allow the user to specify which objects should be extracted from the video. In Chapters 9 and 10, a graph-based object model is presented in which the features of the main object regions are summarized in the graph nodes, and the spatial relations between these regions are expressed with the graph edges. The segmentation algorithm is extended by an object-detection algorithm that searches the input image for the user-defined object model. We provide two objectdetection algorithms. The first one is specific for cartoon sequences and uses an efficient sub-graph matching algorithm, whereas the second processes natural video sequences. With the object-model extension, the segmentation system can be controlled to extract individual objects, even if the input sequence comprises many objects. Chapter 11 proposes an alternative approach to incorporate object models into a segmentation algorithm. The chapter describes a semi-automatic segmentation algorithm, in which the user coarsely marks the object and the computer refines this to the exact object boundary. Afterwards, the object is tracked automatically through the sequence. In this algorithm, the object model is defined as the texture along the object contour. This texture is extracted in the first frame and then used during the object tracking to localize the original object. The core of the algorithm uses a graph representation of the image and a newly developed algorithm for computing shortest circular-paths in planar graphs. The proposed algorithm is faster than the currently known algorithms for this problem, and it can also be applied to many alternative problems like shape matching. Part III of the thesis elaborates on different techniques to derive information about the physical 3-D world from the camera motion. In the segmentation system, we employ camera-motion estimation, but the obtained parameters have no direct physical meaning. Chapter 12 discusses an extension to the camera-motion estimation to factorize the motion parameters into physically meaningful parameters (rotation angles, focal-length) using camera autocalibration techniques. The speciality of the algorithm is that it can process camera motion that spans several sprites by employing the above multi-sprite technique. Consequently, the algorithm can be applied to arbitrary rotational camera motion. For the analysis of video sequences, it is often required to determine and follow the position of the objects. Clearly, the object position in image coordinates provides little information if the viewing direction of the camera is not known. Chapter 13 provides a new algorithm to deduce the transformation between the image coordinates and the real-world coordinates for the special application of sport-video analysis. In sport videos, the camera view can be derived from markings on the playing field. For this reason, we employ a model of the playing field that describes the arrangement of lines. After detecting significant lines in the input image, a combinatorial search is carried out to establish correspondences between lines in the input image and lines in the model. The algorithm requires no information about the specific color of the playing field and it is very robust to occlusions or poor lighting conditions. Moreover, the algorithm is generic in the sense that it can be applied to any type of sport by simply exchanging the model of the playing field. In Chapter 14, we again consider panoramic background images and particularly focus ib their visualization. Apart from the planar backgroundsprites discussed previously, a frequently-used visualization technique for panoramic images are projections onto a cylinder surface which is unwrapped into a rectangular image. However, the disadvantage of this approach is that the viewer has no good orientation in the panoramic image because he looks into all directions at the same time. In order to provide a more intuitive presentation of wide-angle views, we have developed a visualization technique specialized for the case of indoor environments. We present an algorithm to determine the 3-D shape of the room in which the image was captured, or, more generally, to compute a complete floor plan if several panoramic images captured in each of the rooms are provided. Based on the obtained 3-D geometry, a graphical model of the rooms is constructed, where the walls are displayed with textures that are extracted from the panoramic images. This representation enables to conduct virtual walk-throughs in the reconstructed room and therefore, provides a better orientation for the user. Summarizing, we can conclude that all segmentation techniques employ some definition of foreground objects. These definitions are either explicit, using object models like in Part II of this thesis, or they are implicitly defined like in the background synthetization in Part I. The results of this thesis show that implicit descriptions, which extract their definition from video content, work well when the sequence is long enough to extract this information reliably. However, high-level semantics are difficult to integrate into the segmentation approaches that are based on implicit models. Intead, those semantics should be added as postprocessing steps. On the other hand, explicit object models apply semantic pre-knowledge at early stages of the segmentation. Moreover, they can be applied to short video sequences or even still pictures since no background model has to be extracted from the video. The definition of a general object-modeling technique that is widely applicable and that also enables an accurate segmentation remains an important yet challenging problem for further research

    Recognition of Activities of Daily Living Based on Environmental Analyses Using Audio Fingerprinting Techniques: A Systematic Review

    Get PDF
    An increase in the accuracy of identification of Activities of Daily Living (ADL) is very important for different goals of Enhanced Living Environments and for Ambient Assisted Living (AAL) tasks. This increase may be achieved through identification of the surrounding environment. Although this is usually used to identify the location, ADL recognition can be improved with the identification of the sound in that particular environment. This paper reviews audio fingerprinting techniques that can be used with the acoustic data acquired from mobile devices. A comprehensive literature search was conducted in order to identify relevant English language works aimed at the identification of the environment of ADLs using data acquired with mobile devices, published between 2002 and 2017. In total, 40 studies were analyzed and selected from 115 citations. The results highlight several audio fingerprinting techniques, including Modified discrete cosine transform (MDCT), Mel-frequency cepstrum coefficients (MFCC), Principal Component Analysis (PCA), Fast Fourier Transform (FFT), Gaussian mixture models (GMM), likelihood estimation, logarithmic moduled complex lapped transform (LMCLT), support vector machine (SVM), constant Q transform (CQT), symmetric pairwise boosting (SPB), Philips robust hash (PRH), linear discriminant analysis (LDA) and discrete cosine transform (DCT).This work was supported by FCT project UID/EEA/50008/2013 (Este trabalho foi suportado pelo projecto FCT UID/EEA/50008/2013). The authors would also like to acknowledge the contribution of the COST Action IC1303—AAPELE—Architectures, Algorithms and Protocols for Enhanced Living Environments

    Exploiting Natural On-chip Redundancy for Energy Efficient Memory and Computing

    Get PDF
    Power density is currently the primary design constraint across most computing segments and the main performance limiting factor. For years, industry has kept power density constant, while increasing frequency, lowering transistors supply (Vdd) and threshold (Vth) voltages. However, Vth scaling has stopped because leakage current is exponentially related to it. Transistor count and integration density keep doubling every process generation (Moore’s Law), but the power budget caps the amount of hardware that can be active at the same time, leading to dark silicon. With each new generation, there are more resources available, but we cannot fully exploit their performance potential. In the last years, different research trends have explored how to cope with dark silicon and unlock the energy efficiency of the chips, including Near-Threshold voltage Computing (NTC) and approximate computing. NTC aggressively lowers Vdd to values near Vth. This allows a substantial reduction in power, as dynamic power scales quadratically with supply voltage. The resultant power reduction could be used to activate more chip resources and potentially achieve performance improvements. Unfortunately, Vdd scaling is limited by the tight functionality margins of on-chip SRAM transistors. When scaling Vdd down to values near-threshold, manufacture-induced parameter variations affect the functionality of SRAM cells, which eventually become not reliable. A large amount of emerging applications, on the other hand, features an intrinsic error-resilience property, tolerating a certain amount of noise. In this context, approximate computing takes advantage of this observation and exploits the gap between the level of accuracy required by the application and the level of accuracy given by the computation, providing that reducing the accuracy translates into an energy gain. However, deciding which instructions and data and which techniques are best suited for approximation still poses a major challenge. This dissertation contributes in these two directions. First, it proposes a new approach to mitigate the impact of SRAM failures due to parameter variation for effective operation at ultra-low voltages. We identify two levels of natural on-chip redundancy: cache level and content level. The first arises because of the replication of blocks in multi-level cache hierarchies. We exploit this redundancy with a cache management policy that allocates blocks to entries taking into account the nature of the cache entry and the use pattern of the block. This policy obtains performance improvements between 2% and 34%, with respect to block disabling, a technique with similar complexity, incurring no additional storage overhead. The latter (content level redundancy) arises because of the redundancy of data in real world applications. We exploit this redundancy compressing cache blocks to fit them in partially functional cache entries. At the cost of a slight overhead increase, we can obtain performance within 2% of that obtained when the cache is built with fault-free cells, even if more than 90% of the cache entries have at least a faulty cell. Then, we analyze how the intrinsic noise tolerance of emerging applications can be exploited to design an approximate Instruction Set Architecture (ISA). Exploiting the ISA redundancy, we explore a set of techniques to approximate the execution of instructions across a set of emerging applications, pointing out the potential of reducing the complexity of the ISA, and the trade-offs of the approach. In a proof-of-concept implementation, the ISA is shrunk in two dimensions: Breadth (i.e., simplifying instructions) and Depth (i.e., dropping instructions). This proof-of-concept shows that energy can be reduced on average 20.6% at around 14.9% accuracy loss

    Fusion of Data from Heterogeneous Sensors with Distributed Fields of View and Situation Evaluation for Advanced Driver Assistance Systems

    Get PDF
    In order to develop a driver assistance system for pedestrian protection, pedestrians in the environment of a truck are detected by radars and a camera and are tracked across distributed fields of view using a Joint Integrated Probabilistic Data Association filter. A robust approach for prediction of the system vehicles trajectory is presented. It serves the computation of a probabilistic collision risk based on reachable sets where different sources of uncertainty are taken into account

    Multi Agent Systems

    Get PDF
    Research on multi-agent systems is enlarging our future technical capabilities as humans and as an intelligent society. During recent years many effective applications have been implemented and are part of our daily life. These applications have agent-based models and methods as an important ingredient. Markets, finance world, robotics, medical technology, social negotiation, video games, big-data science, etc. are some of the branches where the knowledge gained through multi-agent simulations is necessary and where new software engineering tools are continuously created and tested in order to reach an effective technology transfer to impact our lives. This book brings together researchers working in several fields that cover the techniques, the challenges and the applications of multi-agent systems in a wide variety of aspects related to learning algorithms for different devices such as vehicles, robots and drones, computational optimization to reach a more efficient energy distribution in power grids and the use of social networks and decision strategies applied to the smart learning and education environments in emergent countries. We hope that this book can be useful and become a guide or reference to an audience interested in the developments and applications of multi-agent systems
    corecore