457 research outputs found

    Multi-modal Fusion for Flasher Detection in a Mobile Video Chat Application

    Get PDF
    ABSTRACT This paper investigates the development of accurate and efficient classifiers to identify misbehaving users (i.e., "flashers") in a mobile video chat application. Our analysis is based on video session data collected from a mobile client that we built that connects to a popular random video chat service. We show that prior imagebased classifiers designed for identifying normal and misbehaving users in online video chat systems perform poorly on mobile video chat data. We present an enhanced image-based classifier that improves classification performance on mobile data. More importantly, we demonstrate that incorporating multi-modal mobile sensor data from accelerometer and the camera state (front/back) along with audio can significantly improve the overall image-based classification accuracy. Our work also shows that leveraging multiple image-based predictions within a session (i.e., temporal modality) has the potential to further improve the classification performance. Finally, we show that the cost of classification in terms of running time can be significantly reduced by employing a multilevel cascaded classifier in which high-complexity features and further image-based predictions are not generated unless needed

    Development of a text reading system on video images

    Get PDF
    Since the early days of computer science researchers sought to devise a machine which could automatically read text to help people with visual impairments. The problem of extracting and recognising text on document images has been largely resolved, but reading text from images of natural scenes remains a challenge. Scene text can present uneven lighting, complex backgrounds or perspective and lens distortion; it usually appears as short sentences or isolated words and shows a very diverse set of typefaces. However, video sequences of natural scenes provide a temporal redundancy that can be exploited to compensate for some of these deficiencies. Here we present a complete end-to-end, real-time scene text reading system on video images based on perspective aware text tracking. The main contribution of this work is a system that automatically detects, recognises and tracks text in videos of natural scenes in real-time. The focus of our method is on large text found in outdoor environments, such as shop signs, street names and billboards. We introduce novel efficient techniques for text detection, text aggregation and text perspective estimation. Furthermore, we propose using a set of Unscented Kalman Filters (UKF) to maintain each text region¿s identity and to continuously track the homography transformation of the text into a fronto-parallel view, thereby being resilient to erratic camera motion and wide baseline changes in orientation. The orientation of each text line is estimated using a method that relies on the geometry of the characters themselves to estimate a rectifying homography. This is done irrespective of the view of the text over a large range of orientations. We also demonstrate a wearable head-mounted device for text reading that encases a camera for image acquisition and a pair of headphones for synthesized speech output. Our system is designed for continuous and unsupervised operation over long periods of time. It is completely automatic and features quick failure recovery and interactive text reading. It is also highly parallelised in order to maximize the usage of available processing power and to achieve real-time operation. We show comparative results that improve the current state-of-the-art when correcting perspective deformation of scene text. The end-to-end system performance is demonstrated on sequences recorded in outdoor scenarios. Finally, we also release a dataset of text tracking videos along with the annotated ground-truth of text regions

    A Low-cost Depth Imaging Mobile Platform for Canola Phenotyping

    Get PDF
    To meet the high demand for supporting and accelerating progress in the breeding of novel traits, plant scientists and breeders have to measure a large number of plants and their characteristics accurately. A variety of imaging methodologies are being deployed to acquire data for quantitative studies of complex traits. When applied to a large number of plants such as canola plants, however, a complete three-dimensional (3D) model is time-consuming and expensive for high-throughput phenotyping with an enormous amount of data. In some contexts, a full rebuild of entire plants may not be necessary. In recent years, many 3D plan phenotyping techniques with high cost and large-scale facilities have been introduced to extract plant phenotypic traits, but these applications may be affected by limited research budgets and cross environments. This thesis proposed a low-cost depth and high-throughput phenotyping mobile platform to measure canola plant traits in cross environments. Methods included detecting and counting canola branches and seedpods, monitoring canola growth stages, and fusing color images to improve images resolution and achieve higher accuracy. Canola plant traits were examined in both controlled environment and field scenarios. These methodologies were enhanced by different imaging techniques. Results revealed that this phenotyping mobile platform can be used to investigate canola plant traits in cross environments with high accuracy. The results also show that algorithms for counting canola branches and seedpods enable crop researchers to analyze the relationship between canola genotypes and phenotypes and estimate crop yields. In addition to counting algorithms, fusing techniques can be helpful for plant breeders with more comfortable access plant characteristics by improving the definition and resolution of color images. These findings add value to the automation, low-cost depth and high-throughput phenotyping for canola plants. These findings also contribute a novel multi-focus image fusion that exhibits a competitive performance with outperforms some other state-of-the-art methods based on the visual saliency maps and gradient domain fast guided filter. This proposed platform and counting algorithms can be applied to not only canola plants but also other closely related species. The proposed fusing technique can be extended to other fields, such as remote sensing and medical image fusion

    An investigation into Quadtree fractal image and video compression

    Get PDF
    Digital imaging is the representation of drawings, photographs and pictures in a format that can be displayed and manipulated using a conventional computer. Digital imaging has enjoyed increasing popularity over recent years, with the explosion of digital photography, the Internet and graphics-intensive applications and games. Digitised images, like other digital media, require a relatively large amount of storage space. These storage requirements can become problematic as demands for higher resolution images increases and the resolution capabilities of digital cameras improve. It is not uncommon for a personal computer user to have a collection of thousands of digital images, mainly photographs, whilst the Internet’s Web pages present a practically infinite source. These two factors 一 image size and abundance 一 inevitably lead to a storage problem. As with other large files, data compression can help reduce these storage requirements. Data compression aims to reduce the overall storage requirements for a file by minimising redundancy. The most popular image compression method, JPEG, can reduce the storage requirements for a photographic image by a factor of ten whilst maintaining the appearance of the original image 一 or can deliver much greater levels of compression with a slight loss of quality as a trade-off. Whilst JPEG's efficiency has made it the definitive image compression algorithm, there is always a demand for even greater levels of compression and as a result new image compression techniques are constantly being explored. One such technique utilises the unique properties of Fractals. Fractals are relatively small mathematical formulae that can be used to generate abstract and often colourful images with infinite levels of detail. This property is of interest in the area of image compression because a detailed, high-resolution image can be represented by a few thousand bytes of formulae and coefficients rather than the more typical multi-megabyte filesizes. The real challenge associated with Fractal image compression is to determine the correct set of formulae and coefficients to represent the image a user is trying to compress; it is trivial to produce an image from a given formula but it is much, much harder to produce a formula from a given image. เท theory, Fractal compression can outperform JPEG for a given image and quality level, if the appropiate formulae can be determined. Fractal image compression can also be applied to digital video sequences, which are typically represented by a long series of digital images 一 or 'frames'

    State of the Art in Face Recognition

    Get PDF
    Notwithstanding the tremendous effort to solve the face recognition problem, it is not possible yet to design a face recognition system with a potential close to human performance. New computer vision and pattern recognition approaches need to be investigated. Even new knowledge and perspectives from different fields like, psychology and neuroscience must be incorporated into the current field of face recognition to design a robust face recognition system. Indeed, many more efforts are required to end up with a human like face recognition system. This book tries to make an effort to reduce the gap between the previous face recognition research state and the future state

    Prioritizing Content of Interest in Multimedia Data Compression

    Get PDF
    Image and video compression techniques make data transmission and storage in digital multimedia systems more efficient and feasible for the system's limited storage and bandwidth. Many generic image and video compression techniques such as JPEG and H.264/AVC have been standardized and are now widely adopted. Despite their great success, we observe that these standard compression techniques are not the best solution for data compression in special types of multimedia systems such as microscopy videos and low-power wireless broadcast systems. In these application-specific systems where the content of interest in the multimedia data is known and well-defined, we should re-think the design of a data compression pipeline. We hypothesize that by identifying and prioritizing multimedia data's content of interest, new compression methods can be invented that are far more effective than standard techniques. In this dissertation, a set of new data compression methods based on the idea of prioritizing the content of interest has been proposed for three different kinds of multimedia systems. I will show that the key to designing efficient compression techniques in these three cases is to prioritize the content of interest in the data. The definition of the content of interest of multimedia data depends on the application. First, I show that for microscopy videos, the content of interest is defined as the spatial regions in the video frame with pixels that don't only contain noise. Keeping data in those regions with high quality and throwing out other information yields to a novel microscopy video compression technique. Second, I show that for a Bluetooth low energy beacon based system, practical multimedia data storage and transmission is possible by prioritizing content of interest. I designed custom image compression techniques that preserve edges in a binary image, or foreground regions of a color image of indoor or outdoor objects. Last, I present a new indoor Bluetooth low energy beacon based augmented reality system that integrates a 3D moving object compression method that prioritizes the content of interest.Doctor of Philosoph

    Codage de cartes de profondeur par deformation de courbes elastiques

    Get PDF
    In multiple-view video plus depth, depth maps can be represented by means of grayscale images and the corresponding temporal sequence can be thought as a standard grayscale video sequence. However depth maps have different properties from natural images: they present large areas of smooth surfaces separated by sharp edges. Arguably the most important information lies in object contours, as a consequence an interesting approach consists in performing a lossless coding of the contour map, possibly followed by a lossy coding of per-object depth values.In this context, we propose a new technique for the lossless coding of object contours, based on the elastic deformation of curves. A continuous evolution of elastic deformations between two reference contour curves can be modelled, and an elastically deformed version of the reference contours can be sent to the decoder with an extremely small coding cost and used as side information to improve the lossless coding of the actual contour. After the main discontinuities have been captured by the contour description, the depth field inside each region is rather smooth. We proposed and tested two different techniques for the coding of the depth field inside each region. The first technique performs the shape-adaptive wavelet transform followed by the shape-adaptive version of SPIHT. The second technique performs a prediction of the depth field from its subsampled version and the set of coded contours. It is generally recognized that a high quality view rendering at the receiver side is possible only by preserving the contour information, since distortions on edges during the encoding step would cause a sensible degradation on the synthesized view and on the 3D perception. We investigated this claim by conducting a subjective quality assessment test to compare an object-based technique and a hybrid block-based techniques for the coding of depth maps.Dans le format multiple-view video plus depth, les cartes de profondeur peuvent être représentées comme des images en niveaux de gris et la séquence temporelle correspondante peut être considérée comme une séquence vidéo standard en niveaux de gris. Cependant les cartes de profondeur ont des propriétés différentes des images naturelles: ils présentent de grandes surfaces lisses séparées par des arêtes vives. On peut dire que l'information la plus importante réside dans les contours de l'objet, en conséquence une approche intéressante consiste à effectuer un codage sans perte de la carte de contour, éventuellement suivie d'un codage lossy des valeurs de profondeur par-objet.Dans ce contexte, nous proposons une nouvelle technique pour le codage sans perte des contours de l'objet, basée sur la déformation élastique des courbes. Une évolution continue des déformations élastiques peut être modélisée entre deux courbes de référence, et une version du contour déformée élastiquement peut être envoyé au décodeur avec un coût de codage très faible et utilisé comme information latérale pour améliorer le codage sans perte du contour réel. Après que les principales discontinuités ont été capturés par la description du contour, la profondeur à l'intérieur de chaque région est assez lisse. Nous avons proposé et testé deux techniques différentes pour le codage du champ de profondeur à l'intérieur de chaque région. La première technique utilise la version adaptative à la forme de la transformation en ondelette, suivie par la version adaptative à la forme de SPIHT.La seconde technique effectue une prédiction du champ de profondeur à partir de sa version sous-échantillonnée et l'ensemble des contours codés. Il est généralement reconnu qu'un rendu de haute qualité au récepteur pour un nouveau point de vue est possible que avec la préservation de l'information de contour, car des distorsions sur les bords lors de l'étape de codage entraînerait une dégradation évidente sur la vue synthétisée et sur la perception 3D. Nous avons étudié cette affirmation en effectuant un test d'évaluation de la qualité perçue en comparant, pour le codage des cartes de profondeur, une technique basée sur la compression d'objects et une techniques de codage vidéo hybride à blocs

    Efficient HEVC-based video adaptation using transcoding

    Get PDF
    In a video transmission system, it is important to take into account the great diversity of the network/end-user constraints. On the one hand, video content is typically streamed over a network that is characterized by different bandwidth capacities. In many cases, the bandwidth is insufficient to transfer the video at its original quality. On the other hand, a single video is often played by multiple devices like PCs, laptops, and cell phones. Obviously, a single video would not satisfy their different constraints. These diversities of the network and devices capacity lead to the need for video adaptation techniques, e.g., a reduction of the bit rate or spatial resolution. Video transcoding, which modifies a property of the video without the change of the coding format, has been well-known as an efficient adaptation solution. However, this approach comes along with a high computational complexity, resulting in huge energy consumption in the network and possibly network latency. This presentation provides several optimization strategies for the transcoding process of HEVC (the latest High Efficiency Video Coding standard) video streams. First, the computational complexity of a bit rate transcoder (transrater) is reduced. We proposed several techniques to speed-up the encoder of a transrater, notably a machine-learning-based approach and a novel coding-mode evaluation strategy have been proposed. Moreover, the motion estimation process of the encoder has been optimized with the use of decision theory and the proposed fast search patterns. Second, the issues and challenges of a spatial transcoder have been solved by using machine-learning algorithms. Thanks to their great performance, the proposed techniques are expected to significantly help HEVC gain popularity in a wide range of modern multimedia applications

    Visual and Camera Sensors

    Get PDF
    This book includes 13 papers published in Special Issue ("Visual and Camera Sensors") of the journal Sensors. The goal of this Special Issue was to invite high-quality, state-of-the-art research papers dealing with challenging issues in visual and camera sensors
    • …
    corecore