3 research outputs found
Image and Video Analytics for Document Processing and Event Recognition
The proliferation of handheld devices with cameras is among many changes in the past several decades which affected the document image analysis community by providing a far less constrained document imaging experience compared to traditional non-portable flatbed scanners. Although these devices provide more flexibility in capturing, the users now have to consider numerous environmental challenges including 1) a limited field-of-view keeping users from acquiring a high-quality images of large sources in a single frame, 2) Light reflections on glossy surfaces that result in saturated regions, and 3) Crumpled or non-planar documents that cannot be captured effectively from a single pose.
Another change is the application of deep neural networks such as the deep convolutional neural networks (CNNs) for text analysis which is showing unprecedented performance over the classical approaches. Beginning with the success in character recognition, CNNs have shown their strength in many tasks in document analysis as well as computer vision. Researchers have explored potential applicability of CNNs for tasks such as text detection and segmentation, and have been quite successful. These networks, trained to perform single tasks, have recently evolved to handle multiple tasks. This introduces several important challenges including imposing multiple tasks on single architecture network and integrating multiple architectures with different tasks. In this dissertation, we make contributions in both of these areas.
First, we propose a novel Graphcut-based document image mosaicking method which seeks to overcome the known limitations of the previous approaches. Our method does not require any prior knowledge of the content of the document images, making it more widely applicable and robust. Information regarding the geometrical disposition between the overlapping images is exploited to minimize the errors at the boundary regions. We incorporate a sharpness measure which induces cut generation in a way that results in the mosaic including the sharpest pixels. Our method is shown to outperform previous methods, both quantitatively and qualitatively.
Second, we address the problem of removing highlight regions caused by the light sources reflecting off glossy surfaces in indoor environments. We devise an efficient method to detect and remove the highlights from the target scene by jointly estimating separate homographies for the target scene and the highlights. Our method is based on the observation that when given two images captured at different viewpoints, the displacement of the target scene is different from that of the highlight regions. We show the effectiveness of our method in removing the highlight reflections by comparing it with the related state-of-the-art methods. Unlike the previous methods, our method has the ability to handle saturated and relatively large highlights which completely obscure the content underneath.
Third, we address the problem of selecting instances of a planar object in a video or set of images based on an evaluation of its "frontalness". We introduce the idea of "evaluating the frontalness" by computing how close the object's surface normal aligns with the optical axis of a camera. The unique and novel aspect of our method is that unlike previous planar object pose estimation methods, our method does not require a frontal reference image. The intuition is that a true frontal image can be used to reproduce other non-frontal images by perspective projection, while the non-frontal images have limited ability to do so. We show comparing 'frontal' and 'non-frontal' can be extended to compare 'more frontal' and 'less frontal' images. Based on this observation, our method estimates the relative frontalness of an image by exploiting the objective space error. We also propose the use of a K-invariant space to evaluate the frontalness even when the camera intrinsic parameters are unknown (e.g., images/videos from the web). Our method improves the accuracy over a baseline method.
Lastly, we address the problem of integrating multiple deep neural networks (specifically CNNs) with different architectures and different tasks into a unified framework. To demonstrate the end-to-end integration of networks with different tasks and different architecture, we select event recognition and object detection. One of the novel aspects of our approach is that this is the first attempt to exploit the power of deep convolutional neural networks to directly integrate relevant object information into a unified network to improve event recognition performance. Our architecture allows the sharing of the convolutional layers and a fully connected layer which effectively integrates event recognition with the rigid and non-rigid object detection
RECOGNITION OF FACES FROM SINGLE AND MULTI-VIEW VIDEOS
Face recognition has been an active research field for decades. In recent years, with videos playing an increasingly important role in our everyday life, video-based face recognition has begun to attract considerable research interest. This leads to a wide range of potential application areas, including TV/movies search and parsing, video surveillance, access control etc. Preliminary research results in this field have suggested that by exploiting the abundant spatial-temporal information contained in videos, we can greatly improve the accuracy and robustness of a visual recognition system. On the other hand, as this research area is still in its infancy, developing an end-to-end face processing pipeline that can robustly detect, track and recognize faces remains a challenging task. The goal of this dissertation is to study some of the related problems under different settings.
We address the video-based face association problem, in which one attempts to extract face tracks of multiple subjects while maintaining label consistency. Traditional tracking algorithms have difficulty in handling this task, especially when challenging nuisance factors like motion blur, low resolution or significant camera motions are present. We demonstrate that contextual features, in addition to face appearance itself, play an important role in this case. We propose principled methods to combine multiple features using Conditional Random Fields and Max-Margin Markov networks to infer labels for the detected faces. Different from many existing approaches, our algorithms work in online mode and hence have a wider range of applications. We address issues such as parameter learning, inference and handling false positves/negatives that arise in the proposed approach. Finally, we evaluate our approach on several public databases.
We next propose a novel video-based face recognition framework. We address the problem from two different aspects: To handle pose variations, we learn a Structural-SVM based detector which can simultaneously localize the face fiducial points and estimate the face pose. By adopting a different optimization criterion from existing algorithms, we are able to improve localization accuracy. To model other face variations, we use intra-personal/extra-personal dictionaries. The intra-personal/extra-personal modeling of human faces has been shown to work successfully in the Bayesian face recognition framework. It has additional advantages in scalability and generalization, which are of critical importance to real-world applications. Combining intra-personal/extra-personal models with dictionary learning enables us to achieve state-of-arts performance on unconstrained video data, even when the training data come from a different database.
Finally, we present an approach for video-based face recognition using camera networks. The focus is on handling pose variations by applying the strength of the multi-view camera network. However, rather than taking the typical approach of modeling these variations, which eventually requires explicit knowledge about pose parameters, we rely on a pose-robust feature that eliminates the needs for pose estimation. The pose-robust feature is developed using the Spherical Harmonic (SH) representation theory. It is extracted using the surface texture map of a spherical model which approximates the subject's head. Feature vectors extracted from a video are modeled as an ensemble of instances of a probability distribution in the Reduced Kernel Hilbert Space (RKHS). The ensemble similarity measure in RKHS improves both robustness and accuracy of the recognition system. The proposed approach outperforms traditional algorithms on a multi-view video database collected using a camera network
Recommended from our members
NAVIGATING ‘NATIONAL FORM’ AND ‘SOCIALIST CONTENT’ IN THE GREAT LEADER’S HOMELAND: GEORGIAN PAINTING AND NATIONAL POLITICS UNDER STALIN, 1921-39
This thesis examines the interaction of Georgian painting and national politics in the first two decades of Soviet power in Georgia, 1921-1939, focussing in particular on the period following the consolidation of Stalin’s power at the helm of the Communist Party in 1926-7. In the Stalin era, Georgians enjoyed special status among Soviet nations thanks to Georgia’s prestige as the place of Stalin’s birth. However, Georgians’ advanced sense of their national sovereignty and initial hostility towards Bolshevik control following Georgia’s Sovietisation in 1921 also resulted in Georgia’s uniquely fraught relationship with Soviet power in Moscow in the decades that followed. In light of these circumstances, this thesis explores how and why the experience and activities of Georgian painters between 1926 and 1939 differed from those of other Soviet artists. One of its central arguments is that the experiences of Georgian artists and critics in this period not only differed significantly from those of artists and critics of other republics, but that the uniqueness of their experience was precipitated by a complex network of factors resulting from the interaction of various political imperatives and practical circumstances, including those relating to Soviet national politics.
Chapter one of this thesis introduces the key institutions and individuals involved in producing, evaluating and setting the direction of Georgian painting in the 1920s and early 1930s. Chapters two and three show that artists and critics in Georgia as well as commentators in Moscow in the 1920s and 30s were actively engaged in efforts to interpret the Party’s demand for ‘national form’ in Soviet culture and to suggest what that form might entail as regards Georgian painting. However, contradictions inherent in Soviet nationalities policy, which both demanded the active cultivation of cultural difference between Soviet nationalities and eagerly anticipated a time when national distinctions in all spheres would naturally disappear, made it impossible for an appropriate interpretation of ‘national form’ to be identified. Chapter three, moreover, demonstrates how frequent shifts in Soviet cultural and nationalities policies presented Moscow institutions with a range of practical challenges which ultimately prevented them from reflecting in their exhibitions and publications the contemporary artistic activity taking place in the republics of the Caucasus and Central Asia.
A key finding of chapters four and five concerns the uniquely significant role that Lavrenty Beria, Stalin’s ruthless deputy and the head of the Georgian and Transcaucasian Party organisations, played in differentiating Georgian painters’ experiences from those of Soviet artists of other nationalities. Beginning in 1934, Beria employed Georgian painters to produce an exhibition of monumental paintings, opening at the Tretyakov Gallery in Moscow in 1937, depicting episodes from his own falsified history of Stalin’s role in the revolutionary movement in Transcaucasia. As this thesis shows, the production of the exhibition introduced an unprecedented degree of direct Party supervision over Georgian painting as Beria personally critiqued works by Georgian painters produced on prescribed narrative subjects in a centralised collective studio. As well as representing a major contribution to Stalin’s personality cult, the exhibition, which conferred on Georgian painters special responsibility for representing Stalin and his activities, was also a public statement of the special status that the Georgians were now to enjoy, second only to that of the Russians. However, this special status involved both special privileges and special responsibilities. Georgians would enjoy special access to opportunities in Moscow and a special degree of autonomy in local governance, but in return they were required to lead the way in declaring allegiance to the Stalin regime.
Chapter six returns to the debate about ‘national form’ in Georgian painting by examining how the pre-Revolutionary self-taught Georgian painter, Niko Pirosmani, was discussed by cultural commentators in Georgia and Moscow in the 1920s and 30s as a source informing a Soviet or Soviet Georgian canon of painting. It shows that, in addition to presenting views on the suitability of Pirosmani’s painting either in terms of its formal or class content, commentators perpetuated and developed a cult of Pirosmani steeped in stereotypes of a Georgian ‘national character.’ Further, the establishment of this cult during the late 1920s and early 1930s seems to have been a primary reason for the painter’s subsequent canonisation in the second half of the 1930s as a ‘Great Tradition’ of Soviet Georgian culture. It helped to articulate a version of Georgian national identity that was at once familiar and gratifying for Georgians and useful for the Soviet regime. The combined impression of cultural sovereignty embodied in this and other ‘Great Traditions’ of Soviet Georgian culture and the special status articulated through the 1937 exhibition allowed Georgian nationalism to be aligned, for a time, with support for Stalin and the Soviet regime.Lander Doctoral Studentship in Art History at Pembroke Colleg