    Text Localization in Video Using Multiscale Weber's Local Descriptor

    In this paper, we propose a novel approach for detecting the text present in videos and scene images based on the Multiscale Weber's Local Descriptor (MWLD). Given an input video, the shots are identified and the key frames are extracted based on their spatio-temporal relationship. From each key frame, we detect the local region information using WLD with different radius and neighborhood relationship of pixel values and hence obtained intensity enhanced key frames at multiple scales. These multiscale WLD key frames are merged together and then the horizontal gradients are computed using morphological operations. The obtained results are then binarized and the false positives are eliminated based on geometrical properties. Finally, we employ connected component analysis and morphological dilation operation to determine the text regions that aids in text localization. The experimental results obtained on publicly available standard Hua, Horizontal-1 and Horizontal-2 video dataset illustrate that the proposed method can accurately detect and localize texts of various sizes, fonts and colors in videos.Comment: IEEE SPICES, 201

    Human face recognition under degraded conditions

    Comparative studies on the state of the art feature extraction and classification techniques for human face recognition under low resolution problem, are proposed in this work. Also, the effect of applying resolution enhancement, using interpolation techniques, is evaluated. A gradient-based illumination insensitive preprocessing technique is proposed using the ratio between the gradient magnitude and the current intensity level of image which is insensitive against severe level of lighting effect. Also, a combination of multi-scale Weber analysis and enhanced DD-DT-CWT is demonstrated to have a noticeable stability versus illumination variation. Moreover, utilization of the illumination insensitive image descriptors on the preprocessed image leads to further robustness against lighting effect. The proposed block-based face analysis decreases the effect of occlusion by devoting different weights to the image subblocks, according to their discrimination power, in the score or decision level fusion. In addition, a hierarchical structure of global and block-based techniques is proposed to improve the recognition accuracy when different image degraded conditions occur. Complementary performance of global and local techniques leads to considerable improvement in the face recognition accuracy. Effectiveness of the proposed algorithms are evaluated on Extended Yale B, AR, CMU Multi-PIE, LFW, FERET and FRGC databases with large number of images under different degradation conditions. The experimental results show an improved performance under poor illumination, facial expression and, occluded images

    Recherche par le contenu adaptée à la surveillance vidéo

    Les systĂšmes de surveillance vidĂ©o sont omniprĂ©sents dans les lieux publics achalandĂ©s et leur prĂ©sence dans les lieux privĂ©s s'accroĂźt sans cesse. Si un aĂ©roport ou une gare de trains peut se permettre d'employer une Ă©quipe de surveillance pour surveiller des flux vidĂ©o en temps rĂ©el, il est improbable qu'un particulier effectue une telle dĂ©pense pour un systĂšme de surveillance Ă  domicile. Qui plus est, l'utilisation de vidĂ©os de surveillance pour l'analyse criminalistique requiert souvent une analyse a posteriori des Ă©vĂ©nements observĂ©s. L'historique d'enregistrement correspond souvent Ă  plusieurs jours, voire des semaines de vidĂ©o. Si le moment oĂč s'est produit un Ă©vĂ©nement d'intĂ©rĂȘt est inconnu, un outil de recherche vidĂ©o est essentiel. Un tel outil a pour objectif d'identifier les segments de vidĂ©o dont le contenu correspond Ă  une description approximative de l'Ă©vĂ©nement (ou de l'objet) recherchĂ©. Ce mĂ©moire prĂ©sente une structure de donnĂ©es pour l'indexation du contenu de longues vidĂ©os de surveillance, ainsi qu'un algorithme de recherche par le contenu basĂ© sur cette structure. À partir de la description d'un objet basĂ©e sur des attributs tels sa taille, sa couleur et la direction de son mouvement, le systĂšme identifie en temps rĂ©el les segments de vidĂ©o contenant des objets correspondant Ă  cette description. Nous avons dĂ©montrĂ© empiriquement que notre systĂšme fonctionne dans plusieurs cas d'utilisation tels le comptage d'objets en mouvement, la reconnaissance de trajectoires, la dĂ©tection d'objets abandonnĂ©s et la dĂ©tection de vĂ©hicules stationnĂ©s. Ce mĂ©moire comporte Ă©galement une section sur l'attestation de qualitĂ© d'images. La mĂ©thode prĂ©sentĂ©e permet de dĂ©terminer qualitativement le type et la quantitĂ© de distortion appliquĂ©e Ă  l'image par un systĂšme d'acquisition. Cette technique peut ĂȘtre utilisĂ©e pour estimer les paramĂštres du systĂšme d'acquisition afin de corriger les images, ou encore pour aider au dĂ©veloppement de nouveaux systĂšmes d'acquisition

    Visual scene recognition with biologically relevant generative models

    This research focuses on developing visual object categorization methodologies that are based on machine learning techniques and biologically inspired generative models of visual scene recognition. Modelling the statistical variability in visual patterns, in the space of features extracted from them by an appropriate low level signal processing technique, is an important matter of investigation for both humans and machines. To study this problem, we have examined in detail two recent probabilistic models of vision: a simple multivariate Gaussian model as suggested by (Karklin & Lewicki, 2009) and a restricted Boltzmann machine (RBM) proposed by (Hinton, 2002). Both the models have been widely used for visual object classification and scene analysis tasks before. This research highlights that these models on their own are not plausible enough to perform the classification task, and suggests Fisher kernel as a means of inducing discrimination into these models for classification power. Our empirical results on standard benchmark data sets reveal that the classification performance of these generative models could be significantly boosted near to the state of the art performance, by drawing a Fisher kernel from compact generative models that computes the data labels in a fraction of total computation time. We compare the proposed technique with other distance based and kernel based classifiers to show how computationally efficient the Fisher kernels are. To the best of our knowledge, Fisher kernel has not been drawn from the RBM before, so the work presented in the thesis is novel in terms of its idea and application to vision problem

    Pre-processing, classification and semantic querying of large-scale Earth observation spaceborne/airborne/terrestrial image databases: Process and product innovations.

    Get PDF
    By definition of Wikipedia, “big data is the term adopted for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The big data challenges typically include capture, curation, storage, search, sharing, transfer, analysis and visualization”. Proposed by the intergovernmental Group on Earth Observations (GEO), the visionary goal of the Global Earth Observation System of Systems (GEOSS) implementation plan for years 2005-2015 is systematic transformation of multisource Earth Observation (EO) “big data” into timely, comprehensive and operational EO value-adding products and services, submitted to the GEO Quality Assurance Framework for Earth Observation (QA4EO) calibration/validation (Cal/Val) requirements. To date the GEOSS mission cannot be considered fulfilled by the remote sensing (RS) community. This is tantamount to saying that past and existing EO image understanding systems (EO-IUSs) have been outpaced by the rate of collection of EO sensory big data, whose quality and quantity are ever-increasing. This true-fact is supported by several observations. For example, no European Space Agency (ESA) EO Level 2 product has ever been systematically generated at the ground segment. By definition, an ESA EO Level 2 product comprises a single-date multi-spectral (MS) image radiometrically calibrated into surface reflectance (SURF) values corrected for geometric, atmospheric, adjacency and topographic effects, stacked with its data-derived scene classification map (SCM), whose thematic legend is general-purpose, user- and application-independent and includes quality layers, such as cloud and cloud-shadow. Since no GEOSS exists to date, present EO content-based image retrieval (CBIR) systems lack EO image understanding capabilities. Hence, no semantic CBIR (SCBIR) system exists to date either, where semantic querying is synonym of semantics-enabled knowledge/information discovery in multi-source big image databases. In set theory, if set A is a strict superset of (or strictly includes) set B, then A B. This doctoral project moved from the working hypothesis that SCBIR computer vision (CV), where vision is synonym of scene-from-image reconstruction and understanding EO image understanding (EO-IU) in operating mode, synonym of GEOSS ESA EO Level 2 product human vision. Meaning that necessary not sufficient pre-condition for SCBIR is CV in operating mode, this working hypothesis has two corollaries. First, human visual perception, encompassing well-known visual illusions such as Mach bands illusion, acts as lower bound of CV within the multi-disciplinary domain of cognitive science, i.e., CV is conditioned to include a computational model of human vision. Second, a necessary not sufficient pre-condition for a yet-unfulfilled GEOSS development is systematic generation at the ground segment of ESA EO Level 2 product. Starting from this working hypothesis the overarching goal of this doctoral project was to contribute in research and technical development (R&D) toward filling an analytic and pragmatic information gap from EO big sensory data to EO value-adding information products and services. This R&D objective was conceived to be twofold. First, to develop an original EO-IUS in operating mode, synonym of GEOSS, capable of systematic ESA EO Level 2 product generation from multi-source EO imagery. EO imaging sources vary in terms of: (i) platform, either spaceborne, airborne or terrestrial, (ii) imaging sensor, either: (a) optical, encompassing radiometrically calibrated or uncalibrated images, panchromatic or color images, either true- or false color red-green-blue (RGB), multi-spectral (MS), super-spectral (SS) or hyper-spectral (HS) images, featuring spatial resolution from low (> 1km) to very high (< 1m), or (b) synthetic aperture radar (SAR), specifically, bi-temporal RGB SAR imagery. The second R&D objective was to design and develop a prototypical implementation of an integrated closed-loop EO-IU for semantic querying (EO-IU4SQ) system as a GEOSS proof-of-concept in support of SCBIR. The proposed closed-loop EO-IU4SQ system prototype consists of two subsystems for incremental learning. A primary (dominant, necessary not sufficient) hybrid (combined deductive/top-down/physical model-based and inductive/bottom-up/statistical model-based) feedback EO-IU subsystem in operating mode requires no human-machine interaction to automatically transform in linear time a single-date MS image into an ESA EO Level 2 product as initial condition. A secondary (dependent) hybrid feedback EO Semantic Querying (EO-SQ) subsystem is provided with a graphic user interface (GUI) to streamline human-machine interaction in support of spatiotemporal EO big data analytics and SCBIR operations. EO information products generated as output by the closed-loop EO-IU4SQ system monotonically increase their value-added with closed-loop iterations

    SIS 2017. Statistics and Data Science: new challenges, new generations

    Get PDF
    The 2017 SIS Conference aims to highlight the crucial role of the Statistics in Data Science. In this new domain of ‘meaning’ extracted from the data, the increasing amount of produced and available data in databases, nowadays, has brought new challenges. That involves different fields of statistics, machine learning, information and computer science, optimization, pattern recognition. These afford together a considerable contribute in the analysis of ‘Big data’, open data, relational and complex data, structured and no-structured. The interest is to collect the contributes which provide from the different domains of Statistics, in the high dimensional data quality validation, sampling extraction, dimensional reduction, pattern selection, data modelling, testing hypotheses and confirming conclusions drawn from the data