135 research outputs found

    Special issue on multimodal data fusion for multidimensional signal processing

    Full text link

    Deep background subtraction of thermal and visible imagery for redestrian detection in videos

    Get PDF
    In this paper, we introduce an efficient framework to subtract the background from both visible and thermal imagery for pedestrians’ detection in the urban scene. We use a deep neural network (DNN) to train the background subtraction model. For the training of the DNN, we first generate an initial background map and then employ randomly 5% video frames, background map, and manually segmented ground truth. Then we apply a cognition-based post-processing to further smooth the foreground detection result. We evaluate our method against our previous work and 11 recently widely cited method on three challenge video series selected from a publicly available color-thermal benchmark dataset OCTBVS. Promising results have been shown that the proposed DNN-based approach can successfully detect the pedestrians with good shape in most scenes regardless of illuminate changes and occlusion problem

    Joint kernelized sparse representation classification for hyperspectral imagery

    Get PDF
    In recent years, the hyperspectral image (HSI) classification has received much attention due to its importance on the military applications, food quality assessment [1], and land cover analysis [2-5], etc. Multiple classifiers have been adopted to label pixels of HSI images, including support vector machine (SVM), random forest (RF), and recently, the deep learning methods. Considering that HSI pixels belonging to the same class are usually lying in a low-dimensional space, those pixels can be represented by training samples from the same class. Based on that, Sparse Representation Classification (SRC) methods have also introduced in the HSI imagery. For an unlabeled pixel, a few atoms from the constructed training dictionary can sparsely represent it. With the recovered sparse coefficients, the class label can be determined by the residual between the test pixel and its approximation. With the development of SRC in HIS [5, 6], there is one severe problem during the process of classification. Due to the high dimensions of the HSI data, it may result the Hughes phenomenon. Sufficient training samples are required to overcome the curse of dimensionality. However, sufficient training data are not always available in real application. For example, the ground truth labelling work for remote sensing data is rather inconvenient. Therefore, to solve the above problem, we decide to combine multiple types of features extracted from HSI data, and a joint kernelized SRC will be operated on those extracted features. The aim of our work is to improve the performance of SRC with less training samples

    Aggregating Local Features into Bundles for High-Precision Object Retrieval

    Get PDF
    Due to the omnipresence of digital cameras and mobile phones the number of images stored in image databases has grown tremendously in the last years. It becomes apparent that new data management and retrieval techniques are needed to deal with increasingly large image databases. This thesis presents new techniques for content-based image retrieval where the image content itself is used to retrieve images by visual similarity from databases. We focus on the query-by-example scenario, assuming the image itself is provided as query to the retrieval engine. In many image databases, images are often associated with metadata, which may be exploited to improve the retrieval performance. In this work, we present a technique that fuses cues from the visual domain and textual annotations into a single compact representation. This combined multimodal representation performs significantly better compared to the underlying unimodal representations, which we demonstrate on two large-scale image databases consisting of up to 10 million images. The main focus of this work is on feature bundling for object retrieval and logo recognition. We present two novel feature bundling techniques that aggregate multiple local features into a single visual description. In contrast to many other works, both approaches encode geometric information about the spatial layout of local features into the corresponding visual description itself. Therefore, these descriptions are highly distinctive and suitable for high-precision object retrieval. We demonstrate the use of both bundling techniques for logo recognition. Here, the recognition is performed by the retrieval of visually similar images from a database of reference images, making the recognition systems easily scalable to a large number of classes. The results show that our retrieval-based methods can successfully identify small objects such as logos with an extremely low false positive rate. In particular, our feature bundling techniques are beneficial because false positives are effectively avoided upfront due to the highly distinctive descriptions. We further demonstrate and thoroughly evaluate the use of our bundling technique based on min-Hashing for image and object retrieval. Compared to approaches based on conventional bag-of-words retrieval, it has much higher efficiency: the retrieved result lists are shorter and cleaner while recall is on equal level. The results suggest that this bundling scheme may act as pre-filtering step in a wide range of scenarios and underline the high effectiveness of this approach. Finally, we present a new variant for extremely fast re-ranking of retrieval results, which ranks the retrieved images according to the spatial consistency of their local features to those of the query image. The demonstrated method is robust to outliers, performs better than existing methods and allows to process several hundreds to thousands of images per second on a single thread

    Discovering a Domain Knowledge Representation for Image Grouping: Multimodal Data Modeling, Fusion, and Interactive Learning

    Get PDF
    In visually-oriented specialized medical domains such as dermatology and radiology, physicians explore interesting image cases from medical image repositories for comparative case studies to aid clinical diagnoses, educate medical trainees, and support medical research. However, general image classification and retrieval approaches fail in grouping medical images from the physicians\u27 viewpoint. This is because fully-automated learning techniques cannot yet bridge the gap between image features and domain-specific content for the absence of expert knowledge. Understanding how experts get information from medical images is therefore an important research topic. As a prior study, we conducted data elicitation experiments, where physicians were instructed to inspect each medical image towards a diagnosis while describing image content to a student seated nearby. Experts\u27 eye movements and their verbal descriptions of the image content were recorded to capture various aspects of expert image understanding. This dissertation aims at an intuitive approach to extracting expert knowledge, which is to find patterns in expert data elicited from image-based diagnoses. These patterns are useful to understand both the characteristics of the medical images and the experts\u27 cognitive reasoning processes. The transformation from the viewed raw image features to interpretation as domain-specific concepts requires experts\u27 domain knowledge and cognitive reasoning. This dissertation also approximates this transformation using a matrix factorization-based framework, which helps project multiple expert-derived data modalities to high-level abstractions. To combine additional expert interventions with computational processing capabilities, an interactive machine learning paradigm is developed to treat experts as an integral part of the learning process. Specifically, experts refine medical image groups presented by the learned model locally, to incrementally re-learn the model globally. This paradigm avoids the onerous expert annotations for model training, while aligning the learned model with experts\u27 sense-making

    Weakly supervised conditional random fields model for semantic segmentation with image patches.

    Get PDF
    Image semantic segmentation (ISS) is used to segment an image into regions with differently labeled semantic category. Most of the existing ISS methods are based on fully supervised learning, which requires pixel-level labeling for training the model. As a result, it is often very time-consuming and labor-intensive, yet still subject to manual errors and subjective inconsistency. To tackle such difficulties, a weakly supervised ISS approach is proposed, in which the challenging problem of label inference from image-level to pixel-level will be particularly addressed, using image patches and conditional random fields (CRF). An improved simple linear iterative cluster (SLIC) algorithm is employed to extract superpixels. for image segmentation. Specifically, it generates various numbers of superpixels according to different images, which can be used to guide the process of image patch extraction based on the image-level labeled information. Based on the extracted image patches, the CRF model is constructed for inferring semantic class labels, which uses the potential energy function to map from the image-level to pixel-level image labels. Finally, patch based CRF (PBCRF) model is used to accomplish the weakly supervised ISS. Experiments conducted on two publicly available benchmark datasets, MSRC and PASCAL VOC 2012, have demonstrated that our proposed algorithm can yield very promising results compared to quite a few state-of-the-art ISS methods, including some deep learning-based models

    Contributions to the Content-Based Image Retrieval Using Pictorial Queris

    Get PDF
    L'accés massiu a les càmeres digitals, els ordinadors personals i a Internet, ha propiciat la creació de grans volums de dades en format digital. En aquest context, cada vegada adquireixen major rellevància totes aquelles eines dissenyades per organitzar la informació i facilitar la seva cerca.Les imatges són un cas particular de dades que requereixen tècniques específiques de descripció i indexació. L'àrea de la visió per computador encarregada de l'estudi d'aquestes tècniques rep el nom de Recuperació d'Imatges per Contingut, en anglès Content-Based Image Retrieval (CBIR). Els sistemes de CBIR no utilitzen descripcions basades en text sinó que es basen en característiques extretes de les pròpies imatges. En contrast a les més de 6000 llengües parlades en el món, les descripcions basades en característiques visuals representen una via d'expressió universal.La intensa recerca en el camp dels sistemes de CBIR s'ha aplicat en àrees de coneixement molt diverses. Així doncs s'han desenvolupat aplicacions de CBIR relacionades amb la medicina, la protecció de la propietat intel·lectual, el periodisme, el disseny gràfic, la cerca d'informació en Internet, la preservació dels patrimoni cultural, etc. Un dels punts importants d'una aplicació de CBIR resideix en el disseny de les funcions de l'usuari. L'usuari és l'encarregat de formular les consultes a partir de les quals es fa la cerca de les imatges. Nosaltres hem centrat l'atenció en aquells sistemes en què la consulta es formula a partir d'una representació pictòrica. Hem plantejat una taxonomia dels sistemes de consulta en composada per quatre paradigmes diferents: Consulta-segons-Selecció, Consulta-segons-Composició-Icònica, Consulta-segons-Esboç i Consulta-segons-Il·lustració. Cada paradigma incorpora un nivell diferent en el potencial expressiu de l'usuari. Des de la simple selecció d'una imatge, fins a la creació d'una il·lustració en color, l'usuari és qui pren el control de les dades d'entrada del sistema. Al llarg dels capítols d'aquesta tesi hem analitzat la influència que cada paradigma de consulta exerceix en els processos interns d'un sistema de CBIR. D'aquesta manera també hem proposat un conjunt de contribucions que hem exemplificat des d'un punt de vista pràctic mitjançant una aplicació final

    Contributions to the content-based image retrieval using pictorial queries

    Get PDF
    Descripció del recurs: el 02 de novembre de 2010L'accés massiu a les càmeres digitals, els ordinadors personals i a Internet, ha propiciat la creació de grans volums de dades en format digital. En aquest context, cada vegada adquireixen major rellevància totes aquelles eines dissenyades per organitzar la informació i facilitar la seva cerca. Les imatges són un cas particular de dades que requereixen tècniques específiques de descripció i indexació. L'àrea de la visió per computador encarregada de l'estudi d'aquestes tècniques rep el nom de Recuperació d'Imatges per Contingut, en anglès Content-Based Image Retrieval (CBIR). Els sistemes de CBIR no utilitzen descripcions basades en text sinó que es basen en característiques extretes de les pròpies imatges. En contrast a les més de 6000 llengües parlades en el món, les descripcions basades en característiques visuals representen una via d'expressió universal. La intensa recerca en el camp dels sistemes de CBIR s'ha aplicat en àrees de coneixement molt diverses. Així doncs s'han desenvolupat aplicacions de CBIR relacionades amb la medicina, la protecció de la propietat intel·lectual, el periodisme, el disseny gràfic, la cerca d'informació en Internet, la preservació dels patrimoni cultural, etc. Un dels punts importants d'una aplicació de CBIR resideix en el disseny de les funcions de l'usuari. L'usuari és l'encarregat de formular les consultes a partir de les quals es fa la cerca de les imatges. Nosaltres hem centrat l'atenció en aquells sistemes en què la consulta es formula a partir d'una representació pictòrica. Hem plantejat una taxonomia dels sistemes de consulta en composada per quatre paradigmes diferents: Consulta-segons-Selecció, Consulta-segons-Composició-Icònica, Consulta-segons-Esboç i Consulta-segons-Il·lustració. Cada paradigma incorpora un nivell diferent en el potencial expressiu de l'usuari. Des de la simple selecció d'una imatge, fins a la creació d'una il·lustració en color, l'usuari és qui pren el control de les dades d'entrada del sistema. Al llarg dels capítols d'aquesta tesi hem analitzat la influència que cada paradigma de consulta exerceix en els processos interns d'un sistema de CBIR. D'aquesta manera també hem proposat un conjunt de contribucions que hem exemplificat des d'un punt de vista pràctic mitjançant una aplicació final

    Study and Implementation of Watermarking Algorithms

    Get PDF
    Water Making is the process of embedding data called a watermark into a multimedia object such that watermark can be detected or extracted later to make an assertion about the object. The object may be an audio, image or video. A copy of a digital image is identical to the original. This has in many instances, led to the use of digital content with malicious intent. One way to protect multimedia data against illegal recording and retransmission is to embed a signal, called digital signature or copyright label or watermark that authenticates the owner of the data. Data hiding, schemes to embed secondary data in digital media, have made considerable progress in recent years and attracted attention from both academia and industry. Techniques have been proposed for a variety of applications, including ownership protection, authentication and access control. Imperceptibility, robustness against moderate processing such as compression, and the ability to hide many bits are the basic but rat..
    corecore