6 research outputs found

    Creating Color Image Features Using Local Contrast Method

    Get PDF
    Digital color images are now one of the most popular data types used in the digital processing environment. Color image recognition plays an important role in many vital applications, which makes the enhancement of image recognition or retrieval system an important issue. Using color image pixels to recognize or retrieve the image, but the issue of the huge color image size that requires accordingly more time and memory space to perform color image recognition and/or retrieval. In the current study, image local contrast was used to create local contrast victor, which was then used as a key to recognize or retrieve the image. The proposed local contrast method was properly implemented and tested. The obtained results proved its efficiency as compared with other methods

    An improved discrete cosine transformation block based scheme for copy-move image forgery detection

    Get PDF
    Copy-moved forgery is a common method to manipulate images. Several attempts of image forgery have been discovered and involves a region been duplicated and copied and pasted on another region of the same image in other to achieve selfish gain. Generally, there are two classification of copy-move forgery detection technique such as the block-based and key point-based. The block-based division is mostly used and divides image into blocks during the stage of image pre-processing before features are extracted, whereas key-point based technique skips the division of image into blocks and directly extracts different local feature from the image. In this paper, we review various block based and key point approach which has been proposed by various researchers. There is a problem of achieving a balance between improving the detection accuracy and having minimal computational complexity. The proposed technique is based on an improved DCT based copy-move image forgery detection (IDB-CFD), which involves using an octagonal block to reduce the number of features for matching, thereby improving detection accuracy while having minimal complexity. The analysis of this work as compared to previous proposed works which is based on a robust detection algorithm for copy-move image forgery (RDA-CF) and involves using circle block to reduce the number of features, results show that previous work represents about 79% of the quantized DCT coefficients on each image block and this proposed work represents about 85% of quantized DCT coefficients, therefore, recovery of about 6% more features using the IDB-CFD technique was observed as the improvement over the previously proposed RDA-CF

    Enhanced Block-Based Copy-Move Image Forgery Detection Using K-Means Clustering Technique

    Get PDF
    In this thesis, the effect of feature type and matching method has been analyzed by comparing different combinations of matching method – feature type for copy-move image forgery detection. The results showed an interaction between some of the features and some of the matching methods. Due to the importance of matching process, this thesis focused on improving the matching process by proposing an enhanced block-based copy-move forgery detection pipeline. The proposed pipeline relied on clustering the image blocks into clusters, and then independently performing the matching of the blocks within each cluster which will reduce the time required for matching and increase the true positive ratio (TPR) as well. In order to deploy the proposed pipeline, two combinations of matching method - feature type are considered. In the first case, Zernike Moments (ZMs) were combined with Locality Sensitive Hashing (LSH) and tested on three datasets. The experimental results showed that the proposed pipeline reduced the processing time by 73.05% to 84.70% and enhanced the accuracy of detection by 5.56% to 25.43%. In the second case, Polar Cosine Transform (PCT) was combined with Lexicographical Sort (LS). Although the proposed pipeline could not reduce the processing time, it enhanced the accuracy of detection by 32.46%. The obtained results were statistically analyzed, and it was proven that the proposed pipeline can enhance the accuracy of detection significantly based on the comparison with other two methods

    Identification, synchronisation and composition of user-generated videos

    Get PDF
    Cotutela Universitat Politècnica de Catalunya i Queen Mary University of LondonThe increasing availability of smartphones is facilitating people to capture videos of their experience when attending events such as concerts, sports competitions and public rallies. Smartphones are equipped with inertial sensors which could be beneficial for event understanding. The captured User-Generated Videos (UGVs) are made available on media sharing websites. Searching and mining of UGVs of the same event are challenging due to inconsistent tags or incorrect timestamps. A UGV recorded from a fixed location contains monotonic content and unintentional camera motions, which may make it less interesting to playback. In this thesis, we propose the following identification, synchronisation and video composition frameworks for UGVs. We propose a framework for the automatic identification and synchronisation of unedited multi-camera UGVs within a database. The proposed framework analyses the sound to match and cluster UGVs that capture the same spatio-temporal event, and estimate their relative time-shift to temporally align them. We design a novel descriptor derived from the pairwise matching of audio chroma features of UGVs. The descriptor facilitates the definition of a classification threshold for automatic query-by-example event identification. We contribute a database of 263 multi-camera UGVs of 48 real-world events. We evaluate the proposed framework on this database and compare it with state-of-the-art methods. Experimental results show the effectiveness of the proposed approach in the presence of audio degradations (channel noise, ambient noise, reverberations). Moreover, we present an automatic audio and visual-based camera selection framework for composing uninterrupted recording from synchronised multi-camera UGVs of the same event. We design an automatic audio-based cut-point selection method that provides a common reference for audio and video segmentation. To filter low quality video segments, spatial and spatio-temporal assessments are computed. The framework combines segments of UGVs using a rank-based camera selection strategy by considering visual quality scores and view diversity. The proposed framework is validated on a dataset of 13 events (93~UGVs) through subjective tests and compared with state-of-the-art methods. Suitable cut-point selection, specific visual quality assessments and rank-based camera selection contribute to the superiority of the proposed framework over the existing methods. Finally, we contribute a method for Camera Motion Detection using Gyroscope for UGVs captured from smartphones and design a gyro-based quality score for video composition. The gyroscope measures the angular velocity of the smartphone that can be use for camera motion analysis. We evaluate the proposed camera motion detection method on a dataset of 24 multi-modal UGVs captured by us, and compare it with existing visual and inertial sensor-based methods. By designing a gyro-based score to quantify the goodness of the multi-camera UGVs, we develop a gyro-based video composition framework. A gyro-based score substitutes the spatial and spatio-temporal scores and reduces the computational complexity. We contribute a multi-modal dataset of 3 events (12~UGVs), which is used to validate the proposed gyro-based video composition framework.El incremento de la disponibilidad de teléfonos inteligentes o smartphones posibilita a la gente capturar videos de sus experiencias cuando asisten a eventos así como como conciertos, competiciones deportivas o mítines públicos. Los Videos Generados por Usuarios (UGVs) pueden estar disponibles en sitios web públicos especializados en compartir archivos. La búsqueda y la minería de datos de los UGVs del mismo evento son un reto debido a que los etiquetajes son incoherentes o las marcas de tiempo erróneas. Por otra parte, un UGV grabado desde una ubicación fija, contiene información monótona y movimientos de cámara no intencionados haciendo menos interesante su reproducción. En esta tesis, se propone una identificación, sincronización y composición de tramas de vídeo para UGVs. Se ha propuesto un sistema para la identificación y sincronización automática de UGVs no editados provenientes de diferentes cámaras dentro de una base de datos. El sistema propuesto analiza el sonido con el fin de hacerlo coincidir e integrar UGVs que capturan el mismo evento en el espacio y en el tiempo, estimando sus respectivos desfases temporales y alinearlos en el tiempo. Se ha diseñado un nuevo descriptor a partir de la coincidencia por parejas de características de la croma del audio de los UGVs. Este descriptor facilita la determinación de una clasificación por umbral para una identificación de eventos automática basada en búsqueda mediante ejemplo (en inglés, query by example). Se ha contribuido con una base de datos de 263 multi-cámaras UGVs de un total de 48 eventos reales. Se ha evaluado la trama propuesta en esta base de datos y se ha comparado con los métodos elaborados en el estado del arte. Los resultados experimentales muestran la efectividad del enfoque propuesto con la presencia alteraciones en el audio. Además, se ha presentado una selección automática de tramas en base a la reproducción de video y audio componiendo una grabación ininterrumpida de multi-cámaras UGVs sincronizadas en el mismo evento. También se ha diseñado un método de selección de puntos de corte automático basado en audio que proporciona una referencia común para la segmentación de audio y video. Con el fin de filtrar segmentos de videos de baja calidad, se han calculado algunas medidas espaciales y espacio-temporales. El sistema combina segmentos de UGVs empleando una estrategia de selección de cámaras basadas en la evaluación a través de un ranking considerando puntuaciones de calidad visuales y diversidad de visión. El sistema propuesto se ha validado con un conjunto de datos de 13 eventos (93 UGVs) a través de pruebas subjetivas y se han comparado con los métodos elaborados en el estado del arte. La selección de puntos de corte adecuados, evaluaciones de calidad visual específicas y la selección de cámara basada en ranking contribuyen en la mejoría de calidad del sistema propuesto respecto a otros métodos existentes. Finalmente, se ha realizado un método para la Detección de Movimiento de Cámara usando giróscopos para las UGVs capturadas desde smartphones y se ha diseñado un método de puntuación de calidad basada en el giro. El método de detección de movimiento de la cámara con una base de datos de 24 UGVs multi-modales y se ha comparado con los métodos actuales basados en visión y sistemas inerciales. A través del diseño de puntuación para cuantificar con el giróscopo cuán bien funcionan los sistemas de UGVs con multi-cámara, se ha desarrollado un sistema de composición de video basada en el movimiento del giroscopio. Este sistema basado en la puntuación a través del giróscopo sustituye a los sistemas de puntuaciones basados en parámetros espacio-temporales reduciendo la complejidad computacional. Además, se ha contribuido con un conjunto de datos de 3 eventos (12 UGVs), que se han empleado para validar los sistemas de composición de video basados en giróscopo.Postprint (published version

    Improvements of local directional pattern for texture classification.

    Get PDF
    Doctoral Degree. University of KwaZulu-Natal, Durban.The Local Directional Pattern (LDP) method has established its effectiveness and performance compared to the popular Local Binary Pattern (LBP) method in different applications. In this thesis, several extensions and modification of LDP are proposed with an objective to increase its robustness and discriminative power. Local Directional Pattern (LDP) is dependent on the empirical choice of three for the number of significant bits used to code the responses of the Kirsch Mask operation. In a first study, we applied LDP on informal settlements using various values for the number of significant bits k. It was observed that the change of the value of the number of significant bits led to a change in the performance, depending on the application. Local Directional Pattern (LDP) is based on the computation Kirsch Mask application response values in eight directions. But this method ignores the gray value of the center pixel, which may lead to loss of significant information. Centered Local Directional Pattern (CLDP) is introduced to solve this issue, using the value of the center pixel based on its relations with neighboring pixels. Local Directional Pattern (LDP) also generates a code based on the absolute value of the edge response value; however, the sign of the original value indicates two different trends (positive or negative) of the gradient. To capture the gradient trend, Signed Local Directional Pattern (SLDP) and Centered-SLDP (C-SLDP) are proposed, which compute the eight edge responses based on the two different directions (positive or negative) of the gradients.The Directional Local Binary pattern (DLBP) is introduced, which adopts directional information to represent texture images. This method is more stable than both LDP and LBP because it utilizes the center pixel as a threshold for the edge response of a pixel in eight directions, instead of employing the center pixel as the threshold for pixel intensity of the neighbors, as in the LBP method. Angled Local directional pattern (ALDP) is also presented, with an objective to resolve two problems in the LDP method. These are the value of the number of significant bits k, and to taking into account the center pixel value. It computes the angle values for the edge response of a pixel in eight directions for each angle (0◦,45◦,90◦,135◦). Each angle vector contains three values. The central value in each vector is chosen as a threshold for the other two neighboring pixels. Circular Local Directional Pattern (CILDP) isalso presented, with an objective of a better analysis, especially with textures with a different scale. The method is built around the circular shape to compute the directional edge vector using different radiuses. The performances of LDP, LBP, CLDP, SLDP, C-SLDP, DLBP, ALDP and CILDP are evaluated using five classifiers (K-nearest neighbour algorithm (k-NN), Support Vector Machine (SVM), Perceptron, Naive-Bayes (NB), and Decision Tree (DT)) applied to two different texture datasets: Kylberg dataset and KTH-TIPS2-b dataset. The experimental results demonstrated that the proposed methods outperform both LDP and LBP

    Face Mining in Wikipedia Biographies

    Get PDF
    RÉSUMÉ Cette thèse présente quelques contributions à la recherche liées au thème de la création d’un système automatisé pour l’extraction de visages dans les pages de biographie sur Wikipédia. La première contribution majeure de ce travail est l’élaboration d’une solution au problème basé sur une nouvelle technique de modélisation graphique probabiliste. Nous utilisons l’inférence probabiliste pour faire des prédictions structurées dans les modèles construits dynamiquement afin d’identifier les véritables exemples de visages correspondant à l’objet d’une biographie parmi tous les visages détectés. Notre modèle probabiliste prend en considération l’information provenant de différentes sources, dont : des résultats de comparaisons entre visages détectés, des métadonnées provenant des images de visage et de leurs détections, des images parentes, des données géospatiales, des noms de fichiers et des sous-titres. Nous croyons que cette recherche est également unique parce que nous sommes les premiers à présenter un système complet et une évaluation expérimentale de la tâche de l’extraction des visages humains dans la nature à une échelle de plus de 50 000 identités. Une autre contribution majeure de nos travaux est le développement d’une nouvelle catégorie de modèles probabilistes discriminatifs basée sur une fonction logistique Beta-Bernoulli généralisée. À travers notre formulation novatrice, nous fournissons une nouvelle méthode d’approximation lisse de la perte 0-1, ainsi qu’une nouvelle catégorie de classificateurs probabilistes. Nous présentons certaines expériences réalisées à l’aide de cette technique pour : 1) une nouvelle forme de régression logistique que nous nommons la régression logistique Beta-Bernoulli généralisée ; 2) une version de cette même technique ; et enfin pour 3) notre modèle pour l’extraction des visages que l’on pourrait considérer comme une technique de prédiction structurée en combinant plusieurs sources multimédias. À travers ces expériences, nous démontrons que les différentes formes de cette nouvelle formulation Beta-Bernoulli améliorent la performance des méthodes de la régression logistique couramment utilisées ainsi que la performance des machines à vecteurs de support (SVM) linéaires et non linéaires dans le but d’une classification binaire. Pour évaluer notre technique, nous avons procédé à des tests de performance reconnus en utilisant différentes propriétés allant de celles qui sont de relativement petite taille à celles qui sont de relativement grande taille, en plus de se baser sur des problèmes ayant des caractéristiques clairsemées ou denses. Notre analyse montre que le modèle Beta-Bernoulli généralisé améliore les formes analogues de modèles classiques de la régression logistique et les machines à vecteurs de support et que lorsque nos évaluations sont effectuées sur les ensembles de données à plus grande échelle, les résultats sont statistiquement significatifs. Une autre constatation est que l’approche est aussi robuste lorsqu’il s’agit de valeurs aberrantes. De plus, notre modèle d’extraction de visages atteint sa meilleure performance lorsque le sous-composant consistant d’un modèle discriminant d’entropie maximale est remplacé par notre modèle de Beta-Bernoulli généralisée de la régression logistique. Cela montre l’applicabilité générale de notre approche proposée pour une tâche de prédiction structurée. Autant que nous sachions, c’est la première fois qu’une approximation lisse de la perte 0-1 a été utilisée pour la classification structurée. Enfin, nous avons exploré plus en profondeur un problème important lié à notre tâche d’extraction des visages – la localisation des points-clés denses sur les visages humains. Nous avons développé un pipeline complet qui résout le problème de localisation des points-clés en utilisant une approche par sous-espace localement linéaire. Notre modèle de localisation des points-clés est d’une efficacité comparable à l’état de l’art.----------ABSTRACT This thesis presents a number of research contributions related to the theme of creating an automated system for extracting faces from Wikipedia biography pages. The first major contribution of this work is the formulation of a solution to the problem based on a novel probabilistic graphical modeling technique. We use probabilistic inference to make structured predictions in dynamically constructed models so as to identify true examples of faces corresponding to the subject of a biography among all detected faces. Our probabilistic model takes into account information from multiple sources, including: visual comparisons between detected faces, meta-data about facial images and their detections, parent images, image locations, image file names, and caption texts. We believe this research is also unique in that we are the first to present a complete system and an experimental evaluation for the task of mining wild human faces on the scale of over 50,000 identities. The second major contribution of this work is the development of a new class of discriminative probabilistic models based on a novel generalized Beta-Bernoulli logistic function. Through our generalized Beta-Bernoulli formulation, we provide both a new smooth 0-1 loss approximation method and new class of probabilistic classifiers. We present experiments using this technique for: 1) a new form of Logistic Regression which we call generalized Beta-Bernoulli Logistic Regression, 2) a kernelized version of the aforementioned technique, and 3) our probabilistic face mining model, which can be regarded as a structured prediction technique that combines information from multimedia sources. Through experiments, we show that the different forms of this novel Beta-Bernoulli formulation improve upon the performance of both widely-used Logistic Regression methods and state-of-the-art linear and non-linear Support Vector Machine techniques for binary classification. To evaluate our technique, we have performed tests using a number of widely used benchmarks with different properties ranging from those that are comparatively small to those that are comparatively large in size, as well as problems with both sparse and dense features. Our analysis shows that the generalized Beta-Bernoulli model improves upon the analogous forms of classical Logistic Regression and Support Vector Machine models and that when our evaluations are performed on larger scale datasets, the results are statistically significant. Another finding is that the approach is also robust when dealing with outliers. Furthermore, our face mining model achieves it’s best performance when its sub-component consisting of a discriminative Maximum Entropy Model is replaced with our generalized Beta-Bernoulli Logistic Regression model. This shows the general applicability of our proposed approach for a structured prediction task. To the best of our knowledge, this represents the first time that a smooth approximation to the 0-1 loss has been used for structured predictions. Finally, we have explored an important problem related to our face extraction task in more depth - the localization of dense keypoints on human faces. Therein, we have developed a complete pipeline that solves the keypoint localization problem using an adaptively estimated, locally linear subspace technique. Our keypoint localization model performs on par with state-of-the-art methods
    corecore