10 research outputs found

    Context-aware part-based people detection for video monitoring

    Full text link
    This paper is a postprint of a paper submitted to and accepted for publication in Electronics Letters and is subject to Institution of Engineering and Technology Copyright. The copy of record is available at IEEE Digital LibraryA novel approach for part-based people detection in images that uses contextual information is proposed. Two sources of context are distinguished regarding the local (neighbour) information and the relative importance of the parts in the model. Local context determines part visibility which is derived from the spatial location of static objects in the scene and from the relation between scales of analysis and detection window sizes. Experimental results over various datasets show that the proposed use of context outperforms the related state-of-the-art.This work was supported by the Spanish Government (HA-Video TEC2014-5317-R)

    Foreign object detection (FOD) using multi-class classifier with single camera vs. distance map with stereo configuration

    Get PDF
    Detection of objects of interest is a fundamental problem in computer vision. Foreign object detection (FOD) is to detect the objects that are not expected to be appear in certain area. For this task, we need to first detect the position of foreign objects, and then compute the distance to the foreign objects to judge whether the objects are within the dangerous zone or not. The three principle sources of difficulty in performing this task are: a) the huge number of foreign objects categories, b) the calculation of distance using camera(s), and c) the real-time system running performance. Most state-of-art detectors focus on one type or one class of objects. To the best of our knowledge, there is no single solution that focuses on a set of multiple foreign objects detection in an integrated manner. In some cases, multiple detectors can operate simultaneously to detect objects of interest in a given input. This is not efficient. The goal of our research is to focus on detection of a set of objects identified as foreign object in an integrated and efficient manner. We design a multi-class detector. Our approach is to use a coarse-tofine strategy in which we divide the complicated space into finer and finer sub-spaces. For this purpose, data-driven clustering algorithm is implemented to gather similar foreign objects samples, and then an extended vector boosting algorithm is developed to train our multi-class classifier. The purpose of the extended vector boosting algorithm is to separate all foreign objects from background. For the task of estimation of the distance to the foreign objects, we design a look-up table which is based on the area of the detected foreign objects. Furthermore, we design a FOD framework. Our approach is to use stereo matching algorithm to get the disparity information based on intensity images from stereo cameras, and then using the camera model to retrieve the distance information. The distance calculated using disparity is more accurate than using the distance look-up table. We calculate the initial distance map when no objects are in the scene. Block of interest (BOI) is the area where distance is smaller than the corresponding area in the initial distance map. For the purpose of detecting foreign objects, we use flood fill method along with noise suppression method to combine adjacent BOI with higher confidence level.The foreign object detection prototype system has been implemented and evaluated on a number of test sets under real working scenarios. The experimental results show that our algorithm and framework are efficient and robust

    Context-based Information Fusion: A survey and discussion

    Get PDF
    This survey aims to provide a comprehensive status of recent and current research on context-based Information Fusion (IF) systems, tracing back the roots of the original thinking behind the development of the concept of \u201ccontext\u201d. It shows how its fortune in the distributed computing world eventually permeated in the world of IF, discussing the current strategies and techniques, and hinting possible future trends. IF processes can represent context at different levels (structural and physical constraints of the scenario, a priori known operational rules between entities and environment, dynamic relationships modelled to interpret the system output, etc.). In addition to the survey, several novel context exploitation dynamics and architectural aspects peculiar to the fusion domain are presented and discussed

    Visual Analysis of Extremely Dense Crowded Scenes

    Get PDF
    Visual analysis of dense crowds is particularly challenging due to large number of individuals, occlusions, clutter, and fewer pixels per person which rarely occur in ordinary surveillance scenarios. This dissertation aims to address these challenges in images and videos of extremely dense crowds containing hundreds to thousands of humans. The goal is to tackle the fundamental problems of counting, detecting and tracking people in such images and videos using visual and contextual cues that are automatically derived from the crowded scenes. For counting in an image of extremely dense crowd, we propose to leverage multiple sources of information to compute an estimate of the number of individuals present in the image. Our approach relies on sources such as low confidence head detections, repetition of texture elements (using SIFT), and frequency-domain analysis to estimate counts, along with confidence associated with observing individuals, in an image region. Furthermore, we employ a global consistency constraint on counts using Markov Random Field which caters for disparity in counts in local neighborhoods and across scales. We tested this approach on crowd images with the head counts ranging from 94 to 4543 and obtained encouraging results. Through this approach, we are able to count people in images of high-density crowds unlike previous methods which are only applicable to videos of low to medium density crowded scenes. However, the counting procedure just outputs a single number for a large patch or an entire image. With just the counts, it becomes difficult to measure the counting error for a query image with unknown number of people. For this, we propose to localize humans by finding repetitive patterns in the crowd image. Starting with detections from an underlying head detector, we correlate them within the image after their selection through several criteria: in a pre-defined grid, locally, or at multiple scales by automatically finding the patches that are most representative of recurring patterns in the crowd image. Finally, the set of generated hypotheses is selected using binary integer quadratic programming with Special Ordered Set (SOS) Type 1 constraints. Human Detection is another important problem in the analysis of crowded scenes where the goal is to place a bounding box on visible parts of individuals. Primarily applicable to images depicting medium to high density crowds containing several hundred humans, it is a crucial pre-requisite for many other visual tasks, such as tracking, action recognition or detection of anomalous behaviors, exhibited by individuals in a dense crowd. For detecting humans, we explore context in dense crowds in the form of locally-consistent scale prior which captures the similarity in scale in local neighborhoods with smooth variation over the image. Using the scale and confidence of detections obtained from an underlying human detector, we infer scale and confidence priors using Markov Random Field. In an iterative mechanism, the confidences of detections are modified to reflect consistency with the inferred priors, and the priors are updated based on the new detections. The final set of detections obtained are then reasoned for occlusion using Binary Integer Programming where overlaps and relations between parts of individuals are encoded as linear constraints. Both human detection and occlusion reasoning in this approach are solved with local neighbor-dependent constraints, thereby respecting the inter-dependence between individuals characteristic to dense crowd analysis. In addition, we propose a mechanism to detect different combinations of body parts without requiring annotations for individual combinations. Once human detection and localization is performed, we then use it for tracking people in dense crowds. Similar to the use of context as scale prior for human detection, we exploit it in the form of motion concurrence for tracking individuals in dense crowds. The proposed method for tracking provides an alternative and complementary approach to methods that require modeling of crowd flow. Simultaneously, it is less likely to fail in the case of dynamic crowd flows and anomalies by minimally relying on previous frames. The approach begins with the automatic identification of prominent individuals from the crowd that are easy to track. Then, we use Neighborhood Motion Concurrence to model the behavior of individuals in a dense crowd, this predicts the position of an individual based on the motion of its neighbors. When the individual moves with the crowd flow, we use Neighborhood Motion Concurrence to predict motion while leveraging five-frame instantaneous flow in case of dynamically changing flow and anomalies. All these aspects are then embedded in a framework which imposes hierarchy on the order in which positions of individuals are updated. The results are reported on eight sequences of medium to high density crowds and our approach performs on par with existing approaches without learning or modeling patterns of crowd flow. We experimentally demonstrate the efficacy and reliability of our algorithms by quantifying the performance of counting, localization, as well as human detection and tracking on new and challenging datasets containing hundreds to thousands of humans in a given scene

    Object detection for big data

    Get PDF
    "May 2014."Dissertation supervisor: Dr. Tony X. Han.Includes vita.We have observed significant advances in object detection over the past few decades and gladly seen the related research has began to contribute to the world: Vehicles could automatically stop before hitting any pedestrian; Face detectors have been integrated into smart phones and tablets; Video surveillance systems could locate the suspects and stop crimes. All these applications demonstrate the substantial research progress on object detection. However learning a robust object detector is still quite challenging due to the fact that object detection is a very unbalanced big data problem. In this dissertation, we aim at improving the object detector's performance from different aspects. For object detection, the state-of-the-art performance is achieved through supervised learning. The performances of object detectors of this kind are mainly determined by two factors: features and underlying classification algorithms. We have done thorough research on both of these factors. Our contribution involves model adaption, local learning, contextual boosting, template learning and feature development. Since the object detection is an unbalanced problem, in which positive examples are hard to be collected, we propose to adapt a general object detector for a specific scenario with a few positive examples; To handle the large intra-class variation problem lying in object detection task, we propose a local adaptation method to learn a set of efficient and effective detectors for a single object category; To extract the effective context from the huge amount of negative data in object detection, we introduce a novel contextual descriptor to iteratively improve the detector; To detect object with a depth sensor, we design an effective depth descriptor; To distinguish the object categories with the similar appearance, we propose a local feature embedding and template selection algorithm, which has been successfully incorporated into a real-world fine-grained object recognition application. All the proposed algorithms and featuIncludes bibliographical references (pages 117-130)

    PERSON RE-IDENTIFICATION USING RGB-DEPTH CAMERAS

    Full text link
    [EN] The presence of surveillance systems in our lives has drastically increased during the last years. Camera networks can be seen in almost every crowded public and private place, which generate huge amount of data with valuable information. The automatic analysis of data plays an important role to extract relevant information from the scene. In particular, the problem of person re-identification is a prominent topic that has become of great interest, specially for the fields of security or marketing. However, there are some factors, such as changes in the illumination conditions, variations in the person pose, occlusions or the presence of outliers that make this topic really challenging. Fortunately, the recent introduction of new technologies such as depth cameras opens new paradigms in the image processing field and brings new possibilities. This Thesis proposes a new complete framework to tackle the problem of person re-identification using commercial rgb-depth cameras. This work includes the analysis and evaluation of new approaches for the modules of segmentation, tracking, description and matching. To evaluate our contributions, a public dataset for person re-identification using rgb-depth cameras has been created. Rgb-depth cameras provide accurate 3D point clouds with color information. Based on the analysis of the depth information, an novel algorithm for person segmentation is proposed and evaluated. This method accurately segments any person in the scene, and naturally copes with occlusions and connected people. The segmentation mask of a person generates a 3D person cloud, which can be easily tracked over time based on proximity. The accumulation of all the person point clouds over time generates a set of high dimensional color features, named raw features, that provides useful information about the person appearance. In this Thesis, we propose a family of methods to extract relevant information from the raw features in different ways. The first approach compacts the raw features into a single color vector, named Bodyprint, that provides a good generalisation of the person appearance over time. Second, we introduce the concept of 3D Bodyprint, which is an extension of the Bodyprint descriptor that includes the angular distribution of the color features. Third, we characterise the person appearance as a bag of color features that are independently generated over time. This descriptor receives the name of Bag of Appearances because its similarity with the concept of Bag of Words. Finally, we use different probabilistic latent variable models to reduce the feature vectors from a statistical perspective. The evaluation of the methods demonstrates that our proposals outperform the state of the art.[ES] La presencia de sistemas de vigilancia se ha incrementado notablemente en los últimos anños. Las redes de videovigilancia pueden verse en casi cualquier espacio público y privado concurrido, lo cual genera una gran cantidad de datos de gran valor. El análisis automático de la información juega un papel importante a la hora de extraer información relevante de la escena. En concreto, la re-identificación de personas es un campo que ha alcanzado gran interés durante los últimos años, especialmente en seguridad y marketing. Sin embargo, existen ciertos factores, como variaciones en las condiciones de iluminación, variaciones en la pose de la persona, oclusiones o la presencia de artefactos que hacen de este campo un reto. Afortunadamente, la introducción de nuevas tecnologías como las cámaras de profundidad plantea nuevos paradigmas en la visión artificial y abre nuevas posibilidades. En esta Tesis se propone un marco completo para abordar el problema de re-identificación utilizando cámaras rgb-profundidad. Este trabajo incluye el análisis y evaluación de nuevos métodos de segmentación, seguimiento, descripción y emparejado de personas. Con el fin de evaluar las contribuciones, se ha creado una base de datos pública para re-identificación de personas usando estas cámaras. Las cámaras rgb-profundidad proporcionan nubes de puntos 3D con información de color. A partir de la información de profundidad, se propone y evalúa un nuevo algoritmo de segmentación de personas. Este método segmenta de forma precisa cualquier persona en la escena y resuelve de forma natural problemas de oclusiones y personas conectadas. La máscara de segmentación de una persona genera una nube de puntos 3D que puede ser fácilmente seguida a lo largo del tiempo. La acumulación de todas las nubes de puntos de una persona a lo largo del tiempo genera un conjunto de características de color de grandes dimensiones, denominadas características base, que proporcionan información útil de la apariencia de la persona. En esta Tesis se propone una familia de métodos para extraer información relevante de las características base. La primera propuesta compacta las características base en un vector único de color, denominado Bodyprint, que proporciona una buena generalización de la apariencia de la persona a lo largo del tiempo. En segundo lugar, se introducen los Bodyprints 3D, definidos como una extensión de los Bodyprints que incluyen información angular de las características de color. En tercer lugar, la apariencia de la persona se caracteriza mediante grupos de características de color que se generan independientemente a lo largo del tiempo. Este descriptor recibe el nombre de Grupos de Apariencias debido a su similitud con el concepto de Grupos de Palabras. Finalmente, se proponen diferentes modelos probabilísticos de variables latentes para reducir los vectores de características desde un punto de vista estadístico. La evaluación de los métodos demuestra que nuestras propuestas superan los métodos del estado del arte.[CA] La presència de sistemes de vigilància s'ha incrementat notòriament en els últims anys. Les xarxes de videovigilància poden veure's en quasi qualsevol espai públic i privat concorregut, la qual cosa genera una gran quantitat de dades de gran valor. L'anàlisi automàtic de la informació pren un paper important a l'hora d'extraure informació rellevant de l'escena. En particular, la re-identificaciò de persones és un camp que ha aconseguit gran interès durant els últims anys, especialment en seguretat i màrqueting. No obstant, hi ha certs factors, com variacions en les condicions d'il.luminació, variacions en la postura de la persona, oclusions o la presència d'artefactes que fan d'aquest camp un repte. Afortunadament, la introducció de noves tecnologies com les càmeres de profunditat, planteja nous paradigmes en la visió artificial i obri noves possibilitats. En aquesta Tesi es proposa un marc complet per abordar el problema de la re-identificació mitjançant càmeres rgb-profunditat. Aquest treball inclou l'anàlisi i avaluació de nous mètodes de segmentació, seguiment, descripció i emparellat de persones. Per tal d'avaluar les contribucions, s'ha creat una base de dades pública per re-identificació de persones emprant aquestes càmeres. Les càmeres rgb-profunditat proporcionen núvols de punts 3D amb informació de color. A partir de la informació de profunditat, es defineix i s'avalua un nou algorisme de segmentació de persones. Aquest mètode segmenta de forma precisa qualsevol persona en l'escena i resol de forma natural problemes d'oclusions i persones connectades. La màscara de segmentació d'una persona genera un núvol de punts 3D que pot ser fàcilment seguida al llarg del temps. L'acumulació de tots els núvols de punts d'una persona al llarg del temps genera un conjunt de característiques de color de grans dimensions, anomenades característiques base, que hi proporcionen informació útil de l'aparença de la persona. En aquesta Tesi es proposen una família de mètodes per extraure informació rellevant de les característiques base. La primera proposta compacta les característiques base en un vector únic de color, anomenat Bodyprint, que proporciona una bona generalització de l'aparença de la persona al llarg del temps. En segon lloc, s'introdueixen els Bodyprints 3D, definits com una extensió dels Bodyprints que inclouen informació angular de les característiques de color. En tercer lloc, l'aparença de la persona es caracteritza amb grups de característiques de color que es generen independentment a llarg del temps. Aquest descriptor reb el nom de Grups d'Aparences a causa de la seua similitud amb el concepte de Grups de Paraules. Finalment, es proposen diferents models probabilístics de variables latents per reduir els vectors de característiques des d'un punt de vista estadístic. L'avaluació dels mètodes demostra que les propostes presentades superen als mètodes de l'estat de l'art.Oliver Moll, J. (2015). PERSON RE-IDENTIFICATION USING RGB-DEPTH CAMERAS [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/59227TESI

    Stereo-based Pedestrian Detection and Path Prediction

    Get PDF
    In den letzten Jahren gab es eine rasante Entwicklung von Fahrerassistenzsystemen (Englisch: Advanced Driver Assistance Systems oder kurz ADAS). Diese Systeme unterstützen nicht nur den Fahrer, sondern erhöhen durch das automatische Einleiten von Sicherheitreaktionen des Fahrzeuges selber auch die Sicherheit aller anderen Verkehrsteilnehmer. Zukünftige aktive Fußgängerschutzsystem in Intelligentem Fahrzeugen müssen nun noch einen Schritt weiter gehen und lernen, ein genaues Bild ihrer Umgebung und der darin während der Fahrt zu erwartenden Änderungen zu entwickeln. Diese Arbeit widmet sich der Verbesserung bildgestützter Fußgängerschutzsysteme. Es werden darin neue Methoden der Bildhypothesengenerierung (englisch: region of interest (ROI) generation), Fußgängerklassifikation, Pfadvorhersage und Absichstserkennung entwickelt. Die Leistung der Fußgängererkennung in realen, dynamischen Umgebungen mittels einer bewegten Kamera wird durch die Verwendung von dichtem Stereo in den unterschiedlichen Modulen verbessert. In einer Experimentalstudie wurde die Effizienz eines Systems zur monokularen Fußgängererkennung mit einem System verglichen, dass erweitert wurde um dichtes Stereo für die Hypothesengenerierung und der Fußgängerverfolgung (englisch: tracking) zu nutzen. Das neue System erwies sich hierin als deutlich effizienter als das monokulare System. Diese Leistungssteigerung gab Anlass für eine erweiterte Nutzung von dichtem Stereo bei der Fußgängererkennung. Die Hypothesengenerierung wurde durch die dynamische Schätzung der Kameraorientierung und des Straßenprofils weiter verbessert. Insbesondere bei hügeligen Straßen steigerte sich die Erkennungsleistung durch die Optimierung des Suchbereichs. Zusätzlich konnte die Klassifikationsleistung durch die Fusion von unterschiedlichen Merkmalen aus Bild und Tiefeninformation verbessert werden. Aufbauend auf den Erfolgen bei der Fußgängererkennung wird in der Arbeit ein System für den Aktiven Fußgängerschutz vorgestellt, welches die Funktionen Fußgängererkennung, Situationsanalyse und Fahrzeugsteuerung kombiniert. Für die Fußgängerkennung wurden Ergebnisse eines Verfahrens zur bewegungsbasierten Objekterkennung mit Ergebnissen eines Fußgängerklassifikators fusioniert. Das System wurde in einen Versuchsträger eingebaut und half dabei, Unfälle durch einen aktiven Lenkeingriff oder ein Notbremsemanöver zu vermeiden. Der letzte Teil der Arbeit befasst sich mit dem Problem der Pfadvorhersage und dem Erkennen der Fußgängerabsicht in Situationen, in denen sich der Fußgänger nicht mit einer konstanten Geschwindigkeit bewegt. Zwei neue, lernbasierte Ansätze werden vorgestellt und mit aktuellen Verfahren verglichen. Durch die Verwendung von Merkmalen, die aus dichtem optischem Fluss generiert werden, ist es möglich den Pfad und die Absicht einer Fußgängers vorherzusagen. Das erste Verfahren lernt eine niedrigdimensionale Mannigfaltigkeit der Merkmale, die eine Vorhersage von Merkmale, Pfad und Absicht erlaubt. Das zweite Verfahren verwendet einen Suchbaum in dem Trajektorien abgelegt sind die mit Bewegungsmerkmalen erweitert wurden. Ein probabilistischer Suchalgorithmus ermöglicht die Vorhersage des Fußgängerpfads und Absicht. Die Leistungsfähigkeit der Systeme wurde zusätzlich mit der Leistung von menschlichen Probanden verglichen. In dieser Arbeit wurde großer Wert auf die ausführliche Analyse der vorgestellten Verfahren und die Verwendung von realistischen Testdatensätzen gelegt. Die Experimente zeigen das die Leistungsfähigkeit eines Systems zur Fußgängererkennung durch die Verwendung von dichtem Stereo verbessert werden kann. Die Vorgestellten Verfahren zur Pfadvorhersage und Absichtserkennung ermöglichen ein frühzeitiges erkenne der Fußgängerabsicht. Die Zuverlässigkeit zukünftiger System für den Aktiven Fußgängerschutz, die durch Aktiven Lenkeingriff oder Notbremsemanöver Unfälle vermeiden, kann mit den vorgestellten Verfahren verbessert werden. Dadurch können Unfälle vollständig verhindert oder die Schwere einer Kollision reduziert werden
    corecore