11 research outputs found

    Learning Independent Instance Maps for Crowd Localization

    Full text link
    Accurately locating each head's position in the crowd scenes is a crucial task in the field of crowd analysis. However, traditional density-based methods only predict coarse prediction, and segmentation/detection-based methods cannot handle extremely dense scenes and large-range scale-variations crowds. To this end, we propose an end-to-end and straightforward framework for crowd localization, named Independent Instance Map segmentation (IIM). Different from density maps and boxes regression, each instance in IIM is non-overlapped. By segmenting crowds into independent connected components, the positions and the crowd counts (the centers and the number of components, respectively) are obtained. Furthermore, to improve the segmentation quality for different density regions, we present a differentiable Binarization Module (BM) to output structured instance maps. BM brings two advantages into localization models: 1) adaptively learn a threshold map for different images to detect each instance more accurately; 2) directly train the model using loss on binary predictions and labels. Extensive experiments verify the proposed method is effective and outperforms the-state-of-the-art methods on the five popular crowd datasets. Significantly, IIM improves F1-measure by 10.4\% on the NWPU-Crowd Localization task. The source code and pre-trained models will be released at \url{https://github.com/taohan10200/IIM}

    A Wide Area Multiview Static Crowd Estimation System Using UAV and 3D Training Simulator

    Get PDF
    Crowd size estimation is a challenging problem, especially when the crowd is spread over a significant geographical area. It has applications in monitoring of rallies and demonstrations and in calculating the assistance requirements in humanitarian disasters. Therefore, accomplishing a crowd surveillance system for large crowds constitutes a significant issue. UAV-based techniques are an appealing choice for crowd estimation over a large region, but they present a variety of interesting challenges, such as integrating per-frame estimates through a video without counting individuals twice. Large quantities of annotated training data are required to design, train, and test such a system. In this paper, we have first reviewed several crowd estimation techniques, existing crowd simulators and data sets available for crowd analysis. Later, we have described a simulation system to provide such data, avoiding the need for tedious and error-prone manual annotation. Then, we have evaluated synthetic video from the simulator using various existing single-frame crowd estimation techniques. Our findings show that the simulated data can be used to train and test crowd estimation, thereby providing a suitable platform to develop such techniques. We also propose an automated UAV-based 3D crowd estimation system that can be used for approximately static or slow-moving crowds, such as public events, political rallies, and natural or man-made disasters. We evaluate the results by applying our new framework to a variety of scenarios with varying crowd sizes. The proposed system gives promising results using widely accepted metrics including MAE, RMSE, Precision, Recall, and F1 score to validate the results

    Diseño, implementación y evaluación de una nueva estrategia de aprendizaje para redes neuronales convolucionales de transformación espacial de imágenes (STNs)

    Full text link
    [ES] Este trabajo consistirá en diseñar, implementar y evaluar diferentes métodos de convergencia para el desempeño de redes neuronales convolucionales de transformación espacial, en este caso trabajando sobre imágenes de gusanos (C. elegans). Inicialmente, el trabajo se centrará en el estudio y la comprensión de este tipo de redes neuronales de forma que se puedan plantear las diferentes estrategias a seguir. Para su evaluación, se contará con un dataset de parejas de imágenes de C. elegans, capturadas mediante dos cámaras, y el objetivo principal de los ensayos será transformar una de las imágenes, en la cual el gusano no aparece centrado, en la otra, en la cual sí lo estará. Para ello, se empleará la herramienta PyCharm como medio en el cual realizar los ensayos. Dicha herramienta emplea Python como lenguaje de programación, y mediante la librería de funciones de Pytorch junto a otras librerías típicas de Python se diseñarán tanto las redes neuronales como los diferentes métodos de convergencia que se van a evaluar. Finalmente, para la evaluación de las propuestas se emplearán diversos criterios entre los que estarán la tasa de acierto o los costes temporales de las ejecuciones. Además, se plantearán diversas aplicaciones en las cuales puedan emplearse los resultados aportados por este estudio.[CA] Aquest treball consistirà a dissenyar, implementar i avaluar diferents mètodes de convergència per a l'acompliment de xarxes neuronals convolucionals de transformació espacial, en aquest cas treballant sobre imatges de cucs (Caenorhabditis elegans). Inicialment, el treball se centrarà en l'estudi i la comprensió d'aquesta mena de xarxes neuronals de manera que es puguen plantejar les diferents estratègies a seguir. Per a la seua avaluació, es comptarà amb un dataset de parelles d'imatges de C. elegans, capturades mitjançant dues cambres, i l'objectiu principal dels assajos serà transformar una de les imatges, en la qual el cuc no apareix centrat, en l'altra, en la qual sí que ho estarà. Per a això, s'emprarà l'eina PyCharm com a mitjà en el qual realitzar els assajos. Aquesta eina empra Python com a llenguatge de programació, i mitjançant la llibreria de funcions de Pytorch al costat d'altres llibreries típiques de Python es dissenyaran tant les xarxes neuronals com els diferents mètodes de convergència que s'avaluaran. Finalment, per a l'avaluació de les propostes s'empraran diversos criteris entre els quals estaran la taxa d'encert o els costos temporals de les execucions. A més, es plantejaran diverses aplicacions en les quals puguen emprar-se els resultats aportats per aquest estudi.[EN] This work will consist of designing, implementing and evaluating different convergence methods for the performance of spatial transform convolutional neural networks, in this case working on images of worms (C. elegans). Initially, this work will focus on the study and understanding of this type of convolutional neural network so that can be proposed the different strategies to be followed. For its evaluation, there will be a dataset of C. elegans images pairs, captures by two cameras, and the main objective of the essays will be to transform one of these images, in which the worm does not appear in the middle of the image, into the other one, where it will be there. For this, the PyCharm tool will be used as the means in which to carry out the essays. This tool uses Python as its programming language, through the Pytorch function library together with others typical Python libraries, both the convolutional neural networks and the different convergence methods will be designed. Finally, for the evaluation of the proposed methods, several index will be used, among which will be the success rate or the computational costs. In addition, several applications will be proposed in which the results provided for this study can be used.Navarro Moya, F. (2021). Diseño, implementación y evaluación de una nueva estrategia de aprendizaje para redes neuronales convolucionales de transformación espacial de imágenes (STNs). Universitat Politècnica de València. http://hdl.handle.net/10251/173981TFG

    Video Stabilisation Based on Spatial Transformer Networks

    Get PDF
    User-Generated Content is normally recorded with mobile phones by non-professionals, which leads to a low viewing experience due to artifacts such as jitter and blur. Other jittery videos are those recorded with mounted cameras or moving platforms. In these scenarios, Digital Video Stabilization (DVS) has been utilized, to create high quality, professional level videos. In the industry and academia, there are a number of traditional and Deep Learning (DL)-based DVS systems, however both approaches have limitations: the former struggles to extract and track features in a number of scenarios, and the latter struggles with camera path smoothing, a hard problem to define in this context. On the other hand, traditional methods have shown good performance in smoothing camera path whereas DL methods are effective in feature extraction, tracking, and motion parameter estimation. Hence, to the best of our knowledge the available DVS systems struggle to stabilize videos in a wide variety of scenarios, especially with high motion and certain scene content, such as textureless areas, dark scenes, close object, lack of depth, amongst others. Another challenge faced by current DVS implementations is the resulting artifacts that such systems add to the stabilized videos, degrading the viewing experience. These artifacts are mainly distortion, blur, zoom, and ghosting effects. In this thesis, we utilize the strengths of Deep Learning and traditional methods for video stabilization. Our approach is robust to a wide variety of scene content and camera motion, and avoids adding artifacts to the stabilized video. First, we provide a dataset and evaluation framework for Deep Learning-based DVS. Then, we present our image alignment module, which contains a Spatial Transformer Network (STN). Next, we leverage this module to propose a homography-based video stabilization system. Aiming at avoiding blur and distortion caused by homographies, our next proposal is a translation-based video stabilization method, which contains Exponential Weighted Moving Averages (EWMAs) to smooth the camera path. Finally, instead of using EWMAs, we study the utilization of filters in our approach. In this case, we compare a number of filters and choose the filters with best performance. Since the quality of experience of a viewer does not only consist of video stability, but also of blur and distortion, we consider it is a good trade off to allow some jitter left on the video while avoiding adding distortion and blur. In all three cases, we show that this approach pays off, since our systems ourperform the state-of-the-art proposals

    Advances in Object and Activity Detection in Remote Sensing Imagery

    Get PDF
    The recent revolution in deep learning has enabled considerable development in the fields of object and activity detection. Visual object detection tries to find objects of target classes with precise localisation in an image and assign each object instance a corresponding class label. At the same time, activity recognition aims to determine the actions or activities of an agent or group of agents based on sensor or video observation data. It is a very important and challenging problem to detect, identify, track, and understand the behaviour of objects through images and videos taken by various cameras. Together, objects and their activity recognition in imaging data captured by remote sensing platforms is a highly dynamic and challenging research topic. During the last decade, there has been significant growth in the number of publications in the field of object and activity recognition. In particular, many researchers have proposed application domains to identify objects and their specific behaviours from air and spaceborne imagery. This Special Issue includes papers that explore novel and challenging topics for object and activity detection in remote sensing images and videos acquired by diverse platforms
    corecore