926 research outputs found

    Crowd detection and counting using a static and dynamic platform: state of the art

    Get PDF
    Automated object detection and crowd density estimation are popular and important area in visual surveillance research. The last decades witnessed many significant research in this field however, it is still a challenging problem for automatic visual surveillance. The ever increase in research of the field of crowd dynamics and crowd motion necessitates a detailed and updated survey of different techniques and trends in this field. This paper presents a survey on crowd detection and crowd density estimation from moving platform and surveys the different methods employed for this purpose. This review category and delineates several detections and counting estimation methods that have been applied for the examination of scenes from static and moving platforms

    Active Mapping and Robot Exploration: A Survey

    Get PDF
    Simultaneous localization and mapping responds to the problem of building a map of the environment without any prior information and based on the data obtained from one or more sensors. In most situations, the robot is driven by a human operator, but some systems are capable of navigating autonomously while mapping, which is called native simultaneous localization and mapping. This strategy focuses on actively calculating the trajectories to explore the environment while building a map with a minimum error. In this paper, a comprehensive review of the research work developed in this field is provided, targeting the most relevant contributions in indoor mobile robotics.This research was funded by the ELKARTEK project ELKARBOT KK-2020/00092 of the Basque Government

    Tecnologรญa para Tiendas Inteligentes

    Get PDF
    Trabajo de Fin de Grado en Doble Grado en Ingenierรญa Informรกtica y Matemรกticas, Facultad de Informรกtica UCM, Departamento de Ingenierรญa del Software e Inteligencia Artificial, Curso 2020/2021Smart stores technologies exemplify how Artificial Intelligence and Internet of Things can effectively join forces to shape the future of retailing. With an increasing number of companies proposing and implementing their own smart store concepts, such as Amazon Go or Tao Cafe, a new field is clearly emerging. Since the technologies used to build their infrastructure offer significant competitive advantages, companies are not publicly sharing their own designs. For this reason, this work presents a new smart store model named Mercury, which aims to take the edge off of the lack of public and accessible information and research documents in this field. We do not only introduce a comprehensive smart store model, but also work-through a feasible detailed implementation so that anyone can build their own system upon it.Las tecnologรญas utilizadas en las tiendas inteligentes ejemplifican cรณmo la Inteligencia Artificial y el Internet de las Cosas pueden unir, de manera efectiva, fuerzas para transformar el futuro de la venta al por menor. Con un creciente nรบmero de empresas proponiendo e implementando sus propios conceptos de tiendas inteligentes, como Amazon Go o Tao Cafe, un nuevo campo estรก claramente emergiendo. Debido a que las tecnologรญas utilizadas para construir sus infraestructuras ofrecen una importante ventaja competitiva, las empresas no estรกn compartiendo pรบblicamente sus diseรฑos. Por esta razรณn, este trabajo presenta un nuevo modelo de tienda inteligente llamado Mercury, que tiene como objetivo mitigar la falta de informaciรณn pรบblica y accesible en este campo. No solo introduciremos un modelo general y completo de tienda inteligente, sino que tambiรฉn proponemos una implementaciรณn detallada y concreta para que cualquier persona pueda construir su propia tienda inteligente siguiendo nuestro modelo.Depto. de Ingenierรญa de Software e Inteligencia Artificial (ISIA)Fac. de InformรกticaTRUEunpu

    Understanding social relationships in egocentric vision

    Get PDF
    The understanding of mutual people interaction is a key component for recognizing people social behavior, but it strongly relies on a personal point of view resulting difficult to be a-priori modeled. We propose the adoption of the unique head mounted cameras first person perspective (ego-vision) to promptly detect people interaction in different social contexts. The proposal relies on a complete and reliable system that extracts people\u5f3s head pose combining landmarks and shape descriptors in a temporal smoothed HMM framework. Finally, interactions are detected through supervised clustering on mutual head orientation and people distances exploiting a structural learning framework that specifically adjusts the clustering measure according to a peculiar scenario. Our solution provides the flexibility to capture the interactions disregarding the number of individuals involved and their level of acquaintance in context with a variable degree of social involvement. The proposed system shows competitive performances on both publicly available ego-vision datasets and ad hoc benchmarks built with real life situations

    Change blindness: eradication of gestalt strategies

    Get PDF
    Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149โ€“164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ยฑ1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task

    Video foreground extraction for mobile camera platforms

    Get PDF
    Foreground object detection is a fundamental task in computer vision with many applications in areas such as object tracking, event identification, and behavior analysis. Most conventional foreground object detection methods work only in a stable illumination environments using fixed cameras. In real-world applications, however, it is often the case that the algorithm needs to operate under the following challenging conditions: drastic lighting changes, object shape complexity, moving cameras, low frame capture rates, and low resolution images. This thesis presents four novel approaches for foreground object detection on real-world datasets using cameras deployed on moving vehicles.The first problem addresses passenger detection and tracking tasks for public transport buses investigating the problem of changing illumination conditions and low frame capture rates. Our approach integrates a stable SIFT (Scale Invariant Feature Transform) background seat modelling method with a human shape model into a weighted Bayesian framework to detect passengers. To deal with the problem of tracking multiple targets, we employ the Reversible Jump Monte Carlo Markov Chain tracking algorithm. Using the SVM classifier, the appearance transformation models capture changes in the appearance of the foreground objects across two consecutives frames under low frame rate conditions. In the second problem, we present a system for pedestrian detection involving scenes captured by a mobile bus surveillance system. It integrates scene localization, foreground-background separation, and pedestrian detection modules into a unified detection framework. The scene localization module performs a two stage clustering of the video data.In the first stage, SIFT Homography is applied to cluster frames in terms of their structural similarity, and the second stage further clusters these aligned frames according to consistency in illumination. This produces clusters of images that are differential in viewpoint and lighting. A kernel density estimation (KDE) technique for colour and gradient is then used to construct background models for each image cluster, which is further used to detect candidate foreground pixels. Finally, using a hierarchical template matching approach, pedestrians can be detected.In addition to the second problem, we present three direct pedestrian detection methods that extend the HOG (Histogram of Oriented Gradient) techniques (Dalal and Triggs, 2005) and provide a comparative evaluation of these approaches. The three approaches include: a) a new histogram feature, that is formed by the weighted sum of both the gradient magnitude and the filter responses from a set of elongated Gaussian filters (Leung and Malik, 2001) corresponding to the quantised orientation, which we refer to as the Histogram of Oriented Gradient Banks (HOGB) approach; b) the codebook based HOG feature with branch-and-bound (efficient subwindow search) algorithm (Lampert et al., 2008) and; c) the codebook based HOGB approach.In the third problem, a unified framework that combines 3D and 2D background modelling is proposed to detect scene changes using a camera mounted on a moving vehicle. The 3D scene is first reconstructed from a set of videos taken at different times. The 3D background modelling identifies inconsistent scene structures as foreground objects. For the 2D approach, foreground objects are detected using the spatio-temporal MRF algorithm. Finally, the 3D and 2D results are combined using morphological operations.The significance of these research is that it provides basic frameworks for automatic large-scale mobile surveillance applications and facilitates many higher-level applications such as object tracking and behaviour analysis

    Advances in Object and Activity Detection in Remote Sensing Imagery

    Get PDF
    The recent revolution in deep learning has enabled considerable development in the fields of object and activity detection. Visual object detection tries to find objects of target classes with precise localisation in an image and assign each object instance a corresponding class label. At the same time, activity recognition aims to determine the actions or activities of an agent or group of agents based on sensor or video observation data. It is a very important and challenging problem to detect, identify, track, and understand the behaviour of objects through images and videos taken by various cameras. Together, objects and their activity recognition in imaging data captured by remote sensing platforms is a highly dynamic and challenging research topic. During the last decade, there has been significant growth in the number of publications in the field of object and activity recognition. In particular, many researchers have proposed application domains to identify objects and their specific behaviours from air and spaceborne imagery. This Special Issue includes papers that explore novel and challenging topics for object and activity detection in remote sensing images and videos acquired by diverse platforms

    ๊ตฐ์ค‘ ๋ฐ€๋„ ์˜ˆ์ธก์„ ์œ„ํ•œ ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ์™€ ํ›ˆ๋ จ๋ฐฉ๋ฒ•์˜ ํ˜ผ์žก๋„ ๋ฐ ํฌ๊ธฐ ์ธ์‹ ์„ค๊ณ„

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2022.2. ์ตœ์ง„์˜.This dissertation presents novel deep learning-based crowd density estimation methods considering the crowd congestion and scale of people. Crowd density estimation is one of the important tasks for the intelligent surveillance system. Using the crowd density estimation, the region of interest for public security and safety can be easily indicated. It can also help advanced computer vision algorithms that are computationally expensive, such as pedestrian detection and tracking. After the introduction of deep learning to the crowd density estimation, most researches follow the conventional scheme that uses a convolutional neural network to learn the network to estimate crowd density map with training images. The deep learning-based crowd density estimation researches can consist of two perspectives; network structure perspective and training strategy perspective. In general, researches of network structure perspective propose a novel network structure to extract features to represent crowd well. On the other hand, those of the training strategy perspective propose a novel training methodology or a loss function to improve the counting performance. In this dissertation, I propose several works in both perspectives in deep learning-based crowd density estimation. In particular, I design the network models to be had rich crowd representation characteristics according to the crowd congestion and the scale of people. I propose two novel network structures: selective ensemble network and cascade residual dilated network. Also, I propose one novel loss function for the crowd density estimation: congestion-aware Bayesian loss. First, I propose a selective ensemble deep network architecture for crowd density estimation. In contrast to existing deep network-based methods, the proposed method incorporates two sub-networks for local density estimation: one to learn sparse density regions and one to learn dense density regions. Locally estimated density maps from the two sub-networks are selectively combined in an ensemble fashion using a gating network to estimate an initial crowd density map. The initial density map is refined as a high-resolution map, using another sub-network that draws on contextual information in the image. In training, a novel adaptive loss scheme is applied to resolve ambiguity in the crowded region. The proposed scheme improves both density map accuracy and counting accuracy by adjusting the weighting value between density loss and counting loss according to the degree of crowdness and training epochs. Second, I propose a novel crowd density estimation architecture, which is composed of multiple dilated convolutional neural network blocks with different scales. The proposed architecture is motivated by an empirical analysis that small-scale dilated convolution well estimates the center area density of each person, whereas large-scale dilated convolution well estimates the periphery area density of a person. To estimate the crowd density map gradually from the center to the periphery of each person in a crowd, the multiple dilated CNN blocks are trained in cascading from the small dilated CNN block to the large one. Third, I propose a novel congestion-aware Bayesian loss method that considers the person-scale and crowd-sparsity. Deep learning-based crowd density estimation can greatly improve the accuracy of crowd counting. Though a Bayesian loss method resolves the two problems of the need of a hand-crafted ground truth (GT) density and noisy annotations, counting accurately in high-congested scenes remains a challenging issue. In a crowd scene, people's appearances change according to the scale of each individual (i.e., the person-scale). Also, the lower the sparsity of a local region (i.e., the crowd-sparsity), the more difficult it is to estimate the crowd density. I estimate the person-scale based on scene geometry, and I then estimate the crowd-sparsity using the estimated person-scale. The estimated person-scale and crowd-sparsity are utilized in the novel congestion-aware Bayesian loss method to improve the supervising representation of the point annotations. The effectiveness of the proposed density estimators is validated through comparative experiments with state-of-the-art methods on widely-used crowd counting benchmark datasets. The proposed methods are achieved superior performance to the state-of-the-art density estimators on diverse surveillance environments. In addition, for all proposed crowd density estimation methods, the efficiency of each component is verified through several ablation experiments.๋ณธ ํ•™์œ„๋…ผ๋ฌธ์—์„œ๋Š” ๊ตฐ์ค‘์˜ ํ˜ผ์žก๋„์™€ ์‚ฌ๋žŒ์˜ ํฌ๊ธฐ๋ฅผ ๊ณ ๋ คํ•œ ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜์˜ ์ƒˆ๋กœ์šด ๊ตฐ์ค‘ ๋ฐ€๋„ ์ถ”์ • ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. ๊ตฐ์ค‘ ๋ฐ€๋„ ์ถ”์ •์€ ์ง€๋Šฅํ˜• ๊ฐ์‹œ ์‹œ์Šคํ…œ์˜ ์ค‘์š”ํ•œ ๊ณผ์ œ๋“ค ์ค‘ ํ•˜๋‚˜์ž…๋‹ˆ๋‹ค. ๊ตฐ์ค‘ ๋ฐ€๋„ ์ถ”์ •์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ณต๊ณต ๋ณด์•ˆ ๋ฐ ์•ˆ์ „์— ๋Œ€ํ•œ ๊ด€์‹ฌ ์˜์—ญ์„ ์‰ฝ๊ฒŒ ํ‘œ์‹œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ์ด๋ฅผ ์ด์šฉํ•˜๋ฉด ๋ณดํ–‰์ž ๊ฐ์ง€, ์ถ”์  ๋“ฑ ์—ฐ์‚ฐ ๋ถ€๋‹ด์ด ๋†’์€ ๊ณ ๊ธ‰ ์ปดํ“จํ„ฐ ๋น„์ „ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ์ง€๋Šฅํ˜• ๊ฐ์‹œ ์‹œ์Šคํ…œ์— ํšจ๊ณผ์ ์œผ๋กœ ์ ์šฉํ•˜๋Š” ๊ฒƒ์„ ๋„์šธ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ตฐ์ค‘ ๋ฐ€๋„ ์ถ”์ •์— ๋”ฅ ๋Ÿฌ๋‹์ด ๋„์ž…๋œ ํ›„ ๋Œ€๋ถ€๋ถ„์˜ ์—ฐ๊ตฌ๋Š” ํ›ˆ๋ จ ์ด๋ฏธ์ง€๋กœ ๊ตฐ์ค‘ ๋ฐ€๋„ ๋งต์„ ์ถ”์ •ํ•˜๋Š” ๋„คํŠธ์›Œํฌ๋ฅผ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•ด ์ปจ๋ณผ๋ฃจ์…˜ ์‹ ๊ฒฝ๋ง์„ ์‚ฌ์šฉํ•˜๋Š” ๊ด€์Šต์ ์ธ ๋ฐฉ์‹์„ ๋”ฐ๋ฆ…๋‹ˆ๋‹ค. ๋”ฅ ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ๊ตฐ์ค‘ ๋ฐ€๋„ ์ถ”์ • ์—ฐ๊ตฌ๋Š” ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ ๊ด€์ ๊ณผ ํ›ˆ๋ จ ์ „๋žต ๊ด€์ ์˜ ๋‘ ๊ฐ€์ง€ ๊ด€์ ์œผ๋กœ ๋‚˜๋‰  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ ๊ด€์ ์˜ ์—ฐ๊ตฌ์—์„œ๋Š” ๊ตฐ์ค‘์„ ์ž˜ ํ‘œํ˜„ํ•˜๊ธฐ ์œ„ํ•œ ํŠน์ง•์„ ์ถ”์ถœํ•˜๊ธฐ ์œ„ํ•œ ์ƒˆ๋กœ์šด ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด ํ›ˆ๋ จ ์ „๋žต ๊ด€์ ์—์„œ๋Š” ๊ณ„์ˆ˜ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•ด ์ƒˆ๋กœ์šด ํ›ˆ๋ จ ๋ฐฉ๋ฒ•๋ก ์ด๋‚˜ ์†์‹ค ํ•จ์ˆ˜๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ๋ณธ ํ•™์œ„๋…ผ๋ฌธ์—์„œ๋Š” ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ๊ตฐ์ค‘๋ฐ€๋„ ์ถ”์ •์—์„œ ๋‘ ๊ฐ€์ง€ ๊ด€์ ์—์„œ ์—ฌ๋Ÿฌ ์—ฐ๊ตฌ๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ, ๊ฐ ์‚ฌ๋žŒ์˜ ๊ตฐ์ค‘ ํ˜ผ์žก๋„์™€ ๊ทœ๋ชจ์— ๋”ฐ๋ผ ํ’๋ถ€ํ•œ ๊ตฐ์ค‘ ํ‘œํ˜„ ํŠน์„ฑ์„ ๊ฐ–๋„๋ก ์ œ์•ˆํ•˜๋Š” ๋ชจ๋ธ์„ ์„ค๊ณ„ํ•ฉ๋‹ˆ๋‹ค. ์„ ํƒ์  ์•™์ƒ๋ธ” ๋„คํŠธ์›Œํฌ์™€ ๊ณ„๋‹จ์‹ ์ž”์—ฌ ํ™•์žฅ ๋„คํŠธ์›Œํฌ์˜ ๋‘ ๊ฐ€์ง€ ์ƒˆ๋กœ์šด ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ ๊ตฐ์ค‘ ๋ฐ€๋„ ์ถ”์ •์„ ์œ„ํ•œ ์ƒˆ๋กœ์šด ์†์‹ค ํ•จ์ˆ˜์ธ ํ˜ผ์žก ์ธ์‹ ๋ฒ ์ด์ง€์•ˆ ์†์‹ค์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ๋จผ์ €, ์ •ํ™•ํ•œ ๊ตฐ์ค‘๋ฐ€๋„ ์ถ”์ •๊ณผ ์ธ์› ๊ณ„์ˆ˜๋ฅผ ์œ„ํ•œ ์„ ํƒ์  ์•™์ƒ๋ธ” ๋”ฅ ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์กด ๋”ฅ ๋„คํŠธ์›Œํฌ ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•๊ณผ ๋‹ฌ๋ฆฌ ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ ์ง€์—ญ ๋ฐ€๋„ ์ถ”์ •์„ ์œ„ํ•ด ๋‘ ๊ฐœ์˜ ํ•˜์œ„ ๋„คํŠธ์›Œํฌ๋ฅผ ํ†ตํ•ฉํ•ฉ๋‹ˆ๋‹ค. ํ•˜๋‚˜๋Š” ํฌ์†Œ ๋ฐ€๋„ ์˜์—ญ ํ•™์Šต์šฉ์ด๊ณ  ๋‹ค๋ฅธ ํ•˜๋‚˜๋Š” ๋ฐ€์ง‘ ๋ฐ€๋„ ์˜์—ญ ํ•™์Šต์šฉ์ž…๋‹ˆ๋‹ค. ๋‘ ๊ฐœ์˜ ํ•˜์œ„ ๋„คํŠธ์›Œํฌ์—์„œ ์ง€์—ญ์ ์œผ๋กœ ์ถ”์ •๋œ ๋ฐ€๋„๋งต์€ ์ดˆ๊ธฐ ๊ตฐ์ค‘๋ฐ€๋„๋กœ ์ถ”์ •๋˜๋ฉฐ ๊ฒŒ์ดํŒ… ๋„คํŠธ์›Œํฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์•™์ƒ๋ธ” ๋ฐฉ์‹์œผ๋กœ ์„ ํƒ์ ์œผ๋กœ ๊ฒฐํ•ฉ๋ฉ๋‹ˆ๋‹ค. ์ดˆ๊ธฐ ๋ฐ€๋„๋งต์€ ์ด๋ฏธ์ง€์˜ ์ปจํ…์ŠคํŠธ ์ •๋ณด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๋Š” ๋˜ ๋‹ค๋ฅธ ํ•˜์œ„ ๋„คํŠธ์›Œํฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ณ ํ•ด์ƒ๋„ ๋งต์œผ๋กœ ๊ฐœ์„ ๋ฉ๋‹ˆ๋‹ค. ๋„คํŠธ์›Œํฌ ํ›ˆ๋ จ์—์„œ ์ƒˆ๋กœ์šด ์ ์‘ํ˜• ์†์‹ค ์ฒด๊ณ„๋ฅผ ์ ์šฉํ•˜์—ฌ ํ˜ผ์žกํ•œ ์ง€์—ญ์˜ ๋ชจํ˜ธ์„ฑ์„ ํ•ด๊ฒฐํ•ฉ๋‹ˆ๋‹ค. ์ œ์•ˆ๋œ ๊ธฐ๋ฒ•์€ ๋ฐ€์ง‘๋„ ๋ฐ ํ›ˆ๋ จ ์ •๋„์— ๋”ฐ๋ผ ๋ฐ€๋„ ์†์‹ค๊ณผ ๊ณ„์ˆ˜ ์†์‹ค ์‚ฌ์ด์˜ ๊ฐ€์ค‘์น˜๋ฅผ ์กฐ์ •ํ•˜์—ฌ ๋ฐ€๋„๋งต ์ •ํ™•๋„์™€ ๊ณ„์ˆ˜ ์ •ํ™•๋„๋ฅผ ๋ชจ๋‘ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค. ๋‘ ๋ฒˆ์งธ๋กœ, ์Šค์ผ€์ผ์ด ๋‹ค๋ฅธ ๋‹ค์ค‘ ํ™•์žฅ ์ปจ๋ณผ๋ฃจ์…˜ ๋ธ”๋ก์œผ๋กœ ๊ตฌ์„ฑ๋œ ์ƒˆ๋กœ์šด ๊ตฐ์ค‘๋ฐ€๋„ ์ถ”์ • ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ์ œ์•ˆ๋œ ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ๋Š” ์†Œ๊ทœ๋ชจ ํ™•์žฅ ์ปจ๋ณผ๋ฃจ์…˜์€ ๊ฐ ์‚ฌ๋žŒ์˜ ์ค‘์‹ฌ ์˜์—ญ ๋ฐ€๋„๋ฅผ ์ •ํ™•ํžˆ ์ถ”์ •ํ•˜๋Š” ๋ฐ˜๋ฉด ๋Œ€๊ทœ๋ชจ ํ™•์žฅ ์ปจ๋ณผ๋ฃจ์…˜์€ ์‚ฌ๋žŒ์˜ ์ฃผ๋ณ€ ์˜์—ญ ๋ฐ€๋„๋ฅผ ์ž˜ ์ถ”์ •ํ•œ๋‹ค๋Š” ๊ฒฝํ—˜์  ๋ถ„์„์—์„œ ๋น„๋กฏ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๊ตฐ์ค‘์— ์žˆ๋Š” ๊ฐ ์‚ฌ๋žŒ์˜ ์ค‘์‹ฌ์—์„œ ์ฃผ๋ณ€์œผ๋กœ ์ ์ฐจ์ ์œผ๋กœ ๊ตฐ์ค‘๋ฐ€๋„๋งต์„ ์ถ”์ •ํ•˜๊ธฐ ์œ„ํ•ด ์—ฌ๋Ÿฌ ํ™•์žฅ๋œ ์ปจ๋ณผ๋ฃจ์…˜ ๋ธ”๋ก์ด ์ž‘์€ ํ™•์žฅ ์ปจ๋ณผ๋ฃจ์…˜ ๋ธ”๋ก์—์„œ ํฐ ๋ธ”๋ก์œผ๋กœ ๊ณ„๋‹จ์‹์œผ๋กœ ํ›ˆ๋ จ๋ฉ๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ์‚ฌ๋žŒ ๊ทœ๋ชจ์™€ ๊ตฐ์ค‘ ํฌ์†Œ์„ฑ์„ ๊ณ ๋ คํ•œ ์ƒˆ๋กœ์šด ํ˜ผ์žก ์ธ์‹ ๋ฒ ์ด์ง€์•ˆ ์†์‹ค ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ๋”ฅ ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ๊ตฐ์ค‘ ๋ฐ€๋„ ์ถ”์ •์€ ๊ตฐ์ค‘ ๊ณ„์‚ฐ์˜ ์ •ํ™•๋„๋ฅผ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฒ ์ด์ง€์•ˆ ์†์‹ค ๋ฐฉ๋ฒ•์€ ์†์œผ๋กœ ๋งŒ๋“  ์ง€์ƒ ์ง„์‹ค ๋ฐ€๋„์™€ ์žก์Œ์ด ์žˆ๋Š” ์ฃผ์„์˜ ํ•„์š”์„ฑ์ด๋ผ๋Š” ๋‘ ๊ฐ€์ง€ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜์ง€๋งŒ ํ˜ผ์žกํ•œ ์žฅ๋ฉด์—์„œ ์ •ํ™•ํ•˜๊ฒŒ ๊ณ„์‚ฐํ•˜๋Š” ๊ฒƒ์€ ์—ฌ์ „ํžˆ ์–ด๋ ค์šด ๋ฌธ์ œ์ž…๋‹ˆ๋‹ค. ๊ตฐ์ค‘ ์žฅ๋ฉด์—์„œ ์‚ฌ๋žŒ์˜ ์™ธ๋ชจ๋Š” ๊ฐ ์‚ฌ๋žŒ์˜ ํฌ๊ธฐ('์‚ฌ๋žŒ ํฌ๊ธฐ')์— ๋”ฐ๋ผ ๋ฐ”๋€๋‹ˆ๋‹ค. ๋˜ํ•œ ๊ตญ๋ถ€ ์˜์—ญ์˜ ํฌ์†Œ์„ฑ('๊ตฐ์ค‘ ํฌ์†Œ์„ฑ')์ด ๋‚ฎ์„์ˆ˜๋ก ๊ตฐ์ค‘ ๋ฐ€๋„๋ฅผ ์ถ”์ •ํ•˜๊ธฐ๊ฐ€ ๋” ์–ด๋ ต์Šต๋‹ˆ๋‹ค. ์žฅ๋ฉด ๊ธฐํ•˜์ •๋ณด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ '์‚ฌ๋žŒ ํฌ๊ธฐ'๋ฅผ ์ถ”์ •ํ•œ ๋‹ค์Œ ์ถ”์ •๋œ '์‚ฌ๋žŒ ํฌ๊ธฐ'๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ '๊ตฐ์ค‘ ํฌ์†Œ์„ฑ'์„ ์ถ”์ •ํ•ฉ๋‹ˆ๋‹ค. ์ถ”์ •๋œ '์‚ฌ๋žŒ ํฌ๊ธฐ' ๋ฐ '๊ตฐ์ค‘ ํฌ์†Œ์„ฑ'์€ ์ƒˆ๋กœ์šด ํ˜ผ์žก ์ธ์‹ ๋ฒ ์ด์ง€์•ˆ ์†์‹ค ๋ฐฉ๋ฒ•์—์„œ ์‚ฌ์šฉ๋˜์–ด ์  ์ฃผ์„์˜ ๊ต์‚ฌ ํ‘œํ˜„์„ ๊ฐœ์„ ํ•ฉ๋‹ˆ๋‹ค. ์ œ์•ˆ๋œ ๋ฐ€๋„ ์ถ”์ •๊ธฐ์˜ ํšจ์œจ์„ฑ์€ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ๊ตฐ์ค‘ ๊ณ„์‚ฐ ๋ฒค์น˜๋งˆํฌ ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ๋Œ€ํ•œ ์ตœ์ฒจ๋‹จ ๋ฐฉ๋ฒ•๊ณผ์˜ ๋น„๊ต ์‹คํ—˜์„ ํ†ตํ•ด ๊ฒ€์ฆ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ ๋‹ค์–‘ํ•œ ๊ฐ์‹œ ํ™˜๊ฒฝ์—์„œ ์ตœ์ฒจ๋‹จ ๋ฐ€๋„ ์ถ”์ •๊ธฐ๋ณด๋‹ค ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ์ œ์•ˆ๋œ ๋ชจ๋“  ๊ตฐ์ค‘ ๋ฐ€๋„ ์ถ”์ • ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์—ฌ๋Ÿฌ ์ž๊ฐ€๋น„๊ต ์‹คํ—˜์„ ํ†ตํ•ด ๊ฐ ๊ตฌ์„ฑ ์š”์†Œ์˜ ํšจ์œจ์„ฑ์„ ๊ฒ€์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.Abstract i Contents iv List of Tables vii List of Figures viii 1 Introduction 1 2 Related Works 4 2.1 Detection-based Approaches 4 2.2 Regression-based Approaches 5 2.3 Deep learning-based Approaches 5 2.3.1 Network Structure Perspective 6 2.3.2 Training Strategy Perspective 7 3 Selective Ensemble Network for Accurate Crowd Density Estimation 9 3.1 Overview 9 3.2 Combining Patch-based and Image-based Approaches 11 3.2.1 Local-Global Cascade Network 14 3.2.2 Experiments 20 3.2.3 Summary 24 3.3 Selective Ensemble Network with Adjustable Counting Loss (SEN-ACL) 25 3.3.1 Overall Scheme 25 3.3.2 Data Description 27 3.3.3 Gating Network 27 3.3.4 Sparse / Dense Network 29 3.3.5 Refinement Network 32 3.4 Experiments 34 3.4.1 Implementation Details 34 3.4.2 Dataset and Evaluation Metrics 35 3.4.3 Self-evaluation on WorldExpo'10 dataset 35 3.4.4 Comparative Evaluation with State of the Art Methods 38 3.4.5 Analysis on the Proposed Components 40 3.5 Summary 40 4 Sequential Crowd Density Estimation from Center to Periphery of Crowd 43 4.1 Overview 43 4.2 Cascade Residual Dilated Network (CRDN) 47 4.2.1 Effects of Dilated Convolution in Crowd Counting 47 4.2.2 The Proposed Network 48 4.3 Experiments 52 4.3.1 Datasets and Experimental Settings 52 4.3.2 Implementation Details 52 4.3.3 Comparison with Other Methods 55 4.3.4 Ablation Study 56 4.3.5 Analysis on the Proposed Components 63 4.4 Conclusion 63 5 Congestion-aware Bayesian Loss for Crowd Counting 64 5.1 Overview 64 5.2 Congestion-aware Bayesian Loss 67 5.2.1 Person-Scale Estimation 67 5.2.2 Crowd-Sparsity Estimation 70 5.2.3 Design of The Proposed Loss 70 5.3 Experiments 74 5.3.1 Datasets 76 5.3.2 Implementation Details 77 5.3.3 Evaluation Metrics 77 5.3.4 Ablation Study 78 5.3.5 Comparisons with State of the Art 80 5.3.6 Differences from Existing Person-scale Inference 87 5.3.7 Analysis on the Proposed Components 88 5.4 Summary 90 6 Conclusion 91 Abstract (In Korean) 105๋ฐ•
    • โ€ฆ
    corecore