2,620 research outputs found

    A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community

    Full text link
    In recent years, deep learning (DL), a re-branding of neural networks (NNs), has risen to the top in numerous areas, namely computer vision (CV), speech recognition, natural language processing, etc. Whereas remote sensing (RS) possesses a number of unique challenges, primarily related to sensors and applications, inevitably RS draws from many of the same theories as CV; e.g., statistics, fusion, and machine learning, to name a few. This means that the RS community should be aware of, if not at the leading edge of, of advancements like DL. Herein, we provide the most comprehensive survey of state-of-the-art RS DL research. We also review recent new developments in the DL field that can be used in DL for RS. Namely, we focus on theories, tools and challenges for the RS community. Specifically, we focus on unsolved challenges and opportunities as it relates to (i) inadequate data sets, (ii) human-understandable solutions for modelling physical phenomena, (iii) Big Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and learning algorithms for spectral, spatial and temporal data, (vi) transfer learning, (vii) an improved theoretical understanding of DL systems, (viii) high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote Sensin

    Prioritized Planning for Target-Oriented Manipulation via Hierarchical Stacking Relationship Prediction

    Full text link
    In scenarios involving the grasping of multiple targets, the learning of stacking relationships between objects is fundamental for robots to execute safely and efficiently. However, current methods lack subdivision for the hierarchy of stacking relationship types. In scenes where objects are mostly stacked in an orderly manner, they are incapable of performing human-like and high-efficient grasping decisions. This paper proposes a perception-planning method to distinguish different stacking types between objects and generate prioritized manipulation order decisions based on given target designations. We utilize a Hierarchical Stacking Relationship Network (HSRN) to discriminate the hierarchy of stacking and generate a refined Stacking Relationship Tree (SRT) for relationship description. Considering that objects with high stacking stability can be grasped together if necessary, we introduce an elaborate decision-making planner based on the Partially Observable Markov Decision Process (POMDP), which leverages observations and generates the least grasp-consuming decision chain with robustness and is suitable for simultaneously specifying multiple targets. To verify our work, we set the scene to the dining table and augment the REGRAD dataset with a set of common tableware models for network training. Experiments show that our method effectively generates grasping decisions that conform to human requirements, and improves the implementation efficiency compared with existing methods on the basis of guaranteeing the success rate.Comment: 8 pages, 8 figure

    Apperceptive patterning: Artefaction, extensional beliefs and cognitive scaffolding

    Get PDF
    In โ€œPsychopower and Ordinary Madnessโ€ my ambition, as it relates to Bernard Stieglerโ€™s recent literature, was twofold: 1) critiquing Stieglerโ€™s work on exosomatization and artefactual posthumanismโ€”or, more specifically, nonhumanismโ€”to problematize approaches to media archaeology that rely upon technical exteriorization; 2) challenging how Stiegler engages with Giuseppe Longo and Francis Baillyโ€™s conception of negative entropy. These efforts were directed by a prevalent techno-cultural qualifier: the rise of Synthetic Intelligence (including neural nets, deep learning, predictive processing and Bayesian models of cognition). This paper continues this project but first directs a critical analytic lens at the Derridean practice of the ontologization of grammatization from which Stiegler emerges while also distinguishing how metalanguages operate in relation to object-oriented environmental interaction by way of inferentialism. Stalking continental (Kapp, Simondon, Leroi-Gourhan, etc.) and analytic traditions (e.g., Carnap, Chalmers, Clark, Sutton, Novaes, etc.), we move from artefacts to AI and Predictive Processing so as to link theories related to technicity with philosophy of mind. Simultaneously drawing forth Robert Brandomโ€™s conceptualization of the roles that commitments play in retrospectively reconstructing the social experiences that lead to our endorsement(s) of norms, we compliment this account with Reza Negarestaniโ€™s deprivatized account of intelligence while analyzing the equipollent role between language and media (both digital and analog)

    Efficient Belief Propagation for Perception and Manipulation in Clutter

    Full text link
    Autonomous service robots are required to perform tasks in common human indoor environments. To achieve goals associated with these tasks, the robot should continually perceive, reason its environment, and plan to manipulate objects, which we term as goal-directed manipulation. Perception remains the most challenging aspect of all stages, as common indoor environments typically pose problems in recognizing objects under inherent occlusions with physical interactions among themselves. Despite recent progress in the field of robot perception, accommodating perceptual uncertainty due to partial observations remains challenging and needs to be addressed to achieve the desired autonomy. In this dissertation, we address the problem of perception under uncertainty for robot manipulation in cluttered environments using generative inference methods. Specifically, we aim to enable robots to perceive partially observable environments by maintaining an approximate probability distribution as a belief over possible scene hypotheses. This belief representation captures uncertainty resulting from inter-object occlusions and physical interactions, which are inherently present in clutterred indoor environments. The research efforts presented in this thesis are towards developing appropriate state representations and inference techniques to generate and maintain such belief over contextually plausible scene states. We focus on providing the following features to generative inference while addressing the challenges due to occlusions: 1) generating and maintaining plausible scene hypotheses, 2) reducing the inference search space that typically grows exponentially with respect to the number of objects in a scene, 3) preserving scene hypotheses over continual observations. To generate and maintain plausible scene hypotheses, we propose physics informed scene estimation methods that combine a Newtonian physics engine within a particle based generative inference framework. The proposed variants of our method with and without a Monte Carlo step showed promising results on generating and maintaining plausible hypotheses under complete occlusions. We show that estimating such scenarios would not be possible by the commonly adopted 3D registration methods without the notion of a physical context that our method provides. To scale up the context informed inference to accommodate a larger number of objects, we describe a factorization of scene state into object and object-parts to perform collaborative particle-based inference. This resulted in the Pull Message Passing for Nonparametric Belief Propagation (PMPNBP) algorithm that caters to the demands of the high-dimensional multimodal nature of cluttered scenes while being computationally tractable. We demonstrate that PMPNBP is orders of magnitude faster than the state-of-the-art Nonparametric Belief Propagation method. Additionally, we show that PMPNBP successfully estimates poses of articulated objects under various simulated occlusion scenarios. To extend our PMPNBP algorithm for tracking object states over continuous observations, we explore ways to propose and preserve hypotheses effectively over time. This resulted in an augmentation-selection method, where hypotheses are drawn from various proposals followed by the selection of a subset using PMPNBP that explained the current state of the objects. We discuss and analyze our augmentation-selection method with its counterparts in belief propagation literature. Furthermore, we develop an inference pipeline for pose estimation and tracking of articulated objects in clutter. In this pipeline, the message passing module with the augmentation-selection method is informed by segmentation heatmaps from a trained neural network. In our experiments, we show that our proposed pipeline can effectively maintain belief and track articulated objects over a sequence of observations under occlusion.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163159/1/kdesingh_1.pd

    ํŠธ๋žœ์Šคํฌ๋จธ๋ฅผ ํ†ตํ•œ ๋ณต์žกํ•œ ์ถ”๋ก  ๋Šฅ๋ ฅ ์ •๋ณต์„ ์œ„ํ•œ ์—ฐ๊ตฌ: ์‹œ๊ฐ์ , ๋Œ€ํ™”์ , ์ˆ˜ํ•™์  ์ถ”๋ก ์—์˜ ์ ์šฉ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์‚ฐ์—…๊ณตํ•™๊ณผ, 2021. 2. ์กฐ์„ฑ์ค€.As deep learning models advanced, research is focusing on sophisticated tasks that require complex reasoning, rather than simple classification tasks. These complex tasks require multiple reasoning steps that resembles human intelligence. Architecture-wise, recurrent neural networks and convolutional neural networks have long been the main stream model for deep learning. However, both models suffer from shortcomings from their innate architecture. Nowadays, the attention-based Transformer is replacing them due to its superior architecture and performance. Particularly, the encoder of the Transformer has been extensively studied in the field of natural language processing. However, for the Transformer to be effective in data with distinct structures and characteristics, appropriate adjustments to its structure is required. In this dissertation, we propose novel architectures based on the Transformer encoder for various supervised learning tasks with different data types and characteristics. The tasks that we consider are visual IQ tests, dialogue state tracking and mathematical question answering. For the visual IQ test, the input is in a visual format with hierarchy. To deal with this, we propose using a hierarchical Transformer encoder with structured representation that employs a novel neural network architecture to improve both perception and reasoning. The hierarchical structure of the Transformer encoders and the architecture of each individual Transformer encoder all fit to the characteristics of the data of visual IQ tests. For dialogue state tracking, value prediction for multiple domain-slot pairs is required. To address this issue, we propose a dialogue state tracking model using a pre-trained language model, which is a pre-trained Transformer encoder, for domain-slot relationship modeling. We introduced special tokens for each domain-slot pair which enables effective dependency modeling among domain-slot pairs through the pre-trained language encoder. Finally, for mathematical question answering, we propose a method to pre-train a Transformer encoder on a mathematical question answering dataset for improved performance. Our pre-training method, Question-Answer Masked Language Modeling, utilizes both the question and answer text, which is suitable for the mathematical question answering dataset. Through experiments, we show that each of our proposed methods is effective in their corresponding task and data type.์ˆœํ™˜ ์‹ ๊ฒฝ๋ง๊ณผ ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง์€ ์˜ค๋žซ๋™์•ˆ ๋”ฅ๋Ÿฌ๋‹ ๋ถ„์•ผ์—์„œ ์ฃผ์š” ๋ชจ๋ธ๋กœ ์“ฐ์—ฌ์™”๋‹ค. ํ•˜์ง€๋งŒ ๋‘ ๋ชจ๋ธ ๋ชจ๋‘ ์ž์ฒด์ ์ธ ๊ตฌ์กฐ์—์„œ ์˜ค๋Š” ํ•œ๊ณ„๋ฅผ ๊ฐ€์ง„๋‹ค. ์ตœ๊ทผ์—๋Š” ์–ดํ…์…˜(attention)์— ๊ธฐ๋ฐ˜ํ•œ ํŠธ๋žœ์Šคํฌ๋จธ(Transformer)๊ฐ€ ๋” ๋‚˜์€ ์„ฑ๋Šฅ๊ณผ ๊ตฌ์กฐ๋กœ ์ธํ•ด์„œ ์ด๋“ค์„ ๋Œ€์ฒดํ•ด ๋‚˜๊ฐ€๊ณ  ์žˆ๋‹ค. ํŠธ๋žœ์Šคํฌ๋จธ ์ธ์ฝ”๋”(Transformer encoder)๋Š” ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ๋ถ„์•ผ์—์„œ ํŠน๋ณ„ํžˆ ๋” ๋งŽ์€ ์—ฐ๊ตฌ๊ฐ€ ์ด๋ฃจ์–ด์ง€๊ณ  ์žˆ๋‹ค. ํ•˜์ง€๋งŒ Transformer๊ฐ€ ํŠน๋ณ„ํ•œ ๊ตฌ์กฐ์™€ ํŠน์ง•์„ ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ๋„ ์ œ๋Œ€๋กœ ์ž‘๋™ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๊ทธ ๊ตฌ์กฐ์— ์ ์ ˆํ•œ ๋ณ€ํ™”๊ฐ€ ์š”๊ตฌ๋œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ ์ข…๋ฅ˜์™€ ํŠน์„ฑ์— ๋Œ€ํ•œ ๊ต์‚ฌ ํ•™์Šต์— ์ ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ํŠธ๋žœ์Šคํฌ๋จธ ์ธ์ฝ”๋”์— ๊ธฐ๋ฐ˜ํ•œ ์ƒˆ๋กœ์šด ๊ตฌ์กฐ์˜ ๋ชจ๋ธ๋“ค์„ ์ œ์•ˆํ•œ๋‹ค. ์ด๋ฒˆ ์—ฐ๊ตฌ์—์„œ ๋‹ค๋ฃจ๋Š” ๊ณผ์—…์€ ์‹œ๊ฐ IQ ํ…Œ์ŠคํŠธ, ๋Œ€ํ™” ์ƒํƒœ ํŠธ๋ž˜ํ‚น ๊ทธ๋ฆฌ๊ณ  ์ˆ˜ํ•™ ์งˆ์˜ ์‘๋‹ต์ด๋‹ค. ์‹œ๊ฐ IQ ํ…Œ์ŠคํŠธ์˜ ์ž…๋ ฅ ๋ณ€์ˆ˜๋Š” ์œ„๊ณ„๋ฅผ ๊ฐ€์ง„ ์‹œ๊ฐ์ ์ธ ํ˜•ํƒœ์ด๋‹ค. ์ด์— ๋Œ€์‘ํ•˜๊ธฐ ์œ„ํ•ด์„œ ์šฐ๋ฆฌ๋Š” ์ธ์ง€์™€ ์‚ฌ๊ณ  ์ธก๋ฉด์—์„œ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š” ์ƒˆ๋กœ์šด ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ์ธ, ๊ตฌ์กฐํ™”๋œ ํ‘œํ˜„ํ˜•์„ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š” ๊ณ„์ธต์ ์ธ ํŠธ๋žœ์Šคํฌ๋จธ ์ธ์ฝ”๋” ๋ชจ๋ธ์„ ์ œ์•ˆํ•œ๋‹ค. ํŠธ๋žœ์Šค ํฌ๋จธ ์ธ์ฝ”๋”์˜ ๊ณ„์ธต์  ๊ตฌ์กฐ์™€ ๊ฐ๊ฐ์˜ ํŠธ๋žœ์Šคํฌ๋จธ ์ธ์ฝ”๋”์˜ ๊ตฌ์กฐ ๋ชจ๋‘๊ฐ€ ์‹œ๊ฐ IQ ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์˜ ํŠน์ง•์— ์ ํ•ฉํ•˜๋‹ค. ๋Œ€ํ™” ์ƒํƒœ ํŠธ๋ž˜ํ‚น์€ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋„๋ฉ”์ธ-์Šฌ๋กฏ(domain-slot)์Œ์— ๋Œ€ํ•œ ๊ฐ’(value)์ด ์š”๊ตฌ๋œ๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด์„œ ์šฐ๋ฆฌ๋Š” ์‚ฌ์ „ ํ•™์Šต๋œ ํŠธ๋žœ์Šคํฌ๋จธ ์ธ์ฝ”๋”์ธ, ์‚ฌ์ „ ํ•™์Šต ์–ธ์–ด ๋ชจ๋ธ์„ ํ™œ์šฉํ•˜์—ฌ ๋„๋ฉ”์ธ-์Šฌ๋กฏ์˜ ๊ด€๊ณ„๋ฅผ ๋ชจ๋ธ๋งํ•˜๋Š” ๊ฒƒ์„ ์ œ์•ˆํ•œ๋‹ค. ๊ฐ ๋„๋ฉ”์ธ-์Šฌ๋กฏ ์Œ์— ๋Œ€ํ•œ ํŠน์ˆ˜ ํ† ํฐ์„ ๋„์ž…ํ•จ์œผ๋กœ์จ ํšจ๊ณผ์ ์œผ๋กœ ๋„๋ฉ”์ธ-์Šฌ๋กฏ ์Œ๋“ค ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ๋ชจ๋ธ๋ง ํ•  ์ˆ˜ ์žˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ์ˆ˜ํ•™ ์งˆ์˜ ์‘๋‹ต์„ ์œ„ํ•ด์„œ๋Š” ์ˆ˜ํ•™ ์งˆ์˜ ์‘๋‹ต ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ ์‚ฌ์ „ ํ•™์Šต์„ ์ง„ํ–‰ํ•จ์œผ๋กœ์จ ์ˆ˜ํ•™ ์งˆ์˜ ์‘๋‹ต ๊ณผ์—…์— ๋Œ€ํ•ด์„œ ์„ฑ๋Šฅ์„ ๋†’ํžˆ๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์šฐ๋ฆฌ์˜ ์‚ฌ์ „ ํ•™์Šต ๋ฐฉ๋ฒ•์ธ ์งˆ์˜-์‘๋‹ต ๋งˆ์Šคํ‚น ์–ธ์–ด ๋ชจ๋ธ๋ง์€ ์งˆ์˜์™€ ์‘๋‹ต ํ…์ŠคํŠธ ๋ชจ๋‘๋ฅผ ํ™œ์šฉ ํ•จ์œผ๋กœ์จ ์ˆ˜ํ•™ ์งˆ์˜ ์‘๋‹ต ๋ฐ์ดํ„ฐ์— ์ ํ•ฉํ•œ ํ˜•ํƒœ์ด๋‹ค. ์‹คํ—˜์„ ํ†ตํ•ด์„œ ๊ฐ๊ฐ์˜ ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•๋ก ๋“ค์ด ํ•ด๋‹นํ•˜๋Š” ๊ณผ์—…๊ณผ ๋ฐ์ดํ„ฐ ์ข…๋ฅ˜์— ๋Œ€ํ•ด์„œ ํšจ๊ณผ์ ์ธ ๊ฒƒ์„ ๋ฐํ˜”๋‹ค.Abstract i Contents vi List of Tables viii List of Figures xii Chapter 1 Introduction 1 Chapter 2 Literature Review 7 2.1 Related Works on Transformer . . . . . . . . . . . . . . . . . . . . . 7 2.2 Related Works on Visual IQ Tests . . . . . . . . . . . . . . . . . . . 10 2.2.1 RPM-related studies . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.2 Object Detection related studies . . . . . . . . . . . . . . . . 11 2.3 Related works on Dialogue State Tracking . . . . . . . . . . . . . . . 12 2.4 Related Works on Mathematical Question Answering . . . . . . . . . 14 2.4.1 Pre-training of Neural Networks . . . . . . . . . . . . . . . . 14 2.4.2 Language Model Pre-training . . . . . . . . . . . . . . . . . . 15 2.4.3 Mathematical Reasoning with Neural Networks . . . . . . . . 17 Chapter 3 Hierarchical end-to-end architecture of Transformer encoders for solving visual IQ tests 19 3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.1.1 Perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.1.2 Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2 Proposed Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2.1 Perception Module: Object Detection Model . . . . . . . . . 24 3.2.2 Reasoning Module: Hierarchical Transformer Encoder . . . . 26 3.2.3 Contrasting Module and Loss function . . . . . . . . . . . . . 29 3.3 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.3.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . 34 3.3.3 Results for Perception Module . . . . . . . . . . . . . . . . . 35 3.3.4 Results for Reasoning Module . . . . . . . . . . . . . . . . . . 36 3.3.5 Ablation studies . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Chapter 4 Domain-slot relationship modeling using Transformers for dialogue state tracking 40 4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.2 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.2.1 Domain-Slot-Context Encoder . . . . . . . . . . . . . . . . . 44 4.2.2 Slot-gate classifier . . . . . . . . . . . . . . . . . . . . . . . . 48 4.2.3 Slot-value classifier . . . . . . . . . . . . . . . . . . . . . . . . 49 4.2.4 Total objective function . . . . . . . . . . . . . . . . . . . . . 50 4.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.3.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . 51 4.3.3 Results for the MultiWOZ-2.1 dataset . . . . . . . . . . . . . 52 4.3.4 Ablation Studies . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Chapter 5 Pre-training of Transformers with Question-Answer Masked Language Modeling for Mathematical Question Answering 62 5.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 5.2 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 5.2.1 Pre-training: Question-Answer Masked Language Modeling . 65 5.2.2 Fine-tuning: Mathematical Question Answering . . . . . . . . 67 5.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.3.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . 70 5.3.3 Experimental Results on the Mathematics dataset . . . . . . 71 5.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Chapter 6 Conclusion 79 6.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Bibliography 83 ๊ตญ๋ฌธ์ดˆ๋ก 101 ๊ฐ์‚ฌ์˜ ๊ธ€ 103Docto

    Learning Visual Patterns: Imposing Order on Objects, Trajectories and Networks

    Get PDF
    Fundamental to many tasks in the field of computer vision, this work considers the understanding of observed visual patterns in static images and dynamic scenes . Within this broad domain, we focus on three particular subtasks, contributing novel solutions to: (a) the subordinate categorization of objects (avian species specifically), (b) the analysis of multi-agent interactions using the agent trajectories, and (c) the estimation of camera network topology. In contrast to object recognition, where the presence or absence of certain parts is generally indicative of basic-level category, the problem of subordinate categorization rests on the ability to establish salient distinctions amongst the characteristics of those parts which comprise the basic-level category. Focusing on an avian domain due to the fine-grained structure of the category taxonomy, we explore a pose-normalized appearance model based on a volumetric poselet scheme. The variation in shape and appearance properties of these parts across a taxonomy provides the cues needed for subordinate categorization. Our model associates the underlying image pattern parameters used for detection with corresponding volumetric part location, scale and orientation parameters. These parameters implicitly define a mapping from the image pixels into a pose-normalized appearance space, removing view and pose dependencies, facilitating fine-grained categorization with relatively few training examples. We next examine the problem of leveraging trajectories to understand interactions in dynamic multi-agent environments. We focus on perceptual tasks, those for which an agent's behavior is governed largely by the individuals and objects around them. We introduce kinetic accessibility, a model for evaluating the perceived, and thus anticipated, movements of other agents. This new model is then applied to the analysis of basketball footage. The kinetic accessibility measures are coupled with low-level visual cues and domain-specific knowledge for determining which player has possession of the ball and for recognizing events such as passes, shots and turnovers. Finally, we present two differing approaches for estimating camera network topology. The first technique seeks to partition a set of observations made in the camera network into individual object trajectories. As exhaustive consideration of the partition space is intractable, partitions are considered incrementally, adding observations while pruning unlikely partitions. Partition likelihood is determined by the evaluation of a probabilistic graphical model, balancing the consistency of appearances across a hypothesized trajectory with the latest predictions of camera adjacency. A primarily benefit of estimating object trajectories is that higher-order statistics, as opposed to just first-order adjacency, can be derived, yielding resilience to camera failure and the potential for improved tracking performance between cameras. Unlike the former centralized technique, the latter takes a decentralized approach, estimating the global network topology with local computations using sequential Bayesian estimation on a modified multinomial distribution. Key to this method is an information-theoretic appearance model for observation weighting. The inherently distributed nature of the approach allows the simultaneous utilization of all sensors as processing agents in collectively recovering the network topology
    • โ€ฆ
    corecore