1,166 research outputs found

    Rethinking the competition between detection and ReID in Multi-Object Tracking

    Full text link
    Due to balanced accuracy and speed, joint learning detection and ReID-based one-shot models have drawn great attention in multi-object tracking(MOT). However, the differences between the above two tasks in the one-shot tracking paradigm are unconsciously overlooked, leading to inferior performance than the two-stage methods. In this paper, we dissect the reasoning process of the aforementioned two tasks. Our analysis reveals that the competition of them inevitably hurts the learning of task-dependent representations, which further impedes the tracking performance. To remedy this issue, we propose a novel cross-correlation network that can effectively impel the separate branches to learn task-dependent representations. Furthermore, we introduce a scale-aware attention network that learns discriminative embeddings to improve the ReID capability. We integrate the delicately designed networks into a one-shot online MOT system, dubbed CSTrack. Without bells and whistles, our model achieves new state-of-the-art performances on MOT16 and MOT17. Our code is released at https://github.com/JudasDie/SOTS

    CREATING IDENTITY: HOW STEVE BIKO CULTURAL INSTITUTE’S BLACK CONSCIOUSNESS AND CITIZENSHIP INFLUENCES STUDENT IDENTITY FORMATION IN SALVADOR, BAHIA, BRAZIL

    Get PDF
    The research presented in “Creating Identity” investigates Black identity formation within the Steve Biko Cultural Institute (Biko) in Salvador, Bahia, Brazil, a pre-vestibular – or college entrance exam preparation course – for Afro-Brazilian high school and aspiring college students. The curriculum, Cidadania e Consciência Negra (Black Consciousness and Citizenship; abbreviated CCN) serves as a vital pillar to the institutional approach to Black identity. In a Eurocentric society like Brazil and a world where Black identity is largely discriminated against including in educational spaces, Biko represents a movement to combat the exclusion of Afro-descendant youth from university, improve self-esteem and perceptions of the value of Black identity, and change who graduates from Bahia state universities. Over the course of nine months, in 2015 and 2016, field data were collected in the city of Salvador, Brazil and at the Biko institute. Since the research was cross-linguistic, cross-cultural, and hosted internationally, I assumed a methodologically narrative approach. The research design incorporated a survey, interviews, observations, and document analysis. Forty-two students completed surveys, twenty-six Biko students, staff and alumni participated in interviews, and well over 400 hours of participatory field observation were completed. Policy, demographic and curricular documents were also analyzed. CCN heavily influenced participants’ identity development through student and teacher discourse. The institution is a center of critical activism in the community. Aside from being a major part of the instructional approach to preparation for the college entrance exam, CCN heavily influenced the relationships between participants and their families and friends over newly affirmed Black identities. Although Biko students and alumni became more socially alert to the racial issues in their communities, they remain at risk of being racially profiled. Additionally, understanding blackness through the eyes of participants required an understanding of class and gender structures in Brazil. One major implication of the research for the participants is: blackness is CCN is Biko. Thereby, knowledge production and interaction with universities by Biko students are heavily influenced by Biko tenets and ideologies discussing race and racism, prejudice, discrimination, women’s rights, and economic development

    Investigating the effects of integrated systematic decoding, spelling and communication instruction for students with complex communication needs

    Get PDF
    The purpose of the current study was to investigate the impact of an integrated decoding, spelling and communication intervention on literacy and communication outcomes for students with complex communication needs (CCN). The current study was done with three students with CCN, all of whom used a particular augmentative communication device. Using a non-concurrent multiple baseline across subject design, and a descriptive case study design, the study tested the hypothesis that integrated instruction would lead to improvements in decoding, spelling and, communication using an AAC device. The intervention provided integrated, systematic and explicit instruction through scripted lessons that taught students to decode, spell and communicate the same corpus of high frequency words. The intervention was grounded in general education constructivist based practices and was provided daily by a consistent educator. Throughout the study outside of directed instructional times, the frequency of spontaneous device use was measured across a baseline phase, intervention phase, 1-week post phase and a 5-week post phase. Students\u27 progress was also measured across five pretest-posttest measures including word identification, developmental spelling, word generation, icon sequencing, and expressive communication. Results found high day-to-day fluctuations in students\u27 spontaneous use of their communication devices. However, the most important finding was students\u27 progress on the literacy and communication pretest-posttests, yielding not only improvement in abilities, but generalization across reading, spelling, and communication measures. The findings suggest that integrated communication, decoding and spelling instruction based on constructivist-based practices was successful

    Action-oriented Scene Understanding

    Get PDF
    In order to allow robots to act autonomously it is crucial that they do not only describe their environment accurately but also identify how to interact with their surroundings. While we witnessed tremendous progress in descriptive computer vision, approaches that explicitly target action are scarcer. This cumulative dissertation approaches the goal of interpreting visual scenes “in the wild” with respect to actions implied by the scene. We call this approach action-oriented scene understanding. It involves identifying and judging opportunities for interaction with constituents of the scene (e.g. objects and their parts) as well as understanding object functions and how interactions will impact the future. All of these aspects are addressed on three levels of abstraction: elements, perception and reasoning. On the elementary level, we investigate semantic and functional grouping of objects by analyzing annotated natural image scenes. We compare object label-based and visual context definitions with respect to their suitability for generating meaningful object class representations. Our findings suggest that representations generated from visual context are on-par in terms of semantic quality with those generated from large quantities of text. The perceptive level concerns action identification. We propose a system to identify possible interactions for robots and humans with the environment (affordances) on a pixel level using state-of-the-art machine learning methods. Pixel-wise part annotations of images are transformed into 12 affordance maps. Using these maps, a convolutional neural network is trained to densely predict affordance maps from unknown RGB images. In contrast to previous work, this approach operates exclusively on RGB images during both, training and testing, and yet achieves state-of-the-art performance. At the reasoning level, we extend the question from asking what actions are possible to what actions are plausible. For this, we gathered a dataset of household images associated with human ratings of the likelihoods of eight different actions. Based on the judgement provided by the human raters, we train convolutional neural networks to generate plausibility scores from unseen images. Furthermore, having considered only static scenes previously in this thesis, we propose a system that takes video input and predicts plausible future actions. Since this requires careful identification of relevant features in the video sequence, we analyze this particular aspect in detail using a synthetic dataset for several state-of-the-art video models. We identify feature learning as a major obstacle for anticipation in natural video data. The presented projects analyze the role of action in scene understanding from various angles and in multiple settings while highlighting the advantages of assuming an action-oriented perspective. We conclude that action-oriented scene understanding can augment classic computer vision in many real-life applications, in particular robotics

    Constrained Collective Movement in Human-Robot Teams

    Get PDF
    This research focuses on improving human-robot co-navigation for teams of robots and humans navigating together as a unit while accomplishing a desired task. Frequently, the team’s co-navigation is strongly influenced by a predefined Standard Operating Procedure (SOP), which acts as a high-level guide for where agents should go and what they should do. In this work, I introduce the concept of Constrained Collective Movement (CCM) of a team to describe how members of the team perform inter-team and intra-team navigation to execute a joint task while balancing environmental and application-specific constraints. This work advances robots’ abilities to participate along side humans in applications such as urban search and rescue, firefighters searching for people in a burning building, and military teams performing a building clearing operation. Incorporating robots on such teams could reduce the number of human lives put in danger while increasing the team’s ability to conduct beneficial tasks such as carrying life saving equipment to stranded people. Most previous work on generating more complex collaborative navigation for human- robot teams focuses solely on using model-based methods. These methods usually suffer from the need for hard coding the rules to follow, which can require much time and domain knowledge and can lead to unnatural behavior. This dissertation investigates merging high-level model-based knowledge representation with low-level behavior cloning to achieve CCM of a human-robot team performing collaborative co-navigation. To evaluate the approach, experiments are performed in simulation with the detail-rich game design engine Unity. Experiments show that the designed approach can learn elements of high-level behaviors with accuracies up to 88%. Additionally, the approach is shown to learn low-level robot control behaviors with accuracies up to 89%. To the best of my knowledge, this is the first attempt to blend classical AI methods with state-of-the-art machine learning methods for human-robot team collaborative co-navigation. This not only allows for better human-robot team co-navigation, but also has implications for improving other teamwork based human-robot applications such as joint manufacturing and social assistive robotics

    Towards a Fast and Accurate Face Recognition System from Deep Representations

    Get PDF
    The key components of a machine perception algorithm are feature extraction followed by classification or regression. The features representing the input data should have the following desirable properties: 1) they should contain the discriminative information required for accurate classification, 2) they should be robust and adaptive to several variations in the input data due to illumination, translation/rotation, resolution, and input noise, 3) they should lie on a simple manifold for easy classification or regression. Over the years, researchers have come up with various hand crafted techniques to extract meaningful features. However, these features do not perform well for data collected in unconstrained settings due to large variations in appearance and other nuisance factors. Recent developments in deep convolutional neural networks (DCNNs) have shown impressive performance improvements in various machine perception tasks such as object detection and recognition. DCNNs are highly non-linear regressors because of the presence of hierarchical convolutional layers with non-linear activation. Unlike the hand crafted features, DCNNs learn the feature extraction and feature classification/regression modules from the data itself in an end-to-end fashion. This enables the DCNNs to be robust to variations present in the data and at the same time improve their discriminative ability. Ever-increasing computation power and availability of large datasets have led to significant performance gains from DCNNs. However, these developments in deep learning are not directly applicable to the face analysis tasks due to large variations in illumination, resolution, viewpoint, and attributes of faces acquired in unconstrained settings. In this dissertation, we address this issue by developing efficient DCNN architectures and loss functions for multiple face analysis tasks such as face detection, pose estimation, landmarks localization, and face recognition from unconstrained images and videos. In the first part of this dissertation, we present two face detection algorithms based on deep pyramidal features. The first face detector, called DP2MFD, utilizes the concepts of deformable parts model (DPM) in the context of deep learning. It is able to detect faces of various sizes and poses in unconstrained conditions. It reduces the gap in training and testing of DPM on deep features by adding a normalization layer to the DCNN. The second face detector, called Deep Pyramid Single Shot Face Detector (DPSSD), is fast and capable of detecting faces with large scale variations (especially tiny faces). It makes use of the inbuilt pyramidal hierarchy present in a DCNN, instead of creating an image pyramid. Extensive experiments on publicly available unconstrained face detection datasets show that both these face detectors are able to capture the meaningful structure of faces and perform significantly better than many traditional face detection algorithms. In the second part of this dissertation, we present two algorithms for simultaneous face detection, landmarks localization, pose estimation and gender recognition using DCNNs. The first method called, HyperFace, fuses the intermediate layers of a DCNN using a separate CNN followed by a multi-task learning algorithm that operates on the fused features. The second approach extends HyperFace to incorporate additional tasks of face verification, age estimation, and smile detection, in All-In-One Face. HyperFace and All-In-One Face exploit the synergy among the tasks which improves individual performances. In the third part of this dissertation, we focus on improving the task of face verification by designing a novel loss function that maximizes the inter-class distance and minimizes the intraclass distance in the feature space. We propose a new loss function, called Crystal Loss, that adds an L2-constraint to the feature descriptors which restricts them to lie on a hypersphere of a fixed radius. This module can be easily implemented using existing deep learning frameworks. We show that integrating this simple step in the training pipeline significantly boosts the performance of face verification. We additionally describe a deep learning pipeline for unconstrained face identification and verification which achieves state-of-the-art performance on several benchmark datasets. We provide the design details of the various modules involved in automatic face recognition: face detection, landmark localization and alignment, and face identification/verification. We present experimental results for end-to-end face verification and identification on IARPA Janus Benchmarks A, B and C (IJB-A, IJB-B, IJB-C), and the Janus Challenge Set 5 (CS5). Though DCNNs have surpassed human-level performance on tasks such as object classification and face verification, they can easily be fooled by adversarial attacks. These attacks add a small perturbation to the input image that causes the network to mis-classify the sample. In the final part of this dissertation, we focus on safeguarding the DCNNs and neutralizing adversarial attacks by compact feature learning. In particular, we show that learning features in a closed and bounded space improves the robustness of the network. We explore the effect of Crystal Loss, that enforces compactness in the learned features, thus resulting in enhanced robustness to adversarial perturbations. Additionally, we propose compact convolution, a novel method of convolution that when incorporated in conventional CNNs improves their robustness. Compact convolution ensures feature compactness at every layer such that they are bounded and close to each other. Extensive experiments show that Compact Convolutional Networks (CCNs) neutralize multiple types of attacks, and perform better than existing methods in defending adversarial attacks, without incurring any additional training overhead compared to CNNs
    • …
    corecore