67 research outputs found

    Connecting Perception with Cognition for Deep Representations Learning

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.A long-held purpose in the field of computer vision is to enable agents to perceive and understand the visual world. Significant progresses have been made over the last decades due to underlying advances in data collecting and computing infrastructure. Perception is envisaged as an interface that allows systems to interact and learn from the surrounding environment. However, most tasks are still dependant on the hotbed of assigning intense labels (e.g., category, location and description) to visual data. Development has been limited for tasks where supervision is difficult to obtain or inaccessible. Naturally, human tends to process information with ``thinking'' via a cognitive process. If we can build generalized models via connecting perception with cognitive inference, it would benefit tasks such as domain adaptation and cross-modal by learning rich and good representations. The aim of this thesis is to explore the importance of cognition in representation learning for different visual tasks. During which, we target on modeling vision-based systems with the following objectives: i) It is critical to learn consistent and robust embeddings that are resistant to confounding factors for better understanding the visual world; ii) Second, imagination is essential for models to build connections between visual inputs and other modalities, e.g., 3D models for reconstruction, lingual descriptions for visual grounding and reasoning; iii) Beyond that, it is important to leverage prior knowledge in learning and exploring out-of-domain data in the wild. It is an indispensable ability for models to increase their universality without forgetting the pre-acquired knowledge. This thesis elaborates on alternative methods for learning deep representations for visual tasks. The key idea is to build the fundamental correspondences as objectives for supervising the learning procedure. Accordingly, models and techniques are developed to allow connecting visual inputs and latent semantics. In particular, we validate the proposed methodology under different scenarios with multiple modalities. Especially, we investigate into connecting perception with cognition to make the most of in-domain knowledge and generalize in out-of-domain data, e.g., domain adaptation, open-set recognition. We implemented these ideas in the related benchmarks and the competitive performances demonstrate their effectiveness and universality

    Look, Cast and Mold: Learning 3D Shape Manifold from Single-view Synthetic Data

    Full text link
    Inferring the stereo structure of objects in the real world is a challenging yet practical task. To equip deep models with this ability usually requires abundant 3D supervision which is hard to acquire. It is promising that we can simply benefit from synthetic data, where pairwise ground-truth is easy to access. Nevertheless, the domain gaps are nontrivial considering the variant texture, shape and context. To overcome these difficulties, we propose a Visio-Perceptual Adaptive Network for single-view 3D reconstruction, dubbed VPAN. To generalize the model towards a real scenario, we propose to fulfill several aspects: (1) Look: visually incorporate spatial structure from the single view to enhance the expressiveness of representation; (2) Cast: perceptually align the 2D image features to the 3D shape priors with cross-modal semantic contrastive mapping; (3) Mold: reconstruct stereo-shape of target by transforming embeddings into the desired manifold. Extensive experiments on several benchmarks demonstrate the effectiveness and robustness of the proposed method in learning the 3D shape manifold from synthetic data via a single-view. The proposed method outperforms state-of-the-arts on Pix3D dataset with IoU 0.292 and CD 0.108, and reaches IoU 0.329 and CD 0.104 on Pascal 3D+

    Аванпроект вантажного літака скороченого зльоту і посадки вантажопідйомністю до 30 тон

    Get PDF
    Робота публікується згідно наказу Ректора НАУ від 27.05.2021 р. №311/од "Про розміщення кваліфікаційних робіт здобувачів вищої освіти в репозиторії університету". Керівник роботи: доцент, к.т.н. Краснопольський Володимир СергійовичThis thesis is devoted to the development of a preliminary design of a cargo plane for short takeoff and landing, which meets international flight standards, safe, efficient and reliable cargo transportation capabilities. The thickness of the front edge of the wing rib and the optimized design of the holes to reduce weight, as well as a lightweight design of the wing rib. The dissertation uses finite element analysis, parameter optimization, topology optimization, DM system modeling using the Ansys system, as well as finite element analysis and optimization modules. The practical significance of the results of the master's thesis is to make the aircraft lighter, to increase the range of the aircraft. The materials of the master's thesis can be used in the educational process and practical activities of designers of professional design institutions.Ця дипломна робота присвячена розробці ескізного проекту вантажного літака ближнього зльоту та посадки, що відповідає міжнародним стандартам польотів, безпечним, ефективним та надійним можливостям перевезення вантажів. Товщина переднього краю ребра крила і оптимізована конструкція отворів для зменшення ваги, а також полегшена конструкція ребра крила. У дисертації використовуються скінченно-елементний аналіз, оптимізація параметрів, оптимізація топології, моделювання системи DM за допомогою системи Ansys, а також модулі аналізу та оптимізації кінцевих елементів. Практичне значення результатів магістерської роботи полягає в тому, щоб зробити літак легшим, збільшити дальність польоту літака. Матеріали магістерської роботи можуть бути використані в навчальному процесі та практичній діяльності конструкторів професійних проектних установ

    Uncertainty-Aware Consistency Regularization for Cross-Domain Semantic Segmentation

    Full text link
    Unsupervised domain adaptation (UDA) aims to adapt existing models of the source domain to a new target domain with only unlabeled data. Many adversarial-based UDA methods involve high-instability training and have to carefully tune the optimization procedure. Some non-adversarial UDA methods employ a consistency regularization on the target predictions of a student model and a teacher model under different perturbations, where the teacher shares the same architecture with the student and is updated by the exponential moving average of the student. However, these methods suffer from noticeable negative transfer resulting from either the error-prone discriminator network or the unreasonable teacher model. In this paper, we propose an uncertainty-aware consistency regularization method for cross-domain semantic segmentation. By exploiting the latent uncertainty information of the target samples, more meaningful and reliable knowledge from the teacher model can be transferred to the student model. In addition, we further reveal the reason why the current consistency regularization is often unstable in minimizing the distribution discrepancy. We also show that our method can effectively ease this issue by mining the most reliable and meaningful samples with a dynamic weighting scheme of consistency loss. Experiments demonstrate that the proposed method outperforms the state-of-the-art methods on two domain adaptation benchmarks, i.e.,i.e., GTAV \rightarrow Cityscapes and SYNTHIA \rightarrow Cityscapes

    Context-Aware Mixup for Domain Adaptive Semantic Segmentation

    Get PDF
    Unsupervised domain adaptation (UDA) aims to adapt a model of the labeled source domain to an unlabeled target domain. Existing UDA-based semantic segmentation approaches always reduce the domain shifts in pixel level, feature level, and output level. However, almost all of them largely neglect the contextual dependency, which is generally shared across different domains, leading to less-desired performance. In this paper, we propose a novel Context-Aware Mixup (CAMix) framework for domain adaptive semantic segmentation, which exploits this important clue of context-dependency as explicit prior knowledge in a fully end-to-end trainable manner for enhancing the adaptability toward the target domain. Firstly, we present a contextual mask generation strategy by leveraging the accumulated spatial distributions and prior contextual relationships. The generated contextual mask is critical in this work and will guide the context-aware domain mixup on three different levels. Besides, provided the context knowledge, we introduce a significance-reweighted consistency loss to penalize the inconsistency between the mixed student prediction and the mixed teacher prediction, which alleviates the negative transfer of the adaptation, e.g., early performance degradation. Extensive experiments and analysis demonstrate the effectiveness of our method against the state-of-the-art approaches on widely-used UDA benchmarks.Comment: Accepted to IEEE Transactions on Circuits and Systems for Video Technology (TCSVT

    DMT: Dynamic Mutual Training for Semi-Supervised Learning

    Full text link
    Recent semi-supervised learning methods use pseudo supervision as core idea, especially self-training methods that generate pseudo labels. However, pseudo labels are unreliable. Self-training methods usually rely on single model prediction confidence to filter low-confidence pseudo labels, thus remaining high-confidence errors and wasting many low-confidence correct labels. In this paper, we point out it is difficult for a model to counter its own errors. Instead, leveraging inter-model disagreement between different models is a key to locate pseudo label errors. With this new viewpoint, we propose mutual training between two different models by a dynamically re-weighted loss function, called Dynamic Mutual Training (DMT). We quantify inter-model disagreement by comparing predictions from two different models to dynamically re-weight loss in training, where a larger disagreement indicates a possible error and corresponds to a lower loss value. Extensive experiments show that DMT achieves state-of-the-art performance in both image classification and semantic segmentation. Our codes are released at https://github.com/voldemortX/DST-CBC .Comment: Reformatte

    Fixed and mobile energy storage coordination optimization method for enhancing photovoltaic integration capacity considering voltage offset

    Get PDF
    Mobile energy storage has the characteristics of strong flexibility, wide application, etc., with fixed energy storage can effectively deal with the future large-scale photovoltaic as well as electric vehicles and other fluctuating load access to the grid resulting in the imbalance of supply and demand. To this end, this paper proposes a coordinated two-layer optimization strategy for fixed and mobile energy storage that takes into account voltage offsets, in the context of improving the demand for local PV consumption. Among them, the upper layer optimization model takes into account the minimum operating cost of fixed and mobile energy storage, and the lower layer optimization model minimizes the voltage offset through the 24-h optimal scheduling of fixed and mobile energy storage in order to improve the in-situ PV consumption capacity. In addition, considering the multidimensional nonlinear characteristics of the model, the interaction force of particles in the Universe is introduced, and the hybrid particle swarm-gravitational search algorithm (PSO-GSA) is proposed to solve the model, which is a combination of the individual optimization of the particle swarm algorithm and the local search capability of the gravitational search algorithm, which improves the algorithm’s optimization accuracy. Finally, the feasibility and effectiveness of the proposed model and method are verified by simulation analysis with IEEE 33 nodes
    corecore