53 research outputs found

    Toward the Generalization of Vision Models

    Get PDF
    In the field of computer vision, the ability to generalize is essential for ensuring robust and reliable performance across diverse datasets and real-world scenarios. Generalization refers to a computer vision model's capability to accurately interpret and process new, unseen data beyond the specific examples it was trained on. Real-world visual data can vary widely in terms of lighting conditions, backgrounds, viewpoints, occlusions, and other factors, making generalization crucial for effective performance. A computer vision system with strong generalization ability can identify objects, recognize patterns, and make accurate predictions even when presented with variations it has not encountered during training. Achieving robust generalization necessitates careful algorithm design, sufficient training data that captures real-world variability, regularization techniques to prevent overfitting, and robust evaluation methodologies. Without robust generalization, computer vision models may perform well on training data but fail to generalize to new, unseen data, limiting their real-world applicability. In this dissertation, I will present my research efforts in enhancing the generalization ability of vision models from several perspectives. Firstly, i focus on designing vision algorithms that can generalize to novel viewpoints and occlusions, tackling challenges commonly encountered in real-world scenarios. Secondly, I investigate the architectural disparities between vision transformers and convolutional neural networks (CNNs) to better understand their impact on generalization, particularly concerning out-of-distribution data and adversarial attacks. Thirdly, I propose innovative self-supervised learning algorithms, Point-Level Region Contrast for Object Detection Pre-Training, aimed at providing strong backbone features for generalization across various downstream tasks. Additionally, I introduce efficient masked autoencoding techniques for enhanced representation learning. Finally, I propose Sequential Modeling Enables Scalable Learning for Large Vision Models, a methodology designed to effectively leverage large volumes of real-world visual data, thereby further enhancing generalization capabilities. Through these contributions, I aim to advance the field towards more generalized and robust vision models with broad applicability across diverse real-world scenarios

    A combinatorial characterization of the annihilator varieties of highest weight modules for classical Lie algebras

    Full text link
    Let g\mathfrak{g} be a classical Lie algebra. Let L(λ)L(\lambda) be a highest weight module of g\mathfrak{g} with highest weight λρ\lambda-\rho, where ρ\rho is half the sum of positive roots. In 1985, Joseph proved that the associated variety of a primitive ideal is the Zariski closure of a nilpotent orbit in g\mathfrak{g}^*. In this paper, we will give some combinatorial characterizations of the annihilator varieties of highest weight modules for classical Lie algebras. In fact, we will give two algorithms, i.e., bipartition algorithm and partition algorithm.Comment: 40page

    Intriguing Properties of Text-guided Diffusion Models

    Full text link
    Text-guided diffusion models (TDMs) are widely applied but can fail unexpectedly. Common failures include: (i) natural-looking text prompts generating images with the wrong content, or (ii) different random samples of the latent variables that generate vastly different, and even unrelated, outputs despite being conditioned on the same text prompt. In this work, we aim to study and understand the failure modes of TDMs in more detail. To achieve this, we propose SAGE, an adversarial attack on TDMs that uses image classifiers as surrogate loss functions, to search over the discrete prompt space and the high-dimensional latent space of TDMs to automatically discover unexpected behaviors and failure cases in the image generation. We make several technical contributions to ensure that SAGE finds failure cases of the diffusion model, rather than the classifier, and verify this in a human study. Our study reveals four intriguing properties of TDMs that have not been systematically studied before: (1) We find a variety of natural text prompts producing images that fail to capture the semantics of input texts. We categorize these failures into ten distinct types based on the underlying causes. (2) We find samples in the latent space (which are not outliers) that lead to distorted images independent of the text prompt, suggesting that parts of the latent space are not well-structured. (3) We also find latent samples that lead to natural-looking images which are unrelated to the text prompt, implying a potential misalignment between the latent and prompt spaces. (4) By appending a single adversarial token embedding to an input prompt we can generate a variety of specified target objects, while only minimally affecting the CLIP score. This demonstrates the fragility of language representations and raises potential safety concerns.Comment: Code will be available at: https://github.com/qihao067/SAG

    CoKe: Localized Contrastive Learning for Robust Keypoint Detection

    Full text link
    Today's most popular approaches to keypoint detection involve very complex network architectures that aim to learn holistic representations of all keypoints. In this work, we take a step back and ask: Can we simply learn a local keypoint representation from the output of a standard backbone architecture? This will help make the network simpler and more robust, particularly if large parts of the object are occluded. We demonstrate that this is possible by looking at the problem from the perspective of representation learning. Specifically, the keypoint kernels need to be chosen to optimize three types of distances in the feature space: Features of the same keypoint should be similar to each other, while differing from those of other keypoints, and also being distinct from features from the background clutter. We formulate this optimization process within a framework, which we call CoKe, which includes supervised contrastive learning. CoKe needs to make several approximations to enable representation learning process on large datasets. In particular, we introduce a clutter bank to approximate non-keypoint features, and a momentum update to compute the keypoint representation while training the feature extractor. Our experiments show that CoKe achieves state-of-the-art results compared to approaches that jointly represent all keypoints holistically (Stacked Hourglass Networks, MSS-Net) as well as to approaches that are supervised by detailed 3D object geometry (StarMap). Moreover, CoKe is robust and performs exceptionally well when objects are partially occluded and significantly outperforms related work on a range of diverse datasets (PASCAL3D+, MPII, ObjectNet3D)

    Giant Nonreciprocity of Surface Acoustic Waves induced by a positive-negative magnetostrictive heterostructure

    Full text link
    Lack of nonreciprocity is one of the major drawbacks of solid-state acoustic devices, which has hindered the development of microwave-frequency acoustic isolators and circulators. Here we report giant nonreciprocal transmission of shear-horizontal surface acoustic waves (SH-SAWs) on a LiTaO3 substrate coated with a negative-positive magnetostrictive bilayer structure of Ni/Ti/FeCoSiB. Although the static magnetic moments of two layers are parallel, SH-SAWs can excite optical-mode spin waves much stronger than acoustic-mode ones at relatively low frequencies via magnetoelastic coupling. The measured magnitude nonreciprocity exceeds 40 dB (or 80 dB/mm) at 2.333 GHz. In addition, maximum nonreciprocal phase accumulation reaches 188{\deg} (376{\deg}/mm), which is desired for an effective SAW circulator. Our theoretical model and calculations provide an insight into the observed phenomena and demonstrate a pathway for further improvement of nonreciprocal acoustic devices

    Supply chain finance: what are the challenges in the adoption of blockchain technology?

    Get PDF
    As an emerging information technology, blockchain has aroused extensive discussions around the world and been suggested as a solution to address current issues in supply chain finance (SCF). The Chinese government also attaches great importance to this technology, and many Chinese state-owned enterprises have invested in establishing their own blockchain research and development centres. However, there is a lack of studies on identifying challenges when deploying this technology; theoretical framework and conceptual exposition are also scarcely seen. Therefore, the aim of this study is to investigate the challenges and obstacles in the adoption of blockchain technology in SCF. An exploratory case study of a Chinese state-owned enterprise was conducted to build up an initial conceptual framework. Semi-structured interview was applied to collect data from the case firm's employees, top management, and technical specialists. The results of the analysis indicate that in the adoption of blockchain technology, there are technological, operational, and other challenges. From a technological perspective, framework identification, cross-chain interoperability, and data governance are major barriers; whereas, from an operational perspective, the new business process and transformation in the entire supply chain are identified as challenges. Besides, other obstacles such as the elimination of jobs and regulatory issues are also not neglectable. This study contributes to research on blockchain and supply chains by shedding light on the challenges of blockchain adoption through an exploratory case study of a Chinese state-owned enterprise. A conceptual framework was generated as a basis for future research, and the findings also provide insights for companies that may or are planning to adopt blockchain technology
    corecore