136 research outputs found

    Hashing for Similarity Search: A Survey

    Full text link
    Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database. Various methods have been developed to address this problem, and recently a lot of efforts have been devoted to approximate search. In this paper, we present a survey on one of the main solutions, hashing, which has been widely studied since the pioneering work locality sensitive hashing. We divide the hashing algorithms two main categories: locality sensitive hashing, which designs hash functions without exploring the data distribution and learning to hash, which learns hash functions according the data distribution, and review them from various aspects, including hash function design and distance measure and search scheme in the hash coding space

    Optimized Cartesian KK-Means

    Full text link
    Product quantization-based approaches are effective to encode high-dimensional data points for approximate nearest neighbor search. The space is decomposed into a Cartesian product of low-dimensional subspaces, each of which generates a sub codebook. Data points are encoded as compact binary codes using these sub codebooks, and the distance between two data points can be approximated efficiently from their codes by the precomputed lookup tables. Traditionally, to encode a subvector of a data point in a subspace, only one sub codeword in the corresponding sub codebook is selected, which may impose strict restrictions on the search accuracy. In this paper, we propose a novel approach, named Optimized Cartesian KK-Means (OCKM), to better encode the data points for more accurate approximate nearest neighbor search. In OCKM, multiple sub codewords are used to encode the subvector of a data point in a subspace. Each sub codeword stems from different sub codebooks in each subspace, which are optimally generated with regards to the minimization of the distortion errors. The high-dimensional data point is then encoded as the concatenation of the indices of multiple sub codewords from all the subspaces. This can provide more flexibility and lower distortion errors than traditional methods. Experimental results on the standard real-life datasets demonstrate the superiority over state-of-the-art approaches for approximate nearest neighbor search.Comment: to appear in IEEE TKDE, accepted in Apr. 201

    What Can Simple Arithmetic Operations Do for Temporal Modeling?

    Full text link
    Temporal modeling plays a crucial role in understanding video content. To tackle this problem, previous studies built complicated temporal relations through time sequence thanks to the development of computationally powerful devices. In this work, we explore the potential of four simple arithmetic operations for temporal modeling. Specifically, we first capture auxiliary temporal cues by computing addition, subtraction, multiplication, and division between pairs of extracted frame features. Then, we extract corresponding features from these cues to benefit the original temporal-irrespective domain. We term such a simple pipeline as an Arithmetic Temporal Module (ATM), which operates on the stem of a visual backbone with a plug-andplay style. We conduct comprehensive ablation studies on the instantiation of ATMs and demonstrate that this module provides powerful temporal modeling capability at a low computational cost. Moreover, the ATM is compatible with both CNNs- and ViTs-based architectures. Our results show that ATM achieves superior performance over several popular video benchmarks. Specifically, on Something-Something V1, V2 and Kinetics-400, we reach top-1 accuracy of 65.6%, 74.6%, and 89.4% respectively. The code is available at https://github.com/whwu95/ATM.Comment: Accepted by ICCV 202

    GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition?

    Full text link
    This paper does not present a novel method. Instead, it delves into an essential, yet must-know baseline in light of the latest advancements in Generative Artificial Intelligence (GenAI): the utilization of GPT-4 for visual understanding. Our study centers on the evaluation of GPT-4's linguistic and visual capabilities in zero-shot visual recognition tasks: Firstly, we explore the potential of its generated rich textual descriptions across various categories to enhance recognition performance without any training. Secondly, we evaluate GPT-4's visual proficiency in directly recognizing diverse visual content. We conducted extensive experiments to systematically evaluate GPT-4's performance across images, videos, and point clouds, using 16 benchmark datasets to measure top-1 and top-5 accuracy. Our findings show that GPT-4, enhanced with rich linguistic descriptions, significantly improves zero-shot recognition, offering an average top-1 accuracy increase of 7% across all datasets. GPT-4 excels in visual recognition, outshining OpenAI-CLIP's ViT-L and rivaling EVA-CLIP's ViT-E, particularly in video datasets HMDB-51 and UCF-101, where it leads by 22% and 9%, respectively. We hope this research contributes valuable data points and experience for future studies. We release our code at https://github.com/whwu95/GPT4Vis.Comment: Technical report. Retest GPT-4V and update result

    UATVR: Uncertainty-Adaptive Text-Video Retrieval

    Full text link
    With the explosive growth of web videos and emerging large-scale vision-language pre-training models, e.g., CLIP, retrieving videos of interest with text instructions has attracted increasing attention. A common practice is to transfer text-video pairs to the same embedding space and craft cross-modal interactions with certain entities in specific granularities for semantic correspondence. Unfortunately, the intrinsic uncertainties of optimal entity combinations in appropriate granularities for cross-modal queries are understudied, which is especially critical for modalities with hierarchical semantics, e.g., video, text, etc. In this paper, we propose an Uncertainty-Adaptive Text-Video Retrieval approach, termed UATVR, which models each look-up as a distribution matching procedure. Concretely, we add additional learnable tokens in the encoders to adaptively aggregate multi-grained semantics for flexible high-level reasoning. In the refined embedding space, we represent text-video pairs as probabilistic distributions where prototypes are sampled for matching evaluation. Comprehensive experiments on four benchmarks justify the superiority of our UATVR, which achieves new state-of-the-art results on MSR-VTT (50.8%), VATEX (64.5%), MSVD (49.7%), and DiDeMo (45.8%). The code is available at https://github.com/bofang98/UATVR.Comment: To appear at ICCV202

    Heritability enrichment of immunoglobulin G N-glycosylation in specific tissues

    Get PDF
    Genome-wide association studies (GWAS) have identified over 60 genetic loci associated with immunoglobulin G (IgG) N-glycosylation; however, the causal genes and their abundance in relevant tissues are uncertain. Leveraging data from GWAS summary statistics for 8,090 Europeans, and large-scale expression quantitative trait loci (eQTL) data from the genotype-tissue expression of 53 types of tissues (GTEx v7), we derived a linkage disequilibrium score for the specific expression of genes (LDSC-SEG) and conducted a transcriptome-wide association study (TWAS). We identified 55 gene associations whose predicted levels of expression were significantly associated with IgG N-glycosylation in 14 tissues. Three working scenarios, i.e., tissue-specific, pleiotropic, and coassociated, were observed for candidate genetic predisposition affecting IgG N-glycosylation traits. Furthermore, pathway enrichment showed several IgG N-glycosylation-related pathways, such as asparagine N-linked glycosylation, N-glycan biosynthesis and transport to the Golgi and subsequent modification. Through phenome-wide association studies (PheWAS), most genetic variants underlying TWAS hits were found to be correlated with health measures (height, waist-hip ratio, systolic blood pressure) and diseases, such as systemic lupus erythematosus, inflammatory bowel disease, and Parkinson’s disease, which are related to IgG N-glycosylation. Our study provides an atlas of genetic regulatory loci and their target genes within functionally relevant tissues, for further studies on the mechanisms of IgG N-glycosylation and its related diseases

    Transposable elements cause the loss of self-incompatibility in citrus

    Get PDF
    Self-incompatibility (SI) is a widespread prezygotic mechanism for flowering plants to avoid inbreeding depression and promote genetic diversity. Citrus has an S-RNase-based SI system, which was frequently lost during evolution. We previously identified a single nucleotide mutation in Sm-RNase, which is responsible for the loss of SI in mandarin and its hybrids. However, little is known about other mechanisms responsible for conversion of SI to self-compatibility (SC) and we identify a completely different mechanism widely utilized by citrus. Here, we found a 786-bp miniature inverted-repeat transposable element (MITE) insertion in the promoter region of the FhiS2-RNase in Fortunella hindsii Swingle (a model plant for citrus gene function), which does not contain the Sm-RNase allele but are still SC. We demonstrate that this MITE plays a pivotal role in the loss of SI in citrus, providing evidence that this MITE insertion prevents expression of the S-RNase; moreover, transgenic experiments show that deletion of this 786-bp MITE insertion recovers the expression of FhiS2-RNase and restores SI. This study identifies the first evidence for a role for MITEs at the S-locus affecting the SI phenotype. A family-wide survey of the S-locus revealed that MITE insertions occur frequently adjacent to S-RNase alleles in different citrus genera, but only certain MITEs appear to be responsible for the loss of SI. Our study provides evidence that insertion of MITEs into a promoter region can alter a breeding strategy and suggests that this phenomenon may be broadly responsible for SC in species with the S-RNase system

    Characterization of Neuraminidases from the Highly Pathogenic Avian H5N1 and 2009 Pandemic H1N1 Influenza A Viruses

    Get PDF
    To study the precise role of the neuraminidase (NA), and its stalk region in particular, in the assembly, release, and entry of influenza virus, we deleted the 20-aa stalk segment from 2009 pandemic H1N1 NA (09N1) and inserted this segment, now designated 09s60, into the stalk region of a highly pathogenic avian influenza (HPAI) virus H5N1 NA (AH N1). The biological characterization of these wild-type and mutant NAs was analyzed by pseudotyped particles (pseudoparticles) system. Compared with the wild-type AH N1, the wild-type 09N1 exhibited higher NA activity and released more pseudoparticles. Deletion/insertion of the 09s60 segment did not alter this relationship. The infectivity of pseudoparticles harboring NA in combination with the hemagglutinin from HPAI H5N1 (AH H5) was decreased by insertion of 09s60 into AH N1 and was increased by deletion of 09s60 from 09N1. When isolated from the wild-type 2009H1N1 virus, 09N1 existed in the forms (in order of abundance) dimer>>tetramer>monomer, but when isolated from pseudoparticles, 09N1 existed in the forms dimer>monomer>>>tetramer. After deletion of 09s60, 09N1 existed in the forms monomer>>>dimer. AH N1 from pseudoparticles existed in the forms monomer>>dimer, but after insertion of 09s60, it existed in the forms dimer>>monomer. Deletion/insertion of 09s60 did not alter the NA glycosylation pattern of 09N1 or AH N1. The 09N1 was more sensitive than the AH N1 to the NA inhibitor oseltamivir, suggesting that the infectivity-enhancing effect of oseltamivir correlates with robust NA activity
    corecore