Search CORE

84 research outputs found

Hashing for Similarity Search: A Survey

Author: Ji Jianqiu
Shen Heng Tao
Song Jingkuan
Wang Jingdong
Publication venue
Publication date: 13/08/2014
Field of study

Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database. Various methods have been developed to address this problem, and recently a lot of efforts have been devoted to approximate search. In this paper, we present a survey on one of the main solutions, hashing, which has been widely studied since the pioneering work locality sensitive hashing. We divide the hashing algorithms two main categories: locality sensitive hashing, which designs hash functions without exploring the data distribution and learning to hash, which learns hash functions according the data distribution, and review them from various aspects, including hash function design and distance measure and search scheme in the hash coding space

arXiv.org e-Print Archive

CiteSeerX

Optimized Cartesian $K$ -Means

Author: Li Shipeng
Shen Heng Tao
Song Jingkuan
Wang Jianfeng
Wang Jingdong
Xu Xin-Shun
Publication venue
Publication date: 15/05/2014
Field of study

Product quantization-based approaches are effective to encode high-dimensional data points for approximate nearest neighbor search. The space is decomposed into a Cartesian product of low-dimensional subspaces, each of which generates a sub codebook. Data points are encoded as compact binary codes using these sub codebooks, and the distance between two data points can be approximated efficiently from their codes by the precomputed lookup tables. Traditionally, to encode a subvector of a data point in a subspace, only one sub codeword in the corresponding sub codebook is selected, which may impose strict restrictions on the search accuracy. In this paper, we propose a novel approach, named Optimized Cartesian

K

-Means (OCKM), to better encode the data points for more accurate approximate nearest neighbor search. In OCKM, multiple sub codewords are used to encode the subvector of a data point in a subspace. Each sub codeword stems from different sub codebooks in each subspace, which are optimally generated with regards to the minimization of the distortion errors. The high-dimensional data point is then encoded as the concatenation of the indices of multiple sub codewords from all the subspaces. This can provide more flexibility and lower distortion errors than traditional methods. Experimental results on the standard real-life datasets demonstrate the superiority over state-of-the-art approaches for approximate nearest neighbor search.Comment: to appear in IEEE TKDE, accepted in Apr. 201

arXiv.org e-Print Archive

University of Queensland eSpace

Self-Supervised Video Hashing with Hierarchical Binary Auto-encoder

Author: Gao Lianli
Hong Richang
Li Xiangpeng
Song Jingkuan
Wang Meng
Zhang Hanwang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Existing video hash functions are built on three isolated stages: frame pooling, relaxed learning, and binarization, which have not adequately explored the temporal order of video frames in a joint binary optimization model, resulting in severe information loss. In this paper, we propose a novel unsupervised video hashing framework dubbed Self-Supervised Video Hashing (SSVH), that is able to capture the temporal nature of videos in an end-to-end learning-to-hash fashion. We specifically address two central problems: 1) how to design an encoder-decoder architecture to generate binary codes for videos; and 2) how to equip the binary codes with the ability of accurate video retrieval. We design a hierarchical binary autoencoder to model the temporal dependencies in videos with multiple granularities, and embed the videos into binary codes with less computations than the stacked architecture. Then, we encourage the binary codes to simultaneously reconstruct the visual content and neighborhood structure of the videos. Experiments on two real-world datasets (FCVID and YFCC) show that our SSVH method can significantly outperform the state-of-the-art methods and achieve the currently best performance on the task of unsupervised video retrieval

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

Macro- and microplastic accumulation in soil after 32 years of plastic film mulching

Author: Ding Fan
Flury Markus
Jones Davey L.
Li Shitong
Li Shuangyi
Wang Jingkuan
Wang Zhan
Xu Li
Publication venue
Publication date: 01/01/2022
Field of study

Plastic film mulch (PFM) is a double-edged-sword agricultural technology, which greatly improves global agricultural production but can also cause severe plastic pollution of the environment. Here, we characterized and quantified the amount of macro- and micro-plastics accumulated after 32 years of continuous plastic mulch film use in an agricultural field. An interactive field trial was established in 1987, where the effect of plastic mulching and N fertilization on maize yield was investigated. We assessed the abundance and type of macroplastics (>5 mm) at 0–20 cm soil depth and microplastic (<5 mm) at 0–100 cm depth. In the PFM plot, we found about 10 times more macroplastic particles in the fertilized plots than in the non-fertilized plots (6796 vs 653 pieces/m2), and the amount of film microplastics was about twice as abundant in the fertilized plots than in the non-fertilized plots (3.7 × 106 vs 2.2 × 106 particles/kg soil). These differences can be explained by entanglement of plastics with plant roots and stems, which made it more difficult to remove plastic film after harvest. Macroplastics consisted mainly of films, while microplastics consisted of films, fibers, and granules, with the films being identified as polyethylene originating from the plastic mulch films. Plastic mulch films contributed 33%–56% to the total microplastics in 0–100 cm depth. The total number of microplastics in the topsoil (0–10 cm) ranged as 7183–10,586 particles/kg, with an average of 8885 particles/kg. In the deep subsoil (80–100 cm) the plastic concentration ranged as 2268–3529 particles/kg, with an average of 2899 particles/kg. Long-term use of plastic mulch films caused considerable pollution of not only surface, but also subsurface soil. Migration of plastic to deeper soil layers makes removal and remediation more difficult, implying that the plastic pollution legacy will remain in soil for centuries

Research Repository

Bangor University Research Portal

Differential long-term fertilization alters residue-derived labile organic carbon fractions and microbial community during straw residue decomposition

Author: An Tingting
Bol Roland
Cheng Na
Ge Zhuang
Li Shuangyi
Li Tingyu
Liu Xu
Peng Chang
Wang Jingkuan
Xu Zhiqiang
Zhu Ping
Publication venue: 'Elsevier BV'
Publication date: 01/09/2021
Field of study

Bangor University Research Portal

CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language Models

Author: Chai Huacan
Du Kounianhua
Fan Longteng
Fang Yuchen
Fu Lingyue
Lei Jiayi
Lin Jianghao
Liu Yifan
Luo Shuang
Qi Siyuan
Rui Renting
Wang Jingkuan
Yu Yong
Zhang Kangning
Zhang Weiming
Zhang Weinan
Publication venue
Publication date: 06/09/2023
Field of study

With the emergence of Large Language Models (LLMs), there has been a significant improvement in the programming capabilities of models, attracting growing attention from researchers. We propose CodeApex, a bilingual benchmark dataset focusing on the programming comprehension and code generation abilities of LLMs. CodeApex comprises three types of multiple-choice questions: conceptual understanding, commonsense reasoning, and multi-hop reasoning, designed to evaluate LLMs on programming comprehension tasks. Additionally, CodeApex utilizes algorithmic questions and corresponding test cases to assess the code quality generated by LLMs. We evaluate 14 state-of-the-art LLMs, including both general-purpose and specialized models. GPT exhibits the best programming capabilities, achieving approximate accuracies of 50% and 56% on the two tasks, respectively. There is still significant room for improvement in programming tasks. We hope that CodeApex can serve as a reference for evaluating the coding capabilities of LLMs, further promoting their development and growth. Datasets are released at https://github.com/APEXLAB/CodeApex.git. CodeApex submission website is https://apex.sjtu.edu.cn/codeapex/.Comment: 21 page

arXiv.org e-Print Archive

Modeling Rett Syndrome Using TALEN-Edited MECP2 Mutant Cynomolgus Monkeys

Author: Bai Raoxian
Bao Xinhua
Chen Xiaoying
Chen Yongchang
Chen Zhenzhen
Geng Rui
He Jing
Hu Xintian
Hu Yingzhou
Huang Shaoyong
Ji Weizhi
Jiang Tianzi
Jiang Yong
Kang Yu
Li Fuxing
Li Gang
Li Siguang
Liang Aibin
Liu Hailiang
Liu Jie
Liu Xiaojing
Lu Yi
Luo Yuping
Ma Yuanye
Niu Yuyu
Qin Dongdong
Shen Dinggang
Si Chenyang
Sun Yi Eve
Wang Jiaojian
Wang Junbang
Wang Shuang
Wei Jingkuan
Wu Kunhua
Yu Juehua
Zhang Kunshan
Zhang Qingping
Publication venue
Publication date: 01/01/2017
Field of study

Gene-editing technologies have made it feasible to create nonhuman primate models for human genetic disorders. Here, we report detailed genotypes and phenotypes of TALEN-edited MECP2 mutant cynomolgus monkeys serving as a model for a neurodevelopmental disorder, Rett syndrome (RTT), which is caused by loss-of-function mutations in the human MECP2 gene. Male mutant monkeys were embryonic lethal, reiterating that RTT is a disease of females. Through a battery of behavioral analyses, including primate-unique eye-tracking tests, in combination with brain imaging via MRI, we found a series of physiological, behavioral, and structural abnormalities resembling clinical manifestations of RTT. Moreover, blood transcriptome profiling revealed that mutant monkeys resembled RTT patients in immune gene dysregulation. Taken together, the stark similarity in phenotype and/or endophenotype between monkeys and patients suggested that gene-edited RTT founder monkeys would be of value for disease mechanistic studies as well as development of potential therapeutic interventions for RTT

Carolina Digital Repository

Surface‐modified quantity of Fe 3

Author: Jingkuan Duan
Lihui Yao
Ya Li
Yajuan Wang
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date
Field of study

Crossref