41 research outputs found
Identification and comprehensive analyses of the CBL and CIPK gene families in wheat (Triticum aestivum L.)
The interaction analysis of wheat TaCBL and TaCIPK proteins were performed by Y2H method. (PDF 191Â kb
Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models
Video Question Answering (VideoQA) has been significantly advanced from the
scaling of recent Large Language Models (LLMs). The key idea is to convert the
visual information into the language feature space so that the capacity of LLMs
can be fully exploited. Existing VideoQA methods typically take two paradigms:
(1) learning cross-modal alignment, and (2) using an off-the-shelf captioning
model to describe the visual data. However, the first design needs costly
training on many extra multi-modal data, whilst the second is further limited
by limited domain generalization. To address these limitations, a simple yet
effective Retrieving-to-Answer (R2A) framework is proposed.Given an input
video, R2A first retrieves a set of semantically similar texts from a generic
text corpus using a pre-trained multi-modal model (e.g., CLIP). With both the
question and the retrieved texts, a LLM (e.g., DeBERTa) can be directly used to
yield a desired answer. Without the need for cross-modal fine-tuning, R2A
allows for all the key components (e.g., LLM, retrieval model, and text corpus)
to plug-and-play. Extensive experiments on several VideoQA benchmarks show that
despite with 1.3B parameters and no fine-tuning, our R2A can outperform the 61
times larger Flamingo-80B model even additionally trained on nearly 2.1B
multi-modal data
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
Most recent semantic segmentation methods adopt a fully-convolutional network
(FCN) with an encoder-decoder architecture. The encoder progressively reduces
the spatial resolution and learns more abstract/semantic visual concepts with
larger receptive fields. Since context modeling is critical for segmentation,
the latest efforts have been focused on increasing the receptive field, through
either dilated/atrous convolutions or inserting attention modules. However, the
encoder-decoder based FCN architecture remains unchanged. In this paper, we aim
to provide an alternative perspective by treating semantic segmentation as a
sequence-to-sequence prediction task. Specifically, we deploy a pure
transformer (ie, without convolution and resolution reduction) to encode an
image as a sequence of patches. With the global context modeled in every layer
of the transformer, this encoder can be combined with a simple decoder to
provide a powerful segmentation model, termed SEgmentation TRansformer (SETR).
Extensive experiments show that SETR achieves new state of the art on ADE20K
(50.28% mIoU), Pascal Context (55.83% mIoU) and competitive results on
Cityscapes. Particularly, we achieve the first position in the highly
competitive ADE20K test server leaderboard on the day of submission.Comment: CVPR 2021. Project page at https://fudan-zvg.github.io/SETR
A miniature multi-functional photoacoustic probe
Photoacoustic technology is a promising tool to provide morphological and functional information in biomedical research. To enhance the imaging efficiency, the reported photoacoustic probes have been designed coaxially involving complicated optical/acoustic prisms to bypass the opaque piezoelectric layer of ultrasound transducers, but this has led to bulky probes and has hindered the applications in limited space. Though the emergence of transparent piezoelectric materials helps to save effort on the coaxial design, the reported transparent ultrasound transducers were still bulky. In this work, a miniature photoacoustic probe with an outer diameter of 4 mm was developed, in which an acoustic stack was made with a combination of transparent piezoelectric material and a gradient-index lens as a backing layer. The transparent ultrasound transducer exhibited a high center frequency of ~47 MHz and a −6 dB bandwidth of 29.4%, which could be easily assembled with a pigtailed ferrule of a single-mode fiber. The multi-functional capability of the probe was successfully validated through experiments of fluid flow sensing and photoacoustic imaging
Miniature intravascular photoacoustic endoscopy with coaxial excitation and detection
Recent research pointed out that the degree of inflammation in the adventitia could correlate with the severity of atherosclerotic plaques. Intravascular photoacoustic endoscopy can provide the information of arterial morphology and plaque composition, and even detecting the inflammation. However, most reported work used a non-coaxial configuration for the photoacoustic catheter design, which formed a limited light-sound overlap area for imaging so as to miss the adventitia information. Here we developed a novel 0.9 mm-diameter intravascular photoacoustic catheter with coaxial excitation and detection to resolve the aforementioned issue. A miniature hollow ultrasound transducer with a 0.18 mm-diameter orifice in the center was successfully fabricated. To show the significance and merits of our design, phantom and ex vivo imaging experiments were conducted on both coaxial and non-coaxial catheters for comparison. The results demonstrated that the coaxial catheter exhibited much better photoacoustic/ultrasound imaging performance from the intima to the adventitia
Unsupervised Person Re-identification by Deep Learning Tracklet Association
© 2018, Springer Nature Switzerland AG. Most existing person re-identification (re-id) methods rely on supervised model learning on per-camera-pair manually labelled pairwise training data. This leads to poor scalability in practical re-id deployment due to the lack of exhaustive identity labelling of image positive and negative pairs for every camera pair. In this work, we address this problem by proposing an unsupervised re-id deep learning approach capable of incrementally discovering and exploiting the underlying re-id discriminative information from automatically generated person tracklet data from videos in an end-to-end model optimisation. We formulate a Tracklet Association Unsupervised Deep Learning (TAUDL) framework characterised by jointly learning per-camera (within-camera) tracklet association (labelling) and cross-camera tracklet correlation by maximising the discovery of most likely tracklet relationships across camera views. Extensive experiments demonstrate the superiority of the proposed TAUDL model over the state-of-the-art unsupervised and domain adaptation re-id methods using six person re-id benchmarking datasets
Discovering visual concept structure with sparse and incomplete tags
This work was partially supported by the China Scholarship Council, Vision Semantics Limited, and Royal Society Newton Advanced Fellowship Programme (NA150459)
Person re-identification by unsupervised video matching
This work was partially supported by National Basic Research Program of China (973 Project) 2012CB725405, the National Science and Technology Support Program (2014BAG03B01), National Natural Science Foundation China61273238, Beijing Municipal Science and Technology Project (D15110900280000), Tsinghua University Project (20131089307). Xiatian Zhu and Xiaolong Ma equally contributed to this work