41 research outputs found

    Identification and comprehensive analyses of the CBL and CIPK gene families in wheat (Triticum aestivum L.)

    Get PDF
    The interaction analysis of wheat TaCBL and TaCIPK proteins were performed by Y2H method. (PDF 191 kb

    Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models

    Full text link
    Video Question Answering (VideoQA) has been significantly advanced from the scaling of recent Large Language Models (LLMs). The key idea is to convert the visual information into the language feature space so that the capacity of LLMs can be fully exploited. Existing VideoQA methods typically take two paradigms: (1) learning cross-modal alignment, and (2) using an off-the-shelf captioning model to describe the visual data. However, the first design needs costly training on many extra multi-modal data, whilst the second is further limited by limited domain generalization. To address these limitations, a simple yet effective Retrieving-to-Answer (R2A) framework is proposed.Given an input video, R2A first retrieves a set of semantically similar texts from a generic text corpus using a pre-trained multi-modal model (e.g., CLIP). With both the question and the retrieved texts, a LLM (e.g., DeBERTa) can be directly used to yield a desired answer. Without the need for cross-modal fine-tuning, R2A allows for all the key components (e.g., LLM, retrieval model, and text corpus) to plug-and-play. Extensive experiments on several VideoQA benchmarks show that despite with 1.3B parameters and no fine-tuning, our R2A can outperform the 61 times larger Flamingo-80B model even additionally trained on nearly 2.1B multi-modal data

    Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

    Full text link
    Most recent semantic segmentation methods adopt a fully-convolutional network (FCN) with an encoder-decoder architecture. The encoder progressively reduces the spatial resolution and learns more abstract/semantic visual concepts with larger receptive fields. Since context modeling is critical for segmentation, the latest efforts have been focused on increasing the receptive field, through either dilated/atrous convolutions or inserting attention modules. However, the encoder-decoder based FCN architecture remains unchanged. In this paper, we aim to provide an alternative perspective by treating semantic segmentation as a sequence-to-sequence prediction task. Specifically, we deploy a pure transformer (ie, without convolution and resolution reduction) to encode an image as a sequence of patches. With the global context modeled in every layer of the transformer, this encoder can be combined with a simple decoder to provide a powerful segmentation model, termed SEgmentation TRansformer (SETR). Extensive experiments show that SETR achieves new state of the art on ADE20K (50.28% mIoU), Pascal Context (55.83% mIoU) and competitive results on Cityscapes. Particularly, we achieve the first position in the highly competitive ADE20K test server leaderboard on the day of submission.Comment: CVPR 2021. Project page at https://fudan-zvg.github.io/SETR

    A miniature multi-functional photoacoustic probe

    Get PDF
    Photoacoustic technology is a promising tool to provide morphological and functional information in biomedical research. To enhance the imaging efficiency, the reported photoacoustic probes have been designed coaxially involving complicated optical/acoustic prisms to bypass the opaque piezoelectric layer of ultrasound transducers, but this has led to bulky probes and has hindered the applications in limited space. Though the emergence of transparent piezoelectric materials helps to save effort on the coaxial design, the reported transparent ultrasound transducers were still bulky. In this work, a miniature photoacoustic probe with an outer diameter of 4 mm was developed, in which an acoustic stack was made with a combination of transparent piezoelectric material and a gradient-index lens as a backing layer. The transparent ultrasound transducer exhibited a high center frequency of ~47 MHz and a −6 dB bandwidth of 29.4%, which could be easily assembled with a pigtailed ferrule of a single-mode fiber. The multi-functional capability of the probe was successfully validated through experiments of fluid flow sensing and photoacoustic imaging

    Miniature intravascular photoacoustic endoscopy with coaxial excitation and detection

    Get PDF
    Recent research pointed out that the degree of inflammation in the adventitia could correlate with the severity of atherosclerotic plaques. Intravascular photoacoustic endoscopy can provide the information of arterial morphology and plaque composition, and even detecting the inflammation. However, most reported work used a non-coaxial configuration for the photoacoustic catheter design, which formed a limited light-sound overlap area for imaging so as to miss the adventitia information. Here we developed a novel 0.9 mm-diameter intravascular photoacoustic catheter with coaxial excitation and detection to resolve the aforementioned issue. A miniature hollow ultrasound transducer with a 0.18 mm-diameter orifice in the center was successfully fabricated. To show the significance and merits of our design, phantom and ex vivo imaging experiments were conducted on both coaxial and non-coaxial catheters for comparison. The results demonstrated that the coaxial catheter exhibited much better photoacoustic/ultrasound imaging performance from the intima to the adventitia

    Unsupervised Person Re-identification by Deep Learning Tracklet Association

    Get PDF
    © 2018, Springer Nature Switzerland AG. Most existing person re-identification (re-id) methods rely on supervised model learning on per-camera-pair manually labelled pairwise training data. This leads to poor scalability in practical re-id deployment due to the lack of exhaustive identity labelling of image positive and negative pairs for every camera pair. In this work, we address this problem by proposing an unsupervised re-id deep learning approach capable of incrementally discovering and exploiting the underlying re-id discriminative information from automatically generated person tracklet data from videos in an end-to-end model optimisation. We formulate a Tracklet Association Unsupervised Deep Learning (TAUDL) framework characterised by jointly learning per-camera (within-camera) tracklet association (labelling) and cross-camera tracklet correlation by maximising the discovery of most likely tracklet relationships across camera views. Extensive experiments demonstrate the superiority of the proposed TAUDL model over the state-of-the-art unsupervised and domain adaptation re-id methods using six person re-id benchmarking datasets

    Person re-identification by unsupervised video matching

    Get PDF
    This work was partially supported by National Basic Research Program of China (973 Project) 2012CB725405, the National Science and Technology Support Program (2014BAG03B01), National Natural Science Foundation China61273238, Beijing Municipal Science and Technology Project (D15110900280000), Tsinghua University Project (20131089307). Xiatian Zhu and Xiaolong Ma equally contributed to this work
    corecore