127 research outputs found

    Understanding and Improving In-Context Learning on Vision-language Models

    Full text link
    Recently, in-context learning (ICL) on large language models (LLMs) has received great attention, and this technique can also be applied to vision-language models (VLMs) built upon LLMs. These VLMs can respond to queries by conditioning responses on a series of multimodal demonstrations, which comprise images, queries, and answers. Though ICL has been extensively studied on LLMs, its research on VLMs remains limited. The inclusion of additional visual information in the demonstrations motivates the following research questions: which of the two modalities in the demonstration is more significant? How can we select effective multimodal demonstrations to enhance ICL performance? This study investigates the significance of both visual and language information. Our findings indicate that ICL in VLMs is predominantly driven by the textual information in the demonstrations whereas the visual information in the demonstrations barely affects the ICL performance. Subsequently, we provide an understanding of the findings by analyzing the model information flow and comparing model inner states given different ICL settings. Motivated by our analysis, we propose a simple yet effective approach, termed Mixed Modality In-Context Example Selection (MMICES), which considers both visual and language modalities when selecting demonstrations and shows better ICL performance. Extensive experiments are conducted to support our findings, understanding, and improvement of the ICL performance of VLMs.Comment: 8 pages, 10 figure

    A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models

    Full text link
    Prompt engineering is a technique that involves augmenting a large pre-trained model with task-specific hints, known as prompts, to adapt the model to new tasks. Prompts can be created manually as natural language instructions or generated automatically as either natural language instructions or vector representations. Prompt engineering enables the ability to perform predictions based solely on prompts without updating model parameters, and the easier application of large pre-trained models in real-world tasks. In past years, Prompt engineering has been well-studied in natural language processing. Recently, it has also been intensively studied in vision-language modeling. However, there is currently a lack of a systematic overview of prompt engineering on pre-trained vision-language models. This paper aims to provide a comprehensive survey of cutting-edge research in prompt engineering on three types of vision-language models: multimodal-to-text generation models (e.g. Flamingo), image-text matching models (e.g. CLIP), and text-to-image generation models (e.g. Stable Diffusion). For each type of model, a brief model summary, prompting methods, prompting-based applications, and the corresponding responsibility and integrity issues are summarized and discussed. Furthermore, the commonalities and differences between prompting on vision-language models, language models, and vision models are also discussed. The challenges, future directions, and research opportunities are summarized to foster future research on this topic

    Analysis of miRNAs and their target genes associated with lipid metabolism in duck liver

    Get PDF
    Citation: He, J. et al. Analysis of miRNAs and their target genes associated with lipid metabolism in duck liver. Sci. Rep. 6, 27418; doi: 10.1038/srep27418 (2016).Fat character is an important index in duck culture that linked to local flavor, feed cost and fat intake for costumers. Since the regulation networks in duck lipid metabolism had not been reported very clearly, we aimed to explore the potential miRNA-mRNA pairs and their regulatory roles in duck lipid metabolism. Here, Cherry-Valley ducks were selected and treated with/without 5% oil added in feed for 2 weeks, and then fat content determination was performed on. The data showed that the fat contents and the fatty acid ratios of C17:1 and C18:2 were up-regulated in livers of oil-added ducks, while the C12:0 ratio was down-regulated. Then 21 differential miRNAs, including 10 novel miRNAs, were obtain from the livers by sequencing, and 73 target genes involved in lipid metabolic processes of these miRNAs were found, which constituted 316 miRNA-mRNA pairs. Two miRNA-mRNA pairs including one novel miRNA and one known miRNA, N-miR-16020-FASN and gga-miR-144-ELOVL6, were selected to validate the miRNA-mRNA negative relation. And the results showed that N-mir-16020 and gga-miR-144 could respectively bind the 3?-UTRs of FASN and ELOVL6 to control their expressions. This study provides new sights and useful information for future research on regulation network in duck lipid metabolism

    Construction and validation of a nomogram of risk factors for new-onset atrial fibrillation in advanced lung cancer patients after non-surgical therapy

    Get PDF
    ObjectiveRisk factors of new-onset atrial fibrillation (NOAF) in advanced lung cancer patients are not well defined. We aim to construct and validate a nomogram model between NOAF and advanced lung cancer.MethodsWe retrospectively enrolled 19484 patients with Stage III-IV lung cancer undergoing first-line antitumor therapy in Shanghai Chest Hospital between January 2016 and December 2020 (15837 in training set, and 3647 in testing set). Patients with pre-existing AF, valvular heart disease, cardiomyopathy were excluded. Logistic regression analysis and propensity score matching (PSM) were performed to identify predictors of NOAF, and nomogram model was constructed and validated.ResultsA total of 1089 patients were included in this study (807 in the training set, and 282 in the testing set). Multivariate logistic regression analysis showed that age, c-reactive protein, centric pulmonary carcinoma, and pericardial effusion were independent risk factors, the last two of which were important independent risk factors as confirmed by PSM analysis. Nomogram included independent risk factors of age, c-reactive protein, centric pulmonary carcinoma, and pericardial effusion. The AUC was 0.716 (95% CI 0.661–0.770) and further evaluation of this model showed that the C-index was 0.716, while the bias-corrected C-index after internal validation was 0.748 in the training set. The calibration curves presented good concordance between the predicted and actual outcomes.ConclusionCentric pulmonary carcinoma and pericardial effusion were important independent risk factors for NOAF besides common ones in advanced lung cancer patients. Furthermore, the new nomogram model contributed to the prediction of NOAF

    An Attack Threat Effect Analysis Method Based on K-Means Evaluation

    No full text
    To take full advantage of the specified features of the attack dataset in network attack effect evaluation, maximize the efficiency of evaluation without losing its accuracy. This paper proposed a K-Means evaluation technique using dimensional entropy components, derived from changes in network entropy through attack period and the advantages of clustering algorithm in data mining. This method makes a pre-process of the attack dataset on the basis of network entropy, mapping it to a two-dimensional plane and utilize the output of pre-process as the input of clustering. Then establish a relation between the attack dataset and the effect category via an improved K-Means algorithm, thus achieving an explicit division of attack effect set and provide efficient evaluation result. The experimental results prove that the method can process attack dataset with high efficiency, as well as provide a visualized evaluation result by the form of cluster tree

    Layered hidden Markov models for real-time daily activity monitoring using body sensor networks

    No full text
    This paper presents an inferring and training architecture for long-term and continuous daily activity monitoring using a wearable body sensor network. Energy efficiency and system adaptivity to wearers are two of the most important requirements of a body sensor network. This paper discusses a two-layered hidden Markov model (HMM) architecture for in-network data processing to achieve energy efficiency and model individualization. The bottom-layer HMM is used to process sensory data locally at each wireless sensor node to significantly reduce data transmissions. The top-layer HMM is utilized to find the activity sequence from the result of the local processing. This approach is energy efficient in that only the results of the decoding procedure in each node need to be transmitted rather than raw sensing data. Therefore, the volume of data are significantly reduced. When the algorithm is applied in online monitoring systems, the results of local processing are transmitted only upon hidden state changes. The top-layer processing uses old data of one sensor node when it does not receive a new result sequence of the local processing from that sensor node. The adaption to various wearers is also discussed, and the robustness of this classification system is depicted. Experiments of 19 activity sequences to be classified are taken by 5 subjects to evaluate the performance of this system. © 2011 Springer-Verlag London Limited
    • …
    corecore