3,733 research outputs found

    DIC-Transformer: interpretation of plant disease classification results using image caption generation technology

    Get PDF
    Disease image classification systems play a crucial role in identifying disease categories in the field of agricultural diseases. However, current plant disease image classification methods can only predict the disease category and do not offer explanations for the characteristics of the predicted disease images. Due to the current situation, this paper employed image description generation technology to produce distinct descriptions for different plant disease categories. A two-stage model called DIC-Transformer, which encompasses three tasks (detection, interpretation, and classification), was proposed. In the first stage, Faster R-CNN was utilized to detect the diseased area and generate the feature vector of the diseased image, with the Swin Transformer as the backbone. In the second stage, the model utilized the Transformer to generate image captions. It then generated the image feature vector, which is weighted by text features, to improve the performance of image classification in the subsequent classification decoder. Additionally, a dataset containing text and visualizations for agricultural diseases (ADCG-18) was compiled. The dataset contains images of 18 diseases and descriptive information about their characteristics. Then, using the ADCG-18, the DIC-Transformer was compared to 11 existing classical caption generation methods and 10 image classification models. The evaluation indicators for captions include Bleu1–4, CiderD, and Rouge. The values of BLEU-1, CIDEr-D, and ROUGE were 0.756, 450.51, and 0.721. The results of DIC-Transformer were 0.01, 29.55, and 0.014 higher than those of the highest-performing comparison model, Fc. The classification evaluation metrics include accuracy, recall, and F1 score, with accuracy at 0.854, recall at 0.854, and F1 score at 0.853. The results of DIC-Transformer were 0.024, 0.078, and 0.075 higher than those of the highest-performing comparison model, MobileNetV2. The results indicate that the DIC-Transformer outperforms other comparison models in classification and caption generation

    Linking language and emotion: how emotion is understood in language comprehension, production and prediction using psycholinguistic methods

    Get PDF
    Emotions are an integral part of why and how we use language in everyday life. We communicate our concerns, express our woes, and share our joy through the use of non-verbal and verbal language. Yet there is a limited understanding of when and how emotional language is processed differently to neutral language, or of how emotional information facilitates or inhibits language processing. Indeed, various efforts have been made to bring back emotions into the discipline of psycholinguistics in the last decade. This can be seen in many interdisciplinary models focusing on the role played by emotion in each aspect of linguistic experience. In this thesis, I answer this call and pursue questions that remain unanswered in psycholinguistics regarding its interaction with emotion. The general trend that I am using to bring emotion into psycholinguistic research is straightforward. Where applicable and relevant, I use well-established tasks or paradigms to investigate the effects of emotional content in language processing. Hence, I focused on three main areas of language processing: comprehension, production and prediction. The first experimental chapter includes a series of experiments utilising the Modality Switching Paradigm to investigate whether sentences describing emotional states are processed differently from sentences describing cognitive states. No switching effects were found consistently in my 3 experiments. My results suggest that these distinct classes of interoceptive concepts, such as ‘thinking’ or ‘being happy’, are not processed differently from each other, suggesting that people do not switch attention between different interoceptive systems when comprehending emotional or cognitive sentences. I discuss the implications for grounded cognition theory in the embodiment literature. In my second experimental chapter, I used the Cumulative Semantic Interference Paradigm to investigate these two questions: (1) whether emotion concepts interfere with one another when repeatedly retrieved (emotion label objects), and (2) whether similar interference occurs for concrete objects that share similar valence association (emotion-laden objects). This could indicate that people use information such as valence and arousal to group objects in semantic memory. I found that interference occurs when people retrieve direct emotion labels repeatedly (e.g., “happy” and “sad”) but not when they retrieve the names of concrete objects that have similar emotion connotations (e.g., “puppy” and “rainbow”). I discuss my findings in terms of the different types of information that support representation of abstract vs. concrete concepts. In my final experimental chapter, I used the Visual World Paradigm to investigate whether the emotional state of an agent is used to inform predictions during sentence processing. I found that people do use the description of emotional state of an agent (e.g., “The boy is happy”) to predict the cause of that affective state during sentence processing (e.g., “because he was given an ice-cream”). A key result here is that people were more likely to fixate on the emotionally congruent objects (e.g., ice-cream) compared to incongruent objects (e.g., broccoli). This suggests that people rapidly and automatically inform predictions about upcoming sentence information based on the emotional state of the agent. I discuss our findings as a novel contribution to the Visual World literature. I conducted a diverse set of experiments using a range of established psycholinguistic methods to investigate the roles of emotional information in language processing. I found clear results in the eye-tracking study but inconsistent effects in both switching and interference studies. I interpret these mixed findings in the following way: emotional content does not always have effects in language processing and that effect are most likely in tasks that explicitly require participants to simulate emotion states in some way. Regardless, not only was I successful in finding some novel results by extending previous tasks, but I was also able to show that this is an avenue that can be explored more to advance the affective psycholinguistic field

    LIPIcs, Volume 251, ITCS 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 251, ITCS 2023, Complete Volum

    Scalable Exploration of Complex Objects and Environments Beyond Plain Visual Replication​

    Get PDF
    Digital multimedia content and presentation means are rapidly increasing their sophistication and are now capable of describing detailed representations of the physical world. 3D exploration experiences allow people to appreciate, understand and interact with intrinsically virtual objects. Communicating information on objects requires the ability to explore them under different angles, as well as to mix highly photorealistic or illustrative presentations of the object themselves with additional data that provides additional insights on these objects, typically represented in the form of annotations. Effectively providing these capabilities requires the solution of important problems in visualization and user interaction. In this thesis, I studied these problems in the cultural heritage-computing-domain, focusing on the very common and important special case of mostly planar, but visually, geometrically, and semantically rich objects. These could be generally roughly flat objects with a standard frontal viewing direction (e.g., paintings, inscriptions, bas-reliefs), as well as visualizations of fully 3D objects from a particular point of views (e.g., canonical views of buildings or statues). Selecting a precise application domain and a specific presentation mode allowed me to concentrate on the well defined use-case of the exploration of annotated relightable stratigraphic models (in particular, for local and remote museum presentation). My main results and contributions to the state of the art have been a novel technique for interactively controlling visualization lenses while automatically maintaining good focus-and-context parameters, a novel approach for avoiding clutter in an annotated model and for guiding users towards interesting areas, and a method for structuring audio-visual object annotations into a graph and for using that graph to improve guidance and support storytelling and automated tours. We demonstrated the effectiveness and potential of our techniques by performing interactive exploration sessions on various screen sizes and types ranging from desktop devices to large-screen displays for a walk-up-and-use museum installation. KEYWORDS - Computer Graphics, Human-Computer Interaction, Interactive Lenses, Focus-and-Context, Annotated Models, Cultural Heritage Computing

    AENet: attention efficient network for cross-view image geo-localization

    Get PDF
    To address the problem that task-irrelevant objects such as cars, pedestrians and sky, will interfere with the extracted feature descriptors in cross-view image geo-localization, this paper proposes a novel method for cross-view image geo-localization, named as AENet. The method includes two main parts: an attention efficient network fusing channel and spatial attention mechanisms and a triplet loss function based on a multiple hard samples weighting strategy. In the first part, the EfficientNetV2 network is used to extract features from the images and preliminarily filter irrelevant features from the channel dimension, then the Triplet Attention layer is applied to further filter irrelevant features from the spatial dimension. In the second part, a multiple hard samples weighting strategy is proposed to enhance the learning of hard samples. Experimental results show that our proposed method significantly outperforms the state-of-the-art method on two existing benchmark datasets

    Less is More: Restricted Representations for Better Interpretability and Generalizability

    Get PDF
    Deep neural networks are prevalent in supervised learning for large amounts of tasks such as image classification, machine translation and even scientific discovery. Their success is often at the sacrifice of interpretability and generalizability. The increasing complexity of models and involvement of the pre-training process make the inexplicability more imminent. The outstanding performance when labeled data are abundant while prone to overfit when labeled data are limited demonstrates the difficulty of deep neural networks' generalizability to different datasets. This thesis aims to improve interpretability and generalizability by restricting representations. We choose to approach interpretability by focusing on attribution analysis to understand which features contribute to prediction on BERT, and to approach generalizability by focusing on effective methods in a low-data regime. We consider two strategies of restricting representations: (1) adding bottleneck, and (2) introducing compression. Given input x, suppose we want to learn y with the latent representation z (i.e. x→z→y), adding bottleneck means adding function R such that L(R(z)) < L(z) and introducing compression means adding function R so that L(R(y)) < L(y) where L refers to the number of bits. In other words, the restriction is added either in the middle of the pipeline or at the end of it. We first introduce how adding information bottleneck can help attribution analysis and apply it to investigate BERT's behavior on text classification in Chapter 3. We then extend this attribution method to analyze passage reranking in Chapter 4, where we conduct a detailed analysis to understand cross-layer and cross-passage behavior. Adding bottleneck can not only provide insight to understand deep neural networks but can also be used to increase generalizability. In Chapter 5, we demonstrate the equivalence between adding bottleneck and doing neural compression. We then leverage this finding with a framework called Non-Parametric learning by Compression with Latent Variables (NPC-LV), and show how optimizing neural compressors can be used in the non-parametric image classification with few labeled data. To further investigate how compression alone helps non-parametric learning without latent variables (NPC), we carry out experiments with a universal compressor gzip on text classification in Chapter 6. In Chapter 7, we elucidate methods of adopting the perspective of doing compression but without the actual process of compression using T5. Using experimental results in passage reranking, we show that our method is highly effective in a low-data regime when only one thousand query-passage pairs are available. In addition to the weakly supervised scenario, we also extend our method to large language models like GPT under almost no supervision --- in one-shot and zero-shot settings. The experiments show that without extra parameters or in-context learning, GPT can be used for semantic similarity, text classification, and text ranking and outperform strong baselines, which is presented in Chapter 8. The thesis proposes to tackle two big challenges in machine learning --- "interpretability" and "generalizability" through restricting representation. We provide both theoretical derivation and empirical results to show the effectiveness of using information-theoretic approaches. We not only design new algorithms but also provide numerous insights on why and how "compression" is so important in understanding deep neural networks and improving generalizability

    Advances and Applications of DSmT for Information Fusion. Collected Works, Volume 5

    Get PDF
    This ïŹfth volume on Advances and Applications of DSmT for Information Fusion collects theoretical and applied contributions of researchers working in different ïŹelds of applications and in mathematics, and is available in open-access. The collected contributions of this volume have either been published or presented after disseminating the fourth volume in 2015 in international conferences, seminars, workshops and journals, or they are new. The contributions of each part of this volume are chronologically ordered. First Part of this book presents some theoretical advances on DSmT, dealing mainly with modiïŹed Proportional ConïŹ‚ict Redistribution Rules (PCR) of combination with degree of intersection, coarsening techniques, interval calculus for PCR thanks to set inversion via interval analysis (SIVIA), rough set classiïŹers, canonical decomposition of dichotomous belief functions, fast PCR fusion, fast inter-criteria analysis with PCR, and improved PCR5 and PCR6 rules preserving the (quasi-)neutrality of (quasi-)vacuous belief assignment in the fusion of sources of evidence with their Matlab codes. Because more applications of DSmT have emerged in the past years since the apparition of the fourth book of DSmT in 2015, the second part of this volume is about selected applications of DSmT mainly in building change detection, object recognition, quality of data association in tracking, perception in robotics, risk assessment for torrent protection and multi-criteria decision-making, multi-modal image fusion, coarsening techniques, recommender system, levee characterization and assessment, human heading perception, trust assessment, robotics, biometrics, failure detection, GPS systems, inter-criteria analysis, group decision, human activity recognition, storm prediction, data association for autonomous vehicles, identiïŹcation of maritime vessels, fusion of support vector machines (SVM), Silx-Furtif RUST code library for information fusion including PCR rules, and network for ship classiïŹcation. Finally, the third part presents interesting contributions related to belief functions in general published or presented along the years since 2015. These contributions are related with decision-making under uncertainty, belief approximations, probability transformations, new distances between belief functions, non-classical multi-criteria decision-making problems with belief functions, generalization of Bayes theorem, image processing, data association, entropy and cross-entropy measures, fuzzy evidence numbers, negator of belief mass, human activity recognition, information fusion for breast cancer therapy, imbalanced data classiïŹcation, and hybrid techniques mixing deep learning with belief functions as well

    A corpus-based CDA study of ideological mediation through translation shifts: an analysis of the official Chinese-English translation of the governance of China

    Get PDF
    This study aims to explore the extent to which President Xi’s ideological message is mediated in the official Chinese-English translation of The Governance of China via various translation shifts and analyze the possible ideological reasons behind it. Unlike previous studies whose interpretation of translation shifts has been restricted to either the linguistic level or the speech situation, this research project focuses on exploring the translation shifts’ ideological significance within the broader sociopolitical context. It adopts a mixed-methods approach, merging critical discourse analysis (CDA) and corpus-based translation studies. A parallel corpus based on the source and target texts of President Xi’s domestic speeches to officials and Party members, published in The Governance of China, was built to ensure a quantitative and qualitative analysis. It is also noteworthy that this study concentrates on the key Chinese modality markers, transitivity processes, metaphorical expressions, and referring terms that stand out in the present research corpus compared to general Chinese discourse instead of all the existing or the most frequent ones. The overall results suggest that translation shifts in modality, transitivity, metaphor, and reference have slightly increased the ideological significance of strengthening the government and the Party’s self-discipline compared to other national issues, and exhibited a tendency to contextualize considering the foreign audiences’ ideological positions. Such shifts may be related to the translation agency’s commitment and the state’s current foreign policy. Ultimately, this study reveals subtle ideological translation shifts that will be buried if researchers treat source and target texts separately. It calls for translators to raise awareness of textual features’ ideological potential and encourages audiences to pay attention to the institutional and sociopolitical background of translated texts

    Information Forensics and Security: A quarter-century-long journey

    Get PDF
    Information forensics and security (IFS) is an active R&D area whose goal is to ensure that people use devices, data, and intellectual properties for authorized purposes and to facilitate the gathering of solid evidence to hold perpetrators accountable. For over a quarter century, since the 1990s, the IFS research area has grown tremendously to address the societal needs of the digital information era. The IEEE Signal Processing Society (SPS) has emerged as an important hub and leader in this area, and this article celebrates some landmark technical contributions. In particular, we highlight the major technological advances by the research community in some selected focus areas in the field during the past 25 years and present future trends

    Focused Decoding Enables 3D Anatomical Detection by Transformers

    Full text link
    Detection Transformers represent end-to-end object detection approaches based on a Transformer encoder-decoder architecture, exploiting the attention mechanism for global relation modeling. Although Detection Transformers deliver results on par with or even superior to their highly optimized CNN-based counterparts operating on 2D natural images, their success is closely coupled to access to a vast amount of training data. This, however, restricts the feasibility of employing Detection Transformers in the medical domain, as access to annotated data is typically limited. To tackle this issue and facilitate the advent of medical Detection Transformers, we propose a novel Detection Transformer for 3D anatomical structure detection, dubbed Focused Decoder. Focused Decoder leverages information from an anatomical region atlas to simultaneously deploy query anchors and restrict the cross-attention's field of view to regions of interest, which allows for a precise focus on relevant anatomical structures. We evaluate our proposed approach on two publicly available CT datasets and demonstrate that Focused Decoder not only provides strong detection results and thus alleviates the need for a vast amount of annotated data but also exhibits exceptional and highly intuitive explainability of results via attention weights. Our code is available at https://github.com/bwittmann/transoar.Comment: Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://melba-journal.org/2023:00
    • 

    corecore