16 research outputs found

    Early Detection of Coffee Leaf Rust Through Convolutional Neural Networks Trained on Low-Resolution Images

    Full text link
    Coffee leaf rust, a foliar disease caused by the fungus Hemileia vastatrix, poses a major threat to coffee production, especially in Central America. Climate change further aggravates this issue, as it shortens the latency period between initial infection and the emergence of visible symptoms in diseases like leaf rust. Shortened latency periods can lead to more severe plant epidemics and faster spread of diseases. There is, hence, an urgent need for effective disease management strategies. To address these challenges, we explore the potential of deep learning models for enhancing early disease detection. However, deep learning models require extensive processing power and large amounts of data for model training, resources that are typically scarce. To overcome these barriers, we propose a preprocessing technique that involves convolving training images with a high-pass filter to enhance lesion-leaf contrast, significantly improving model efficacy in resource-limited environments. This method and our model demonstrated a strong performance, achieving over 90% across all evaluation metrics--including precision, recall, F1-score, and the Dice coefficient. Our experiments show that this approach outperforms other methods, including two different image preprocessing techniques and using unaltered, full-color images

    Emotion-Aligned Contrastive Learning Between Images and Music

    Full text link
    Traditional music search engines rely on retrieval methods that match natural language queries with music metadata. There have been increasing efforts to expand retrieval methods to consider the audio characteristics of music itself, using queries of various modalities including text, video, and speech. Most approaches aim to match general music semantics to the input queries, while only a few focus on affective qualities. We address the task of retrieving emotionally-relevant music from image queries by proposing a framework for learning an affective alignment between images and music audio. Our approach focuses on learning an emotion-aligned joint embedding space between images and music. This joint embedding space is learned via emotion-supervised contrastive learning, using an adapted cross-modal version of the SupCon loss. We directly evaluate the joint embeddings with cross-modal retrieval tasks (image-to-music and music-to-image) based on emotion labels. In addition, we investigate the generalizability of the learned music embeddings with automatic music tagging as a downstream task. Our experiments show that our approach successfully aligns images and music, and that the learned embedding space is effective for cross-modal retrieval applications.Comment: Under revie

    Signal Processing Grand Challenge 2023 -- e-Prevention: Sleep Behavior as an Indicator of Relapses in Psychotic Patients

    Full text link
    This paper presents the approach and results of USC SAIL's submission to the Signal Processing Grand Challenge 2023 - e-Prevention (Task 2), on detecting relapses in psychotic patients. Relapse prediction has proven to be challenging, primarily due to the heterogeneity of symptoms and responses to treatment between individuals. We address these challenges by investigating the use of sleep behavior features to estimate relapse days as outliers in an unsupervised machine learning setting. We extract informative features from human activity and heart rate data collected in the wild, and evaluate various combinations of feature types and time resolutions. We found that short-time sleep behavior features outperformed their awake counterparts and larger time intervals. Our submission was ranked 3rd in the Task's official leaderboard, demonstrating the potential of such features as an objective and non-invasive predictor of psychotic relapses.Comment: 2 pages, 1 table, ICASSP 2023, Grand Challenges Trac

    Evaluating Atypical Gaze Patterns through Vision Models: The Case of Cortical Visual Impairment

    Full text link
    A wide range of neurological and cognitive disorders exhibit distinct behavioral markers aside from their clinical manifestations. Cortical Visual Impairment (CVI) is a prime example of such conditions, resulting from damage to visual pathways in the brain, and adversely impacting low- and high-level visual function. The characteristics impacted by CVI are primarily described qualitatively, challenging the establishment of an objective, evidence-based measure of CVI severity. To study those characteristics, we propose to create visual saliency maps by adequately prompting deep vision models with attributes of clinical interest. After extracting saliency maps for a curated set of stimuli, we evaluate fixation traces on those from children with CVI through eye tracking technology. Our experiments reveal significant gaze markers that verify clinical knowledge and yield nuanced discriminability when compared to those of age-matched control subjects. Using deep learning to unveil atypical visual saliency is an important step toward establishing an eye-tracking signature for severe neurodevelopmental disorders, like CVI.Comment: 5 pages, 4 figures, submitted to IEEE EMBC 202

    Method for assessing visual saliency in children with cerebral/cortical visual impairment using generative artificial intelligence

    Get PDF
    Cerebral/cortical visual impairment (CVI) is a leading cause of pediatric visual impairment in the United States and other developed countries, and is increasingly diagnosed in developing nations due to improved care and survival of children who are born premature or have other risk factors for CVI. Despite this, there is currently no objective, standardized method to quantify the diverse visual impairments seen in children with CVI who are young and developmentally delayed. We propose a method that combines eye tracking and an image-based generative artificial intelligence (AI) model (SegCLIP) to assess higher- and lower-level visual characteristics in children with CVI. We will recruit 40 CVI participants (aged 12 months to 12 years) and 40 age-matched controls, who will watch a series of images on a monitor while eye gaze position is recorded using eye tracking. SegCLIP will be prompted to generate saliency maps for each of the images in the experimental protocol. The saliency maps (12 total) will highlight areas of interest that pertain to specific visual features, allowing for analysis of a range of individual visual characteristics. Eye tracking fixation maps will then be compared to the saliency maps to calculate fixation saliency values, which will be assigned based on the intensity of the pixel corresponding to the location of the fixation in the saliency map. Fixation saliency values will be compared between CVI and control participants. Fixation saliency values will also be correlated to corresponding scores on a functional vision assessment, the CVI Range-CR. We expect that fixation saliency values on visual characteristics that require higher-level processing will be significantly lower in CVI participants compared to controls, whereas fixation saliency values on lower-level visual characteristics will be similar or higher in CVI participants. Furthermore, we anticipate that fixation saliency values will be significantly correlated to scores on corresponding items on the CVI Range-CR. Together, these findings would suggest that AI-enabled saliency analysis using eye tracking can objectively quantify abnormalities of lower- and higher-order visual processing in children with CVI. This novel technique has the potential to guide individualized interventions and serve as an outcome measure in future clinical trials

    Knowledge-guided EEG Representation Learning

    Full text link
    Self-supervised learning has produced impressive results in multimedia domains of audio, vision and speech. This paradigm is equally, if not more, relevant for the domain of biosignals, owing to the scarcity of labelled data in such scenarios. The ability to leverage large-scale unlabelled data to learn robust representations could help improve the performance of numerous inference tasks on biosignals. Given the inherent domain differences between multimedia modalities and biosignals, the established objectives for self-supervised learning may not translate well to this domain. Hence, there is an unmet need to adapt these methods to biosignal analysis. In this work we propose a self-supervised model for EEG, which provides robust performance and remarkable parameter efficiency by using state space-based deep learning architecture. We also propose a novel knowledge-guided pre-training objective that accounts for the idiosyncrasies of the EEG signal. The results indicate improved embedding representation learning and downstream performance compared to prior works on exemplary tasks. Also, the proposed objective significantly reduces the amount of pre-training data required to obtain performance equivalent to prior works.Comment: 6 Pages, 5 figures, Submitted to EMBC 202
    corecore