16 research outputs found
Early Detection of Coffee Leaf Rust Through Convolutional Neural Networks Trained on Low-Resolution Images
Coffee leaf rust, a foliar disease caused by the fungus Hemileia vastatrix,
poses a major threat to coffee production, especially in Central America.
Climate change further aggravates this issue, as it shortens the latency period
between initial infection and the emergence of visible symptoms in diseases
like leaf rust. Shortened latency periods can lead to more severe plant
epidemics and faster spread of diseases. There is, hence, an urgent need for
effective disease management strategies. To address these challenges, we
explore the potential of deep learning models for enhancing early disease
detection. However, deep learning models require extensive processing power and
large amounts of data for model training, resources that are typically scarce.
To overcome these barriers, we propose a preprocessing technique that involves
convolving training images with a high-pass filter to enhance lesion-leaf
contrast, significantly improving model efficacy in resource-limited
environments. This method and our model demonstrated a strong performance,
achieving over 90% across all evaluation metrics--including precision, recall,
F1-score, and the Dice coefficient. Our experiments show that this approach
outperforms other methods, including two different image preprocessing
techniques and using unaltered, full-color images
Emotion-Aligned Contrastive Learning Between Images and Music
Traditional music search engines rely on retrieval methods that match natural
language queries with music metadata. There have been increasing efforts to
expand retrieval methods to consider the audio characteristics of music itself,
using queries of various modalities including text, video, and speech. Most
approaches aim to match general music semantics to the input queries, while
only a few focus on affective qualities. We address the task of retrieving
emotionally-relevant music from image queries by proposing a framework for
learning an affective alignment between images and music audio. Our approach
focuses on learning an emotion-aligned joint embedding space between images and
music. This joint embedding space is learned via emotion-supervised contrastive
learning, using an adapted cross-modal version of the SupCon loss. We directly
evaluate the joint embeddings with cross-modal retrieval tasks (image-to-music
and music-to-image) based on emotion labels. In addition, we investigate the
generalizability of the learned music embeddings with automatic music tagging
as a downstream task. Our experiments show that our approach successfully
aligns images and music, and that the learned embedding space is effective for
cross-modal retrieval applications.Comment: Under revie
Signal Processing Grand Challenge 2023 -- e-Prevention: Sleep Behavior as an Indicator of Relapses in Psychotic Patients
This paper presents the approach and results of USC SAIL's submission to the
Signal Processing Grand Challenge 2023 - e-Prevention (Task 2), on detecting
relapses in psychotic patients. Relapse prediction has proven to be
challenging, primarily due to the heterogeneity of symptoms and responses to
treatment between individuals. We address these challenges by investigating the
use of sleep behavior features to estimate relapse days as outliers in an
unsupervised machine learning setting. We extract informative features from
human activity and heart rate data collected in the wild, and evaluate various
combinations of feature types and time resolutions. We found that short-time
sleep behavior features outperformed their awake counterparts and larger time
intervals. Our submission was ranked 3rd in the Task's official leaderboard,
demonstrating the potential of such features as an objective and non-invasive
predictor of psychotic relapses.Comment: 2 pages, 1 table, ICASSP 2023, Grand Challenges Trac
Evaluating Atypical Gaze Patterns through Vision Models: The Case of Cortical Visual Impairment
A wide range of neurological and cognitive disorders exhibit distinct
behavioral markers aside from their clinical manifestations. Cortical Visual
Impairment (CVI) is a prime example of such conditions, resulting from damage
to visual pathways in the brain, and adversely impacting low- and high-level
visual function. The characteristics impacted by CVI are primarily described
qualitatively, challenging the establishment of an objective, evidence-based
measure of CVI severity. To study those characteristics, we propose to create
visual saliency maps by adequately prompting deep vision models with attributes
of clinical interest. After extracting saliency maps for a curated set of
stimuli, we evaluate fixation traces on those from children with CVI through
eye tracking technology. Our experiments reveal significant gaze markers that
verify clinical knowledge and yield nuanced discriminability when compared to
those of age-matched control subjects. Using deep learning to unveil atypical
visual saliency is an important step toward establishing an eye-tracking
signature for severe neurodevelopmental disorders, like CVI.Comment: 5 pages, 4 figures, submitted to IEEE EMBC 202
Method for assessing visual saliency in children with cerebral/cortical visual impairment using generative artificial intelligence
Cerebral/cortical visual impairment (CVI) is a leading cause of pediatric visual impairment in the United States and other developed countries, and is increasingly diagnosed in developing nations due to improved care and survival of children who are born premature or have other risk factors for CVI. Despite this, there is currently no objective, standardized method to quantify the diverse visual impairments seen in children with CVI who are young and developmentally delayed. We propose a method that combines eye tracking and an image-based generative artificial intelligence (AI) model (SegCLIP) to assess higher- and lower-level visual characteristics in children with CVI. We will recruit 40 CVI participants (aged 12 months to 12 years) and 40 age-matched controls, who will watch a series of images on a monitor while eye gaze position is recorded using eye tracking. SegCLIP will be prompted to generate saliency maps for each of the images in the experimental protocol. The saliency maps (12 total) will highlight areas of interest that pertain to specific visual features, allowing for analysis of a range of individual visual characteristics. Eye tracking fixation maps will then be compared to the saliency maps to calculate fixation saliency values, which will be assigned based on the intensity of the pixel corresponding to the location of the fixation in the saliency map. Fixation saliency values will be compared between CVI and control participants. Fixation saliency values will also be correlated to corresponding scores on a functional vision assessment, the CVI Range-CR. We expect that fixation saliency values on visual characteristics that require higher-level processing will be significantly lower in CVI participants compared to controls, whereas fixation saliency values on lower-level visual characteristics will be similar or higher in CVI participants. Furthermore, we anticipate that fixation saliency values will be significantly correlated to scores on corresponding items on the CVI Range-CR. Together, these findings would suggest that AI-enabled saliency analysis using eye tracking can objectively quantify abnormalities of lower- and higher-order visual processing in children with CVI. This novel technique has the potential to guide individualized interventions and serve as an outcome measure in future clinical trials
Knowledge-guided EEG Representation Learning
Self-supervised learning has produced impressive results in multimedia
domains of audio, vision and speech. This paradigm is equally, if not more,
relevant for the domain of biosignals, owing to the scarcity of labelled data
in such scenarios. The ability to leverage large-scale unlabelled data to learn
robust representations could help improve the performance of numerous inference
tasks on biosignals. Given the inherent domain differences between multimedia
modalities and biosignals, the established objectives for self-supervised
learning may not translate well to this domain. Hence, there is an unmet need
to adapt these methods to biosignal analysis. In this work we propose a
self-supervised model for EEG, which provides robust performance and remarkable
parameter efficiency by using state space-based deep learning architecture. We
also propose a novel knowledge-guided pre-training objective that accounts for
the idiosyncrasies of the EEG signal. The results indicate improved embedding
representation learning and downstream performance compared to prior works on
exemplary tasks. Also, the proposed objective significantly reduces the amount
of pre-training data required to obtain performance equivalent to prior works.Comment: 6 Pages, 5 figures, Submitted to EMBC 202
