13,065 research outputs found
Recommended from our members
Ensuring Access to Safe and Nutritious Food for All Through the Transformation of Food Systems
ADS_UNet: A Nested UNet for Histopathology Image Segmentation
The UNet model consists of fully convolutional network (FCN) layers arranged
as contracting encoder and upsampling decoder maps. Nested arrangements of
these encoder and decoder maps give rise to extensions of the UNet model, such
as UNete and UNet++. Other refinements include constraining the outputs of the
convolutional layers to discriminate between segment labels when trained end to
end, a property called deep supervision. This reduces feature diversity in
these nested UNet models despite their large parameter space. Furthermore, for
texture segmentation, pixel correlations at multiple scales contribute to the
classification task; hence, explicit deep supervision of shallower layers is
likely to enhance performance. In this paper, we propose ADS UNet, a stage-wise
additive training algorithm that incorporates resource-efficient deep
supervision in shallower layers and takes performance-weighted combinations of
the sub-UNets to create the segmentation model. We provide empirical evidence
on three histopathology datasets to support the claim that the proposed ADS
UNet reduces correlations between constituent features and improves performance
while being more resource efficient. We demonstrate that ADS_UNet outperforms
state-of-the-art Transformer-based models by 1.08 and 0.6 points on CRAG and
BCSS datasets, and yet requires only 37% of GPU consumption and 34% of training
time as that required by Transformers.Comment: To be published in Expert Systems With Application
BotMoE: Twitter Bot Detection with Community-Aware Mixtures of Modal-Specific Experts
Twitter bot detection has become a crucial task in efforts to combat online
misinformation, mitigate election interference, and curb malicious propaganda.
However, advanced Twitter bots often attempt to mimic the characteristics of
genuine users through feature manipulation and disguise themselves to fit in
diverse user communities, posing challenges for existing Twitter bot detection
models. To this end, we propose BotMoE, a Twitter bot detection framework that
jointly utilizes multiple user information modalities (metadata, textual
content, network structure) to improve the detection of deceptive bots.
Furthermore, BotMoE incorporates a community-aware Mixture-of-Experts (MoE)
layer to improve domain generalization and adapt to different Twitter
communities. Specifically, BotMoE constructs modal-specific encoders for
metadata features, textual content, and graphical structure, which jointly
model Twitter users from three modal-specific perspectives. We then employ a
community-aware MoE layer to automatically assign users to different
communities and leverage the corresponding expert networks. Finally, user
representations from metadata, text, and graph perspectives are fused with an
expert fusion layer, combining all three modalities while measuring the
consistency of user information. Extensive experiments demonstrate that BotMoE
significantly advances the state-of-the-art on three Twitter bot detection
benchmarks. Studies also confirm that BotMoE captures advanced and evasive
bots, alleviates the reliance on training data, and better generalizes to new
and previously unseen user communities.Comment: Accepted at SIGIR 202
TransFusionOdom: Interpretable Transformer-based LiDAR-Inertial Fusion Odometry Estimation
Multi-modal fusion of sensors is a commonly used approach to enhance the
performance of odometry estimation, which is also a fundamental module for
mobile robots. However, the question of \textit{how to perform fusion among
different modalities in a supervised sensor fusion odometry estimation task?}
is still one of challenging issues remains. Some simple operations, such as
element-wise summation and concatenation, are not capable of assigning adaptive
attentional weights to incorporate different modalities efficiently, which make
it difficult to achieve competitive odometry results. Recently, the Transformer
architecture has shown potential for multi-modal fusion tasks, particularly in
the domains of vision with language. In this work, we propose an end-to-end
supervised Transformer-based LiDAR-Inertial fusion framework (namely
TransFusionOdom) for odometry estimation. The multi-attention fusion module
demonstrates different fusion approaches for homogeneous and heterogeneous
modalities to address the overfitting problem that can arise from blindly
increasing the complexity of the model. Additionally, to interpret the learning
process of the Transformer-based multi-modal interactions, a general
visualization approach is introduced to illustrate the interactions between
modalities. Moreover, exhaustive ablation studies evaluate different
multi-modal fusion strategies to verify the performance of the proposed fusion
strategy. A synthetic multi-modal dataset is made public to validate the
generalization ability of the proposed fusion strategy, which also works for
other combinations of different modalities. The quantitative and qualitative
odometry evaluations on the KITTI dataset verify the proposed TransFusionOdom
could achieve superior performance compared with other related works.Comment: Submitted to IEEE Sensors Journal with some modifications. This work
has been submitted to the IEEE for possible publication. Copyright may be
transferred without notice, after which this version may no longer be
accessibl
Comedians without a Cause: The Politics and Aesthetics of Humour in Dutch Cabaret (1966-2020)
Comedians play an important role in society and public debate. While comedians have been considered important cultural critics for quite some time, comedy has acquired a new social and political significance in recent years, with humour taking centre stage in political and social debates around issues of identity, social justice, and freedom of speech. To understand the shifting meanings and political implications of humour within a Dutch context, this PhD thesis examines the political and aesthetic workings of humour in the highly popular Dutch cabaret genre, focusing on cabaret performances from the 1960s to the present. The central questions of the thesis are: how do comedians use humour to deliver social critique, and how does their humour resonate with political ideologies? These questions are answered by adopting a cultural studies approach to humour, which is used to analyse Dutch cabaret performances, and by studying related materials such as reviews and media interviews with comedians. This thesis shows that, from the 1960s onwards, Dutch comedians have been considered ‘progressive rebels’ – politically engaged, subversive, and carrying a left-wing political agenda – but that this image is in need of correction. While we tend to look for progressive political messages in the work of comedians who present themselves as being anti-establishment rebels – such as Youp van ‘t Hek, Hans Teeuwen, and Theo Maassen – this thesis demonstrates that their transgressive and provocative humour tends to protect social hierarchies and relationships of power. Moreover, it shows that, paradoxically, both the deliberately moderate and nuanced humour of Wim Kan and Claudia de Breij, and the seemingly past-oriented nostalgia of Alex Klaasen, are more radical and progressive than the transgressive humour of van ‘t Hek, Teeuwen and Maassen. Finally, comedians who present absurdist or deconstructionist forms of humour, such as the early student cabarets, Freek de Jonge, and Micha Wertheim, tend to disassociate themselves from an explicit political engagement. By challenging the dominant image of the Dutch comedian as a ‘progressive rebel,’ this thesis contributes to a better understanding of humour in the present cultural moment, in which humour is often either not taken seriously, or one-sidedly celebrated as being merely pleasurable, innocent, or progressively liberating. In so doing, this thesis concludes, the ‘dark’ and more conservative sides of humour tend to get obscured
Semantic Segmentation Enhanced Transformer Model for Human Attention Prediction
Saliency Prediction aims to predict the attention distribution of human eyes
given an RGB image. Most of the recent state-of-the-art methods are based on
deep image feature representations from traditional CNNs. However, the
traditional convolution could not capture the global features of the image well
due to its small kernel size. Besides, the high-level factors which closely
correlate to human visual perception, e.g., objects, color, light, etc., are
not considered. Inspired by these, we propose a Transformer-based method with
semantic segmentation as another learning objective. More global cues of the
image could be captured by Transformer. In addition, simultaneously learning
the object segmentation simulates the human visual perception, which we would
verify in our investigation of human gaze control in cognitive science. We
build an extra decoder for the subtask and the multiple tasks share the same
Transformer encoder, forcing it to learn from multiple feature spaces. We find
in practice simply adding the subtask might confuse the main task learning,
hence Multi-task Attention Module is proposed to deal with the feature
interaction between the multiple learning targets. Our method achieves
competitive performance compared to other state-of-the-art methods
LMDA-Net:A lightweight multi-dimensional attention network for general EEG-based brain-computer interface paradigms and interpretability
EEG-based recognition of activities and states involves the use of prior
neuroscience knowledge to generate quantitative EEG features, which may limit
BCI performance. Although neural network-based methods can effectively extract
features, they often encounter issues such as poor generalization across
datasets, high predicting volatility, and low model interpretability. Hence, we
propose a novel lightweight multi-dimensional attention network, called
LMDA-Net. By incorporating two novel attention modules designed specifically
for EEG signals, the channel attention module and the depth attention module,
LMDA-Net can effectively integrate features from multiple dimensions, resulting
in improved classification performance across various BCI tasks. LMDA-Net was
evaluated on four high-impact public datasets, including motor imagery (MI) and
P300-Speller paradigms, and was compared with other representative models. The
experimental results demonstrate that LMDA-Net outperforms other representative
methods in terms of classification accuracy and predicting volatility,
achieving the highest accuracy in all datasets within 300 training epochs.
Ablation experiments further confirm the effectiveness of the channel attention
module and the depth attention module. To facilitate an in-depth understanding
of the features extracted by LMDA-Net, we propose class-specific neural network
feature interpretability algorithms that are suitable for event-related
potentials (ERPs) and event-related desynchronization/synchronization
(ERD/ERS). By mapping the output of the specific layer of LMDA-Net to the time
or spatial domain through class activation maps, the resulting feature
visualizations can provide interpretable analysis and establish connections
with EEG time-spatial analysis in neuroscience. In summary, LMDA-Net shows
great potential as a general online decoding model for various EEG tasks.Comment: 20 pages, 7 Figure
mSPD-NN: A Geometrically Aware Neural Framework for Biomarker Discovery from Functional Connectomics Manifolds
Connectomics has emerged as a powerful tool in neuroimaging and has spurred
recent advancements in statistical and machine learning methods for
connectivity data. Despite connectomes inhabiting a matrix manifold, most
analytical frameworks ignore the underlying data geometry. This is largely
because simple operations, such as mean estimation, do not have easily
computable closed-form solutions. We propose a geometrically aware neural
framework for connectomes, i.e., the mSPD-NN, designed to estimate the geodesic
mean of a collections of symmetric positive definite (SPD) matrices. The
mSPD-NN is comprised of bilinear fully connected layers with tied weights and
utilizes a novel loss function to optimize the matrix-normal equation arising
from Fr\'echet mean estimation. Via experiments on synthetic data, we
demonstrate the efficacy of our mSPD-NN against common alternatives for SPD
mean estimation, providing competitive performance in terms of scalability and
robustness to noise. We illustrate the real-world flexibility of the mSPD-NN in
multiple experiments on rs-fMRI data and demonstrate that it uncovers stable
biomarkers associated with subtle network differences among patients with
ADHD-ASD comorbidities and healthy controls.Comment: Accepted into IPMI 202
MaPLe: Multi-modal Prompt Learning
Pre-trained vision-language (V-L) models such as CLIP have shown excellent
generalization ability to downstream tasks. However, they are sensitive to the
choice of input text prompts and require careful selection of prompt templates
to perform well. Inspired by the Natural Language Processing (NLP) literature,
recent CLIP adaptation approaches learn prompts as the textual inputs to
fine-tune CLIP for downstream tasks. We note that using prompting to adapt
representations in a single branch of CLIP (language or vision) is sub-optimal
since it does not allow the flexibility to dynamically adjust both
representation spaces on a downstream task. In this work, we propose
Multi-modal Prompt Learning (MaPLe) for both vision and language branches to
improve alignment between the vision and language representations. Our design
promotes strong coupling between the vision-language prompts to ensure mutual
synergy and discourages learning independent uni-modal solutions. Further, we
learn separate prompts across different early stages to progressively model the
stage-wise feature relationships to allow rich context learning. We evaluate
the effectiveness of our approach on three representative tasks of
generalization to novel classes, new target datasets and unseen domain shifts.
Compared with the state-of-the-art method Co-CoOp, MaPLe exhibits favorable
performance and achieves an absolute gain of 3.45% on novel classes and 2.72%
on overall harmonic-mean, averaged over 11 diverse image recognition datasets.
Our code and pre-trained models are available at
https://github.com/muzairkhattak/multimodal-prompt-learning.Comment: Accepted at CVPR202
Examples of works to practice staccato technique in clarinet instrument
Klarnetin staccato tekniğini güçlendirme aşamaları eser çalışmalarıyla uygulanmıştır. Staccato
geçişlerini hızlandıracak ritim ve nüans çalışmalarına yer verilmiştir. Çalışmanın en önemli amacı
sadece staccato çalışması değil parmak-dilin eş zamanlı uyumunun hassasiyeti üzerinde de
durulmasıdır. Staccato çalışmalarını daha verimli hale getirmek için eser çalışmasının içinde etüt
çalışmasına da yer verilmiştir. Çalışmaların üzerinde titizlikle durulması staccato çalışmasının ilham
verici etkisi ile müzikal kimliğe yeni bir boyut kazandırmıştır. Sekiz özgün eser çalışmasının her
aşaması anlatılmıştır. Her aşamanın bir sonraki performans ve tekniği güçlendirmesi esas alınmıştır.
Bu çalışmada staccato tekniğinin hangi alanlarda kullanıldığı, nasıl sonuçlar elde edildiği bilgisine
yer verilmiştir. Notaların parmak ve dil uyumu ile nasıl şekilleneceği ve nasıl bir çalışma disiplini
içinde gerçekleşeceği planlanmıştır. Kamış-nota-diyafram-parmak-dil-nüans ve disiplin
kavramlarının staccato tekniğinde ayrılmaz bir bütün olduğu saptanmıştır. Araştırmada literatür
taraması yapılarak staccato ile ilgili çalışmalar taranmıştır. Tarama sonucunda klarnet tekniğin de
kullanılan staccato eser çalışmasının az olduğu tespit edilmiştir. Metot taramasında da etüt
çalışmasının daha çok olduğu saptanmıştır. Böylelikle klarnetin staccato tekniğini hızlandırma ve
güçlendirme çalışmaları sunulmuştur. Staccato etüt çalışmaları yapılırken, araya eser çalışmasının
girmesi beyni rahatlattığı ve istekliliği daha arttırdığı gözlemlenmiştir. Staccato çalışmasını yaparken
doğru bir kamış seçimi üzerinde de durulmuştur. Staccato tekniğini doğru çalışmak için doğru bir
kamışın dil hızını arttırdığı saptanmıştır. Doğru bir kamış seçimi kamıştan rahat ses çıkmasına
bağlıdır. Kamış, dil atma gücünü vermiyorsa daha doğru bir kamış seçiminin yapılması gerekliliği
vurgulanmıştır. Staccato çalışmalarında baştan sona bir eseri yorumlamak zor olabilir. Bu açıdan
çalışma, verilen müzikal nüanslara uymanın, dil atış performansını rahatlattığını ortaya koymuştur.
Gelecek nesillere edinilen bilgi ve birikimlerin aktarılması ve geliştirici olması teşvik edilmiştir.
Çıkacak eserlerin nasıl çözüleceği, staccato tekniğinin nasıl üstesinden gelinebileceği anlatılmıştır.
Staccato tekniğinin daha kısa sürede çözüme kavuşturulması amaç edinilmiştir. Parmakların
yerlerini öğrettiğimiz kadar belleğimize de çalışmaların kaydedilmesi önemlidir. Gösterilen azmin ve
sabrın sonucu olarak ortaya çıkan yapıt başarıyı daha da yukarı seviyelere çıkaracaktır
- …