103 research outputs found
Sentiment analysis with adaptive multi-head attention in Transformer
We propose a novel framework based on the attention mechanism to identify the
sentiment of a movie review document. Previous efforts on deep neural networks
with attention mechanisms focus on encoder and decoder with fixed numbers of
multi-head attention. Therefore, we need a mechanism to stop the attention
process automatically if no more useful information can be read from the
memory.In this paper, we propose an adaptive multi-head attention architecture
(AdaptAttn) which varies the number of attention heads based on length of
sentences. AdaptAttn has a data preprocessing step where each document is
classified into any one of the three bins small, medium or large based on
length of the sentence. The document classified as small goes through two heads
in each layer, the medium group passes four heads and the large group is
processed by eight heads. We examine the merit of our model on the Stanford
large movie review dataset. The experimental results show that the F1 score
from our model is on par with the baseline model.Comment: Accepted by the 4th International Conference on Signal Processing and
Machine Learnin
Long time behaviors for the inhomogeneous NLS with a potential in
In this article, we aim to study the scattering of the solution to the
focusing inhomogeneous nonlinear Schr\"odinger equation with a potential of
form \begin{align*}
i\partial_t u+\Delta u- Vu=-|x|^{-b}|u|^{p-1}u \end{align*} in the energy
space . We prove a scattering criterion, and then we use it together
with Morawetz estimate to show the scattering theory, which generalizes the
results of Dinh \cite{DD} to the non-radial symmetric case.Comment: In this version, we correct some mistakes and change the titl
Evolution and Efficiency in Neural Architecture Search: Bridging the Gap Between Expert Design and Automated Optimization
The paper provides a comprehensive overview of Neural Architecture Search
(NAS), emphasizing its evolution from manual design to automated,
computationally-driven approaches. It covers the inception and growth of NAS,
highlighting its application across various domains, including medical imaging
and natural language processing. The document details the shift from
expert-driven design to algorithm-driven processes, exploring initial
methodologies like reinforcement learning and evolutionary algorithms. It also
discusses the challenges of computational demands and the emergence of
efficient NAS methodologies, such as Differentiable Architecture Search and
hardware-aware NAS. The paper further elaborates on NAS's application in
computer vision, NLP, and beyond, demonstrating its versatility and potential
for optimizing neural network architectures across different tasks. Future
directions and challenges, including computational efficiency and the
integration with emerging AI domains, are addressed, showcasing NAS's dynamic
nature and its continued evolution towards more sophisticated and efficient
architecture search methods.Comment: 7 Pages, Double Colum
Joint Detection Algorithm for Multiple Cognitive Users in Spectrum Sensing
Spectrum sensing technology is a crucial aspect of modern communication
technology, serving as one of the essential techniques for efficiently
utilizing scarce information resources in tight frequency bands. This paper
first introduces three common logical circuit decision criteria in hard
decisions and analyzes their decision rigor. Building upon hard decisions, the
paper further introduces a method for multi-user spectrum sensing based on soft
decisions. Then the paper simulates the false alarm probability and detection
probability curves corresponding to the three criteria. The simulated results
of multi-user collaborative sensing indicate that the simulation process
significantly reduces false alarm probability and enhances detection
probability. This approach effectively detects spectrum resources unoccupied
during idle periods, leveraging the concept of time-division multiplexing and
rationalizing the redistribution of information resources. The entire
computation process relies on the calculation principles of power spectral
density in communication theory, involving threshold decision detection for
noise power and the sum of noise and signal power. It provides a secondary
decision detection, reflecting the perceptual decision performance of logical
detection methods with relative accuracy.Comment: https://aei.ewapublishing.org/article.html?pk=e24c40d220434209ae2fe2e984bcf2c
Optimizing the Passenger Flow for Airport Security Check
Due to the necessary security for the airport and flight, passengers are
required to have strict security check before getting aboard. However, there
are frequent complaints of wasting huge amount of time while waiting for the
security check. This paper presents a potential solution aimed at optimizing
gate setup procedures specifically tailored for Chicago OHare International
Airport. By referring to queueing theory and performing Monte Carlo
simulations, we propose an approach to significantly diminish the average
waiting time to a more manageable level. Additionally, our study meticulously
examines and identifies the influential factors contributing to this
optimization, providing a comprehensive understanding of their impact
Hybrid FedGraph: An efficient hybrid federated learning algorithm using graph convolutional neural network
Federated learning is an emerging paradigm for decentralized training of
machine learning models on distributed clients, without revealing the data to
the central server. Most existing works have focused on horizontal or vertical
data distributions, where each client possesses different samples with shared
features, or each client fully shares only sample indices, respectively.
However, the hybrid scheme is much less studied, even though it is much more
common in the real world. Therefore, in this paper, we propose a generalized
algorithm, FedGraph, that introduces a graph convolutional neural network to
capture feature-sharing information while learning features from a subset of
clients. We also develop a simple but effective clustering algorithm that
aggregates features produced by the deep neural networks of each client while
preserving data privacy
New construction of mutually orthogonal complementary sequence sets
To address the limitations in design methods and the scarcity of construction parameters for mutual orthogonal complementary sequence set (MOCSS), a construction method for MOCSS based on paraunitary (PU) matrices was proposed. The new concept of coefficient paraunitary (CPU) matrices was defined, and by employing matrix multiplication, Kronecker product, and matrix iteration techniques, three types of PU matrices with varying sizes were constructed. Utilizing the equivalence between PU matrices and MOCSS, a series of multi-phase MOCSS with flexible parameter selection were developed, filling the parameter gap in the existing literature. Considering the suppression of peak-to-average power ratio (PAPR) in multi-carrier code division multiple access (MC-CDMA) systems, a class of CPU matrices with low column vector PAPR characteristics was designed using Boolean functions. Experimental results demonstrate that the constructed MOCSS using such CPU matrices effectively controls the column sequence PAPR within the range of below two, while maintaining the flexibility of code capacity and length, providing a variety of signal selection options for the systems
SAMPLE-BASED DYNAMIC HIERARCHICAL TRANSFORMER WITH LAYER AND HEAD FLEXIBILITY VIA CONTEXTUAL BANDIT
Transformer requires a fixed number of layers and heads which makes them inflexible to the complexity of individual samples and expensive in training and inference. To address this, we propose a sample-based Dynamic Hierarchical Transformer (DHT) model whose layers and heads can be dynamically configured with single data samples via solving contextual bandit problems. To determine the number of layers and heads, we use the Uniform Confidence Bound algorithm while we deploy combinatorial Thompson Sampling in order to select specific head combinations given their number. Different from previous work that focuses on compressing trained networks for inference only, DHT is not only advantageous for adaptively optimizing the underlying network architecture during training but also has a flexible network for efficient inference. To the best of our knowledge, this is the first comprehensive data-driven dynamic transformer without any additional auxiliary neural networks that implement the dynamic system. According to the experiment results, we achieve up to 74% computational savings for both training and inference with a minimal loss of accuracy
FEDEMB: A VERTICAL AND HYBRID FEDERATED LEARNING ALGORITHM USING NETWORK AND FEATURE EMBEDDING AGGREGATION
Federated learning (FL) is an emerging paradigm for decentralized training of machine learning models on distributed clients, without revealing the data to the central server. The learning scheme may be horizontal, vertical or hybrid (both vertical and horizontal). Most existing research work with deep neural network (DNN) modeling is focused on horizontal data distributions, while vertical and hybrid schemes are much less studied. In this paper, we propose a generalized algorithm FedEmb, for modeling vertical and hybrid DNN-based learning. The idea of our algorithm is characterized by higher inference accuracy, stronger privacy-preserving properties, and lower client-server communication bandwidth demands as compared with existing work. The experimental results show that FedEmb is an effective method to tackle both split feature & subject space decentralized problems. To be specific, there are 0.3% to 4.2% improvement on inference accuracy and 88.9 % time complexity reduction over baseline method
- …
