237 research outputs found
Penalized Clustering of Large Scale Functional Data with Multiple Covariates
In this article, we propose a penalized clustering method for large scale
data with multiple covariates through a functional data approach. In the
proposed method, responses and covariates are linked together through
nonparametric multivariate functions (fixed effects), which have great
flexibility in modeling a variety of function features, such as jump points,
branching, and periodicity. Functional ANOVA is employed to further decompose
multivariate functions in a reproducing kernel Hilbert space and provide
associated notions of main effect and interaction. Parsimonious random effects
are used to capture various correlation structures. The mixed-effect models are
nested under a general mixture model, in which the heterogeneity of functional
data is characterized. We propose a penalized Henderson's likelihood approach
for model-fitting and design a rejection-controlled EM algorithm for the
estimation. Our method selects smoothing parameters through generalized
cross-validation. Furthermore, the Bayesian confidence intervals are used to
measure the clustering uncertainty. Simulation studies and real-data examples
are presented to investigate the empirical performance of the proposed method.
Open-source code is available in the R package MFDA
Improving Uncertainty Quantification of Variance Networks by Tree-Structured Learning
To improve uncertainty quantification of variance networks, we propose a
novel tree-structured local neural network model that partitions the feature
space into multiple regions based on uncertainty heterogeneity. A tree is built
upon giving the training data, whose leaf nodes represent different regions
where region-specific neural networks are trained to predict both the mean and
the variance for quantifying uncertainty. The proposed Uncertainty-Splitting
Neural Regression Tree (USNRT) employs novel splitting criteria. At each node,
a neural network is trained on the full data first, and a statistical test for
the residuals is conducted to find the best split, corresponding to the two
sub-regions with the most significant uncertainty heterogeneity. USNRT is
computationally friendly because very few leaf nodes are sufficient and pruning
is unnecessary. On extensive UCI datasets, in terms of both calibration and
sharpness, USNRT shows superior performance compared to some recent popular
methods for variance prediction, including vanilla variance network, deep
ensemble, dropout-based methods, tree-based models, etc. Through comprehensive
visualization and analysis, we uncover how USNRT works and show its merits
Ensemble Multi-Quantiles: Adaptively Flexible Distribution Prediction for Uncertainty Quantification
We propose a novel, succinct, and effective approach to quantify uncertainty
in machine learning. It incorporates adaptively flexible distribution
prediction for in regression tasks. For
predicting this conditional distribution, its quantiles of probability levels
spreading the interval are boosted by additive models which are
designed by us with intuitions and interpretability. We seek an adaptive
balance between the structural integrity and the flexibility for
, while Gaussian assumption results in a
lack of flexibility for real data and highly flexible approaches (e.g.,
estimating the quantiles separately without a distribution structure)
inevitably have drawbacks and may not lead to good generalization. This
ensemble multi-quantiles approach called EMQ proposed by us is totally
data-driven, and can gradually depart from Gaussian and discover the optimal
conditional distribution in the boosting. On extensive regression tasks from
UCI datasets, we show that EMQ achieves state-of-the-art performance comparing
to many recent uncertainty quantification methods. Visualization results
further illustrate the necessity and the merits of such an ensemble model
Language Semantic Graph Guided Data-Efficient Learning
Developing generalizable models that can effectively learn from limited data
and with minimal reliance on human supervision is a significant objective
within the machine learning community, particularly in the era of deep neural
networks. Therefore, to achieve data-efficient learning, researchers typically
explore approaches that can leverage more related or unlabeled data without
necessitating additional manual labeling efforts, such as Semi-Supervised
Learning (SSL), Transfer Learning (TL), and Data Augmentation (DA). SSL
leverages unlabeled data in the training process, while TL enables the transfer
of expertise from related data distributions. DA broadens the dataset by
synthesizing new data from existing examples. However, the significance of
additional knowledge contained within labels has been largely overlooked in
research. In this paper, we propose a novel perspective on data efficiency that
involves exploiting the semantic information contained in the labels of the
available data. Specifically, we introduce a Language Semantic Graph (LSG)
which is constructed from labels manifest as natural language descriptions.
Upon this graph, an auxiliary graph neural network is trained to extract
high-level semantic relations and then used to guide the training of the
primary model, enabling more adequate utilization of label knowledge. Across
image, video, and audio modalities, we utilize the LSG method in both TL and
SSL scenarios and illustrate its versatility in significantly enhancing
performance compared to other data-efficient learning approaches. Additionally,
our in-depth analysis shows that the LSG method also expedites the training
process.Comment: Accepted by NeurIPS 202
Language-Routing Mixture of Experts for Multilingual and Code-Switching Speech Recognition
Multilingual speech recognition for both monolingual and code-switching
speech is a challenging task. Recently, based on the Mixture of Experts (MoE),
many works have made good progress in multilingual and code-switching ASR, but
present huge computational complexity with the increase of supported languages.
In this work, we propose a computation-efficient network named Language-Routing
Mixture of Experts (LR-MoE) for multilingual and code-switching ASR. LR-MoE
extracts language-specific representations through the Mixture of Language
Experts (MLE), which is guided to learn by a frame-wise language routing
mechanism. The weight-shared frame-level language identification (LID) network
is jointly trained as the shared pre-router of each MoE layer. Experiments show
that the proposed method significantly improves multilingual and code-switching
speech recognition performances over baseline with comparable computational
efficiency.Comment: To appear in Proc. INTERSPEECH 2023, August 20-24, 2023, Dublin,
Irelan
Vector Approximate Message Passing based Channel Estimation for MIMO-OFDM Underwater Acoustic Communications
Accurate channel estimation is critical to the performance of orthogonal
frequency-division multiplexing (OFDM) underwater acoustic (UWA)
communications, especially under multiple-input multiple-output (MIMO)
scenarios. In this paper, we explore Vector Approximate Message Passing (VAMP)
coupled with Expected Maximum (EM) to obtain channel estimation (CE) for MIMO
OFDM UWA communications. The EM-VAMP-CE scheme is developed by employing a
Bernoulli-Gaussian (BG) prior distribution for the channel impulse response,
and hyperparameters of the BG prior distribution are learned via the EM
algorithm. Performance of the EM-VAMP-CE is evaluated through both synthesized
data and real data collected in two at-sea UWA communication experiments. It is
shown the EM-VAMP-CE achieves better performance-complexity tradeoff compared
with existing channel estimation methods.Comment: Journal:IEEE Journal of Oceanic Engineering(Date of
Submission:2022-06-25
Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling
Document-level relation extraction (RE) poses new challenges compared to its
sentence-level counterpart. One document commonly contains multiple entity
pairs, and one entity pair occurs multiple times in the document associated
with multiple possible relations. In this paper, we propose two novel
techniques, adaptive thresholding and localized context pooling, to solve the
multi-label and multi-entity problems. The adaptive thresholding replaces the
global threshold for multi-label classification in the prior work with a
learnable entities-dependent threshold. The localized context pooling directly
transfers attention from pre-trained language models to locate relevant context
that is useful to decide the relation. We experiment on three document-level RE
benchmark datasets: DocRED, a recently released large-scale RE dataset, and two
datasets CDRand GDA in the biomedical domain. Our ATLOP (Adaptive Thresholding
and Localized cOntext Pooling) model achieves an F1 score of 63.4, and also
significantly outperforms existing models on both CDR and GDA.Comment: Accepted by AAAI 2021. Code available at
https://github.com/wzhouad/ATLO
- …