81 research outputs found
EmbeddingTree: Hierarchical Exploration of Entity Features in Embedding
Embedding learning transforms discrete data entities into continuous
numerical representations, encoding features/properties of the entities.
Despite the outstanding performance reported from different embedding learning
algorithms, few efforts were devoted to structurally interpreting how features
are encoded in the learned embedding space. This work proposes EmbeddingTree, a
hierarchical embedding exploration algorithm that relates the semantics of
entity features with the less-interpretable embedding vectors. An interactive
visualization tool is also developed based on EmbeddingTree to explore
high-dimensional embeddings. The tool helps users discover nuance features of
data entities, perform feature denoising/injecting in embedding training, and
generate embeddings for unseen entities. We demonstrate the efficacy of
EmbeddingTree and our visualization tool through embeddings generated for
industry-scale merchant data and the public 30Music listening/playlists
dataset.Comment: 5 pages, 3 figures, accepted by PacificVis 202
Sharpness-Aware Graph Collaborative Filtering
Graph Neural Networks (GNNs) have achieved impressive performance in
collaborative filtering. However, GNNs tend to yield inferior performance when
the distributions of training and test data are not aligned well. Also,
training GNNs requires optimizing non-convex neural networks with an abundance
of local and global minima, which may differ widely in their performance at
test time. Thus, it is essential to choose the minima carefully. Here we
propose an effective training schema, called {gSAM}, under the principle that
the \textit{flatter} minima has a better generalization ability than the
\textit{sharper} ones. To achieve this goal, gSAM regularizes the flatness of
the weight loss landscape by forming a bi-level optimization: the outer problem
conducts the standard model training while the inner problem helps the model
jump out of the sharp minima. Experimental results show the superiority of our
gSAM
Multitask Learning for Time Series Data with 2D Convolution
Multitask learning (MTL) aims to develop a unified model that can handle a
set of closely related tasks simultaneously. By optimizing the model across
multiple tasks, MTL generally surpasses its non-MTL counterparts in terms of
generalizability. Although MTL has been extensively researched in various
domains such as computer vision, natural language processing, and
recommendation systems, its application to time series data has received
limited attention. In this paper, we investigate the application of MTL to the
time series classification (TSC) problem. However, when we integrate the
state-of-the-art 1D convolution-based TSC model with MTL, the performance of
the TSC model actually deteriorates. By comparing the 1D convolution-based
models with the Dynamic Time Warping (DTW) distance function, it appears that
the underwhelming results stem from the limited expressive power of the 1D
convolutional layers. To overcome this challenge, we propose a novel design for
a 2D convolution-based model that enhances the model's expressiveness.
Leveraging this advantage, our proposed method outperforms competing approaches
on both the UCR Archive and an industrial transaction TSC dataset
Toward a Foundation Model for Time Series Data
A foundation model is a machine learning model trained on a large and diverse
set of data, typically using self-supervised learning-based pre-training
techniques, that can be adapted to various downstream tasks. However, current
research on time series pre-training has mostly focused on models pre-trained
solely on data from a single domain, resulting in a lack of knowledge about
other types of time series. However, current research on time series
pre-training has predominantly focused on models trained exclusively on data
from a single domain. As a result, these models possess domain-specific
knowledge that may not be easily transferable to time series from other
domains. In this paper, we aim to develop an effective time series foundation
model by leveraging unlabeled samples from multiple domains. To achieve this,
we repurposed the publicly available UCR Archive and evaluated four existing
self-supervised learning-based pre-training methods, along with a novel method,
on the datasets. We tested these methods using four popular neural network
architectures for time series to understand how the pre-training methods
interact with different network designs. Our experimental results show that
pre-training improves downstream classification tasks by enhancing the
convergence of the fine-tuning process. Furthermore, we found that the proposed
pre-training method, when combined with the Transformer model, outperforms the
alternatives
An Efficient Content-based Time Series Retrieval System
A Content-based Time Series Retrieval (CTSR) system is an information
retrieval system for users to interact with time series emerged from multiple
domains, such as finance, healthcare, and manufacturing. For example, users
seeking to learn more about the source of a time series can submit the time
series as a query to the CTSR system and retrieve a list of relevant time
series with associated metadata. By analyzing the retrieved metadata, users can
gather more information about the source of the time series. Because the CTSR
system is required to work with time series data from diverse domains, it needs
a high-capacity model to effectively measure the similarity between different
time series. On top of that, the model within the CTSR system has to compute
the similarity scores in an efficient manner as the users interact with the
system in real-time. In this paper, we propose an effective and efficient CTSR
model that outperforms alternative models, while still providing reasonable
inference runtimes. To demonstrate the capability of the proposed method in
solving business problems, we compare it against alternative models using our
in-house transaction data. Our findings reveal that the proposed model is the
most suitable solution compared to others for our transaction data problem
Design, synthesis and in vitro anti-Zika virus evaluation of novel Sinefungin derivatives
We report herein the design and synthesis of a series of novel Sinefungin (SIN) derivatives, based on the structures of SIN and its analogue EPZ004777. Our results reveal that target compounds 1ad-af, 1ba-bb and 1bf-bh show better activity (IC50 = 4.56–20.16 μM) than EPZ004777 (IC50 = 35.19 μM). Surprisingly, SIN was founded to be not as active (IC50 > 50 μM) as we and other research groups predicted. Interestingly, the intermediates 9a-b and 11b display potent anti-ZIKV potency (IC50 = 6.33–29.98 μM), and compound 9a also exhibits acceptable cytotoxicity (CC50 > 200 μM), suggesting their promising potential to be leads for further development
C5aR1 shapes a non-inflammatory tumor microenvironment and mediates immune evasion in gastric cancer
C5a receptor 1 (C5aR1) is associated with various inflammatory processes, the pathogenesis of immune diseases, and tumor growth. However, its role in the tumor microenvironment of gastric cancer (GC) remains unclear. In this study, the expression of C5aR1 in GC and normal gastric mucosa tissues was compared using data retrieved from the Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) databases, and the results were validated by in vitro qRT-PCR and immunohistochemical analyses. The relationship between C5aR1 expression and the overall survival of patients with GC was analyzed using the Kaplan–Meier method. Subsequently, enrichment analysis was performed, and the signaling pathways were screened. C5aR1 expression was also correlated with genes related to the immune checkpoint and immune cell infiltration. The results revealed that C5aR1 expression was enhanced in GC tissues compared to normal gastric tissues, and that patients with high expression of C5aR1 had a worse 10-year overall survival compared to those showing low expression of C5aR1. Functional analysis revealed that C5aR1 is a gene related to theimmune system and may play a crucial role in inflammatory and tumor immune responses. Additionally, C5aR1 showed a positive correlation with most immune checkpoint-related genes and a negative correlation with natural killer cells, dendritic cells, and CD8+ T cells. Immune evasion risk was observed to be significantly greater in patients with higher expression of C5aR1 than in those with lower expression. The results of this study reveal that C5aR1 shapes a non-inflammatory tumor microenvironment in GC and mediates immune evasion
Coexistence of Gravitationally Bound and Radiation Driven CIV Emission Line Regions in Active Galactic Nuclei
There are mutually contradictory views in the literature of the kinematics
and structure of high-ionization line (e.g. CIV) emitting regions in active
galactic nuclei (AGNs). Two kinds of broad emission line region (BELR) models
have been proposed, outflow and gravitationally bound BELR, which are supported
respectively by blueshift of the CIV line and reverberation mapping
observations. To reconcile these two apparently different models, we present a
detailed comparison study between the CIV and MgII lines using a sample of AGNs
selected from the Sloan Digital Sky Survey. We find that the kinematics of the
CIV region is different from that of MgII, which is thought to be controlled by
gravity. A strong correlation is found between the blueshift and asymmetry of
the CIV profile and the Eddington ratio. This provides strong observational
support for the postulation that the outflow is driven by radiation pressure.
In particular, we find robust evidence that the CIV line region is largely
dominated by outflow at high Eddington ratios, while it is primarily
gravitationally bounded at low Eddington ratios. Our results indicate that
these two emitting regions coexist in most of AGNs. The emission strength from
these two gases varies smoothly with Eddington ratio in opposite ways. This
explanation naturally reconciles the apparently contradictory views proposed in
previous studies. Finally, candidate models are discussed which can account for
both, the enhancement of outflow emission and suppression of normal BEL, in AGN
with high Eddington ratios.Comment: 34 pages, 9 figures, accepted for publication in Ap
- …