81 research outputs found

    EmbeddingTree: Hierarchical Exploration of Entity Features in Embedding

    Full text link
    Embedding learning transforms discrete data entities into continuous numerical representations, encoding features/properties of the entities. Despite the outstanding performance reported from different embedding learning algorithms, few efforts were devoted to structurally interpreting how features are encoded in the learned embedding space. This work proposes EmbeddingTree, a hierarchical embedding exploration algorithm that relates the semantics of entity features with the less-interpretable embedding vectors. An interactive visualization tool is also developed based on EmbeddingTree to explore high-dimensional embeddings. The tool helps users discover nuance features of data entities, perform feature denoising/injecting in embedding training, and generate embeddings for unseen entities. We demonstrate the efficacy of EmbeddingTree and our visualization tool through embeddings generated for industry-scale merchant data and the public 30Music listening/playlists dataset.Comment: 5 pages, 3 figures, accepted by PacificVis 202

    Sharpness-Aware Graph Collaborative Filtering

    Full text link
    Graph Neural Networks (GNNs) have achieved impressive performance in collaborative filtering. However, GNNs tend to yield inferior performance when the distributions of training and test data are not aligned well. Also, training GNNs requires optimizing non-convex neural networks with an abundance of local and global minima, which may differ widely in their performance at test time. Thus, it is essential to choose the minima carefully. Here we propose an effective training schema, called {gSAM}, under the principle that the \textit{flatter} minima has a better generalization ability than the \textit{sharper} ones. To achieve this goal, gSAM regularizes the flatness of the weight loss landscape by forming a bi-level optimization: the outer problem conducts the standard model training while the inner problem helps the model jump out of the sharp minima. Experimental results show the superiority of our gSAM

    Multitask Learning for Time Series Data with 2D Convolution

    Full text link
    Multitask learning (MTL) aims to develop a unified model that can handle a set of closely related tasks simultaneously. By optimizing the model across multiple tasks, MTL generally surpasses its non-MTL counterparts in terms of generalizability. Although MTL has been extensively researched in various domains such as computer vision, natural language processing, and recommendation systems, its application to time series data has received limited attention. In this paper, we investigate the application of MTL to the time series classification (TSC) problem. However, when we integrate the state-of-the-art 1D convolution-based TSC model with MTL, the performance of the TSC model actually deteriorates. By comparing the 1D convolution-based models with the Dynamic Time Warping (DTW) distance function, it appears that the underwhelming results stem from the limited expressive power of the 1D convolutional layers. To overcome this challenge, we propose a novel design for a 2D convolution-based model that enhances the model's expressiveness. Leveraging this advantage, our proposed method outperforms competing approaches on both the UCR Archive and an industrial transaction TSC dataset

    Toward a Foundation Model for Time Series Data

    Full text link
    A foundation model is a machine learning model trained on a large and diverse set of data, typically using self-supervised learning-based pre-training techniques, that can be adapted to various downstream tasks. However, current research on time series pre-training has mostly focused on models pre-trained solely on data from a single domain, resulting in a lack of knowledge about other types of time series. However, current research on time series pre-training has predominantly focused on models trained exclusively on data from a single domain. As a result, these models possess domain-specific knowledge that may not be easily transferable to time series from other domains. In this paper, we aim to develop an effective time series foundation model by leveraging unlabeled samples from multiple domains. To achieve this, we repurposed the publicly available UCR Archive and evaluated four existing self-supervised learning-based pre-training methods, along with a novel method, on the datasets. We tested these methods using four popular neural network architectures for time series to understand how the pre-training methods interact with different network designs. Our experimental results show that pre-training improves downstream classification tasks by enhancing the convergence of the fine-tuning process. Furthermore, we found that the proposed pre-training method, when combined with the Transformer model, outperforms the alternatives

    An Efficient Content-based Time Series Retrieval System

    Full text link
    A Content-based Time Series Retrieval (CTSR) system is an information retrieval system for users to interact with time series emerged from multiple domains, such as finance, healthcare, and manufacturing. For example, users seeking to learn more about the source of a time series can submit the time series as a query to the CTSR system and retrieve a list of relevant time series with associated metadata. By analyzing the retrieved metadata, users can gather more information about the source of the time series. Because the CTSR system is required to work with time series data from diverse domains, it needs a high-capacity model to effectively measure the similarity between different time series. On top of that, the model within the CTSR system has to compute the similarity scores in an efficient manner as the users interact with the system in real-time. In this paper, we propose an effective and efficient CTSR model that outperforms alternative models, while still providing reasonable inference runtimes. To demonstrate the capability of the proposed method in solving business problems, we compare it against alternative models using our in-house transaction data. Our findings reveal that the proposed model is the most suitable solution compared to others for our transaction data problem

    Design, synthesis and in vitro anti-Zika virus evaluation of novel Sinefungin derivatives

    Get PDF
    We report herein the design and synthesis of a series of novel Sinefungin (SIN) derivatives, based on the structures of SIN and its analogue EPZ004777. Our results reveal that target compounds 1ad-af, 1ba-bb and 1bf-bh show better activity (IC50 = 4.56–20.16 μM) than EPZ004777 (IC50 = 35.19 μM). Surprisingly, SIN was founded to be not as active (IC50 > 50 μM) as we and other research groups predicted. Interestingly, the intermediates 9a-b and 11b display potent anti-ZIKV potency (IC50 = 6.33–29.98 μM), and compound 9a also exhibits acceptable cytotoxicity (CC50 > 200 μM), suggesting their promising potential to be leads for further development

    C5aR1 shapes a non-inflammatory tumor microenvironment and mediates immune evasion in gastric cancer

    Get PDF
    C5a receptor 1 (C5aR1) is associated with various inflammatory processes, the pathogenesis of immune diseases, and tumor growth. However, its role in the tumor microenvironment of gastric cancer (GC) remains unclear. In this study, the expression of C5aR1 in GC and normal gastric mucosa tissues was compared using data retrieved from the Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) databases, and the results were validated by in vitro qRT-PCR and immunohistochemical analyses. The relationship between C5aR1 expression and the overall survival of patients with GC was analyzed using the Kaplan–Meier method. Subsequently, enrichment analysis was performed, and the signaling pathways were screened. C5aR1 expression was also correlated with genes related to the immune checkpoint and immune cell infiltration. The results revealed that C5aR1 expression was enhanced in GC tissues compared to normal gastric tissues, and that patients with high expression of C5aR1 had a worse 10-year overall survival compared to those showing low expression of C5aR1. Functional analysis revealed that C5aR1 is a gene related to theimmune system and may play a crucial role in inflammatory and tumor immune responses. Additionally, C5aR1 showed a positive correlation with most immune checkpoint-related genes and a negative correlation with natural killer cells, dendritic cells, and CD8+ T cells. Immune evasion risk was observed to be significantly greater in patients with higher expression of C5aR1 than in those with lower expression. The results of this study reveal that C5aR1 shapes a non-inflammatory tumor microenvironment in GC and mediates immune evasion

    Coexistence of Gravitationally Bound and Radiation Driven CIV Emission Line Regions in Active Galactic Nuclei

    Full text link
    There are mutually contradictory views in the literature of the kinematics and structure of high-ionization line (e.g. CIV) emitting regions in active galactic nuclei (AGNs). Two kinds of broad emission line region (BELR) models have been proposed, outflow and gravitationally bound BELR, which are supported respectively by blueshift of the CIV line and reverberation mapping observations. To reconcile these two apparently different models, we present a detailed comparison study between the CIV and MgII lines using a sample of AGNs selected from the Sloan Digital Sky Survey. We find that the kinematics of the CIV region is different from that of MgII, which is thought to be controlled by gravity. A strong correlation is found between the blueshift and asymmetry of the CIV profile and the Eddington ratio. This provides strong observational support for the postulation that the outflow is driven by radiation pressure. In particular, we find robust evidence that the CIV line region is largely dominated by outflow at high Eddington ratios, while it is primarily gravitationally bounded at low Eddington ratios. Our results indicate that these two emitting regions coexist in most of AGNs. The emission strength from these two gases varies smoothly with Eddington ratio in opposite ways. This explanation naturally reconciles the apparently contradictory views proposed in previous studies. Finally, candidate models are discussed which can account for both, the enhancement of outflow emission and suppression of normal BEL, in AGN with high Eddington ratios.Comment: 34 pages, 9 figures, accepted for publication in Ap
    • …
    corecore