123 research outputs found

    Learning Topology-Specific Experts for Molecular Property Prediction

    Full text link
    Recently, graph neural networks (GNNs) have been successfully applied to predicting molecular properties, which is one of the most classical cheminformatics tasks with various applications. Despite their effectiveness, we empirically observe that training a single GNN model for diverse molecules with distinct structural patterns limits its prediction performance. In this paper, motivated by this observation, we propose TopExpert to leverage topology-specific prediction models (referred to as experts), each of which is responsible for each molecular group sharing similar topological semantics. That is, each expert learns topology-specific discriminative features while being trained with its corresponding topological group. To tackle the key challenge of grouping molecules by their topological patterns, we introduce a clustering-based gating module that assigns an input molecule into one of the clusters and further optimizes the gating module with two different types of self-supervision: topological semantics induced by GNNs and molecular scaffolds, respectively. Extensive experiments demonstrate that TopExpert has boosted the performance for molecular property prediction and also achieved better generalization for new molecules with unseen scaffolds than baselines. The code is available at https://github.com/kimsu55/ToxExpert.Comment: 11 pages with 8 figure

    Pohang Canal Dataset: A Multimodal Maritime Dataset for Autonomous Navigation in Restricted Waters

    Full text link
    This paper presents a multimodal maritime dataset and the data collection procedure used to gather it, which aims to facilitate autonomous navigation in restricted water environments. The dataset comprises measurements obtained using various perception and navigation sensors, including a stereo camera, an infrared camera, an omnidirectional camera, three LiDARs, a marine radar, a global positioning system, and an attitude heading reference system. The data were collected along a 7.5-km-long route that includes a narrow canal, inner and outer ports, and near-coastal areas in Pohang, South Korea. The collection was conducted under diverse weather and visual conditions. The dataset and its detailed description are available for free download at https://sites.google.com/view/pohang-canal-dataset.Comment: Submitted to IJRR as a data paper for revie

    Unsupervised Story Discovery from Continuous News Streams via Scalable Thematic Embedding

    Full text link
    Unsupervised discovery of stories with correlated news articles in real-time helps people digest massive news streams without expensive human annotations. A common approach of the existing studies for unsupervised online story discovery is to represent news articles with symbolic- or graph-based embedding and incrementally cluster them into stories. Recent large language models are expected to improve the embedding further, but a straightforward adoption of the models by indiscriminately encoding all information in articles is ineffective to deal with text-rich and evolving news streams. In this work, we propose a novel thematic embedding with an off-the-shelf pretrained sentence encoder to dynamically represent articles and stories by considering their shared temporal themes. To realize the idea for unsupervised online story discovery, a scalable framework USTORY is introduced with two main techniques, theme- and time-aware dynamic embedding and novelty-aware adaptive clustering, fueled by lightweight story summaries. A thorough evaluation with real news data sets demonstrates that USTORY achieves higher story discovery performances than baselines while being robust and scalable to various streaming settings.Comment: Accepted by SIGIR'2

    SCStory: Self-supervised and Continual Online Story Discovery

    Full text link
    We present a framework SCStory for online story discovery, that helps people digest rapidly published news article streams in real-time without human annotations. To organize news article streams into stories, existing approaches directly encode the articles and cluster them based on representation similarity. However, these methods yield noisy and inaccurate story discovery results because the generic article embeddings do not effectively reflect the story-indicative semantics in an article and cannot adapt to the rapidly evolving news article streams. SCStory employs self-supervised and continual learning with a novel idea of story-indicative adaptive modeling of news article streams. With a lightweight hierarchical embedding module that first learns sentence representations and then article representations, SCStory identifies story-relevant information of news articles and uses them to discover stories. The embedding module is continuously updated to adapt to evolving news streams with a contrastive learning objective, backed up by two unique techniques, confidence-aware memory replay and prioritized-augmentation, employed for label absence and data scarcity problems. Thorough experiments on real and the latest news data sets demonstrate that SCStory outperforms existing state-of-the-art algorithms for unsupervised online story discovery.Comment: Presented at WWW'2

    RTSUM: Relation Triple-based Interpretable Summarization with Multi-level Salience Visualization

    Full text link
    In this paper, we present RTSUM, an unsupervised summarization framework that utilizes relation triples as the basic unit for summarization. Given an input document, RTSUM first selects salient relation triples via multi-level salience scoring and then generates a concise summary from the selected relation triples by using a text-to-text language model. On the basis of RTSUM, we also develop a web demo for an interpretable summarizing tool, providing fine-grained interpretations with the output summary. With support for customization options, our tool visualizes the salience for textual units at three distinct levels: sentences, relation triples, and phrases. The codes,are publicly available.Comment: 8 pages, 2 figure

    Persistent metallic Sn-doped In2O3 epitaxial ultrathin films with enhanced infrared transmittance

    Get PDF
    Infrared transparent electrodes (IR-TEs) have recently attracted much attention for industrial and military applications. The simplest method to obtain high IR transmittance is to reduce the electrode film thickness. However, for films several tens of nanometres thick, this approach unintentionally suppresses conduction due to surface electron scattering. Here, we demonstrate low sheet resistance (<400 Ω □−1 at room temperature) and high IR transmittance (>65% at the 2.5-μm wavelength) in Sn-doped In2O3 (ITO) epitaxial films for the thickness range of 17−80 nm. A combination of X-ray spectroscopy and ellipsometry measurements reveals a persistent electronic bandstructure in the 8-nm-thick film compared to much thicker films. This indicates that the metallicity of the film is preserved, despite the ultrathin film configuration. The high carrier mobility in the ITO epitaxial films further confirms the film’s metallicity as a result of the improved crystallinity of the film and the resulting reduction in the scattering defect concentration. Thus, ITO shows great potential for IR-TE applications of transparent photovoltaic and optoelectronic devices. © 2020, The Author(s).1

    Evidentiality-aware Retrieval for Overcoming Abstractiveness in Open-Domain Question Answering

    Full text link
    The long-standing goal of dense retrievers in abtractive open-domain question answering (ODQA) tasks is to learn to capture evidence passages among relevant passages for any given query, such that the reader produce factually correct outputs from evidence passages. One of the key challenge is the insufficient amount of training data with the supervision of the answerability of the passages. Recent studies rely on iterative pipelines to annotate answerability using signals from the reader, but their high computational costs hamper practical applications. In this paper, we instead focus on a data-centric approach and propose Evidentiality-Aware Dense Passage Retrieval (EADPR), which leverages synthetic distractor samples to learn to discriminate evidence passages from distractors. We conduct extensive experiments to validate the effectiveness of our proposed method on multiple abstractive ODQA tasks.Comment: Findings of EACL 202

    COCOA: CBT-based Conversational Counseling Agent using Memory Specialized in Cognitive Distortions and Dynamic Prompt

    Full text link
    The demand for conversational agents that provide mental health care is consistently increasing. In this work, we develop a psychological counseling agent, referred to as CoCoA, that applies Cognitive Behavioral Therapy (CBT) techniques to identify and address cognitive distortions inherent in the client's statements. Specifically, we construct a memory system to efficiently manage information necessary for counseling while extracting high-level insights about the client from their utterances. Additionally, to ensure that the counseling agent generates appropriate responses, we introduce dynamic prompting to flexibly apply CBT techniques and facilitate the appropriate retrieval of information. We conducted dialogues between CoCoA and characters from Character.ai, creating a dataset for evaluation. Then, we asked GPT to evaluate the constructed counseling dataset, and our model demonstrated a statistically significant difference from other models.Comment: 4 pages, 2 figure

    Dialogue Chain-of-Thought Distillation for Commonsense-aware Conversational Agents

    Full text link
    Human-like chatbots necessitate the use of commonsense reasoning in order to effectively comprehend and respond to implicit information present within conversations. Achieving such coherence and informativeness in responses, however, is a non-trivial task. Even for large language models (LLMs), the task of identifying and aggregating key evidence within a single hop presents a substantial challenge. This complexity arises because such evidence is scattered across multiple turns in a conversation, thus necessitating integration over multiple hops. Hence, our focus is to facilitate such multi-hop reasoning over a dialogue context, namely dialogue chain-of-thought (CoT) reasoning. To this end, we propose a knowledge distillation framework that leverages LLMs as unreliable teachers and selectively distills consistent and helpful rationales via alignment filters. We further present DOCTOR, a DialOgue Chain-of-ThOught Reasoner that provides reliable CoT rationales for response generation. We conduct extensive experiments to show that enhancing dialogue agents with high-quality rationales from DOCTOR significantly improves the quality of their responses.Comment: 25 pages, 8 figures, Accepted to EMNLP 202

    Development of Soil Compaction Analysis Software (SCAN) Integrating a Low Cost GPS Receiver and Compactometer

    Get PDF
    A software for soil compaction analysis (SCAN) has been developed for evaluating the compaction states using the data from the GPS as well as a compactometer attached on the roller. The SCAN is distinguished from other previous software for intelligent compaction (IC) in that it can use the results from various types of GPS positioning methods, and it also has an optimal structure for remotely managing the large amounts of data gathered from numerous rollers. For this, several methods were developed: (1) improving the accuracy of low cost GPS receiver’s positioning results; (2) modeling the trajectory of a moving roller using a GPS receiver’s results and linking it with the data from the compactometer; and (3) extracting the information regarding the compaction states of the ground from the modeled trajectory, using spatial analysis methods. The SCAN was verified throughout various field compaction tests, and it has been confirmed that it can be a very effective tool in evaluating field compaction states
    corecore