421 research outputs found

    Generative temporal link prediction via self-tokenized sequence modeling

    Full text link
    © 2020, Springer Science+Business Media, LLC, part of Springer Nature. We formalize networks with evolving structures as temporal networks and propose a generative link prediction model, Generative Link Sequence Modeling (GLSM), to predict future links for temporal networks. GLSM captures the temporal link formation patterns from the observed links with a sequence modeling framework and has the ability to generate the emerging links by inferring from the probability distribution on the potential future links. To avoid overfitting caused by treating each link as a unique token, we propose a self-tokenization mechanism to transform each raw link in the network to an abstract aggregation token automatically. The self-tokenization is seamlessly integrated into the sequence modeling framework, which allows the proposed GLSM model to have the generalization capability to discover link formation patterns beyond raw link sequences. We compare GLSM with the existing state-of-art methods on five real-world datasets. The experimental results demonstrate that GLSM obtains future positive links effectively in a generative fashion while achieving the best performance (2-10% improvements on AUC) among other alternatives

    Entity matching with transformer architectures - a step forward in data integration

    Get PDF
    Transformer architectures have proven to be very effective and provide state-of-the-art results in many natural language tasks. The attention-based architecture in combination with pre-training on large amounts of text lead to the recent breakthrough and a variety of slightly different implementations. In this paper we analyze how well four of the most recent attention-based transformer architectures (BERT, XLNet, RoBERTa and DistilBERT) perform on the task of entity matching - a crucial part of data integration. Entity matching (EM) is the task of finding data instances that refer to the same real-world entity. It is a challenging task if the data instances consist of long textual data or if the data instances are "dirty" due to misplaced values. To evaluate the capability of transformer architectures and transfer-learning on the task of EM, we empirically compare the four approaches on inherently difficult data sets. We show that transformer architectures outperform classical deep learning methods in EM by an average margin of 27.5%

    ClimaX: A foundation model for weather and climate

    Full text link
    Most state-of-the-art approaches for weather and climate modeling are based on physics-informed numerical models of the atmosphere. These approaches aim to model the non-linear dynamics and complex interactions between multiple variables, which are challenging to approximate. Additionally, many such numerical models are computationally intensive, especially when modeling the atmospheric phenomenon at a fine-grained spatial and temporal resolution. Recent data-driven approaches based on machine learning instead aim to directly solve a downstream forecasting or projection task by learning a data-driven functional mapping using deep neural networks. However, these networks are trained using curated and homogeneous climate datasets for specific spatiotemporal tasks, and thus lack the generality of numerical models. We develop and demonstrate ClimaX, a flexible and generalizable deep learning model for weather and climate science that can be trained using heterogeneous datasets spanning different variables, spatio-temporal coverage, and physical groundings. ClimaX extends the Transformer architecture with novel encoding and aggregation blocks that allow effective use of available compute while maintaining general utility. ClimaX is pre-trained with a self-supervised learning objective on climate datasets derived from CMIP6. The pre-trained ClimaX can then be fine-tuned to address a breadth of climate and weather tasks, including those that involve atmospheric variables and spatio-temporal scales unseen during pretraining. Compared to existing data-driven baselines, we show that this generality in ClimaX results in superior performance on benchmarks for weather forecasting and climate projections, even when pretrained at lower resolutions and compute budgets. The source code is available at https://github.com/microsoft/ClimaX.Comment: International Conference on Machine Learning 202

    Transformer-Based Visual Segmentation: A Survey

    Full text link
    Visual segmentation seeks to partition images, video frames, or point clouds into multiple segments or groups. This technique has numerous real-world applications, such as autonomous driving, image editing, robot sensing, and medical analysis. Over the past decade, deep learning-based methods have made remarkable strides in this area. Recently, transformers, a type of neural network based on self-attention originally designed for natural language processing, have considerably surpassed previous convolutional or recurrent approaches in various vision processing tasks. Specifically, vision transformers offer robust, unified, and even simpler solutions for various segmentation tasks. This survey provides a thorough overview of transformer-based visual segmentation, summarizing recent advancements. We first review the background, encompassing problem definitions, datasets, and prior convolutional methods. Next, we summarize a meta-architecture that unifies all recent transformer-based approaches. Based on this meta-architecture, we examine various method designs, including modifications to the meta-architecture and associated applications. We also present several closely related settings, including 3D point cloud segmentation, foundation model tuning, domain-aware segmentation, efficient segmentation, and medical segmentation. Additionally, we compile and re-evaluate the reviewed methods on several well-established datasets. Finally, we identify open challenges in this field and propose directions for future research. The project page can be found at https://github.com/lxtGH/Awesome-Segmenation-With-Transformer. We will also continually monitor developments in this rapidly evolving field.Comment: Work in progress. Github: https://github.com/lxtGH/Awesome-Segmenation-With-Transforme

    On Generative Models and Joint Architectures for Document-level Relation Extraction

    Get PDF
    Biomedical text is being generated at a high rate in scientific literature publications and electronic health records. Within these documents lies a wealth of potentially useful information in biomedicine. Relation extraction (RE), the process of automating the identification of structured relationships between entities within text, represents a highly sought-after goal in biomedical informatics, offering the potential to unlock deeper insights and connections from this vast corpus of data. In this dissertation, we tackle this problem with a variety of approaches. We review the recent history of the field of document-level RE. Several themes emerge. First, graph neural networks dominate the methods for constructing entity and relation representations. Second, clever uses of attention allow for the these constructions to focus on particularly relevant tokens and object (such as mentions and entities) representations. Third, aggregation of signal across mentions in entity-level RE is a key focus of research. Fourth, the injection of additional signal by adding tokens to the text prior to encoding via language model (LM) or through additional learning tasks boosts performance. Last, we explore an assortment of strategies for the challenging task of end-to-end entity-level RE. Of particular note are sequence-to-sequence (seq2seq) methods that have become particularly popular in the past few years. With the success of general-domain generative LMs, biomedical NLP researchers have trained a variety of these models on biomedical text under the assumption that they would be superior for biomedical tasks. As training such models is computationally expensive, we investigate whether they outperform generic models. We test this assumption rigorously by comparing performance of all major biomedical generative language models to the performances of their generic counterparts across multiple biomedical RE datasets, in the traditional finetuning setting as well as in the few-shot setting. Surprisingly, we found that biomedical models tended to underperform compared to their generic counterparts. However, we found that small-scale biomedical instruction finetuning improved performance to a similar degree as larger-scale generic instruction finetuning. Zero-shot natural language processing (NLP) offers savings on the expenses associated with annotating datasets and the specialized knowledge required for applying NLP methods. Large, generative LMs trained to align with human objectives have demonstrated impressive zero-shot capabilities over a broad range of tasks. However, the effectiveness of these models in biomedical RE remains uncertain. To bridge this gap in understanding, we investigate how GPT-4 performs across several RE datasets. We experiment with the recent JSON generation features to generate structured output, which we use alternately by defining an explicit schema describing the relation structure, and inferring the structure from the prompt itself. Our work is the first to study zero-shot biomedical RE across a variety of datasets. Overall, performance was lower than that of fully-finetuned methods. Recall suffered in examples with more than a few relations. Entity mention boundaries were a major source of error, which future work could fruitfully address. In our previous work with generative LMs, we noted that RE performance decreased with the number of gold relations in an example. This observation aligns with the general pattern that recurrent neural network and transformer-based model performance tends to decrease with sequence length. Generative LMs also do not identify textual mentions or group them into entities, which are valuable information extraction tasks unto themselves. Therefore, in this age of generative methods, we revisit non-seq2seq methodology for biomedical RE. We adopt a sequential framework of named entity recognition (NER), clustering mentions into entities, followed by relation classification (RC). As errors early in the pipeline necessarily cause downstream errors, and NER performance is near its ceiling, we focus on improving clustering. We match state-of-the-art (SOTA) performance in NER, and substantially improve mention clustering performance by incorporating dependency parsing and gating string dissimilarity embeddings. Overall, we advance the field of biomedical RE in a few ways. In our experiments of finetuned LMs, we show that biomedicine-specific models are unnecessary, freeing researchers to make use of SOTA generic LMs. The relatively high few-shot performance in these experiments also suggests that biomedical RE can be reasonably accessible, as it is not so difficult to construct small datasets. Our investigation into zero-shot RE shows that SOTA LMs can compete with fully finetuned smaller LMs. Together these studies also demonstrate weaknesses of generative RE. Last, we show that non-generative RE methods still outperform generative methods in the fully-finetuned setting

    Geometric Deep Learning for Computer-Aided Design: A Survey

    Full text link
    Geometric Deep Learning techniques have become a transformative force in the field of Computer-Aided Design (CAD), and have the potential to revolutionize how designers and engineers approach and enhance the design process. By harnessing the power of machine learning-based methods, CAD designers can optimize their workflows, save time and effort while making better informed decisions, and create designs that are both innovative and practical. The ability to process the CAD designs represented by geometric data and to analyze their encoded features enables the identification of similarities among diverse CAD models, the proposition of alternative designs and enhancements, and even the generation of novel design alternatives. This survey offers a comprehensive overview of learning-based methods in computer-aided design across various categories, including similarity analysis and retrieval, 2D and 3D CAD model synthesis, and CAD generation from point clouds. Additionally, it provides a complete list of benchmark datasets and their characteristics, along with open-source codes that have propelled research in this domain. The final discussion delves into the challenges prevalent in this field, followed by potential future research directions in this rapidly evolving field.Comment: 26 pages, 14 figures, journal articl

    Graph Meets LLMs: Towards Large Graph Models

    Full text link
    Large models have emerged as the most recent groundbreaking achievements in artificial intelligence, and particularly machine learning. However, when it comes to graphs, large models have not achieved the same level of success as in other fields, such as natural language processing and computer vision. In order to promote applying large models for graphs forward, we present a perspective paper to discuss the challenges and opportunities associated with developing large graph models. First, we discuss the desired characteristics of large graph models. Then, we present detailed discussions from three key perspectives: representation basis, graph data, and graph models. In each category, we provide a brief overview of recent advances and highlight the remaining challenges together with our visions. Finally, we discuss valuable applications of large graph models. We believe this perspective can encourage further investigations into large graph models, ultimately pushing us one step closer towards artificial general intelligence (AGI). We are the first to comprehensively study large graph models, to the best of our knowledge.Comment: Accepted by NeurIPS 2023 New Frontiers in Graph Learning Workshop. Comments are welcom

    ML-Based User Authentication Through Mouse Dynamics

    Get PDF
    Increasing reliance on digital services and the limitations of traditional authentication methods have necessitated the development of more advanced and secure user authentication methods. For user authentication and intrusion detection, mouse dynamics, a form of behavioral biometrics, offers a promising and non-invasive method. This paper presents a comprehensive study on ML-Based User Authentication Through Mouse Dynamics. This project proposes a novel framework integrating sophisticated techniques such as embeddings extraction using Transformer models with cutting-edge machine learning algorithms such as Recurrent Neural Networks (RNN). The project aims to accurately identify users based on their distinct mouse behavior and detect unauthorized access by utilizing the hybrid models. Using a mouse dynamics dataset, the proposed framework’s performance is evaluated, demonstrating its efficacy in accurately identifying users and detecting intrusions. In addition, a comparative analysis with existing methodologies is provided, highlighting the enhancements made by the proposed framework. This paper contributes to the development of more secure, reliable, and user-friendly authentication systems that leverage the power of machine learning and behavioral biometrics, ultimately augmenting the privacy and security of digital services and resources

    A Unified, Scalable Framework for Neural Population Decoding

    Full text link
    Our ability to use deep learning approaches to decipher neural activity would likely benefit from greater scale, in terms of both model size and datasets. However, the integration of many neural recordings into one unified model is challenging, as each recording contains the activity of different neurons from different individual animals. In this paper, we introduce a training framework and architecture designed to model the population dynamics of neural activity across diverse, large-scale neural recordings. Our method first tokenizes individual spikes within the dataset to build an efficient representation of neural events that captures the fine temporal structure of neural activity. We then employ cross-attention and a PerceiverIO backbone to further construct a latent tokenization of neural population activities. Utilizing this architecture and training framework, we construct a large-scale multi-session model trained on large datasets from seven nonhuman primates, spanning over 158 different sessions of recording from over 27,373 neural units and over 100 hours of recordings. In a number of different tasks, we demonstrate that our pretrained model can be rapidly adapted to new, unseen sessions with unspecified neuron correspondence, enabling few-shot performance with minimal labels. This work presents a powerful new approach for building deep learning tools to analyze neural data and stakes out a clear path to training at scale.Comment: Accepted at NeurIPS 202

    Source Code Generation from Descriptions in a Natural Language

    Get PDF
    Tato diplomová práce představuje CodeFormer, nový model neuronové sítě, schopný na základě popisu úlohy v anglickém jazyce generovat funkce v programovacím jazyce Python. Tento model, založený na architektuře modelu BART, je předtrénovaný na 230 milionech funkcích získaných z veřejných GitHub repozitářů. Po dotrénování na CodeSearchNet datasetu náš model překonává konkurenční modely a nastavuje tak nové state of the art s 46,12 BLEU, což představuje zlepšení o 13,86 BLEU. Vedle CodeFormer modelu tato práce představuje nový Stack Overflow Code Generation Dataset (SOCGD), který je určený k trénování generativních modelů zdrojových kódů. Na tomto datasetu náš model dosahuje výsledku 47,68 BLEU. Výsledný model lze integrovat do vývojových prostředí a umožnit tak programátorům generovat části zdrojových kódů s cílem zvýšit efektivitu jejich práce. V rámci našeho výzkumu jsme také objevili lepší přístup k trénování modelu BART na úloze strojového překladu. Použitelnost tohoto přístupu na jiných doménách je třeba ověřit v navazující práci.ObhájenoThis work introduces CodeFormer, a Python source code generator pre-trained on a massive GitHub crawl consisting of 230M Python functions. The released model, built on BART architecture, generates Python functions based on descriptions in English. On a CodeSearchNet dataset, the CodeFormer sets a new state of the art with 46.12 BLEU, representing an improvement of 13.86 BLEU. We also release a new parallel corpus for code generation called Stack Overflow Code Generation Dataset (SOCGD), on which our model sets a baseline of 47.68 BLEU. The resulting model is ready to be integrated into a source code suggestion system in an IDE, where it can improve software developers' productivity. During our research, we discovered a better way of training the BART for machine translation. However, the applicability of our approach to other domains must be verified in subsequent work
    • …