421 research outputs found
Generative temporal link prediction via self-tokenized sequence modeling
© 2020, Springer Science+Business Media, LLC, part of Springer Nature. We formalize networks with evolving structures as temporal networks and propose a generative link prediction model, Generative Link Sequence Modeling (GLSM), to predict future links for temporal networks. GLSM captures the temporal link formation patterns from the observed links with a sequence modeling framework and has the ability to generate the emerging links by inferring from the probability distribution on the potential future links. To avoid overfitting caused by treating each link as a unique token, we propose a self-tokenization mechanism to transform each raw link in the network to an abstract aggregation token automatically. The self-tokenization is seamlessly integrated into the sequence modeling framework, which allows the proposed GLSM model to have the generalization capability to discover link formation patterns beyond raw link sequences. We compare GLSM with the existing state-of-art methods on five real-world datasets. The experimental results demonstrate that GLSM obtains future positive links effectively in a generative fashion while achieving the best performance (2-10% improvements on AUC) among other alternatives
Entity matching with transformer architectures - a step forward in data integration
Transformer architectures have proven to be very effective and provide state-of-the-art results in many natural language tasks. The attention-based architecture in combination with pre-training on large amounts of text lead to the recent breakthrough and a variety of slightly different implementations.
In this paper we analyze how well four of the most recent attention-based transformer architectures (BERT, XLNet, RoBERTa and DistilBERT) perform on the task of entity matching - a crucial part of data integration. Entity matching (EM) is the task of finding data instances that refer to the same real-world entity. It is a challenging task if the data instances consist of long textual data or if the data instances are "dirty" due to misplaced values.
To evaluate the capability of transformer architectures and transfer-learning on the task of EM, we empirically compare the four approaches on inherently difficult data sets. We show that transformer architectures outperform classical deep learning methods in EM by an average margin of 27.5%
ClimaX: A foundation model for weather and climate
Most state-of-the-art approaches for weather and climate modeling are based
on physics-informed numerical models of the atmosphere. These approaches aim to
model the non-linear dynamics and complex interactions between multiple
variables, which are challenging to approximate. Additionally, many such
numerical models are computationally intensive, especially when modeling the
atmospheric phenomenon at a fine-grained spatial and temporal resolution.
Recent data-driven approaches based on machine learning instead aim to directly
solve a downstream forecasting or projection task by learning a data-driven
functional mapping using deep neural networks. However, these networks are
trained using curated and homogeneous climate datasets for specific
spatiotemporal tasks, and thus lack the generality of numerical models. We
develop and demonstrate ClimaX, a flexible and generalizable deep learning
model for weather and climate science that can be trained using heterogeneous
datasets spanning different variables, spatio-temporal coverage, and physical
groundings. ClimaX extends the Transformer architecture with novel encoding and
aggregation blocks that allow effective use of available compute while
maintaining general utility. ClimaX is pre-trained with a self-supervised
learning objective on climate datasets derived from CMIP6. The pre-trained
ClimaX can then be fine-tuned to address a breadth of climate and weather
tasks, including those that involve atmospheric variables and spatio-temporal
scales unseen during pretraining. Compared to existing data-driven baselines,
we show that this generality in ClimaX results in superior performance on
benchmarks for weather forecasting and climate projections, even when
pretrained at lower resolutions and compute budgets. The source code is
available at https://github.com/microsoft/ClimaX.Comment: International Conference on Machine Learning 202
Transformer-Based Visual Segmentation: A Survey
Visual segmentation seeks to partition images, video frames, or point clouds
into multiple segments or groups. This technique has numerous real-world
applications, such as autonomous driving, image editing, robot sensing, and
medical analysis. Over the past decade, deep learning-based methods have made
remarkable strides in this area. Recently, transformers, a type of neural
network based on self-attention originally designed for natural language
processing, have considerably surpassed previous convolutional or recurrent
approaches in various vision processing tasks. Specifically, vision
transformers offer robust, unified, and even simpler solutions for various
segmentation tasks. This survey provides a thorough overview of
transformer-based visual segmentation, summarizing recent advancements. We
first review the background, encompassing problem definitions, datasets, and
prior convolutional methods. Next, we summarize a meta-architecture that
unifies all recent transformer-based approaches. Based on this
meta-architecture, we examine various method designs, including modifications
to the meta-architecture and associated applications. We also present several
closely related settings, including 3D point cloud segmentation, foundation
model tuning, domain-aware segmentation, efficient segmentation, and medical
segmentation. Additionally, we compile and re-evaluate the reviewed methods on
several well-established datasets. Finally, we identify open challenges in this
field and propose directions for future research. The project page can be found
at https://github.com/lxtGH/Awesome-Segmenation-With-Transformer. We will also
continually monitor developments in this rapidly evolving field.Comment: Work in progress. Github:
https://github.com/lxtGH/Awesome-Segmenation-With-Transforme
On Generative Models and Joint Architectures for Document-level Relation Extraction
Biomedical text is being generated at a high rate in scientific literature publications and electronic health records. Within these documents lies a wealth of potentially useful information in biomedicine. Relation extraction (RE), the process of automating the identification of structured relationships between entities within text, represents a highly sought-after goal in biomedical informatics, offering the potential to unlock deeper insights and connections from this vast corpus of data. In this dissertation, we tackle this problem with a variety of approaches.
We review the recent history of the field of document-level RE. Several themes emerge. First, graph neural networks dominate the methods for constructing entity and relation representations. Second, clever uses of attention allow for the these constructions to focus on particularly relevant tokens and object (such as mentions and entities) representations. Third, aggregation of signal across mentions in entity-level RE is a key focus of research. Fourth, the injection of additional signal by adding tokens to the text prior to encoding via language model (LM) or through additional learning tasks boosts performance. Last, we explore an assortment of strategies for the challenging task of end-to-end entity-level RE.
Of particular note are sequence-to-sequence (seq2seq) methods that have become particularly popular in the past few years. With the success of general-domain generative LMs, biomedical NLP researchers have trained a variety of these models on biomedical text under the assumption that they would be superior for biomedical tasks. As training such models is computationally expensive, we investigate whether they outperform generic models. We test this assumption rigorously by comparing performance of all major biomedical generative language models to the performances of their generic counterparts across multiple biomedical RE datasets, in the traditional finetuning setting as well as in the few-shot setting. Surprisingly, we found that biomedical models tended to underperform compared to their generic counterparts. However, we found that small-scale biomedical instruction finetuning improved performance to a similar degree as larger-scale generic instruction finetuning.
Zero-shot natural language processing (NLP) offers savings on the expenses associated with annotating datasets and the specialized knowledge required for applying NLP methods. Large, generative LMs trained to align with human objectives have demonstrated impressive zero-shot capabilities over a broad range of tasks. However, the effectiveness of these models in biomedical RE remains uncertain. To bridge this gap in understanding, we investigate how GPT-4 performs across several RE datasets. We experiment with the recent JSON generation features to generate structured output, which we use alternately by defining an explicit schema describing the relation structure, and inferring the structure from the prompt itself. Our work is the first to study zero-shot biomedical RE across a variety of datasets. Overall, performance was lower than that of fully-finetuned methods. Recall suffered in examples with more than a few relations. Entity mention boundaries were a major source of error, which future work could fruitfully address.
In our previous work with generative LMs, we noted that RE performance decreased with the number of gold relations in an example. This observation aligns with the general pattern that recurrent neural network and transformer-based model performance tends to decrease with sequence length. Generative LMs also do not identify textual mentions or group them into entities, which are valuable information extraction tasks unto themselves. Therefore, in this age of generative methods, we revisit non-seq2seq methodology for biomedical RE. We adopt a sequential framework of named entity recognition (NER), clustering mentions into entities, followed by relation classification (RC). As errors early in the pipeline necessarily cause downstream errors, and NER performance is near its ceiling, we focus on improving clustering. We match state-of-the-art (SOTA) performance in NER, and substantially improve mention clustering performance by incorporating dependency parsing and gating string dissimilarity embeddings.
Overall, we advance the field of biomedical RE in a few ways. In our experiments of finetuned LMs, we show that biomedicine-specific models are unnecessary, freeing researchers to make use of SOTA generic LMs. The relatively high few-shot performance in these experiments also suggests that biomedical RE can be reasonably accessible, as it is not so difficult to construct small datasets. Our investigation into zero-shot RE shows that SOTA LMs can compete with fully finetuned smaller LMs. Together these studies also demonstrate weaknesses of generative RE. Last, we show that non-generative RE methods still outperform generative methods in the fully-finetuned setting
Geometric Deep Learning for Computer-Aided Design: A Survey
Geometric Deep Learning techniques have become a transformative force in the
field of Computer-Aided Design (CAD), and have the potential to revolutionize
how designers and engineers approach and enhance the design process. By
harnessing the power of machine learning-based methods, CAD designers can
optimize their workflows, save time and effort while making better informed
decisions, and create designs that are both innovative and practical. The
ability to process the CAD designs represented by geometric data and to analyze
their encoded features enables the identification of similarities among diverse
CAD models, the proposition of alternative designs and enhancements, and even
the generation of novel design alternatives. This survey offers a comprehensive
overview of learning-based methods in computer-aided design across various
categories, including similarity analysis and retrieval, 2D and 3D CAD model
synthesis, and CAD generation from point clouds. Additionally, it provides a
complete list of benchmark datasets and their characteristics, along with
open-source codes that have propelled research in this domain. The final
discussion delves into the challenges prevalent in this field, followed by
potential future research directions in this rapidly evolving field.Comment: 26 pages, 14 figures, journal articl
Graph Meets LLMs: Towards Large Graph Models
Large models have emerged as the most recent groundbreaking achievements in
artificial intelligence, and particularly machine learning. However, when it
comes to graphs, large models have not achieved the same level of success as in
other fields, such as natural language processing and computer vision. In order
to promote applying large models for graphs forward, we present a perspective
paper to discuss the challenges and opportunities associated with developing
large graph models. First, we discuss the desired characteristics of large
graph models. Then, we present detailed discussions from three key
perspectives: representation basis, graph data, and graph models. In each
category, we provide a brief overview of recent advances and highlight the
remaining challenges together with our visions. Finally, we discuss valuable
applications of large graph models. We believe this perspective can encourage
further investigations into large graph models, ultimately pushing us one step
closer towards artificial general intelligence (AGI). We are the first to
comprehensively study large graph models, to the best of our knowledge.Comment: Accepted by NeurIPS 2023 New Frontiers in Graph Learning Workshop.
Comments are welcom
ML-Based User Authentication Through Mouse Dynamics
Increasing reliance on digital services and the limitations of traditional authentication methods have necessitated the development of more advanced and secure user authentication methods. For user authentication and intrusion detection, mouse dynamics, a form of behavioral biometrics, offers a promising and non-invasive method. This paper presents a comprehensive study on ML-Based User Authentication Through Mouse Dynamics.
This project proposes a novel framework integrating sophisticated techniques such as embeddings extraction using Transformer models with cutting-edge machine learning algorithms such as Recurrent Neural Networks (RNN). The project aims to accurately identify users based on their distinct mouse behavior and detect unauthorized access by utilizing the hybrid models. Using a mouse dynamics dataset, the proposed framework’s performance is evaluated, demonstrating its efficacy in accurately identifying users and detecting intrusions.
In addition, a comparative analysis with existing methodologies is provided, highlighting the enhancements made by the proposed framework. This paper contributes to the development of more secure, reliable, and user-friendly authentication systems that leverage the power of machine learning and behavioral biometrics, ultimately augmenting the privacy and security of digital services and resources
A Unified, Scalable Framework for Neural Population Decoding
Our ability to use deep learning approaches to decipher neural activity would
likely benefit from greater scale, in terms of both model size and datasets.
However, the integration of many neural recordings into one unified model is
challenging, as each recording contains the activity of different neurons from
different individual animals. In this paper, we introduce a training framework
and architecture designed to model the population dynamics of neural activity
across diverse, large-scale neural recordings. Our method first tokenizes
individual spikes within the dataset to build an efficient representation of
neural events that captures the fine temporal structure of neural activity. We
then employ cross-attention and a PerceiverIO backbone to further construct a
latent tokenization of neural population activities. Utilizing this
architecture and training framework, we construct a large-scale multi-session
model trained on large datasets from seven nonhuman primates, spanning over 158
different sessions of recording from over 27,373 neural units and over 100
hours of recordings. In a number of different tasks, we demonstrate that our
pretrained model can be rapidly adapted to new, unseen sessions with
unspecified neuron correspondence, enabling few-shot performance with minimal
labels. This work presents a powerful new approach for building deep learning
tools to analyze neural data and stakes out a clear path to training at scale.Comment: Accepted at NeurIPS 202
- …