6 research outputs found
Towards Exact Molecular Dynamics Simulations with Machine-Learned Force Fields
Molecular dynamics (MD) simulations employing classical force fields
constitute the cornerstone of contemporary atomistic modeling in chemistry,
biology, and materials science. However, the predictive power of these
simulations is only as good as the underlying interatomic potential. Classical
potentials often fail to faithfully capture key quantum effects in molecules
and materials. Here we enable the direct construction of flexible molecular
force fields from high-level ab initio calculations by incorporating spatial
and temporal physical symmetries into a gradient-domain machine learning
(sGDML) model in an automatic data-driven way. The developed sGDML approach
faithfully reproduces global force fields at quantum-chemical CCSD(T) level of
accuracy and allows converged molecular dynamics simulations with fully
quantized electrons and nuclei. We present MD simulations, for flexible
molecules with up to a few dozen atoms and provide insights into the dynamical
behavior of these molecules. Our approach provides the key missing ingredient
for achieving spectroscopic accuracy in molecular simulations
A Survey on Text Classification Algorithms: From Text to Predictions
In recent years, the exponential growth of digital documents has been met by rapid progress in text classification techniques. Newly proposed machine learning algorithms leverage the latest advancements in deep learning methods, allowing for the automatic extraction of expressive features. The swift development of these methods has led to a plethora of strategies to encode natural language into machine-interpretable data. The latest language modelling algorithms are used in conjunction with ad hoc preprocessing procedures, of which the description is often omitted in favour of a more detailed explanation of the classification step. This paper offers a concise review of recent text classification models, with emphasis on the flow of data, from raw text to output labels. We highlight the differences between earlier methods and more recent, deep learning-based methods in both their functioning and in how they transform input data. To give a better perspective on the text classification landscape, we provide an overview of datasets for the English language, as well as supplying instructions for the synthesis of two new multilabel datasets, which we found to be particularly scarce in this setting. Finally, we provide an outline of new experimental results and discuss the open research challenges posed by deep learning-based language models
Ticket Automation: an Insight into Current Research with Applications to Multi-level Classification Scenarios
odern service providers often have to deal with large amounts of customer requests, which
they need to act upon in a swift and effective manner to ensure adequate support is provided.
In this context, machine learning algorithms are fundamental in streamlining support ticket
processing workflows. However, a large part of current approaches is still based on traditional
Natural Language Processing approaches without fully exploiting the latest advancements in this
field. In this work, we aim to provide an overview of support Ticket Automation, what recent
proposals are being made in this field, and how well some of these methods can generalize
to new scenarios and datasets. We list the most recent proposals for these tasks and examine
in detail the ones related to Ticket Classification, the most prevalent of them. We analyze
commonly utilized datasets and experiment on two of them, both characterized by a two-level
hierarchy of labels, which are descriptive of the ticket’s topic at different levels of granularity.
The first is a collection of 20,000 customer complaints, and the second comprises 35,000 issues
crawled from a bug reporting website. Using this data, we focus on topically classifying tickets
using a pre-trained BERT language model. The experimental section of this work has two
objectives. First, we demonstrate the impact of different document representation strategies
on classification performance. Secondly, we showcase an effective way to boost classification
by injecting information from the hierarchical structure of the labels into the classifier. Our
findings show that the choice of the embedding strategy for ticket embeddings considerably
impacts classification metrics on our datasets: the best method improves by more than 28% in F1-
score over the standard strategy. We also showcase the effectiveness of hierarchical information
injection, which further improves the results. In the bugs dataset, one of our multi-level models
(ML-BERT) outperforms the best baseline by up to 5.7% in F1-score and 5.4% in accuracy
Transitive Assignment Kernels for Structural Classification
Kernel methods provide a convenient way to apply a wide range of learning techniques to complex and structured data by shifting the representational problem from one of finding an embedding of the data to that of defining a positive semi-definite kernel. One problem with the most widely used kernels is that they neglect the locational information within the structures, resulting in less discrimination. Correspondence-based kernels, on the other hand, are in general more discriminating, at the cost of sacrificing positive-definiteness due to their inability to guarantee transitivity of the correspondences between multiple graphs. In this paper we adopt a general framework for the projection of (relaxed) correspondences onto the space of transitive correspondences, thus transforming any given matching algorithm onto a transitive multi-graph matching approach. The resulting transitive correspondences can then be used to provide a kernel that both maintains locational information and is guaranteed to be positive-definite. Experimental evaluation validates the effectiveness of the kernel for several structural classification tasks