95 research outputs found
Labelled tree graphs, Feynman diagrams and disk integrals
In this note, we introduce and study a new class of "half integrands" in
Cachazo-He-Yuan (CHY) formula, which naturally generalize the so-called
Parke-Taylor factors; these are dubbed Cayley functions as each of them
corresponds to a labelled tree graph. The CHY formula with a Cayley function
squared gives a sum of Feynman diagrams, and we represent it by a combinatoric
polytope whose vertices correspond to Feynman diagrams. We provide a simple
graphic rule to derive the polytope from a labelled tree graph, and classify
such polytopes ranging from the associahedron to the permutohedron.
Furthermore, we study the linear space of such half integrands and find (1) a
nice formula reducing any Cayley function to a sum of Parke-Taylor factors in
the Kleiss-Kuijf basis (2) a set of Cayley functions as a new basis of the
space; each element has the remarkable property that its CHY formula with a
given Parke-Taylor factor gives either a single Feynman diagram or zero. We
also briefly discuss applications of Cayley functions and the new basis in
certain disk integrals of superstring theory.Comment: 30+8 pages, many figures;typos fixe
LiSum: Open Source Software License Summarization with Multi-Task Learning
Open source software (OSS) licenses regulate the conditions under which users
can reuse, modify, and distribute the software legally. However, there exist
various OSS licenses in the community, written in a formal language, which are
typically long and complicated to understand. In this paper, we conducted a
661-participants online survey to investigate the perspectives and practices of
developers towards OSS licenses. The user study revealed an indeed need for an
automated tool to facilitate license understanding. Motivated by the user study
and the fast growth of licenses in the community, we propose the first study
towards automated license summarization. Specifically, we released the first
high quality text summarization dataset and designed two tasks, i.e., license
text summarization (LTS), aiming at generating a relatively short summary for
an arbitrary license, and license term classification (LTC), focusing on the
attitude inference towards a predefined set of key license terms (e.g.,
Distribute). Aiming at the two tasks, we present LiSum, a multi-task learning
method to help developers overcome the obstacles of understanding OSS licenses.
Comprehensive experiments demonstrated that the proposed jointly training
objective boosted the performance on both tasks, surpassing state-of-the-art
baselines with gains of at least 5 points w.r.t. F1 scores of four
summarization metrics and achieving 95.13% micro average F1 score for
classification simultaneously. We released all the datasets, the replication
package, and the questionnaires for the community
AdvCat: Domain-Agnostic Robustness Assessment for Cybersecurity-Critical Applications with Categorical Inputs
Machine Learning-as-a-Service systems (MLaaS) have been largely developed for
cybersecurity-critical applications, such as detecting network intrusions and
fake news campaigns. Despite effectiveness, their robustness against
adversarial attacks is one of the key trust concerns for MLaaS deployment. We
are thus motivated to assess the adversarial robustness of the Machine Learning
models residing at the core of these security-critical applications with
categorical inputs. Previous research efforts on accessing model robustness
against manipulation of categorical inputs are specific to use cases and
heavily depend on domain knowledge, or require white-box access to the target
ML model. Such limitations prevent the robustness assessment from being as a
domain-agnostic service provided to various real-world applications. We propose
a provably optimal yet computationally highly efficient adversarial robustness
assessment protocol for a wide band of ML-driven cybersecurity-critical
applications. We demonstrate the use of the domain-agnostic robustness
assessment method with substantial experimental study on fake news detection
and intrusion detection problems.Comment: IEEE BigData 202
AdvCat: Domain-Agnostic Robustness Assessment for Cybersecurity-Critical Applications with Categorical Inputs
International audienceMachine Learning-as-a-Service systems (MLaaS) have been largely developed for cybersecurity-critical applications, such as detecting network intrusions and fake news campaigns. Despite effectiveness, their robustness against adversarial attacks is one of the key trust concerns for MLaaS deployment. We are thus motivated to assess the adversarial robustness of the Machine Learning models residing at the core of these securitycritical applications with categorical inputs. Previous research efforts on accessing model robustness against manipulation of categorical inputs are specific to use cases and heavily depend on domain knowledge, or require white-box access to the target ML model. Such limitations prevent the robustness assessment from being as a domain-agnostic service provided to various real-world applications. We propose a provably optimal yet computationally highly efficient adversarial robustness assessment protocol for a wide band of ML-driven cybersecurity-critical applications. We demonstrate the use of the domain-agnostic robustness assessment method with substantial experimental study on fake news detection and intrusion detection problems
StarGraph: A Coarse-to-Fine Representation Method for Large-Scale Knowledge Graph
Conventional representation learning algorithms for knowledge graphs (KG) map
each entity to a unique embedding vector, ignoring the rich information
contained in neighbor entities. We propose a method named StarGraph, which
gives a novel way to utilize the neighborhood information for large-scale
knowledge graphs to get better entity representations. The core idea is to
divide the neighborhood information into different levels for sampling and
processing, where the generalized coarse-grained information and unique
fine-grained information are combined to generate an efficient subgraph for
each node. In addition, a self-attention network is proposed to process the
subgraphs and get the entity representations, which are used to replace the
entity embeddings in conventional methods. The proposed method achieves the
best results on the ogbl-wikikg2 dataset, which validates the effectiveness of
it. The code is now available at https://github.com/hzli-ucas/StarGrap
- âŠ