95 research outputs found

    Labelled tree graphs, Feynman diagrams and disk integrals

    Full text link
    In this note, we introduce and study a new class of "half integrands" in Cachazo-He-Yuan (CHY) formula, which naturally generalize the so-called Parke-Taylor factors; these are dubbed Cayley functions as each of them corresponds to a labelled tree graph. The CHY formula with a Cayley function squared gives a sum of Feynman diagrams, and we represent it by a combinatoric polytope whose vertices correspond to Feynman diagrams. We provide a simple graphic rule to derive the polytope from a labelled tree graph, and classify such polytopes ranging from the associahedron to the permutohedron. Furthermore, we study the linear space of such half integrands and find (1) a nice formula reducing any Cayley function to a sum of Parke-Taylor factors in the Kleiss-Kuijf basis (2) a set of Cayley functions as a new basis of the space; each element has the remarkable property that its CHY formula with a given Parke-Taylor factor gives either a single Feynman diagram or zero. We also briefly discuss applications of Cayley functions and the new basis in certain disk integrals of superstring theory.Comment: 30+8 pages, many figures;typos fixe

    LiSum: Open Source Software License Summarization with Multi-Task Learning

    Full text link
    Open source software (OSS) licenses regulate the conditions under which users can reuse, modify, and distribute the software legally. However, there exist various OSS licenses in the community, written in a formal language, which are typically long and complicated to understand. In this paper, we conducted a 661-participants online survey to investigate the perspectives and practices of developers towards OSS licenses. The user study revealed an indeed need for an automated tool to facilitate license understanding. Motivated by the user study and the fast growth of licenses in the community, we propose the first study towards automated license summarization. Specifically, we released the first high quality text summarization dataset and designed two tasks, i.e., license text summarization (LTS), aiming at generating a relatively short summary for an arbitrary license, and license term classification (LTC), focusing on the attitude inference towards a predefined set of key license terms (e.g., Distribute). Aiming at the two tasks, we present LiSum, a multi-task learning method to help developers overcome the obstacles of understanding OSS licenses. Comprehensive experiments demonstrated that the proposed jointly training objective boosted the performance on both tasks, surpassing state-of-the-art baselines with gains of at least 5 points w.r.t. F1 scores of four summarization metrics and achieving 95.13% micro average F1 score for classification simultaneously. We released all the datasets, the replication package, and the questionnaires for the community

    AdvCat: Domain-Agnostic Robustness Assessment for Cybersecurity-Critical Applications with Categorical Inputs

    Full text link
    Machine Learning-as-a-Service systems (MLaaS) have been largely developed for cybersecurity-critical applications, such as detecting network intrusions and fake news campaigns. Despite effectiveness, their robustness against adversarial attacks is one of the key trust concerns for MLaaS deployment. We are thus motivated to assess the adversarial robustness of the Machine Learning models residing at the core of these security-critical applications with categorical inputs. Previous research efforts on accessing model robustness against manipulation of categorical inputs are specific to use cases and heavily depend on domain knowledge, or require white-box access to the target ML model. Such limitations prevent the robustness assessment from being as a domain-agnostic service provided to various real-world applications. We propose a provably optimal yet computationally highly efficient adversarial robustness assessment protocol for a wide band of ML-driven cybersecurity-critical applications. We demonstrate the use of the domain-agnostic robustness assessment method with substantial experimental study on fake news detection and intrusion detection problems.Comment: IEEE BigData 202

    AdvCat: Domain-Agnostic Robustness Assessment for Cybersecurity-Critical Applications with Categorical Inputs

    Get PDF
    International audienceMachine Learning-as-a-Service systems (MLaaS) have been largely developed for cybersecurity-critical applications, such as detecting network intrusions and fake news campaigns. Despite effectiveness, their robustness against adversarial attacks is one of the key trust concerns for MLaaS deployment. We are thus motivated to assess the adversarial robustness of the Machine Learning models residing at the core of these securitycritical applications with categorical inputs. Previous research efforts on accessing model robustness against manipulation of categorical inputs are specific to use cases and heavily depend on domain knowledge, or require white-box access to the target ML model. Such limitations prevent the robustness assessment from being as a domain-agnostic service provided to various real-world applications. We propose a provably optimal yet computationally highly efficient adversarial robustness assessment protocol for a wide band of ML-driven cybersecurity-critical applications. We demonstrate the use of the domain-agnostic robustness assessment method with substantial experimental study on fake news detection and intrusion detection problems

    StarGraph: A Coarse-to-Fine Representation Method for Large-Scale Knowledge Graph

    Full text link
    Conventional representation learning algorithms for knowledge graphs (KG) map each entity to a unique embedding vector, ignoring the rich information contained in neighbor entities. We propose a method named StarGraph, which gives a novel way to utilize the neighborhood information for large-scale knowledge graphs to get better entity representations. The core idea is to divide the neighborhood information into different levels for sampling and processing, where the generalized coarse-grained information and unique fine-grained information are combined to generate an efficient subgraph for each node. In addition, a self-attention network is proposed to process the subgraphs and get the entity representations, which are used to replace the entity embeddings in conventional methods. The proposed method achieves the best results on the ogbl-wikikg2 dataset, which validates the effectiveness of it. The code is now available at https://github.com/hzli-ucas/StarGrap
    • 

    corecore