3,399 research outputs found
Supersymmetry versus Gauge Symmetry on the Heterotic Landscape
One of the goals of the landscape program in string theory is to extract
information about the space of string vacua in the form of statistical
correlations between phenomenological features that are otherwise uncorrelated
in field theory. Such correlations would thus represent predictions of string
theory that hold independently of a vacuum-selection principle. In this paper,
we study statistical correlations between two features which are likely to be
central to any potential description of nature at high energy scales: gauge
symmetries and spacetime supersymmetry. We analyze correlations between these
two kinds of symmetry within the context of perturbative heterotic string
vacua, and find a number of striking features. We find, for example, that the
degree of spacetime supersymmetry is strongly correlated with the probabilities
of realizing certain gauge groups, with unbroken supersymmetry at the string
scale tending to favor gauge-group factors with larger rank. We also find that
nearly half of the heterotic landscape is non-supersymmetric and yet
tachyon-free at tree level; indeed, less than a quarter of the tree-level
heterotic landscape exhibits any supersymmetry at all at the string scale.Comment: 29 pages, LaTeX, 4 figures, 7 table
Effective Knowledge Graph Aggregation for Malware-Related Cybersecurity Text
With the rate at which malware spreads in the modern age, it is extremely important that cyber security analysts are able to extract relevant information pertaining to new and active threats in a timely and effective manner. Having to manually read through articles and blog posts on the internet is time consuming and usually involves sifting through much repeated information. Knowledge graphs, a structured representation of relationship information, are an effective way to visually condense information presented in large amounts of unstructured text for human readers. Thusly, they are useful for sifting through the abundance of cyber security information that is released through web-based security articles and blogs. This paper presents a pipeline for extracting these relationships using supervised deep learning with the recent state-of-the-art transformer-based neural architectures for sequence processing tasks. To this end, a corpus of text from a range of prominent cybersecurity-focused media outlets was manually annotated. An algorithm is also presented that keeps potentially redundant relationships from being added to an existing knowledge graph, using a cosine-similarity metric on pre-trained word embeddings
Visualizing the semantic content of large text databases using text maps
A methodology for generating text map representations of the semantic content of text databases is presented. Text maps provide a graphical metaphor for conceptualizing and visualizing the contents and data interrelationships of large text databases. Described are a set of experiments conducted against the TIPSTER corpora of Wall Street Journal articles. These experiments provide an introduction to current work in the representation and visualization of documents by way of their semantic content
Evolving Large-Scale Data Stream Analytics based on Scalable PANFIS
Many distributed machine learning frameworks have recently been built to
speed up the large-scale data learning process. However, most distributed
machine learning used in these frameworks still uses an offline algorithm model
which cannot cope with the data stream problems. In fact, large-scale data are
mostly generated by the non-stationary data stream where its pattern evolves
over time. To address this problem, we propose a novel Evolving Large-scale
Data Stream Analytics framework based on a Scalable Parsimonious Network based
on Fuzzy Inference System (Scalable PANFIS), where the PANFIS evolving
algorithm is distributed over the worker nodes in the cloud to learn
large-scale data stream. Scalable PANFIS framework incorporates the active
learning (AL) strategy and two model fusion methods. The AL accelerates the
distributed learning process to generate an initial evolving large-scale data
stream model (initial model), whereas the two model fusion methods aggregate an
initial model to generate the final model. The final model represents the
update of current large-scale data knowledge which can be used to infer future
data. Extensive experiments on this framework are validated by measuring the
accuracy and running time of four combinations of Scalable PANFIS and other
Spark-based built in algorithms. The results indicate that Scalable PANFIS with
AL improves the training time to be almost two times faster than Scalable
PANFIS without AL. The results also show both rule merging and the voting
mechanisms yield similar accuracy in general among Scalable PANFIS algorithms
and they are generally better than Spark-based algorithms. In terms of running
time, the Scalable PANFIS training time outperforms all Spark-based algorithms
when classifying numerous benchmark datasets.Comment: 20 pages, 5 figure
- …