2,123 research outputs found
TRIDEnT: Building Decentralized Incentives for Collaborative Security
Sophisticated mass attacks, especially when exploiting zero-day
vulnerabilities, have the potential to cause destructive damage to
organizations and critical infrastructure. To timely detect and contain such
attacks, collaboration among the defenders is critical. By correlating
real-time detection information (alerts) from multiple sources (collaborative
intrusion detection), defenders can detect attacks and take the appropriate
defensive measures in time. However, although the technical tools to facilitate
collaboration exist, real-world adoption of such collaborative security
mechanisms is still underwhelming. This is largely due to a lack of trust and
participation incentives for companies and organizations. This paper proposes
TRIDEnT, a novel collaborative platform that aims to enable and incentivize
parties to exchange network alert data, thus increasing their overall detection
capabilities. TRIDEnT allows parties that may be in a competitive relationship,
to selectively advertise, sell and acquire security alerts in the form of
(near) real-time peer-to-peer streams. To validate the basic principles behind
TRIDEnT, we present an intuitive game-theoretic model of alert sharing, that is
of independent interest, and show that collaboration is bound to take place
infinitely often. Furthermore, to demonstrate the feasibility of our approach,
we instantiate our design in a decentralized manner using Ethereum smart
contracts and provide a fully functional prototype.Comment: 28 page
OnionBots: Subverting Privacy Infrastructure for Cyber Attacks
Over the last decade botnets survived by adopting a sequence of increasingly
sophisticated strategies to evade detection and take overs, and to monetize
their infrastructure. At the same time, the success of privacy infrastructures
such as Tor opened the door to illegal activities, including botnets,
ransomware, and a marketplace for drugs and contraband. We contend that the
next waves of botnets will extensively subvert privacy infrastructure and
cryptographic mechanisms. In this work we propose to preemptively investigate
the design and mitigation of such botnets. We first, introduce OnionBots, what
we believe will be the next generation of resilient, stealthy botnets.
OnionBots use privacy infrastructures for cyber attacks by completely
decoupling their operation from the infected host IP address and by carrying
traffic that does not leak information about its source, destination, and
nature. Such bots live symbiotically within the privacy infrastructures to
evade detection, measurement, scale estimation, observation, and in general all
IP-based current mitigation techniques. Furthermore, we show that with an
adequate self-healing network maintenance scheme, that is simple to implement,
OnionBots achieve a low diameter and a low degree and are robust to
partitioning under node deletions. We developed a mitigation technique, called
SOAP, that neutralizes the nodes of the basic OnionBots. We also outline and
discuss a set of techniques that can enable subsequent waves of Super
OnionBots. In light of the potential of such botnets, we believe that the
research community should proactively develop detection and mitigation methods
to thwart OnionBots, potentially making adjustments to privacy infrastructure.Comment: 12 pages, 8 figure
Exploring Linguistic Constraints in Nlp Applications
The key argument of this dissertation is that the success of an Natural Language Processing (NLP) application depends on a proper representation of the corresponding linguistic problem. This theme is raised in the context that the recent progress made in our field is widely credited to the effective use of strong engineering techniques. However, the intriguing power of highly lexicalized models shown in many NLP applications is not only an achievement by the development in machine learning, but also impossible without the extensive hand-annotated data resources made available,
which are originally built with very deep linguistic considerations.
More specifically, we explore three linguistic aspects in this dissertation: the distinction between closed-class vs. open-class words, long-tail distributions in vocabulary study
and determinism in language models. The first two aspects are studied in unsupervised tasks, unsupervised part-of-speech (POS) tagging and morphology learning, and the last one is studied in supervised tasks, English POS tagging and Chinese word segmentation. Each linguistic aspect under study manifests
itself in a (different) way to help improve performance or efficiency in some NLP application
Integrating Personality Psychology into Economics
This paper reviews the problems and potential benefits of integrating personality psychology into economics. Economists have much to learn from and contribute to personality psychology.personality psychology, behavioral economics, identification, causality
PLAZA 4.0 : an integrative resource for functional, evolutionary and comparative plant genomics
PLAZA (https://bioinformatics.psb.ugent.be/plaza) is a plant-oriented online resource for comparative, evolutionary and functional genomics. The PLAZA platform consists of multiple independent instances focusing on different plant clades, while also providing access to a consistent set of reference species. Each PLAZA instance contains structural and functional gene annotations, gene family data and phylogenetic trees and detailed gene colinearity information. A user-friendly web interface makes the necessary tools and visualizations accessible, specific for each data type. Here we present PLAZA 4.0, the latest iteration of the PLAZA framework. This version consists of two new instances (Dicots 4.0 and Monocots 4.0) providing a large increase in newly available species, and offers access to updated and newly implemented tools and visualizations, helping users with the ever-increasing demands for complex and in-depth analyzes. The total number of species across both instances nearly doubles from 37 species in PLAZA 3.0 to 71 species in PLAZA 4.0, with a much broader coverage of crop species (e.g. wheat, palm oil) and species of evolutionary interest (e.g. spruce, Marchantia). The new PLAZA instances can also be accessed by a programming interface through a RESTful web service, thus allowing bioinformaticians to optimally leverage the power of the PLAZA platform
Text Generation with Efficient (Soft) Q-Learning
Maximum likelihood estimation (MLE) is the predominant algorithm for training
text generation models. This paradigm relies on direct supervision examples,
which is not applicable to many applications, such as generating adversarial
attacks or generating prompts to control language models. Reinforcement
learning (RL) on the other hand offers a more flexible solution by allowing
users to plug in arbitrary task metrics as reward. Yet previous RL algorithms
for text generation, such as policy gradient (on-policy RL) and Q-learning
(off-policy RL), are often notoriously inefficient or unstable to train due to
the large sequence space and the sparse reward received only at the end of
sequences. In this paper, we introduce a new RL formulation for text generation
from the soft Q-learning perspective. It further enables us to draw from the
latest RL advances, such as path consistency learning, to combine the best of
on-/off-policy updates, and learn effectively from sparse reward. We apply the
approach to a wide range of tasks, including learning from noisy/negative
examples, adversarial attacks, and prompt generation. Experiments show our
approach consistently outperforms both task-specialized algorithms and the
previous RL methods. On standard supervised tasks where MLE prevails, our
approach also achieves competitive performance and stability by training text
generation from scratch.Comment: Code available at
https://github.com/HanGuo97/soft-Q-learning-for-text-generatio
Annotating patient clinical records with syntactic chunks and named entities: the Harvey corpus
The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spelling mistakes, inconsistencies in punctuation, and non-standard word order. To support information extraction and classification tasks over such text, we describe a de-identified corpus of free text notes, a shallow syntactic and named entity annotation scheme for this kind of text, and an approach to training domain specialists with no linguistic background to annotate the text. Finally, we present a statistical chunking system for such clinical text with a stable learning rate and good accuracy, indicating that the manual annotation is consistent and that the annotation scheme is tractable for machine learning
- …