224 research outputs found
PACE: Pattern Accurate Computationally Efficient Bootstrapping for Timely Discovery of Cyber-Security Concepts
Public disclosure of important security information, such as knowledge of
vulnerabilities or exploits, often occurs in blogs, tweets, mailing lists, and
other online sources months before proper classification into structured
databases. In order to facilitate timely discovery of such knowledge, we
propose a novel semi-supervised learning algorithm, PACE, for identifying and
classifying relevant entities in text sources. The main contribution of this
paper is an enhancement of the traditional bootstrapping method for entity
extraction by employing a time-memory trade-off that simultaneously circumvents
a costly corpus search while strengthening pattern nomination, which should
increase accuracy. An implementation in the cyber-security domain is discussed
as well as challenges to Natural Language Processing imposed by the security
domain.Comment: 6 pages, 3 figures, ieeeTran conference. International Conference on
Machine Learning and Applications 201
Balancing Interactive Data Management of Massive Data with Situational Awareness through Smart Aggregation
Designing a visualization system capable of processing, managing, and presenting massive data sets while maximizing the user’s situational awareness (SA) is a challenging, but important, research question in visual analytics. Traditional data management and interactive retrieval approaches have often focused on solving the data overload problem at the expense of the user’s SA. This paper discusses various data management strategies and the strengths and limitations of each approach in providing the user with SA. A new data management strategy, coined Smart Aggregation, is presented as a powerful approach to overcome the challenges of both massive data sets and maintaining SA. By combining automatic data aggregation with user-defined controls on what, how, and when data should be aggregated, we present a visualization system that can handle massive amounts of data while affording the user with the best possible SA. This approach ensures that a system is always usable in terms of both system resources and human perceptual resources. We have implemented our Smart Aggregation approach in a visual analytics system called VIAssist (Visual Assistant for Information Assurance Analysis) to facilitate exploration, discovery, and SA in th
The Influence of Visual Provenance Representations on Strategies in a Collaborative Hand-off Data Analysis Scenario
Conducting data analysis tasks rarely occur in isolation. Especially in
intelligence analysis scenarios where different experts contribute knowledge to
a shared understanding, members must communicate how insights develop to
establish common ground among collaborators. The use of provenance to
communicate analytic sensemaking carries promise by describing the interactions
and summarizing the steps taken to reach insights. Yet, no universal guidelines
exist for communicating provenance in different settings. Our work focuses on
the presentation of provenance information and the resulting conclusions
reached and strategies used by new analysts. In an open-ended, 30-minute,
textual exploration scenario, we qualitatively compare how adding different
types of provenance information (specifically data coverage and interaction
history) affects analysts' confidence in conclusions developed, propensity to
repeat work, filtering of data, identification of relevant information, and
typical investigation strategies. We see that data coverage (i.e., what was
interacted with) provides provenance information without limiting individual
investigation freedom. On the other hand, while interaction history (i.e., when
something was interacted with) does not significantly encourage more mimicry,
it does take more time to comfortably understand, as represented by less
confident conclusions and less relevant information-gathering behaviors. Our
results contribute empirical data towards understanding how provenance
summarizations can influence analysis behaviors.Comment: to be published in IEEE Vis 202
Global analysis of the mammalian RNA degradome reveals widespread miRNA-dependent and miRNA-independent endonucleolytic cleavage
The Ago2 component of the RNA-induced silencing complex (RISC) is an endonuclease that cleaves mRNAs that base pair with high complementarity to RISC-bound microRNAs. Many examples of such direct cleavage have been identified in plants, but not in vertebrates, despite the conservation of catalytic capacity in vertebrate Ago2. We performed parallel analysis of RNA ends (PAREs), a deep sequencing approach that identifies 5′-phosphorylated, polyadenylated RNAs, to detect potential microRNA-directed mRNA cleavages in mouse embryo and adult tissues. We found that numerous mRNAs are potentially targeted for cleavage by endogenous microRNAs, but at very low levels relative to the mRNA abundance, apart from miR-151-5p-guided cleavage of the N4BP1 mRNA. We also find numerous examples of non-miRNA-directed cleavage, including cleavage of a group of mRNAs within a CA-repeat consensus sequence. The PARE analysis also identified many examples of adenylated small non-coding RNAs, including microRNAs, tRNA processing intermediates and various other small RNAs, consistent with adenylation being part of a widespread proof-reading and/or degradation pathway for small RNAs
Recommended from our members
Identification of extracellular glycerophosphodiesterases in Pseudomonas and their role in soil organic phosphorus remineralisation
In soils, phosphorus (P) exists in numerous organic and inorganic forms. However, plants can only acquire inorganic orthophosphate (Pi), meaning global crop production is frequently limited by P availability. To overcome this problem, rock phosphate fertilisers are heavily applied, often with negative environmental and socio-economic consequences. The organic P fraction of soil contains phospholipids that are rapidly degraded resulting in the release of bioavailable Pi. However, the mechanisms behind this process remain unknown. We identified and experimentally confirmed the function of two secreted glycerolphosphodiesterases, GlpQI and GlpQII, found in Pseudomonas stutzeri DSM4166 and Pseudomonas fluorescens SBW25, respectively. A series of co-cultivation experiments revealed that in these Pseudomonas strains, cleavage of glycerolphosphorylcholine and its breakdown product G3P occurs extracellularly allowing other bacteria to benefit from this metabolism. Analyses of metagenomic and metatranscriptomic datasets revealed that this trait is widespread among soil bacteria with Actinobacteria and Proteobacteria, specifically Betaproteobacteria and Gammaproteobacteria, the likely major players
- …