283,130 research outputs found
Efficient Correlated Topic Modeling with Topic Embedding
Correlated topic modeling has been limited to small model and problem sizes
due to their high computational cost and poor scaling. In this paper, we
propose a new model which learns compact topic embeddings and captures topic
correlations through the closeness between the topic vectors. Our method
enables efficient inference in the low-dimensional embedding space, reducing
previous cubic or quadratic time complexity to linear w.r.t the topic size. We
further speedup variational inference with a fast sampler to exploit sparsity
of topic occurrence. Extensive experiments show that our approach is capable of
handling model and data scales which are several orders of magnitude larger
than existing correlation results, without sacrificing modeling quality by
providing competitive or superior performance in document classification and
retrieval.Comment: KDD 2017 oral. The first two authors contributed equall
Classifying Web Exploits with Topic Modeling
This short empirical paper investigates how well topic modeling and database
meta-data characteristics can classify web and other proof-of-concept (PoC)
exploits for publicly disclosed software vulnerabilities. By using a dataset
comprised of over 36 thousand PoC exploits, near a 0.9 accuracy rate is
obtained in the empirical experiment. Text mining and topic modeling are a
significant boost factor behind this classification performance. In addition to
these empirical results, the paper contributes to the research tradition of
enhancing software vulnerability information with text mining, providing also a
few scholarly observations about the potential for semi-automatic
classification of exploits in the existing tracking infrastructures.Comment: Proceedings of the 2017 28th International Workshop on Database and
Expert Systems Applications (DEXA).
http://ieeexplore.ieee.org/abstract/document/8049693
Integrative biological simulation praxis: Considerations from physics, philosophy, and data/model curation practices
Integrative biological simulations have a varied and controversial history in
the biological sciences. From computational models of organelles, cells, and
simple organisms, to physiological models of tissues, organ systems, and
ecosystems, a diverse array of biological systems have been the target of
large-scale computational modeling efforts. Nonetheless, these research agendas
have yet to prove decisively their value among the broader community of
theoretical and experimental biologists. In this commentary, we examine a range
of philosophical and practical issues relevant to understanding the potential
of integrative simulations. We discuss the role of theory and modeling in
different areas of physics and suggest that certain sub-disciplines of physics
provide useful cultural analogies for imagining the future role of simulations
in biological research. We examine philosophical issues related to modeling
which consistently arise in discussions about integrative simulations and
suggest a pragmatic viewpoint that balances a belief in philosophy with the
recognition of the relative infancy of our state of philosophical
understanding. Finally, we discuss community workflow and publication practices
to allow research to be readily discoverable and amenable to incorporation into
simulations. We argue that there are aligned incentives in widespread adoption
of practices which will both advance the needs of integrative simulation
efforts as well as other contemporary trends in the biological sciences,
ranging from open science and data sharing to improving reproducibility.Comment: 10 page
- …