1,781 research outputs found
Sampled Weighted Min-Hashing for Large-Scale Topic Mining
We present Sampled Weighted Min-Hashing (SWMH), a randomized approach to
automatically mine topics from large-scale corpora. SWMH generates multiple
random partitions of the corpus vocabulary based on term co-occurrence and
agglomerates highly overlapping inter-partition cells to produce the mined
topics. While other approaches define a topic as a probabilistic distribution
over a vocabulary, SWMH topics are ordered subsets of such vocabulary.
Interestingly, the topics mined by SWMH underlie themes from the corpus at
different levels of granularity. We extensively evaluate the meaningfulness of
the mined topics both qualitatively and quantitatively on the NIPS (1.7 K
documents), 20 Newsgroups (20 K), Reuters (800 K) and Wikipedia (4 M) corpora.
Additionally, we compare the quality of SWMH with Online LDA topics for
document representation in classification.Comment: 10 pages, Proceedings of the Mexican Conference on Pattern
Recognition 201
Science Concierge: A fast content-based recommendation system for scientific publications
Finding relevant publications is important for scientists who have to cope
with exponentially increasing numbers of scholarly material. Algorithms can
help with this task as they help for music, movie, and product recommendations.
However, we know little about the performance of these algorithms with
scholarly material. Here, we develop an algorithm, and an accompanying Python
library, that implements a recommendation system based on the content of
articles. Design principles are to adapt to new content, provide near-real time
suggestions, and be open source. We tested the library on 15K posters from the
Society of Neuroscience Conference 2015. Human curated topics are used to cross
validate parameters in the algorithm and produce a similarity metric that
maximally correlates with human judgments. We show that our algorithm
significantly outperformed suggestions based on keywords. The work presented
here promises to make the exploration of scholarly material faster and more
accurate.Comment: 12 pages, 5 figure
Optimal client recommendation for market makers in illiquid financial products
The process of liquidity provision in financial markets can result in
prolonged exposure to illiquid instruments for market makers. In this case,
where a proprietary position is not desired, pro-actively targeting the right
client who is likely to be interested can be an effective means to offset this
position, rather than relying on commensurate interest arising through natural
demand. In this paper, we consider the inference of a client profile for the
purpose of corporate bond recommendation, based on typical recorded information
available to the market maker. Given a historical record of corporate bond
transactions and bond meta-data, we use a topic-modelling analogy to develop a
probabilistic technique for compiling a curated list of client recommendations
for a particular bond that needs to be traded, ranked by probability of
interest. We show that a model based on Latent Dirichlet Allocation offers
promising performance to deliver relevant recommendations for sales traders.Comment: 12 pages, 3 figures, 1 tabl
- …