Search CORE

1,781 research outputs found

Sampled Weighted Min-Hashing for Large-Scale Topic Mining

Author: AZ Broder
DM Blei
G Fuentes Pineda
G Salton
GE Hinton
O Chum
YW Teh
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/09/2015
Field of study

We present Sampled Weighted Min-Hashing (SWMH), a randomized approach to automatically mine topics from large-scale corpora. SWMH generates multiple random partitions of the corpus vocabulary based on term co-occurrence and agglomerates highly overlapping inter-partition cells to produce the mined topics. While other approaches define a topic as a probabilistic distribution over a vocabulary, SWMH topics are ordered subsets of such vocabulary. Interestingly, the topics mined by SWMH underlie themes from the corpus at different levels of granularity. We extensively evaluate the meaningfulness of the mined topics both qualitatively and quantitatively on the NIPS (1.7 K documents), 20 Newsgroups (20 K), Reuters (800 K) and Wikipedia (4 M) corpora. Additionally, we compare the quality of SWMH with Online LDA topics for document representation in classification.Comment: 10 pages, Proceedings of the Mexican Conference on Pattern Recognition 201

arXiv.org e-Print Archive

Crossref

Science Concierge: A fast content-based recommendation system for scientific publications

Author: Achakulvisut Titipat
Acuna Daniel E.
Kording Konrad
Ruangrong Tulakan
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2016
Field of study

Finding relevant publications is important for scientists who have to cope with exponentially increasing numbers of scholarly material. Algorithms can help with this task as they help for music, movie, and product recommendations. However, we know little about the performance of these algorithms with scholarly material. Here, we develop an algorithm, and an accompanying Python library, that implements a recommendation system based on the content of articles. Design principles are to adapt to new content, provide near-real time suggestions, and be open source. We tested the library on 15K posters from the Society of Neuroscience Conference 2015. Human curated topics are used to cross validate parameters in the algorithm and produce a similarity metric that maximally correlates with human judgments. We show that our algorithm significantly outperformed suggestions based on keywords. The work presented here promises to make the exploration of scholarly material faster and more accurate.Comment: 12 pages, 5 figure

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

Optimal client recommendation for market makers in illiquid financial products

Author: DD Lee
DJC MacKay
DM Blei
DM Blei
EJ Elton
F Pedregosa
G Shani
GE Batista
I Kim
KS Jones
L Bolelli
M Avellaneda
M Hoffman
MI Jordan
S Robertson
Y Amihud
Publication venue
Publication date: 27/04/2017
Field of study

The process of liquidity provision in financial markets can result in prolonged exposure to illiquid instruments for market makers. In this case, where a proprietary position is not desired, pro-actively targeting the right client who is likely to be interested can be an effective means to offset this position, rather than relying on commensurate interest arising through natural demand. In this paper, we consider the inference of a client profile for the purpose of corporate bond recommendation, based on typical recorded information available to the market maker. Given a historical record of corporate bond transactions and bond meta-data, we use a topic-modelling analogy to develop a probabilistic technique for compiling a curated list of client recommendations for a particular bond that needs to be traded, ranked by probability of interest. We show that a model based on Latent Dirichlet Allocation offers promising performance to deliver relevant recommendations for sales traders.Comment: 12 pages, 3 figures, 1 tabl

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive

Simplifying Text Mining Activities: Scalable and Self-Tuning Methodology for Topic Detection and Characterization

Author: Bartolomeo Vacchetti
Evelina Di Corso
Paolo Bethaz
Stefano Proto
Tania Cerquitelli
Publication venue: 'MDPI AG'
Publication date: 01/01/2022
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)