Search CORE

635 research outputs found

Exploratory Analysis of Highly Heterogeneous Document Collections

Author: Blei D. M.
Bun K. K.
Maiya A. S.
Manning C. D.
Mihalcea R.
Pecina P.
Ranganathan S. R.
Wagstaff K.
Publication venue
Publication date: 01/01/2013
Field of study

We present an effective multifaceted system for exploratory analysis of highly heterogeneous document collections. Our system is based on intelligently tagging individual documents in a purely automated fashion and exploiting these tags in a powerful faceted browsing framework. Tagging strategies employed include both unsupervised and supervised approaches based on machine learning and natural language processing. As one of our key tagging strategies, we introduce the KERA algorithm (Keyword Extraction for Reports and Articles). KERA extracts topic-representative terms from individual documents in a purely unsupervised fashion and is revealed to be significantly more effective than state-of-the-art methods. Finally, we evaluate our system in its ability to help users locate documents pertaining to military critical technologies buried deep in a large heterogeneous sea of information.Comment: 9 pages; KDD 2013: 19th ACM SIGKDD Conference on Knowledge Discovery and Data Minin

arXiv.org e-Print Archive

CiteSeerX

Crossref

On Recognizing Transparent Objects in Domestic Environments Using Fusion of Multiple Sensor Modalities

Author: A Blake
D Blei
DG Lowe
J Lambert
K Koshikawa
M Saito
S Hinterstoisser
TL Heath
W Chiu
Publication venue
Publication date: 03/06/2016
Field of study

Current object recognition methods fail on object sets that include both diffuse, reflective and transparent materials, although they are very common in domestic scenarios. We show that a combination of cues from multiple sensor modalities, including specular reflectance and unavailable depth information, allows us to capture a larger subset of household objects by extending a state of the art object recognition method. This leads to a significant increase in robustness of recognition over a larger set of commonly used objects.Comment: 12 page

arXiv.org e-Print Archive

Crossref

pub H-BRS - Publikationsserver der Hochschule Bonn-Rhein-Sieg

Statistical Mechanics of the Chinese Restaurant Process: lack of self-averaging, anomalous finite-size effects and condensation

Author: A. E. Scheidegger
Bruno Bassetti
D. M. Blei
G. K. Zipf
Ginestra Bianconi
H. A. Simon
H. Yamato
J. Pitman
Marco Cosentino Lagomarsino
Mina Zarei
S. N. Dorogovtsev
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2009
Field of study

The Pitman-Yor, or Chinese Restaurant Process, is a stochastic process that generates distributions following a power-law with exponents lower than two, as found in a numerous physical, biological, technological and social systems. We discuss its rich behavior with the tools and viewpoint of statistical mechanics. We show that this process invariably gives rise to a condensation, i.e. a distribution dominated by a finite number of classes. We also evaluate thoroughly the finite-size effects, finding that the lack of stationary state and self-averaging of the process creates realization-dependent cutoffs and behavior of the distributions with no equivalent in other statistical mechanical models.Comment: (5pages, 1 figure

arXiv.org e-Print Archive

Crossref

AIR Universita degli studi di Milano

Infinite factorization of multiple non-parametric views

Author: A. Gelman
A. Klami
A. Klami
A. Rodriguez
A. Vinokourov
Arto Klami
C. Archambeau
C. Rasmussen
D. Blackwell
D. Blei
D. Cohn
D. Lee
D. M. Blei
D. M. Roy
G. Englebienne
I. Rivals
I. S. Dhillon
Janne Sinkkonen
K. Barnard
M. Welling
Mark Girolami
N. Friedman
N. L. Johnson
R. M. Neal
S. Becker
S. Rogers
Samuel Kaski
Simon Rogers
T. Hofmann
Y. W. Teh
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Combined analysis of multiple data sources has increasing application interest, in particular for distinguishing shared and source-specific aspects. We extend this rationale of classical canonical correlation analysis into a flexible, generative and non-parametric clustering setting, by introducing a novel non-parametric hierarchical mixture model. The lower level of the model describes each source with a flexible non-parametric mixture, and the top level combines these to describe commonalities of the sources. The lower-level clusters arise from hierarchical Dirichlet Processes, inducing an infinite-dimensional contingency table between the views. The commonalities between the sources are modeled by an infinite block model of the contingency table, interpretable as non-negative factorization of infinite matrices, or as a prior for infinite contingency tables. With Gaussian mixture components plugged in for continuous measurements, the model is applied to two views of genes, mRNA expression and abundance of the produced proteins, to expose groups of genes that are co-regulated in either or both of the views. Cluster analysis of co-expression is a standard simple way of screening for co-regulation, and the two-view analysis extends the approach to distinguishing between pre- and post-translational regulation

CUED - Cambridge University Engineering Department

Meaning-focused and Quantum-inspired Information Retrieval

Author: AY Khrennikov
D Aerts
D Aerts
D Aerts
D Aerts
D Aerts
D Aerts
D Aerts
D Osherson
D Widdows
D. Aerts
DM Blei
EM Pothos
G Zuccon
JA Hampton
JA Hampton
JA Hampton
JR Busemeyer
JR Busemeyer
K Lund
K Rijsbergen Van
M Melucci
S Deerwester
S Dumais
Y Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/03/2013
Field of study

In recent years, quantum-based methods have promisingly integrated the traditional procedures in information retrieval (IR) and natural language processing (NLP). Inspired by our research on the identification and application of quantum structures in cognition, more specifically our work on the representation of concepts and their combinations, we put forward a 'quantum meaning based' framework for structured query retrieval in text corpora and standardized testing corpora. This scheme for IR rests on considering as basic notions, (i) 'entities of meaning', e.g., concepts and their combinations and (ii) traces of such entities of meaning, which is how documents are considered in this approach. The meaning content of these 'entities of meaning' is reconstructed by solving an 'inverse problem' in the quantum formalism, consisting of reconstructing the full states of the entities of meaning from their collapsed states identified as traces in relevant documents. The advantages with respect to traditional approaches, such as Latent Semantic Analysis (LSA), are discussed by means of concrete examples.Comment: 11 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

Archivio istituzionale della ricerca - Università degli Studi di Udine

Using Social Media to Promote STEM Education: Matching College Students with Role Models

Author: A Bandura
A Bandura
AB Heldman
C Merrill
CJ Owen
D Karunanayake
DM Blei
EA Ensher
G Salton
HV Emmerik
JE Lydon
K Weber
L Tsui
M Hall
P Jaccard
P Lockwood
S Metz
TA Judge
VI Levenshtein
Publication venue
Publication date: 01/07/2016
Field of study

STEM (Science, Technology, Engineering, and Mathematics) fields have become increasingly central to U.S. economic competitiveness and growth. The shortage in the STEM workforce has brought promoting STEM education upfront. The rapid growth of social media usage provides a unique opportunity to predict users' real-life identities and interests from online texts and photos. In this paper, we propose an innovative approach by leveraging social media to promote STEM education: matching Twitter college student users with diverse LinkedIn STEM professionals using a ranking algorithm based on the similarities of their demographics and interests. We share the belief that increasing STEM presence in the form of introducing career role models who share similar interests and demographics will inspire students to develop interests in STEM related fields and emulate their models. Our evaluation on 2,000 real college students demonstrated the accuracy of our ranking algorithm. We also design a novel implementation that recommends matched role models to the students.Comment: 16 pages, 8 figures, accepted by ECML/PKDD 2016, Industrial Trac

arXiv.org e-Print Archive

Crossref

Quantum Aspects of Semantic Analysis and Symbolic Artificial Intelligence

Author: Aerts D
Aerts D
Aerts D Czachor M
Bell J S
Bennett C H Brassard G
Bettelli S
Blei D M
Bush P
Deerwester S
Diederik Aerts
Griffiths T L
Hampton J
Hofmann T
Landauer T K
Landauer T K
Landauer T K
Lund K
Marek Czachor
Oemer B
Penrose R
Penrose R
Penrose R
Plate T A
Selinger P
Steyvers M Shiffrin R M Nelson D L
Widdows D Peters S R T Oehrle J Rogers
Publication venue: 'IOP Publishing'
Publication date: 19/02/2004
Field of study

Modern approaches to semanic analysis if reformulated as Hilbert-space problems reveal formal structures known from quantum mechanics. Similar situation is found in distributed representations of cognitive structures developed for the purposes of neural networks. We take a closer look at similarites and differences between the above two fields and quantum information theory.Comment: version accepted in J. Phys. A (Letter to the Editor

arXiv.org e-Print Archive

Crossref

Seeing Tree Structure from Vibration

Author: A French
AD Jepson
B Bascle
B Zhou
DC Knill
DJ Fleet
DM Blei
E Türetken
E Türetken
ES Spelke
EW Dijkstra
HY Wu
J Canny
JF Henriques
JR Moore
K-K Maninis
KD Murphy
KR James
KR James
LA Miller
MM Fraz
MN Davies
N Wiener
O Braddick
R Moreno-Bote
S Geman
S Hare
SJ Farlow
SJ Gershman
T Furoh
T Gautama
T Xue
TC Lee
TS Lee
Publication venue
Publication date: 13/09/2018
Field of study

Humans recognize object structure from both their appearance and motion; often, motion helps to resolve ambiguities in object structure that arise when we observe object appearance only. There are particular scenarios, however, where neither appearance nor spatial-temporal motion signals are informative: occluding twigs may look connected and have almost identical movements, though they belong to different, possibly disconnected branches. We propose to tackle this problem through spectrum analysis of motion signals, because vibrations of disconnected branches, though visually similar, often have distinctive natural frequencies. We propose a novel formulation of tree structure based on a physics-based link model, and validate its effectiveness by theoretical analysis, numerical simulation, and empirical experiments. With this formulation, we use nonparametric Bayesian inference to reconstruct tree structure from both spectral vibration signals and appearance cues. Our model performs well in recognizing hierarchical tree structure from real-world videos of trees and vessels.Comment: ECCV 2018. The first two authors contributed equally to this work. Project page: http://tree.csail.mit.edu

arXiv.org e-Print Archive

DSpace@MIT

Crossref

Topic modeling applied to business research: A latent dirichlet allocation (LDA)-based classification for organization studies

Author: Carlo Schwarz
D Carnerud
D Voss
DM Blei
JA Mazanec
JJ Ryoo
JW Dean
K Martell
NT Bendle
P Puranam
S Al-Augby
T Dyer
Tobias Brandt
VM Papadakis
Y Bao
Publication venue: 'Baishideng Publishing Group Inc.'
Publication date: 01/01/2019
Field of study

More than 1.5 million academic documents are published each year, and this trend shows an incremental tendency for the following years. One of the main challenges for the academic community is how to organize this huge volume of documentation to have a sense of the knowledge frontier. In this study we applied Latent Dirichlet Allocation (LDA) techniques to identify primary topics in organization studies, and analyzed the relationships between academic impact and belonging to the topics detected by LDA

Crossref

Repositorio Institucional Ulima

Does \u2018bigger\u2019mean \u2018better\u2019? Pitfalls and shortcuts associated with big data for social research

Author: A Abbott
A McAfee
A Pickering
B Franks
B Pang
B Wellman
B Wellman
B Wellman
C Fuchs
C Snijders
D Beer
D Lazer
D Lazer
DL Morgan
DM Blei
ES Lieberman
F Neuhaus
G Lotan
H Geser
J Ginsberg
JW Crampton
K Hampton
M Castells
M Hilbert
M Hilbert
M Hilbert
N Baym
N Lin
NB Ellison
O Schwarz
O Schwarz
P Giardullo
P Magaudda
P Magaudda
P Zikopoulos
Paolo Giardullo
R Rogers
S Crabu
S Elwood
S González-Bailón
T Venturini
WE Bijker
WE Bijker
WS Bainbridge
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

\u2018Big data is here to stay.\u2019 This key statement has a double value: is an assumption as well as the reason why a theoretical reflection is needed. Furthermore, Big data is something that is gaining visibility and success in social sciences even, overcoming the division between humanities and computer sciences. In this contribution some considerations on the presence and the certain persistence of Big data as a socio-technical assemblage will be outlined. Therefore, the intriguing opportunities for social research linked to such interaction between practices and technological development will be developed. However, despite a promissory rhetoric, fostered by several scholars since the birth of Big data as a labelled concept, some risks are just around the corner. The claims for the methodological power of bigger and bigger datasets, as well as increasing speed in analysis and data collection, are creating a real hype in social research. Peculiar attention is needed in order to avoid some pitfalls. These risks will be analysed for what concerns the validity of the research results \u2018obtained through Big data. After a pars distruens, this contribution will conclude with a pars construens; assuming the previous critiques, a mixed methods research design approach will be described as a general proposal with the objective of stimulating a debate on the integration of Big data in complex research projecting

Crossref

Archivio istituzionale della ricerca - Università di Padova