Search CORE

983 research outputs found

Some considerations in the selection of aircraft for earth resource observations

Author: Arno R. D.
Deerwester J. M.
Publication venue
Publication date
Field of study

Comparison of logistics problems and cost aspects in selection of aircraft for earth resources survey

NASA Technical Reports Server

Inference and Evaluation of the Multinomial Mixture Model for Text Clustering

Author: Banerjee
Church
Deerwester
François Yvon
Halkidi
Hofmann
Jain
Katz
Kuhn
Lange
Loïs Rigouste
Mosimann
Nigam
Olivier Cappé
Robert
Sebastiani
Shahnaz
Publication venue: 'Elsevier BV'
Publication date: 01/01/2006
Field of study

In this article, we investigate the use of a probabilistic model for unsupervised clustering in text collections. Unsupervised clustering has become a basic module for many intelligent text processing applications, such as information retrieval, text classification or information extraction. The model considered in this contribution consists of a mixture of multinomial distributions over the word counts, each component corresponding to a different theme. We present and contrast various estimation procedures, which apply both in supervised and unsupervised contexts. In supervised learning, this work suggests a criterion for evaluating the posterior odds of new documents which is more statistically sound than the "naive Bayes" approach. In an unsupervised context, we propose measures to set up a systematic evaluation framework and start with examining the Expectation-Maximization (EM) algorithm as the basic tool for inference. We discuss the importance of initialization and the influence of other features such as the smoothing strategy or the size of the vocabulary, thereby illustrating the difficulties incurred by the high dimensionality of the parameter space. We also propose a heuristic algorithm based on iterative EM with vocabulary reduction to solve this problem. Using the fact that the latent variables can be analytically integrated out, we finally show that Gibbs sampling algorithm is tractable and compares favorably to the basic expectation maximization approach

arXiv.org e-Print Archive

CiteSeerX

Crossref

HAL Descartes

Detecting Large Concept Extensions for Conceptual Analysis

Author: C Dutilh Novaes
DJ Chalmers
DM Blei
F Jackson
KL Gwet
S Deerwester
S Haslanger
S Laurence
TL Griffiths
U Fayyad
Publication venue
Publication date: 18/06/2017
Field of study

When performing a conceptual analysis of a concept, philosophers are interested in all forms of expression of a concept in a text---be it direct or indirect, explicit or implicit. In this paper, we experiment with topic-based methods of automating the detection of concept expressions in order to facilitate philosophical conceptual analysis. We propose six methods based on LDA, and evaluate them on a new corpus of court decision that we had annotated by experts and non-experts. Our results indicate that these methods can yield important improvements over the keyword heuristic, which is often used as a concept detection heuristic in many contexts. While more work remains to be done, this indicates that detecting concepts through topics can serve as a general-purpose method for at least some forms of concept expression that are not captured using naive keyword approaches

arXiv.org e-Print Archive

Crossref

Launch Window Analysis for Round Trip Mars Missions

Author: Deerwester Jerry M.
Manning Larry A.
Swenson Byron L.
Publication venue: Scholarly Commons
Publication date: 01/04/1968
Field of study

Round trip missions to Mars have been investigated to define representative launch windows and associated AV requirements. The 1982 inbound and the 1986 outbound Venus swingby missions were selected for analysis and serve to demonstrate the influence of the characteristics of the heliocentric trajectories on the launch window velocity requirements. This report presents results indicating the effects on the launch windows of velocity capability, transfer technique, and of the inclination, eccentricity, and insertion direction of the orbit. The analysis assumed a circular parking orbit at Earth and considers both circular and elliptical parking orbits at Mars. Use of one-, two-, and three-impulse transfers were investigated. The three-impulse transfer employs an intermediate elliptic orbit of 0,9 eccentricity. For all cases, insertion at planet arrival was into an orbit coplanar with the arrival asymptote and any required plane change was performed during the planet departure phase. The minimum AV requirement to transfer from a circular parking orbit to a hyperbolic asymptote occurs when the orbits are coplanar and the maneuver is performed at periapsis of the hyperbola. The study indicates that, using a three-impulse transfer, the AV penalty for non-coplanar departures, is no more than 5-10% above the minimum coplanar requirements. Therefore, use in mission analysis of the coplanar AV requirements would not result in large errors if threeimpulse transfers are acceptable. Use of fewer impulses significantly increases the error. Similar characteristics occur for elliptical parking orbits. However, due to the low coplanar AV\u27s, they provide a longer launch window for a given total AV capability

NASA Technical Reports Server

Embry-Riddle Aeronautical University

Spoken query processing for interactive information retrieval

Author: Barnett
Crestani
Crestani
Crestani
Crestani
Crestani
Deerwester
Fabio Crestani
Garofolo
Harman
Harman
Markowitz
Porter
Silipo
Singhal
Singhal
Tombros
Tombros
van Rijsbergen
Voorhees
Publication venue: 'Elsevier BV'
Publication date: 01/01/2002
Field of study

It has long been recognised that interactivity improves the effectiveness of information retrieval systems. Speech is the most natural and interactive medium of communication and recent progress in speech recognition is making it possible to build systems that interact with the user via speech. However, given the typical length of queries submitted to information retrieval systems, it is easy to imagine that the effects of word recognition errors in spoken queries must be severely destructive on the system's effectiveness. The experimental work reported in this paper shows that the use of classical information retrieval techniques for spoken query processing is robust to considerably high levels of word recognition errors, in particular for long queries. Moreover, in the case of short queries, both standard relevance feedback and pseudo relevance feedback can be effectively employed to improve the effectiveness of spoken query processing

Crossref

University of Strathclyde Institutional Repository

Evaluation of Co-occurring Terms in Clinical Documents Using Latent Semantic Indexing

Author: Blei
Blei
Deerwester
Dumais
Hofmann
Hofmann
Landauer
Manning
Steyvers
Wang
Publication venue: Korean Society of Medical Informatics
Publication date: 01/01/2011
Field of study

Crossref

PubMed Central

Community detection based on links and node features in social networks

Author: A. Pothen
D.M.. Blei
J. Xie
J.M. Kleinberg
M. Girvan
S.. Fortunato
S.C. Deerwester
Publication venue
Publication date: 01/01/2015
Field of study

© Springer International Publishing Switzerland 2015. Community detection is a significant but challenging task in the field of social network analysis. Many effective methods have been proposed to solve this problem. However, most of them are mainly based on the topological structure or node attributes. In this paper, based on SPAEM [1], we propose a joint probabilistic model to detect community which combines node attributes and topological structure. In our model, we create a novel feature-based weighted network, within which each edge weight is represented by the node feature similarity between two nodes at the end of the edge. Then we fuse the original network and the created network with a parameter and employ expectation-maximization algorithm (EM) to identify a community. Experiments on a diverse set of data, collected from Facebook and Twitter, demonstrate that our algorithm has achieved promising results compared with other algorithms

Crossref

OPUS - University of Technology Sydney

Word Embeddings for Entity-annotated Texts

Author: A Das
A Spitz
CD Manning
D Nadeau
E Bruni
F Hill
F Hill
H Abdi
H Rubenstein
J Mitchell
J Strötgen
JG Moreno
L Maaten
P Bojanowski
P Goyal
S Deerwester
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 12/02/2020
Field of study

Learned vector representations of words are useful tools for many information retrieval and natural language processing tasks due to their ability to capture lexical semantics. However, while many such tasks involve or even rely on named entities as central components, popular word embedding models have so far failed to include entities as first-class citizens. While it seems intuitive that annotating named entities in the training corpus should result in more intelligent word features for downstream tasks, performance issues arise when popular embedding approaches are naively applied to entity annotated corpora. Not only are the resulting entity embeddings less useful than expected, but one also finds that the performance of the non-entity word embeddings degrades in comparison to those trained on the raw, unannotated corpus. In this paper, we investigate approaches to jointly train word and entity embeddings on a large corpus with automatically annotated and linked entities. We discuss two distinct approaches to the generation of such embeddings, namely the training of state-of-the-art embeddings on raw-text and annotated versions of the corpus, as well as node embeddings of a co-occurrence graph representation of the annotated corpus. We compare the performance of annotated embeddings and classical word embeddings on a variety of word similarity, analogy, and clustering evaluation tasks, and investigate their performance in entity-specific tasks. Our findings show that it takes more than training popular word embedding models on an annotated corpus to create entity embeddings with acceptable performance on common test cases. Based on these results, we discuss how and when node embeddings of the co-occurrence graph representation of the text can restore the performance.Comment: This paper is accepted in 41st European Conference on Information Retrieva

arXiv.org e-Print Archive

Crossref

Indexing by latent semantic analysis

Author: George W. Furnas
Richard Harshman
Scott Deerwester
Susan T. Dumais
Thomas K. Landauer
Publication venue: 'Wiley'
Publication date: 01/01/2004
Field of study

Crossref

Decentralized learning with budgeted network load using Gaussian copulas and classifier ensembles

Author: AP Dawid
C Genest
DH Wolpert
ED Sontag
F Pedregosa
GB Giannakis
I Zezula
J Kittler
J Kittler
L Breiman
L Xu
LK Hansen
M Wozniak
OP Faugeras
S Deerwester
TK Ho
V Tresp
Y Freund
Y Koren
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/03/2019
Field of study

We examine a network of learners which address the same classification task but must learn from different data sets. The learners cannot share data but instead share their models. Models are shared only one time so as to preserve the network load. We introduce DELCO (standing for Decentralized Ensemble Learning with COpulas), a new approach allowing to aggregate the predictions of the classifiers trained by each learner. The proposed method aggregates the base classifiers using a probabilistic model relying on Gaussian copulas. Experiments on logistic regressor ensembles demonstrate competing accuracy and increased robustness in case of dependent classifiers. A companion python implementation can be downloaded at https://github.com/john-klein/DELC

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

UCL Discovery

Hal-Diderot