983 research outputs found
Some considerations in the selection of aircraft for earth resource observations
Comparison of logistics problems and cost aspects in selection of aircraft for earth resources survey
Inference and Evaluation of the Multinomial Mixture Model for Text Clustering
In this article, we investigate the use of a probabilistic model for
unsupervised clustering in text collections. Unsupervised clustering has become
a basic module for many intelligent text processing applications, such as
information retrieval, text classification or information extraction. The model
considered in this contribution consists of a mixture of multinomial
distributions over the word counts, each component corresponding to a different
theme. We present and contrast various estimation procedures, which apply both
in supervised and unsupervised contexts. In supervised learning, this work
suggests a criterion for evaluating the posterior odds of new documents which
is more statistically sound than the "naive Bayes" approach. In an unsupervised
context, we propose measures to set up a systematic evaluation framework and
start with examining the Expectation-Maximization (EM) algorithm as the basic
tool for inference. We discuss the importance of initialization and the
influence of other features such as the smoothing strategy or the size of the
vocabulary, thereby illustrating the difficulties incurred by the high
dimensionality of the parameter space. We also propose a heuristic algorithm
based on iterative EM with vocabulary reduction to solve this problem. Using
the fact that the latent variables can be analytically integrated out, we
finally show that Gibbs sampling algorithm is tractable and compares favorably
to the basic expectation maximization approach
Detecting Large Concept Extensions for Conceptual Analysis
When performing a conceptual analysis of a concept, philosophers are
interested in all forms of expression of a concept in a text---be it direct or
indirect, explicit or implicit. In this paper, we experiment with topic-based
methods of automating the detection of concept expressions in order to
facilitate philosophical conceptual analysis. We propose six methods based on
LDA, and evaluate them on a new corpus of court decision that we had annotated
by experts and non-experts. Our results indicate that these methods can yield
important improvements over the keyword heuristic, which is often used as a
concept detection heuristic in many contexts. While more work remains to be
done, this indicates that detecting concepts through topics can serve as a
general-purpose method for at least some forms of concept expression that are
not captured using naive keyword approaches
Launch Window Analysis for Round Trip Mars Missions
Round trip missions to Mars have been investigated to define representative launch windows and associated AV requirements. The 1982 inbound and the 1986 outbound Venus swingby missions were selected for analysis and serve to demonstrate the influence of the characteristics of the heliocentric trajectories on the launch window velocity requirements. This report presents results indicating the effects on the launch windows of velocity capability, transfer technique, and of the inclination, eccentricity, and insertion direction of the orbit. The analysis assumed a circular parking orbit at Earth and considers both circular and elliptical parking orbits at Mars. Use of one-, two-, and three-impulse transfers were investigated. The three-impulse transfer employs an intermediate elliptic orbit of 0,9 eccentricity. For all cases, insertion at planet arrival was into an orbit coplanar with the arrival asymptote and any required plane change was performed during the planet departure phase.
The minimum AV requirement to transfer from a circular parking orbit to a hyperbolic asymptote occurs when the orbits are coplanar and the maneuver is performed at periapsis of the hyperbola. The study indicates that, using a three-impulse transfer, the AV penalty for non-coplanar departures, is no more than 5-10% above the minimum coplanar requirements. Therefore, use in mission analysis of the coplanar AV requirements would not result in large errors if threeimpulse transfers are acceptable. Use of fewer impulses significantly increases the error. Similar characteristics occur for elliptical parking orbits. However, due to the low coplanar AV\u27s, they provide a longer launch window for a given total AV capability
Spoken query processing for interactive information retrieval
It has long been recognised that interactivity improves the effectiveness of information retrieval systems. Speech is the most natural and interactive medium of communication and recent progress in speech recognition is making it possible to build systems that interact with the user via speech. However, given the typical length of queries submitted to information retrieval systems, it is easy to imagine that the effects of word recognition errors in spoken queries must be severely destructive on the system's effectiveness. The experimental work reported in this paper shows that the use of classical information retrieval techniques for spoken query processing is robust to considerably high levels of word recognition errors, in particular for long queries. Moreover, in the case of short queries, both standard relevance feedback and pseudo relevance feedback can be effectively employed to improve the effectiveness of spoken query processing
Community detection based on links and node features in social networks
© Springer International Publishing Switzerland 2015. Community detection is a significant but challenging task in the field of social network analysis. Many effective methods have been proposed to solve this problem. However, most of them are mainly based on the topological structure or node attributes. In this paper, based on SPAEM [1], we propose a joint probabilistic model to detect community which combines node attributes and topological structure. In our model, we create a novel feature-based weighted network, within which each edge weight is represented by the node feature similarity between two nodes at the end of the edge. Then we fuse the original network and the created network with a parameter and employ expectation-maximization algorithm (EM) to identify a community. Experiments on a diverse set of data, collected from Facebook and Twitter, demonstrate that our algorithm has achieved promising results compared with other algorithms
Word Embeddings for Entity-annotated Texts
Learned vector representations of words are useful tools for many information
retrieval and natural language processing tasks due to their ability to capture
lexical semantics. However, while many such tasks involve or even rely on named
entities as central components, popular word embedding models have so far
failed to include entities as first-class citizens. While it seems intuitive
that annotating named entities in the training corpus should result in more
intelligent word features for downstream tasks, performance issues arise when
popular embedding approaches are naively applied to entity annotated corpora.
Not only are the resulting entity embeddings less useful than expected, but one
also finds that the performance of the non-entity word embeddings degrades in
comparison to those trained on the raw, unannotated corpus. In this paper, we
investigate approaches to jointly train word and entity embeddings on a large
corpus with automatically annotated and linked entities. We discuss two
distinct approaches to the generation of such embeddings, namely the training
of state-of-the-art embeddings on raw-text and annotated versions of the
corpus, as well as node embeddings of a co-occurrence graph representation of
the annotated corpus. We compare the performance of annotated embeddings and
classical word embeddings on a variety of word similarity, analogy, and
clustering evaluation tasks, and investigate their performance in
entity-specific tasks. Our findings show that it takes more than training
popular word embedding models on an annotated corpus to create entity
embeddings with acceptable performance on common test cases. Based on these
results, we discuss how and when node embeddings of the co-occurrence graph
representation of the text can restore the performance.Comment: This paper is accepted in 41st European Conference on Information
Retrieva
Decentralized learning with budgeted network load using Gaussian copulas and classifier ensembles
We examine a network of learners which address the same classification task
but must learn from different data sets. The learners cannot share data but
instead share their models. Models are shared only one time so as to preserve
the network load. We introduce DELCO (standing for Decentralized Ensemble
Learning with COpulas), a new approach allowing to aggregate the predictions of
the classifiers trained by each learner. The proposed method aggregates the
base classifiers using a probabilistic model relying on Gaussian copulas.
Experiments on logistic regressor ensembles demonstrate competing accuracy and
increased robustness in case of dependent classifiers. A companion python
implementation can be downloaded at https://github.com/john-klein/DELC
- …