3,818 research outputs found
Parametric t-Distributed Stochastic Exemplar-centered Embedding
Parametric embedding methods such as parametric t-SNE (pt-SNE) have been
widely adopted for data visualization and out-of-sample data embedding without
further computationally expensive optimization or approximation. However, the
performance of pt-SNE is highly sensitive to the hyper-parameter batch size due
to conflicting optimization goals, and often produces dramatically different
embeddings with different choices of user-defined perplexities. To effectively
solve these issues, we present parametric t-distributed stochastic
exemplar-centered embedding methods. Our strategy learns embedding parameters
by comparing given data only with precomputed exemplars, resulting in a cost
function with linear computational and memory complexity, which is further
reduced by noise contrastive samples. Moreover, we propose a shallow embedding
network with high-order feature interactions for data visualization, which is
much easier to tune but produces comparable performance in contrast to a deep
neural network employed by pt-SNE. We empirically demonstrate, using several
benchmark datasets, that our proposed methods significantly outperform pt-SNE
in terms of robustness, visual effects, and quantitative evaluations.Comment: fixed typo
A concept design for an ultra-long-range survey class AUV
Gliders and flight-style Autonomous Underwater Vehicles (AUVs) are used to perform perform autonomous surveys of large areas of open ocean. Glider missions are characterized by their profiling flight pattern, slow speed, long range (1000s of km) and many month mission duration. Flight-style AUV missions are faster, of shorter range (100s of km) and multi day duration. An AUV combining many aspects of both vehicle classes would be of considerable value.This paper investigates the factors that affect the range of a
traditional flight-style AUVs. A generic range model is outlined
which factors in the effects of buoyancy on the range. The model
shows that to create a very long range AUV it is necessary to reduce the hotel load on the AUV to the order of 1W and to add wings to overcome the vehicle’s positive buoyancy whilst travelling at the reduced speed required for long range.Using this model a concept long range AUV is outlined that is capable of travelling up to 5000km. The practical issues associated with achieving this range are also discussed
Exploring Student Check-In Behavior for Improved Point-of-Interest Prediction
With the availability of vast amounts of user visitation history on
location-based social networks (LBSN), the problem of Point-of-Interest (POI)
prediction has been extensively studied. However, much of the research has been
conducted solely on voluntary checkin datasets collected from social apps such
as Foursquare or Yelp. While these data contain rich information about
recreational activities (e.g., restaurants, nightlife, and entertainment),
information about more prosaic aspects of people's lives is sparse. This not
only limits our understanding of users' daily routines, but more importantly
the modeling assumptions developed based on characteristics of recreation-based
data may not be suitable for richer check-in data. In this work, we present an
analysis of education "check-in" data using WiFi access logs collected at
Purdue University. We propose a heterogeneous graph-based method to encode the
correlations between users, POIs, and activities, and then jointly learn
embeddings for the vertices. We evaluate our method compared to previous
state-of-the-art POI prediction methods, and show that the assumptions made by
previous methods significantly degrade performance on our data with dense(r)
activity signals. We also show how our learned embeddings could be used to
identify similar students (e.g., for friend suggestions).Comment: published in KDD'1
Evaluating Text-to-Image Matching using Binary Image Selection (BISON)
Providing systems the ability to relate linguistic and visual content is one
of the hallmarks of computer vision. Tasks such as text-based image retrieval
and image captioning were designed to test this ability but come with
evaluation measures that have a high variance or are difficult to interpret. We
study an alternative task for systems that match text and images: given a text
query, the system is asked to select the image that best matches the query from
a pair of semantically similar images. The system's accuracy on this Binary
Image SelectiON (BISON) task is interpretable, eliminates the reliability
problems of retrieval evaluations, and focuses on the system's ability to
understand fine-grained visual structure. We gather a BISON dataset that
complements the COCO dataset and use it to evaluate modern text-based image
retrieval and image captioning systems. Our results provide novel insights into
the performance of these systems. The COCO-BISON dataset and corresponding
evaluation code are publicly available from \url{http://hexianghu.com/bison/}
Classifying document types to enhance search and recommendations in digital libraries
In this paper, we address the problem of classifying documents available from
the global network of (open access) repositories according to their type. We
show that the metadata provided by repositories enabling us to distinguish
research papers, thesis and slides are missing in over 60% of cases. While
these metadata describing document types are useful in a variety of scenarios
ranging from research analytics to improving search and recommender (SR)
systems, this problem has not yet been sufficiently addressed in the context of
the repositories infrastructure. We have developed a new approach for
classifying document types using supervised machine learning based exclusively
on text specific features. We achieve 0.96 F1-score using the random forest and
Adaboost classifiers, which are the best performing models on our data. By
analysing the SR system logs of the CORE [1] digital library aggregator, we
show that users are an order of magnitude more likely to click on research
papers and thesis than on slides. This suggests that using document types as a
feature for ranking/filtering SR results in digital libraries has the potential
to improve user experience.Comment: 12 pages, 21st International Conference on Theory and Practise of
Digital Libraries (TPDL), 2017, Thessaloniki, Greec
A Concede-and-Divide Rule for Bankruptcy Problems
The concede-and-divide rule is a basic solution for bankruptcy problems with two claimants.An extension of the concede-and-divide rule to bankruptcy problems with more than two claimants is provided.This extension not only uses the concede-and-divide principle in its procedural definition, but also preserves the main properties of the concede-and-divide rule.Bankruptcy problems;concede-and-divide rule
Enhancing Domain Word Embedding via Latent Semantic Imputation
We present a novel method named Latent Semantic Imputation (LSI) to transfer
external knowledge into semantic space for enhancing word embedding. The method
integrates graph theory to extract the latent manifold structure of the
entities in the affinity space and leverages non-negative least squares with
standard simplex constraints and power iteration method to derive spectral
embeddings. It provides an effective and efficient approach to combining entity
representations defined in different Euclidean spaces. Specifically, our
approach generates and imputes reliable embedding vectors for low-frequency
words in the semantic space and benefits downstream language tasks that depend
on word embedding. We conduct comprehensive experiments on a carefully designed
classification problem and language modeling and demonstrate the superiority of
the enhanced embedding via LSI over several well-known benchmark embeddings. We
also confirm the consistency of the results under different parameter settings
of our method.Comment: ACM SIGKDD 201
- …