146,850 research outputs found
Parametric t-Distributed Stochastic Exemplar-centered Embedding
Parametric embedding methods such as parametric t-SNE (pt-SNE) have been
widely adopted for data visualization and out-of-sample data embedding without
further computationally expensive optimization or approximation. However, the
performance of pt-SNE is highly sensitive to the hyper-parameter batch size due
to conflicting optimization goals, and often produces dramatically different
embeddings with different choices of user-defined perplexities. To effectively
solve these issues, we present parametric t-distributed stochastic
exemplar-centered embedding methods. Our strategy learns embedding parameters
by comparing given data only with precomputed exemplars, resulting in a cost
function with linear computational and memory complexity, which is further
reduced by noise contrastive samples. Moreover, we propose a shallow embedding
network with high-order feature interactions for data visualization, which is
much easier to tune but produces comparable performance in contrast to a deep
neural network employed by pt-SNE. We empirically demonstrate, using several
benchmark datasets, that our proposed methods significantly outperform pt-SNE
in terms of robustness, visual effects, and quantitative evaluations.Comment: fixed typo
Found in Alberta: Environmental Themes for the Anthropocene edited by Robert Boschman and Mario Trono
Review of Robert Boschman and Mario Trono’s edited collection Found in Alberta: Environmental Themes for the Anthropocene
The Horseshoe Estimator: Posterior Concentration around Nearly Black Vectors
We consider the horseshoe estimator due to Carvalho, Polson and Scott (2010)
for the multivariate normal mean model in the situation that the mean vector is
sparse in the nearly black sense. We assume the frequentist framework where the
data is generated according to a fixed mean vector. We show that if the number
of nonzero parameters of the mean vector is known, the horseshoe estimator
attains the minimax risk, possibly up to a multiplicative constant. We
provide conditions under which the horseshoe estimator combined with an
empirical Bayes estimate of the number of nonzero means still yields the
minimax risk. We furthermore prove an upper bound on the rate of contraction of
the posterior distribution around the horseshoe estimator, and a lower bound on
the posterior variance. These bounds indicate that the posterior distribution
of the horseshoe prior may be more informative than that of other one-component
priors, including the Lasso.Comment: This version differs from the final published version in pagination
and typographical detail; Available at
http://projecteuclid.org/euclid.ejs/141813426
Relations between some invariants of algebraic varieties in positive characteristic
We discuss relations between certain invariants of varieties in positive
characteristic, like the a-number and the height of the Artin-Mazur formal
group. We calculate the a-number for Fermat surfacesComment: 13 page
On Exploring Temporal Graphs of Small Pathwidth
We show that the Temporal Graph Exploration Problem is NP-complete, even when
the underlying graph has pathwidth 2 and at each time step, the current graph
is connected
Calculating the global contribution of coralline algae to carbon burial
The ongoing increase in anthropogenic carbon dioxide (CO2) emissions is changing the global marine environment and is causing warming and acidification of the oceans. Reduction of CO2 to a sustainable level is required to avoid further marine change. Many studies investigate the potential of marine carbon sinks (e.g. seagrass) to mitigate anthropogenic emissions, however, information on storage by coralline algae and the beds they create is scant. Calcifying photosynthetic organisms, including coralline algae, can act as a CO2 sink via photosynthesis and CaCO3 dissolution and act as a CO2 source during respiration and CaCO3 production on short-term time scales. Long-term carbon storage potential might come from the accumulation of coralline algae deposits over geological time scales. Here, the carbon storage potential of coralline algae is assessed using meta-analysis of their global organic and inorganic carbon production and the processes involved in this metabolism. Organic and inorganic production were estimated at 330 g C m−2 yr−1 and 880 g CaCO3 m−2 yr−1 respectively giving global organic/inorganic C production of 0.7/1.8 × 109 t C yr−1. Calcium carbonate production by free-living/crustose coralline algae (CCA) corresponded to a sediment accretion of 70/450 mm kyr−1. Using this potential carbon storage by coralline algae, the global production of free-living algae/CCA was 0.4/1.2 × 109 t C yr−1 suggesting a total potential carbon sink of 1.6 × 109 t C yr−1. Coralline algae therefore have production rates similar to mangroves, saltmarshes and seagrasses representing an as yet unquantified but significant carbon store, however, further empirical investigations are needed to determine the dynamics and stability of that store
Classifying document types to enhance search and recommendations in digital libraries
In this paper, we address the problem of classifying documents available from
the global network of (open access) repositories according to their type. We
show that the metadata provided by repositories enabling us to distinguish
research papers, thesis and slides are missing in over 60% of cases. While
these metadata describing document types are useful in a variety of scenarios
ranging from research analytics to improving search and recommender (SR)
systems, this problem has not yet been sufficiently addressed in the context of
the repositories infrastructure. We have developed a new approach for
classifying document types using supervised machine learning based exclusively
on text specific features. We achieve 0.96 F1-score using the random forest and
Adaboost classifiers, which are the best performing models on our data. By
analysing the SR system logs of the CORE [1] digital library aggregator, we
show that users are an order of magnitude more likely to click on research
papers and thesis than on slides. This suggests that using document types as a
feature for ranking/filtering SR results in digital libraries has the potential
to improve user experience.Comment: 12 pages, 21st International Conference on Theory and Practise of
Digital Libraries (TPDL), 2017, Thessaloniki, Greec
- …