Search CORE

2,267 research outputs found

Parametric t-Distributed Stochastic Exemplar-centered Embedding

Author: A Gisbrecht
B Bahmani
L Greengard
L Maaten van der
L Maaten Van Der
PP Kuksa
Publication venue
Publication date: 20/04/2018
Field of study

Parametric embedding methods such as parametric t-SNE (pt-SNE) have been widely adopted for data visualization and out-of-sample data embedding without further computationally expensive optimization or approximation. However, the performance of pt-SNE is highly sensitive to the hyper-parameter batch size due to conflicting optimization goals, and often produces dramatically different embeddings with different choices of user-defined perplexities. To effectively solve these issues, we present parametric t-distributed stochastic exemplar-centered embedding methods. Our strategy learns embedding parameters by comparing given data only with precomputed exemplars, resulting in a cost function with linear computational and memory complexity, which is further reduced by noise contrastive samples. Moreover, we propose a shallow embedding network with high-order feature interactions for data visualization, which is much easier to tune but produces comparable performance in contrast to a deep neural network employed by pt-SNE. We empirically demonstrate, using several benchmark datasets, that our proposed methods significantly outperform pt-SNE in terms of robustness, visual effects, and quantitative evaluations.Comment: fixed typo

arXiv.org e-Print Archive

NRC Publications Archive

Crossref

Evaluating Text-to-Image Matching using Binary Image Selection (BISON)

Author: Hu Hexiang
Misra Ishan
van der Maaten Laurens
Publication venue
Publication date: 05/04/2019
Field of study

Providing systems the ability to relate linguistic and visual content is one of the hallmarks of computer vision. Tasks such as text-based image retrieval and image captioning were designed to test this ability but come with evaluation measures that have a high variance or are difficult to interpret. We study an alternative task for systems that match text and images: given a text query, the system is asked to select the image that best matches the query from a pair of semantically similar images. The system's accuracy on this Binary Image SelectiON (BISON) task is interpretable, eliminates the reliability problems of retrieval evaluations, and focuses on the system's ability to understand fine-grained visual structure. We gather a BISON dataset that complements the COCO dataset and use it to evaluate modern text-based image retrieval and image captioning systems. Our results provide novel insights into the performance of these systems. The COCO-BISON dataset and corresponding evaluation code are publicly available from \url{http://hexianghu.com/bison/}

arXiv.org e-Print Archive

Crossref

Classifying document types to enhance search and recommendations in digital libraries

Author: F Sebastiani
L Maaten van der
Y Aphinyanaphongs
Publication venue
Publication date: 13/07/2017
Field of study

In this paper, we address the problem of classifying documents available from the global network of (open access) repositories according to their type. We show that the metadata provided by repositories enabling us to distinguish research papers, thesis and slides are missing in over 60% of cases. While these metadata describing document types are useful in a variety of scenarios ranging from research analytics to improving search and recommender (SR) systems, this problem has not yet been sufficiently addressed in the context of the repositories infrastructure. We have developed a new approach for classifying document types using supervised machine learning based exclusively on text specific features. We achieve 0.96 F1-score using the random forest and Adaboost classifiers, which are the best performing models on our data. By analysing the SR system logs of the CORE [1] digital library aggregator, we show that users are an order of magnitude more likely to click on research papers and thesis than on slides. This suggests that using document types as a feature for ranking/filtering SR results in digital libraries has the potential to improve user experience.Comment: 12 pages, 21st International Conference on Theory and Practise of Digital Libraries (TPDL), 2017, Thessaloniki, Greec

arXiv.org e-Print Archive

Crossref

Intestinal Obstruction in a Dog

Author: Van Der Maaten Martin
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/1955
Field of study

On August 3, 1954, a 6-year-old female Collie was admitted to the Stange Memorial Clinic with a history of having an upset stomach for the past several days. Penicillin had been administered, but no improvement was noticed. The animal was examined and found to be extremely depressed and in a toxic condition. The conjunctiva appeared injected and the temperature was 103°F. A hard mass could be detected upon palpation of the lower abdomen on the left side

Digital Repository @ Iowa State University (ISU)

High Phenotypic Plasticity, but Low Signals of Local Adaptation to Climate in a Large-Scale Transplant Experiment of Picea abies (L.) Karst. in Europe

Author: Liepe Katharina Julie
Liesebach Mirko
van der Maaten Ernst
van der Maaten-Theunissen Marieke
Publication venue: Frontiers Media
Publication date: 30/05/2024
Field of study

The most common tool to predict future changes in species range are species distribution models. These models do, however, often underestimate potential future habitat, as they do not account for phenotypic plasticity and local adaptation, although being the most important processes in the response of tree populations to rapid climate change. Here, we quantify the difference in the predictions of future range for Norway spruce, by (i) deriving a classic, occurrence-based species distribution model (OccurrenceSDM), and (ii) analysing the variation in juvenile tree height and translating this to species occurrence (TraitSDM). Making use of 32 site locations of the most comprehensive European trial series that includes 1,100 provenances of Norway spruce originating from its natural and further beyond from its largely extended, artificial distribution, we fit a universal response function to quantify growth as a function of site and provenance climate. Both the OccurrenceSDM and TraitSDM show a substantial retreat towards the northern latitudes and higher elevations (−55 and −43%, respectively, by the 2080s). However, thanks to the species’ particularly high phenotypic plasticity in juvenile height growth, the decline is delayed. The TraitSDM identifies increasing summer heat paired with decreasing water availability as the main climatic variable that restricts growth, while a prolonged frost-free period enables a longer period of active growth and therefore increasing growth potential within the restricted, remaining area. Clear signals of local adaptation to climatic clines spanning the entire range are barely detectable, as they are disguised by a latitudinal cline. This cline strongly reflects population differentiation for the Baltic domain, but fails to capture the high phenotypic variation associated to the geographic heterogeneity in the Central European mountain ranges paired with the species history of postglacial migration. Still the model is used to provide recommendations of optimal provenance choice for future climate conditions. In essence, assisted migration may not decrease the predicted range decline of Norway spruce, but may help to capitalize on potential opportunities for increased growth associated with warmer climates

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

Enhancing Domain Word Embedding via Latent Semantic Imputation

Author: Lai Siwei
Lin Frank
Mikolov Tomas
van der Maaten Laurens
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 21/05/2019
Field of study

We present a novel method named Latent Semantic Imputation (LSI) to transfer external knowledge into semantic space for enhancing word embedding. The method integrates graph theory to extract the latent manifold structure of the entities in the affinity space and leverages non-negative least squares with standard simplex constraints and power iteration method to derive spectral embeddings. It provides an effective and efficient approach to combining entity representations defined in different Euclidean spaces. Specifically, our approach generates and imputes reliable embedding vectors for low-frequency words in the semantic space and benefits downstream language tasks that depend on word embedding. We conduct comprehensive experiments on a carefully designed classification problem and language modeling and demonstrate the superiority of the enhanced embedding via LSI over several well-known benchmark embeddings. We also confirm the consistency of the results under different parameter settings of our method.Comment: ACM SIGKDD 201

arXiv.org e-Print Archive

Crossref

Modeling dominant height growth using permanent plot data for Pinus brutia stands in the Eastern Mediterranean region

Author: Ali Wael
Berger Uta
Suliman Tammam
van der Maaten Ernst
van der Maaten-Theunissen Marieke
Publication venue: 'Instituto Nacional de Investigacion y Tecnologia Agraria y Alimentaria (INIA)'
Publication date: 28/04/2021
Field of study

Aim of the study: At current, forest management in the Eastern Mediterranean region is largely based on experience rather than on management plans. To support the development of such plans, this study develops and compares site index equations for pure even-aged Pinus brutia stands in Syria using base-age invariant techniques that realistically describe dominant height growth.Materials and methods: Data on top height and stand age were obtained in 2008 and 2016 from 80 permanent plots capturing the whole range of variation in site conditions, stand age and stand density. Both the Algebraic Difference Approach (ADA) and the Generalized Algebraic Difference Approach (GADA) were used to fit eight generalized algebraic difference equations in order to identify the one which describes the data best. For this, 61 permanent plots were used for model calibration and 19 plots for validation.Main results: According to both biological plausibility and model accuracy, the so-called Sloboda equation based on the GADA approach showed the best performance.Research highlights: The study provides a solid classification and comparison of Pinus brutia stands growing in the Eastern Mediterranean region and can thus be used to support sustainable forest management planning.Keywords: site index; Generalized Algebraic Difference Approach (GADA); Sloboda equation

Scientific Journals of INIA (Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria)