1,369 research outputs found
Kernel functions based on triplet comparisons
Given only information in the form of similarity triplets "Object A is more
similar to object B than to object C" about a data set, we propose two ways of
defining a kernel function on the data set. While previous approaches construct
a low-dimensional Euclidean embedding of the data set that reflects the given
similarity triplets, we aim at defining kernel functions that correspond to
high-dimensional embeddings. These kernel functions can subsequently be used to
apply any kernel method to the data set
Validation of purdue engineering shape benchmark clusters by crowdsourcing
The effective organization of CAD data archives is central to PLM and consequently content based retrieval of 2D drawings and 3D models is often seen as a "holy grail" for the industry. Given this context, it is not surprising that the vision of a "Google for shape", which enables engineers to search databases of 3D models for components similar in shape to a query part, has motivated numerous researchers to investigate algorithms for computing geometric similarity. Measuring the effectiveness of the many approaches proposed has in turn lead to the creation of benchmark datasets against which researchers can compare the performance of their search engines. However to be useful the datasets used to measure the effectiveness of 3D retrieval algorithms must not only define a collection of models, but also provide a canonical specification of their relative similarity. Because the objective of shape retrieval algorithms is (typically) to retrieve groups of objects that humans perceive as "similar" these benchmark similarity relationships have (by definition) to be manually determined through inspection
Recommended from our members
Patterns of Oral Microbiota Diversity in Adults and Children: A Crowdsourced Population Study.
Oral microbiome dysbiosis has been associated with various local and systemic human diseases such as dental caries, periodontal disease, obesity, and cardiovascular disease. Bacterial composition may be affected by age, oral health, diet, and geography, although information about the natural variation found in the general public is still lacking. In this study, citizen-scientists used a crowdsourcing model to obtain oral bacterial composition data from guests at the Denver Museum of Nature & Science to determine if previously suspected oral microbiome associations with an individual's demographics, lifestyle, and/or genetics are robust and generalizable enough to be detected within a general population. Consistent with past research, we found bacterial composition to be more diverse in youth microbiomes when compared to adults. Adult oral microbiomes were predominantly impacted by oral health habits, while youth microbiomes were impacted by biological sex and weight status. The oral pathogen Treponema was detected more commonly in adults without recent dentist visits and in obese youth. Additionally, oral microbiomes from participants of the same family were more similar to each other than to oral microbiomes from non-related individuals. These results suggest that previously reported oral microbiome associations are observable in a human population containing the natural variation commonly found in the general public. Furthermore, these results support the use of crowdsourced data as a valid methodology to obtain community-based microbiome data
Context Embedding Networks
Low dimensional embeddings that capture the main variations of interest in
collections of data are important for many applications. One way to construct
these embeddings is to acquire estimates of similarity from the crowd. However,
similarity is a multi-dimensional concept that varies from individual to
individual. Existing models for learning embeddings from the crowd typically
make simplifying assumptions such as all individuals estimate similarity using
the same criteria, the list of criteria is known in advance, or that the crowd
workers are not influenced by the data that they see. To overcome these
limitations we introduce Context Embedding Networks (CENs). In addition to
learning interpretable embeddings from images, CENs also model worker biases
for different attributes along with the visual context i.e. the visual
attributes highlighted by a set of images. Experiments on two noisy crowd
annotated datasets show that modeling both worker bias and visual context
results in more interpretable embeddings compared to existing approaches.Comment: CVPR 2018 spotligh
A Similarity Measure for Material Appearance
We present a model to measure the similarity in appearance between different
materials, which correlates with human similarity judgments. We first create a
database of 9,000 rendered images depicting objects with varying materials,
shape and illumination. We then gather data on perceived similarity from
crowdsourced experiments; our analysis of over 114,840 answers suggests that
indeed a shared perception of appearance similarity exists. We feed this data
to a deep learning architecture with a novel loss function, which learns a
feature space for materials that correlates with such perceived appearance
similarity. Our evaluation shows that our model outperforms existing metrics.
Last, we demonstrate several applications enabled by our metric, including
appearance-based search for material suggestions, database visualization,
clustering and summarization, and gamut mapping.Comment: 12 pages, 17 figure
Improving Entity Retrieval on Structured Data
The increasing amount of data on the Web, in particular of Linked Data, has
led to a diverse landscape of datasets, which make entity retrieval a
challenging task. Explicit cross-dataset links, for instance to indicate
co-references or related entities can significantly improve entity retrieval.
However, only a small fraction of entities are interlinked through explicit
statements. In this paper, we propose a two-fold entity retrieval approach. In
a first, offline preprocessing step, we cluster entities based on the
\emph{x--means} and \emph{spectral} clustering algorithms. In the second step,
we propose an optimized retrieval model which takes advantage of our
precomputed clusters. For a given set of entities retrieved by the BM25F
retrieval approach and a given user query, we further expand the result set
with relevant entities by considering features of the queries, entities and the
precomputed clusters. Finally, we re-rank the expanded result set with respect
to the relevance to the query. We perform a thorough experimental evaluation on
the Billions Triple Challenge (BTC12) dataset. The proposed approach shows
significant improvements compared to the baseline and state of the art
approaches
- âŠ