122 research outputs found
VISIR : visual and semantic image label refinement
The social media explosion has populated the Internet with a wealth of images. There are two existing paradigms for image retrieval: 1) content-based image retrieval (CBIR), which has traditionally used visual features for similarity search (e.g., SIFT features), and 2) tag-based image retrieval (TBIR), which has relied on user tagging (e.g., Flickr tags). CBIR now gains semantic expressiveness by advances in deep-learning-based detection of visual labels. TBIR benefits from query-and-click logs to automatically infer more informative labels. However, learning-based tagging still yields noisy labels and is restricted to concrete objects, missing out on generalizations and abstractions. Click-based tagging is limited to terms that appear in the textual context of an image or in queries that lead to a click. This paper addresses the above limitations by semantically refining and expanding the labels suggested by learning-based object detection. We consider the semantic coherence between the labels for different objects, leverage lexical and commonsense knowledge, and cast the label assignment into a constrained optimization problem solved by an integer linear program. Experiments show that our method, called VISIR, improves the quality of the state-of-the-art visual labeling tools like LSDA and YOLO
Socializing the Semantic Gap: A Comparative Survey on Image Tag Assignment, Refinement and Retrieval
Where previous reviews on content-based image retrieval emphasize on what can
be seen in an image to bridge the semantic gap, this survey considers what
people tag about an image. A comprehensive treatise of three closely linked
problems, i.e., image tag assignment, refinement, and tag-based image retrieval
is presented. While existing works vary in terms of their targeted tasks and
methodology, they rely on the key functionality of tag relevance, i.e.
estimating the relevance of a specific tag with respect to the visual content
of a given image and its social context. By analyzing what information a
specific method exploits to construct its tag relevance function and how such
information is exploited, this paper introduces a taxonomy to structure the
growing literature, understand the ingredients of the main works, clarify their
connections and difference, and recognize their merits and limitations. For a
head-to-head comparison between the state-of-the-art, a new experimental
protocol is presented, with training sets containing 10k, 100k and 1m images
and an evaluation on three test sets, contributed by various research groups.
Eleven representative works are implemented and evaluated. Putting all this
together, the survey aims to provide an overview of the past and foster
progress for the near future.Comment: to appear in ACM Computing Survey
The detection of high-qualified indels in exomes and their effect on cognition
Plusieurs insertions/délétions (indels) génétiques ont été identifiées en lien avec des troubles du
neurodéveloppement, notamment le trouble du spectre de l’autisme (TSA) et la déficience
intellectuelle (DI). Bien que ce soit le deuxième type de variant le plus courant, la détection et
l’identification des indels demeure difficile à ce jour, et on y retrouve un grand nombre de faux
positifs. Ce projet vise à trouver une méthode pour détecter des indels de haute qualité ayant une
forte probabilité d’être des vrais positifs.
Un « ensemble de vérité » a été construit à partir d’indels provenant de deux cohortes familiales
basé sur un diagnostic d’autisme. Ces indels ont été filtrés selon un ensemble de paramètres
prédéterminés et ils ont été appelés par plusieurs outils d’appel de variants. Cet ensemble a été
utilisé pour entraîner trois modèles d’apprentissage automatique pour identifier des indels de haute
qualité. Par la suite, nous avons utilisé ces modèles pour prédire des indels de haute qualité dans
une cohorte de population générale, ayant été appelé par une technologie d’appel de variant.
Les modèles ont pu identifier des indels de meilleure qualité qui ont une association avec le QI,
malgré que cet effet soit petit. De plus, les indels prédits par les modèles affectent un plus petit
nombre de gènes par individu que ceux ayant été filtrés par un seuil de rejet fixe. Les modèles ont
tendance à améliorer la qualité des indels, mais nécessiteront davantage de travail pour déterminer
si ce serait possible de prédire les indels qui ont un effet non-négligeable sur le QI.Genetic insertions/deletions (indels) have been linked to many neurodevelopmental
disorders (NDDs) such as autism spectrum disorder (ASD) and intellectual disability (ID).
However, although they are the second most common type of genetic variant, they remain to this
day difficult to identify and verify, presenting a high number of false positives. We sought to find
a method that would appropriately identify high-quality indels that are likely to be true positives.
We built an indel “truth set” using indels from two diagnosis-based family cohorts that
were filtered according to a set of threshold values and called by several variant calling tools in
order to train three machine learning models to identify the highest quality indels. The two best
performing models were then used to identify high quality indels in a general population cohort
that was called using only one variant calling technology.
The machine learning models were able to identify higher quality indels that showed a
association with IQ, although the effect size was small. The indels predicted by the models also
affected a much smaller number of genes per individual than those predicted through using
minimum thresholds alone. The models tend to show an overall improvement in the quality of the
indels but would require further work to see if it could a noticeable and significant effect on IQ
Less is MORE: a MultimOdal system for tag REfinement
With the proliferation of image-based social media, an ex-tremely large amount of multimodal data is being produced. Very oftenimage contents are published together with a set of user defined meta-data such as tags and textual descriptions. Despite being very useful toenhance traditional image retrieval, user defined tags on social mediahave been proven to be noneffective to index images because they areinfluenced by personal experiences of the owners as well as their will ofpromoting the published contents. To be analyzed and indexed, multi-modal data require algorithms able to jointly deal with textual and visualdata. This research presents a multimodal approach to the problem of tagrefinement, which consists in separating the relevant descriptors (tags)of images from noisy ones. The proposed method exploits both Natu-ral Language Processing (NLP) and Computer Vision (CV) techniquesbased on deep learning to find a match between the textual informationand visual content of social media posts. Textual semantic features arerepresented with (multilingual) word embeddings, while visual ones areobtained with image classification. The proposed system is evaluated ona manually annotated Italian dataset extracted from Instagram achieving68% of weighted F1-scor
- …