Search CORE

17 research outputs found

Multi-modal joint embedding for fashion product retrieval

Author: Moreno-Noguer Francesc
Rubio Romano Antonio
Simó Serra Edgar
Yu Longlong
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

© 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Finding a product in the fashion world can be a daunting task. Everyday, e-commerce sites are updating with thousands of images and their associated metadata (textual information), deepening the problem, akin to finding a needle in a haystack. In this paper, we leverage both the images and textual meta-data and propose a joint multi-modal embedding that maps both the text and images into a common latent space. Distances in the latent space correspond to similarity between products, allowing us to effectively perform retrieval in this latent space, which is both efficient and accurate. We train this embedding using large-scale real world e-commerce data by both minimizing the similarity between related products and using auxiliary classification networks to that encourage the embedding to have semantic meaning. We compare against existing approaches and show significant improvements in retrieval tasks on a large-scale e-commerce dataset. We also provide an analysis of the different metadata.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Digital.CSIC

Accurate fashion and accessories detection for mobile application based on deep learning

Author: Jongsawat Nipat
Thwe Yamin
Tungkasthan Anucha
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 31/03/2023
Field of study

Detection and classification have an essential role in the world of e-commerce applications. The recommendation method that is commonly used is based on information text attached to a product. This results in several recommendation errors caused by invalid text information. In this study, we propose the development of a fashion category (FC-YOLOv4) model in providing category recommendations to sellers based on fashion accessory images. The resulting model was then compared to YOLOv3 and YOLOv4 on mobile devices. The dataset we use is a collection of 13,689, which consists of five fashion categories and five accessories' categories. Accuracy and speed analysis were performed by looking at mean average precision (mAP) values, intersection over union (IoU), model size, loading time, average RAM usage, and maximum RAM usage. From the experimental results, an increase in mAP was obtained by 99.84% and an IoU of 88.49 when compared to YOLOv3 and YOLOv4. Based on these results, it can be seen that the models we propose can accurately identify fashion and accessories categories. The main advantage of this paper lies in i) providing a model with a high level of accuracy and ii) the experimental results presented on a smartphone

Institute of Advanced Engineering and Science

Classifying Garments from Fashion-MNIST Dataset Through CNNs

Author: LEITHARDT VALDERI
Publication venue: 'ASTES Journal'
Publication date: 26/07/2021
Field of study

Repositório Comum

Towards Decrypting Attractiveness via Multi-Modality Cue

Author: Liu Si
Nguyen Tam
Ni Bingbing
Rui Yong
Tan Jun
Yan Shuicheng
Publication venue: eCommons
Publication date: 01/08/2013
Field of study

Decrypting the secret of beauty or attractiveness has been the pursuit of artists and philosophers for centuries. To date, the computational model for attractiveness estimation has been actively explored in the computer vision and multimedia community, yet with the focus mainly on facial features. In this article, we conduct a comprehensive study on female attractiveness conveyed by single/multiple modalities of cues, that is, face, dressing and/or voice; the aim is to discover how different modalities individually and collectively affect the human sense of beauty. To extensively investigate the problem, we collect the Multi-Modality Beauty (M2B) dataset, which is annotated with attractiveness levels converted from manual k-wise ratings and semantic attributes of different modalities. Inspired by the common consensus that middle-level attribute prediction can assist higher-level computer vision tasks, we manually labeled many attributes for each modality. Next, a tri-layer Dual-supervised Feature-Attribute-Task (DFAT) network is proposed to jointly learn the attribute model and attractiveness model of single/multiple modalities. To remedy possible loss of information caused by incomplete manual attributes, we also propose a novel Latent Dual-supervised Feature-Attribute-Task (LDFAT) network, where latent attributes are combined with manual attributes to contribute to the final attractiveness estimation. The extensive experimental evaluations on the collected M2B dataset well demonstrate the effectiveness of the proposed DFAT and LDFAT networks for female attractiveness prediction

University of Dayton

Recommended from our members

Towards solving computer vision problems: datasets, labels, algorithms, and applications

Author: Kwak Iljung Samuel
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

The solution to a supervised computer vision problem consists of an application, algorithm, input data, and a set of human generated labels. Solving these kinds of tasks involves collecting large quantities of data, collecting appropriate labels, and developing machine vision algorithms tailored to the application. Progress on these problems has often benefited from large scale datasets with high fidelity labels. Successful algorithms display a synergy between application goals and the size and quality of the dataset. This thesis presents work highlighting the importance of each component of a supervised vision task.First, the problem of automatically classifying groups of people into social categories is introduced. This problem is called Urban Tribe Classification. To tackle this problem, each individual and the entire group of individuals are modeled. Since this was a newly introduced computer vision problem, a dataset for this task was created. On this dataset, the combined representation of group and individuals outperforms using only the person representations. This model showed promising results for automatic subculture classification.Second, the problem of creating perceptual embeddings based on human similarity judgements is tackled. This work focuses on triplet similarity comparisons of the form ``Is object

i

more similar to

j

k

?'', which have been useful for computer vision and machine learning applications. Unfortunately, triplet similarity comparisons, like many human labeling efforts, can be prohibitively expensive. This work proposes two techniques for dealing with this obstacle. First, an alternative display for collecting triplets is designed. This display shows a probe image and a grid of query images, allowing the user to collect multiple triplets simultaneously. The display is shown to reduce the cost and time of triplet collection. In addition, higher quality embeddings are created with the improved triplet collection UI. A 10,000-food item dataset of human taste similarity was created using this UI. Second, ``SNaCK,'' a low-dimensional perceptual embedding algorithm that combines human expertise with automatic machine kernels, is introduced. Both parts are complementary: human insight can capture relationships that are not apparent from the object's visual similarity and the machine can help relieve the human from having to exhaustively specify many constraints. Finally, the precise localization of key frames of an action is explored. This work focuses on detecting the exact starting frame of a behavior, an important task for neuroscience research. To address this problem, a loss designed to penalize extra and missed action start detections over small misalignments. Recurrent neural networks (RNN) are trained to optimize this loss. The model is shown to reduce the number of false positives, an important criteria defined by the neuroscientist. The performance of the model is evaluated on a new dataset, the Mouse Reach Dataset, a large, annotated video dataset of mice performing a sequence of actions. The dataset was created for neuroscience research. On this dataset, the proposed model outperforms related approaches and baseline methods using an unstructured loss

eScholarship - University of California

LARGE SCALE VISUAL RECOGNITION OF CLOTHING, PEOPLE AND STYLES

Author: Kiapour Mohammadhadi
Publication venue: University of North Carolina at Chapel Hill Graduate School
Publication date: 01/01/2015
Field of study

Clothing recognition is a societally and commercially important yet extremely challenging problem due to large variations in clothing appearance, layering, style, body shape and pose. In this dissertation, we propose new computational vision approaches that learn to represent and recognize clothing items in images. First, we present an effective method for parsing clothing in fashion photographs, where we label the regions of an image with their clothing categories. We then extend our approach to tackle the clothing parsing problem using a data-driven methodology: for a query image, we find similar styles from a large database of tagged fashion images and use these examples to recognize clothing items in the query. Along with our novel large fashion dataset, we also present intriguing initial results on using clothing estimates to improve human pose identification. Second, we examine questions related to fashion styles and identifying the clothing elements associated with each style. We first design an online competitive style rating game called Hipster Wars to crowd source reliable human judgments of clothing styles. We use this game to collect a new dataset of clothing outfits with associated style ratings for different clothing styles. Next, we build visual style descriptors and train models that are able to classify clothing styles and identify the clothing elements are most discriminative in every style. Finally, we define a new task, Exact Street to Shop, where our goal is to match a real-world example of a garment item to the same exact garment in an online shop. This is an extremely challenging task due to visual differences between street photos that are taken of people wearing clothing in everyday uncontrolled settings, and online shop photos, which are captured by professionals in highly controlled settings. We introduce a novel large dataset for this application, collected from the web, and present a deep learning based similarity network that can compare clothing items across visual domains.Doctor of Philosoph

Carolina Digital Repository

Semantic Attributes for Transfer Learning in Visual Recognition

Author: Al Halah Ziad
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2018
Field of study

Angetrieben durch den Erfolg von Deep Learning Verfahren wurden in Bezug auf künstliche Intelligenz erhebliche Fortschritte im Bereich des Maschinenverstehens gemacht. Allerdings sind Tausende von manuell annotierten Trainingsdaten zwingend notwendig, um die Generalisierungsfähigkeit solcher Modelle sicherzustellen. Darüber hinaus muss das Modell jedes Mal komplett neu trainiert werden, sobald es auf eine neue Problemklasse angewandt werden muss. Dies führt wiederum dazu, dass der sehr kostenintensive Prozess des Sammelns und Annotierens von Trainingsdaten wiederholt werden muss, wodurch die Skalierbarkeit solcher Modelle erheblich begrenzt wird. Auf der anderen Seite bearbeiten wir Menschen neue Aufgaben nicht isoliert, sondern haben die bemerkenswerte Fähigkeit, auf bereits erworbenes Wissen bei der Lösung neuer Probleme zurückzugreifen. Diese Fähigkeit wird als Transfer-Learning bezeichnet. Sie ermöglicht es uns, schneller, besser und anhand nur sehr weniger Beispiele Neues zu lernen. Daher besteht ein großes Interesse, diese Fähigkeit durch Algorithmen nachzuahmen, insbesondere in Bereichen, in denen Trainingsdaten sehr knapp oder sogar nicht verfügbar sind. In dieser Arbeit untersuchen wir Transfer-Learning im Kontext von Computer Vision. Insbesondere untersuchen wir, wie visuelle Erkennung (z.B. Objekt- oder Aktionsklassifizierung) durchgeführt werden kann, wenn nur wenige oder keine Trainingsbeispiele existieren. Eine vielversprechende Lösung in dieser Richtung ist das Framework der semantischen Attribute. Dabei werden visuelle Kategorien in Form von Attributen wie Farbe, Muster und Form beschrieben. Diese Attribute können aus einer disjunkten Menge von Trainingsbeispielen gelernt werden. Da die Attribute eine doppelte, d.h. sowohl visuelle als auch semantische, Interpretation haben, kann Sprache effektiv genutzt werden, um den Übertragungsprozess zu steuern. Dies bedeutet, dass Modelle für eine neue visuelle Kategorie nur anhand der sprachlichen Beschreibung erstellt werden können, indem relevante Attribute selektiert und auf die neue Kategorie übertragen werden. Die Notwendigkeit von Trainingsbildern entfällt durch diesen Prozess jedoch vollständig. In dieser Arbeit stellen wir neue Lösungen vor, semantische Attribute zu modellieren, zu übertragen, automatisch mit visuellen Kategorien zu assoziieren, und aus sprachlichen Beschreibungen zu erkennen. Zu diesem Zweck beleuchten wir die attributbasierte Erkennung aus den folgenden vier Blickpunkten: 1) Anders als das gängige Modell, bei dem Attribute global gelernt werden müssen, stellen wir einen hierarchischen Ansatz vor, der es ermöglicht, die Attribute auf verschiedenen Abstraktionsebenen zu lernen. Wir zeigen zudem, wie die Struktur zwischen den Kategorien effektiv genutzt werden kann, um den Lern- und Transferprozess zu steuern und damit diskriminative Modelle für neue Kategorien zu erstellen. Mit einer gründlichen experimentellen Analyse demonstrieren wir eine deutliche Verbesserung unseres Modells gegenüber dem globalen Ansatz, insbesondere bei der Erkennung detailgenauer Kategorien. 2) In vorherrschend attributbasierten Transferansätzen überwacht der Benutzer die Zuordnung zwischen den Attributen und den Kategorien. Wir schlagen in dieser Arbeit vor, die Verbindung zwischen den beiden automatisch und ohne Benutzereingriff herzustellen. Unser Modell erfasst die semantischen Beziehungen, welche die Attribute mit Objekten koppeln, um ihre Assoziationen vorherzusagen und unüberwacht auszuwählen welche Attribute übertragen werden sollen. 3) Wir umgehen die Notwendigkeit eines vordefinierten Vokabulars von Attributen. Statt dessen schlagen wir vor, Enyzklopädie-Artikel zu verwenden, die Objektkategorien in einem freien Text beschreiben, um automatisch eine Menge von diskriminanten, salienten und vielfältigen Attributen zu entdecken. Diese Beseitigung des Bedarfs eines benutzerdefinierten Vokabulars ermöglicht es uns, das Potenzial attributbasierter Modelle im Kontext sehr großer Datenmengen vollends auszuschöpfen. 4) Wir präsentieren eine neuartige Anwendung semantischer Attribute in der realen Welt. Wir schlagen das erste Verfahren vor, welches automatisch Modestile lernt, und vorhersagt, wie sich ihre Beliebtheit in naher Zukunft entwickeln wird. Wir zeigen, dass semantische Attribute interpretierbare Modestile liefern und zu einer besseren Vorhersage der Beliebtheit von visuellen Stilen im Vergleich zu anderen Darstellungen führen

KITopen

The relationship between types of needs and consumer choices in apparel industry of Bangladesh

Author
Publication venue: University of Northern British Columbia
Publication date: 01/01/2021
Field of study

Consumer behavior signifies the way of people’s purchasing and consuming products and services. It is a hotbed of research which is intensely associated with human psychology and is essential for companies that are trying to sell their products or services to as many consumers as possible. Since various facets of consumers’ lives affect what they purchase and why they purchase, research on consumer behavior resolves the issues of understanding – how individuals respond to advertising and marketing, individuality – if the process can determine consumers’ personalities, social status, decision-making process. Consumer behavior research is important to determine how best to sell products or services by influencing consumers’ fears, their least healthy habits or their worst tendencies. This study has examined the relationship between consumer needs and consumer purchase behavior in terms of consumer choices of Bangladeshi customers for apparel market. Types of consumers’ choices and types of consumers’ needs are associated with each other. Types of needs are the underlying determinant of types of choices to satisfy consumers’ apparel necessity and these three categories of needs, that are recognised to satisfy apparel necessity (i.e. functional needs, social needs and experiential needs), are fulfilled by consumers through three recognised categories of choices (i.e. choice freedom, choice difficulty and choice confidence). Different categories of consumers have different types of needs and they behave differently while purchasing clothes, therefore, this study has figured out how consumers of Bangladesh make their choices according to their needs in apparel purchases. This study also examined how income levels work as a moderator while consumers make decisions. The purpose of this thesis is to shed light on the relationship of the consumers’ needs and choices. The key objective is to investigate how the correlation between consumers’ needs and choices influences the insights of consumers and their decision-making process. This study also inspected the affiliation between the needs and choices for the population with respect to certain key questions. Consequently, it identified a better understanding of the links between needs and choices and set a special consideration of how choices interact with various situations which is very important to sell the apparel products in best way. The study employed a survey research design which is quantitative in nature. Quantitative survey quantifies the problems by generating numerical data which can be converted into functional statistics and it is mostly used to measure attitudes, opinions, behaviors and any other defined variables (DeFranzo, 2011). The data was collected through a structured Likert-scale questionnaire which fulfilled the quantitative research nature. The included questions were related to the types of needs and choices theories, with focus on consumers’ purchase behavior. Therefore, it was able to investigate quantitatively due to the nature of the research. Respondents completed the questionnaire which was administrated via online electronic form through SurveyMonkey. After collecting the data, it ran ordinal regression models on the basis of dependent and independent variables of the questionnaire and that provided a clear indication of consumers’ actual preference structure. This research paper was demonstrated with the broad overview about the consumers’ choices according to the types of their needs. The study found that types of needs are positively associated with types of choices and low-income level and high-income level have decreased and increased impact accordingly with the positive relationships of needs and choices. The findings of the paper offer some valuable considerations for related theories, especially on topic of consumers’ purchase behavior by highlighting the contextual differences between needs and choices and by highlighting the other influencing factors associated with these. It contributes to build up new concepts of consumer purchase behavior theory in terms of branding theory and consumers’ needs and selection process also demonstrates that how types of needs influence consumers’ purchase decision through types of choices. Moreover, new concepts or strategies and psychological explanations of consumers help the managers to sell their products appropriately

Arca British Columbia's network of post-secondary digital repositories