3,654 research outputs found
Multimodal non-linear latent semantic method for information retrieval
La búsqueda y recuperación de datos multimodales es una importante tarea dentro del campo de búsqueda y recuperación de información, donde las consultas y los elementos de la base de datos objetivo están representados por un conjunto de modalidades, donde cada una de ellas captura un aspecto de un fenómeno de interés. Cada modalidad contiene información complementaria y común a otras modalidades. Con el fin de tomar ventaja de
la información adicional distribuida a través de las distintas modalidades han sido desarrollados muchos algoritmos y métodos que utilizan las propiedades estadísticas en los datos multimodales para encontrar correlaciones implícitas, otros aprenden a calcular distancias heterogéneas, otros métodos aprenden a proyectar los datos desde el espacio de entrada hasta un espacio semántico común, donde las diferentes modalidades son comparables y se puede construir un ranking a partir de ellas. En esta tesis se presenta el diseño de un sistema para la búsqueda y recuperación de información multimodal que aprende varias proyecciones no lineales a espacios semánticos
latentes donde las distintas modalidades son representadas en conjunto y es posible realizar comparaciones y medidas de similitud para construir rankings multimodales. Adicionalmente se propone un método kernelizado para la proyección de datos a un espacio semántico latente usando la información de las etiquetas como método de supervisión para construir
índice multimodal que integra los datos multimodales y la información de las etiquetas; este método puede proyectar los datos a tres diferentes espacios semánticos donde varias configuraciones de búsqueda y recuperación de información pueden ser aplicadas. El sistema y el método propuestos fueron evaluados en un conjunto de datos compuesto por casos médicos, donde cada caso consta de una imagen de tejido prostático, un reporte de
texto del patólogo y un valor de Gleason score como etiqueta de supervisión. Combinando la información multimodal y la información en las etiquetas se generó un índice multimodal
que se utilizó para realizar la tarea de búsqueda y recuperación de información por contenido obteniendo resultados sobresalientes. Las proyecciones no-lineales permiten al modelo una mayor flexibilidad y capacidad de representación. Sin embargo calcular estas proyecciones no-lineales en un conjunto de datos enorme es computacionalmente costoso, para reducir este costo y habilitar el modelo para procesar datos a gran escala, la técnica del budget fue utilizada, mostrando un buen compromiso entre efectividad y velocidad.Multimodal information retrieval is an information retrieval sub-task where queries and database target elements are composed of several modalities or views. A modality is a representation of complex phenomena, captured and measured by different sensors or information sources, each one encodes some information about it. Each modality representation contains complementary and shared information about the phenomenon of interest,
this additional information can be used to improve the information retrieval process. Several methods have been developed to take advantage of additional information distributed across different modalities. Some of them exploit statistical properties in multimodal data to find correlations and implicit relationships, others learn heterogeneous distance functions, and others learn linear and non-linear projections that transform data from the original input space to a common latent semantic space where different modalities are comparable. In spite of the attention dedicated to this issue, multimodal information retrieval is still an open problem. This thesis presents a multimodal information retrieval system designed to learn several mapping functions to transform multimodal data to a latent semantic space, where different modalities are combined and can be compared to build a multimodal ranking and perform a multimodal information retrieval task. Additionally, a multimodal kernelized latent semantic embedding method is proposed to construct a supervised multimodal index, integrating
multimodal data and label supervision. This method can perform mappings to three different spaces where some information retrieval task setups can be performed.
The proposed system and method were evaluated in a multimodal medical case-based retrieval task where data is composed of whole-slide images of prostate tissue samples, pathologist’s text report and Gleason score as a supervised label. Multimodal data and labels were combined to produce a multimodal index. This index was used to retrieve multimodal information and achieves outstanding results compared with previous works on this topic. Non-linear mappings provide more flexibility and representation capacity to the proposed model. However, constructing the non-linear mapping in a large dataset using kernel methods can be computationally costly. To reduce the cost and allow large scale applications, the budget technique was introduced, showing good performance between speed and effectiveness.COLCIENCIASJóvenes investigadores 761/2016Línea de investigación: Ciencias de la computaciónMaestrí
Short-Video Marketing in E-commerce: Analyzing and Predicting Consumer Response
This study analyzes and predicts consumer viewing response to e-commerce short-videos (ESVs). We first construct a large-scale ESV dataset that contains 23,001 ESVs across 40 product categories. The dataset consists of the consumer response label in terms of average viewing durations and human-annotated ESV content attributes. Using the constructed dataset and mixed-effects model, we find that product description, product demonstration, pleasure, and aesthetics are four key determinants of ESV viewing duration. Furthermore, we design a content-based multimodal-multitask framework to predict consumer viewing response to ESVs. We propose the information distillation module to extract the shared, special, and conflicted information from ESV multimodal features. Additionally, we employ a hierarchical multitask classification module to capture feature-level and label-level dependencies. We conduct extensive experiments to evaluate the prediction performance of our proposed framework. Taken together, our paper provides theoretical and methodological contributions to the IS and relevant literature
Deep tree-ensembles for multi-output prediction
Recently, deep neural networks have expanded the state-of-art in various
scientific fields and provided solutions to long standing problems across
multiple application domains. Nevertheless, they also suffer from weaknesses
since their optimal performance depends on massive amounts of training data and
the tuning of an extended number of parameters. As a countermeasure, some
deep-forest methods have been recently proposed, as efficient and low-scale
solutions. Despite that, these approaches simply employ label classification
probabilities as induced features and primarily focus on traditional
classification and regression tasks, leaving multi-output prediction
under-explored. Moreover, recent work has demonstrated that tree-embeddings are
highly representative, especially in structured output prediction. In this
direction, we propose a novel deep tree-ensemble (DTE) model, where every layer
enriches the original feature set with a representation learning component
based on tree-embeddings. In this paper, we specifically focus on two
structured output prediction tasks, namely multi-label classification and
multi-target regression. We conducted experiments using multiple benchmark
datasets and the obtained results confirm that our method provides superior
results to state-of-the-art methods in both tasks
Towards Fairer and More Efficient Federated Learning via Multidimensional Personalized Edge Models
Federated learning (FL) is an emerging technique that trains massive and
geographically distributed edge data while maintaining privacy. However, FL has
inherent challenges in terms of fairness and computational efficiency due to
the rising heterogeneity of edges, and thus usually results in sub-optimal
performance in recent state-of-the-art (SOTA) solutions. In this paper, we
propose a Customized Federated Learning (CFL) system to eliminate FL
heterogeneity from multiple dimensions. Specifically, CFL tailors personalized
models from the specially designed global model for each client jointly guided
by an online trained model-search helper and a novel aggregation algorithm.
Extensive experiments demonstrate that CFL has full-stack advantages for both
FL training and edge reasoning and significantly improves the SOTA performance
w.r.t. model accuracy (up to 7.2% in the non-heterogeneous environment and up
to 21.8% in the heterogeneous environment), efficiency, and FL fairness.Comment: 8 pages, 7 figure
Real-Time Purchase Prediction Using Retail Video Analytics
The proliferation of video data in retail marketing brings opportunities for researchers to study customer behavior using rich video information. Our study demonstrates how to understand customer behavior of multiple dimensions using video analytics on a scalable basis. We obtained a unique video footage data collected from in-store cameras, resulting in approximately 20,000 customers involved and over 6,000 payments recorded. We extracted features on the demographics, appearance, emotion, and contextual dimensions of customer behavior from the video with state-of-the-art computer vision techniques and proposed a novel framework using machine learning and deep learning models to predict consumer purchase decision. Results showed that our framework makes accurate predictions which indicate the importance of incorporating emotional response into prediction. Our findings reveal multi-dimensional drivers of purchase decision and provide an implementable video analytics tool for marketers. It shows possibility of involving personalized recommendations that would potentially integrate our framework into omnichannel landscape
- …