115 research outputs found
CKNet: A Convolutional Neural Network Based on Koopman Operator for Modeling Latent Dynamics from Pixels
With the development of end-to-end control based on deep learning, it is
important to study new system modeling techniques to realize dynamics modeling
with high-dimensional inputs. In this paper, a novel Koopman-based deep
convolutional network, called CKNet, is proposed to identify latent dynamics
from raw pixels. CKNet learns an encoder and decoder to play the role of the
Koopman eigenfunctions and modes, respectively. The Koopman eigenvalues can be
approximated by eigenvalues of the learned state transition matrix. The
deterministic convolutional Koopman network (DCKNet) and the variational
convolutional Koopman network (VCKNet) are proposed to span some subspace for
approximating the Koopman operator respectively. Because CKNet is trained under
the constraints of the Koopman theory, the identified latent dynamics is in a
linear form and has good interpretability. Besides, the state transition and
control matrices are trained as trainable tensors so that the identified
dynamics is also time-invariant. We also design an auxiliary weight term for
reducing multi-step linearity and prediction losses. Experiments were conducted
on two offline trained and four online trained nonlinear forced dynamical
systems with continuous action spaces in Gym and Mujoco environment
respectively, and the results show that identified dynamics are adequate for
approximating the latent dynamics and generating clear images. Especially for
offline trained cases, this work confirms CKNet from a novel perspective that
we visualize the evolutionary processes of the latent states and the Koopman
eigenfunctions with DCKNet and VCKNet separately to each task based on the same
episode and results demonstrate that different approaches learn similar
features in shapes.Comment: 8 pages, 7 figure
Analyzing Customer Needs of Product Ecosystems Using Online Product Reviews
It is necessary to analyze customer needs of a product ecosystem in order to increase customer satisfaction and user experience, which will, in turn, enhance its business strategy and profits. However, it is often time-consuming and challenging to identify and analyze customer needs of product ecosystems using traditional methods due to numerous products and services as well as their interdependence within the product ecosystem. In this paper, we analyzed customer needs of a product ecosystem by capitalizing on online product reviews of multiple products and services of the Amazon product ecosystem with machine learning techniques. First, we filtered the noise involved in the reviews using a fastText method to categorize the reviews into informative and uninformative regarding customer needs. Second, we extracted various customer needs related topics using a latent Dirichlet allocation technique. Third, we conducted sentiment analysis using a valence aware dictionary and sentiment reasoner method, which not only predicted the sentiment of the reviews, but also its intensity. Based on the first three steps, we classified customer needs using an analytical Kano model dynamically. The case study of Amazon product ecosystem showed the potential of the proposed method.https://deepblue.lib.umich.edu/bitstream/2027.42/153962/1/ANALYZING CUSTOMER NEEDS OF PRODUCT ECOSYSTEMS USING ONLINE PRODUCT REVIEWS.pdfDescription of ANALYZING CUSTOMER NEEDS OF PRODUCT ECOSYSTEMS USING ONLINE PRODUCT REVIEWS.pdf : Main articl
Team I2R-VI-FF Technical Report on EPIC-KITCHENS VISOR Hand Object Segmentation Challenge 2023
In this report, we present our approach to the EPIC-KITCHENS VISOR Hand
Object Segmentation Challenge, which focuses on the estimation of the relation
between the hands and the objects given a single frame as input. The
EPIC-KITCHENS VISOR dataset provides pixel-wise annotations and serves as a
benchmark for hand and active object segmentation in egocentric video. Our
approach combines the baseline method, i.e., Point-based Rendering (PointRend)
and the Segment Anything Model (SAM), aiming to enhance the accuracy of hand
and object segmentation outcomes, while also minimizing instances of missed
detection. We leverage accurate hand segmentation maps obtained from the
baseline method to extract more precise hand and in-contact object segments. We
utilize the class-agnostic segmentation provided by SAM and apply specific
hand-crafted constraints to enhance the results. In cases where the baseline
model misses the detection of hands or objects, we re-train an object detector
on the training set to enhance the detection accuracy. The detected hand and
in-contact object bounding boxes are then used as prompts to extract their
respective segments from the output of SAM. By effectively combining the
strengths of existing methods and applying our refinements, our submission
achieved the 1st place in terms of evaluation criteria in the VISOR HOS
Challenge
SIGIFSDP: A Service Id Guided Intelligent Forwarding Service Discovery Protocol in Pervasive Computing Environments
Service discovery constructs a bridge between the service providers and the service consumers, and is a key point in pervasive computing environments. In group-based service discovery protocols, selective forwarding service requests only based on the service group maybe lead to unnecessary forwarding, which produces large packet redundancy. This paper proposes an efficient service discovery protocol: SIGIFSDP (Service Id Guided Intelligent Forwarding Service Discovery Protocol). In SIGIFSDP, based on GSD, SIGIF (Service Id Guided Intelligent Forwarding) is introduced to select the exact forwarding nodes based on the service id. Theoretical analysis and simulation results using GloMosim verify that SIGIFSDP can save the response time, reduce the service request packets, and improve the efficiency of service discovery
Neural Operator Variational Inference based on Regularized Stein Discrepancy for Deep Gaussian Processes
Deep Gaussian Process (DGP) models offer a powerful nonparametric approach
for Bayesian inference, but exact inference is typically intractable,
motivating the use of various approximations. However, existing approaches,
such as mean-field Gaussian assumptions, limit the expressiveness and efficacy
of DGP models, while stochastic approximation can be computationally expensive.
To tackle these challenges, we introduce Neural Operator Variational Inference
(NOVI) for Deep Gaussian Processes. NOVI uses a neural generator to obtain a
sampler and minimizes the Regularized Stein Discrepancy in L2 space between the
generated distribution and true posterior. We solve the minimax problem using
Monte Carlo estimation and subsampling stochastic optimization techniques. We
demonstrate that the bias introduced by our method can be controlled by
multiplying the Fisher divergence with a constant, which leads to robust error
control and ensures the stability and precision of the algorithm. Our
experiments on datasets ranging from hundreds to tens of thousands demonstrate
the effectiveness and the faster convergence rate of the proposed method. We
achieve a classification accuracy of 93.56 on the CIFAR10 dataset,
outperforming SOTA Gaussian process methods. Furthermore, our method guarantees
theoretically controlled prediction error for DGP models and demonstrates
remarkable performance on various datasets. We are optimistic that NOVI has the
potential to enhance the performance of deep Bayesian nonparametric models and
could have significant implications for various practical application
Masked Diffusion with Task-awareness for Procedure Planning in Instructional Videos
A key challenge with procedure planning in instructional videos lies in how
to handle a large decision space consisting of a multitude of action types that
belong to various tasks. To understand real-world video content, an AI agent
must proficiently discern these action types (e.g., pour milk, pour water, open
lid, close lid, etc.) based on brief visual observation. Moreover, it must
adeptly capture the intricate semantic relation of the action types and task
goals, along with the variable action sequences. Recently, notable progress has
been made via the integration of diffusion models and visual representation
learning to address the challenge. However, existing models employ rudimentary
mechanisms to utilize task information to manage the decision space. To
overcome this limitation, we introduce a simple yet effective enhancement - a
masked diffusion model. The introduced mask acts akin to a task-oriented
attention filter, enabling the diffusion/denoising process to concentrate on a
subset of action types. Furthermore, to bolster the accuracy of task
classification, we harness more potent visual representation learning
techniques. In particular, we learn a joint visual-text embedding, where a text
embedding is generated by prompting a pre-trained vision-language model to
focus on human actions. We evaluate the method on three public datasets and
achieve state-of-the-art performance on multiple metrics. Code is available at
https://github.com/ffzzy840304/Masked-PDPP.Comment: 7 pages (main text excluding references), 3 figures, 7 table
SocioGlass: Social interaction assistance with face recognition on google glass
We present SocioGlass - a system built on Google Glass paired with a mobile phone that provides a user with in-situ information about an acquaintance in face-to-face communication. The system can recognize faces from the live feed of visual input. Accordingly, it retrieves relevant information about a person with a matching face in the database. In order to provide interaction assistance, multiple aspects of personal information are categorized based on its relevance to the interaction scenario or context. Thus, the system can be adapted to the social context in interaction assistance. The system can be used to help acquaintances build relationships, or to assist people with memory problems
- …