697 research outputs found
Dual Pose-invariant Embeddings: Learning Category and Object-specific Discriminative Representations for Recognition and Retrieval
In the context of pose-invariant object recognition and retrieval, we
demonstrate that it is possible to achieve significant improvements in
performance if both the category-based and the object-identity-based embeddings
are learned simultaneously during training. In hindsight, that sounds intuitive
because learning about the categories is more fundamental than learning about
the individual objects that correspond to those categories. However, to the
best of what we know, no prior work in pose-invariant learning has demonstrated
this effect. This paper presents an attention-based dual-encoder architecture
with specially designed loss functions that optimize the inter- and intra-class
distances simultaneously in two different embedding spaces, one for the
category embeddings and the other for the object-level embeddings. The loss
functions we have proposed are pose-invariant ranking losses that are designed
to minimize the intra-class distances and maximize the inter-class distances in
the dual representation spaces. We demonstrate the power of our approach with
three challenging multi-view datasets, ModelNet-40, ObjectPI, and FG3D. With
our dual approach, for single-view object recognition, we outperform the
previous best by 20.0% on ModelNet40, 2.0% on ObjectPI, and 46.5% on FG3D. On
the other hand, for single-view object retrieval, we outperform the previous
best by 33.7% on ModelNet40, 18.8% on ObjectPI, and 56.9% on FG3D.Comment: Accepted by IEEE/CVF Conference on Computer Vision and Pattern
Recognition (CVPR 2024
Dimension-independent functional inequalities by tensorization and projection arguments
We study stability under tensorization and projection-type operations of
gradient-type estimates and other functional inequalities for Markov semigroups
on metric spaces. Using transportation-type inequalities obtained by F. Baudoin
and N. Eldredge in 2021, we prove that constants in the gradient estimates can
be chosen to be independent of the dimension. Our results are applicable to
hypoelliptic diffusions on sub-Riemannian manifolds and some hypocoercive
diffusions. As a byproduct, we obtain dimension-independent reverse
Poincar\'{e}, reverse logarithmic Sobolev, and gradient bounds for Lie groups
with a transverse symmetry and for non-isotropic Heisenberg groups.Comment: 28 page
Health Risk from Toxic Metals in Wild Rice Grown in Copper Mining-Impacted Sediments
Northern wild rice is of great dietary and cultural importance to the Native American population in the Upper Peninsula of Michigan. Millions of tons of mine tailings were discharged into Lake Superior and other inland lakes during the copper mining boom in the early 20th century in this area. This includes L’Anse Bay, located within the Keweenaw Bay Indian Community (KBIC) reservation. Since wild rice restoration is being encouraged by the KBIC, we investigated the distribution of toxic metals in sediments, water, and wild rice and their potential impact on human health from two locations. Sand Point sloughs on L’Anse Bay and a nearby inland lake, Lake Plumbago, were sampled for sediment, water, and wild rice, and the potential human health risk from dietary exposure to toxic metals in wild rice was assessed. Arsenic stood out as the element that had the highest bioaccumulation at both locations. Risk calculations showed that the hazard index (HI) value for wild rice seeds from both sites was high. Data indicate both carcinogenic and noncarcinogenic risks for As from wild rice in Sand Point sloughs and Lake Plumbago, and carcinogenic risks for Cd and Cr at Lake Plumbago
Deep Learning to Predict the Hydration and Performance of Fly Ash-Containing Cementitious Binders
Fly ash (FA) – an industrial byproduct – is used to partially substitute Portland cement (PC) in concrete to mitigate concrete\u27s environmental impact. Chemical composition and structure of FAs significantly impact hydration kinetics and compressive strength of concrete. Due to the substantial diversity in these physicochemical attributes of FAs, it has been challenging to develop a generic theoretical framework – and, therefore, theory-based analytical models – that could produce reliable, a priori predictions of properties of [PC + FA] binders. In recent years, machine learning (ML) – which is purely data-driven, as opposed to being derived from theorical underpinnings – has emerged as a promising tool to predict and optimize properties of complex, heterogenous materials, including the aforesaid binders. That said, there are two issues that stand in the way of widespread use of ML models: (1) ML models require thousands of data-records to learn input-output correlations and developing such a large, yet consistent database is impractical; and (2) ML models – while good at producing predictions – are unable to reveal the underlying correlation between composition/structure of material and its properties. This study employs a deep forest (DF) model to predict composition- and time-dependent hydration kinetics and compressive strength of [PC + FA] binders. Data dimensionality-reduction and segmentation techniques – premised on theoretical understanding of composition-structure correlations in FAs, and hydration mechanism of PC – are used to boost the DF model\u27s prediction performance. And, finally, through inference of the intermediate and final outputs of the DF model, a simple, closed-form analytical model is developed to predict compressive strength, and reveal the correlations between mixture design and compressive strength of [PC + FA] binders
Agent AI: Surveying the Horizons of Multimodal Interaction
Multi-modal AI systems will likely become a ubiquitous presence in our
everyday lives. A promising approach to making these systems more interactive
is to embody them as agents within physical and virtual environments. At
present, systems leverage existing foundation models as the basic building
blocks for the creation of embodied agents. Embedding agents within such
environments facilitates the ability of models to process and interpret visual
and contextual data, which is critical for the creation of more sophisticated
and context-aware AI systems. For example, a system that can perceive user
actions, human behavior, environmental objects, audio expressions, and the
collective sentiment of a scene can be used to inform and direct agent
responses within the given environment. To accelerate research on agent-based
multimodal intelligence, we define "Agent AI" as a class of interactive systems
that can perceive visual stimuli, language inputs, and other
environmentally-grounded data, and can produce meaningful embodied actions. In
particular, we explore systems that aim to improve agents based on
next-embodied action prediction by incorporating external knowledge,
multi-sensory inputs, and human feedback. We argue that by developing agentic
AI systems in grounded environments, one can also mitigate the hallucinations
of large foundation models and their tendency to generate environmentally
incorrect outputs. The emerging field of Agent AI subsumes the broader embodied
and agentic aspects of multimodal interactions. Beyond agents acting and
interacting in the physical world, we envision a future where people can easily
create any virtual reality or simulated scene and interact with agents embodied
within the virtual environment
- …