4,934 research outputs found
Aerospace medicine and biology: A continuing bibliography with indexes (supplement 368)
This bibliography lists 305 reports, articles, and other documents introduced into the NASA Scientific and Technical Information System during Sep. 1992. The subject coverage concentrates on the biological, physiological, psychological, and environmental effects to which humans are subjected during and following simulated or actual flight in the Earth's atmosphere or in interplanetary space. References describing similar effects on biological organisms of lower order are also included. Such related topics as sanitary problems, pharmacology, toxicology, safety and survival, life support systems, exobiology, and personnel factors receive appropriate attention. Applied research receives the most emphasis, but references to fundamental studies and theoretical principles related to experimental development also qualify for inclusion
Holistic Temporal Situation Interpretation for Traffic Participant Prediction
For a profound understanding of traffic situations including a prediction of traf-
fic participants’ future motion, behaviors and routes it is crucial to incorporate all
available environmental observations. The presence of sensor noise and depen-
dency uncertainties, the variety of available sensor data, the complexity of large
traffic scenes and the large number of different estimation tasks with diverging
requirements require a general method that gives a robust foundation for the de-
velopment of estimation applications.
In this work, a general description language, called Object-Oriented Factor Graph
Modeling Language (OOFGML), is proposed, that unifies formulation of esti-
mation tasks from the application-oriented problem description via the choice
of variable and probability distribution representation through to the inference
method definition in implementation. The different language properties are dis-
cussed theoretically using abstract examples.
The derivation of explicit application examples is shown for the automated driv-
ing domain. A domain-specific ontology is defined which forms the basis for
four exemplary applications covering the broad spectrum of estimation tasks in
this domain: Basic temporal filtering, ego vehicle localization using advanced
interpretations of perceived objects, road layout perception utilizing inter-object
dependencies and finally highly integrated route, behavior and motion estima-
tion to predict traffic participant’s future actions. All applications are evaluated
as proof of concept and provide an example of how their class of estimation tasks
can be represented using the proposed language. The language serves as a com-
mon basis and opens a new field for further research towards holistic solutions
for automated driving
Continuous Melody Generation via Disentangled Short-Term Representations and Structural Conditions
Automatic music generation is an interdisciplinary research topic that
combines computational creativity and semantic analysis of music to create
automatic machine improvisations. An important property of such a system is
allowing the user to specify conditions and desired properties of the generated
music. In this paper we designed a model for composing melodies given a user
specified symbolic scenario combined with a previous music context. We add
manual labeled vectors denoting external music quality in terms of chord
function that provides a low dimensional representation of the harmonic tension
and resolution. Our model is capable of generating long melodies by regarding
8-beat note sequences as basic units, and shares consistent rhythm pattern
structure with another specific song. The model contains two stages and
requires separate training where the first stage adopts a Conditional
Variational Autoencoder (C-VAE) to build a bijection between note sequences and
their latent representations, and the second stage adopts long short-term
memory networks (LSTM) with structural conditions to continue writing future
melodies. We further exploit the disentanglement technique via C-VAE to allow
melody generation based on pitch contour information separately from
conditioning on rhythm patterns. Finally, we evaluate the proposed model using
quantitative analysis of rhythm and the subjective listening study. Results
show that the music generated by our model tends to have salient repetition
structures, rich motives, and stable rhythm patterns. The ability to generate
longer and more structural phrases from disentangled representations combined
with semantic scenario specification conditions shows a broad application of
our model.Comment: 9 pages, 12 figures, 4 tables. in 14th international conference on
semantic computing, ICSC 202
NExT-GPT: Any-to-Any Multimodal LLM
While recently Multimodal Large Language Models (MM-LLMs) have made exciting
strides, they mostly fall prey to the limitation of only input-side multimodal
understanding, without the ability to produce content in multiple modalities.
As we humans always perceive the world and communicate with people through
various modalities, developing any-to-any MM-LLMs capable of accepting and
delivering content in any modality becomes essential to human-level AI. To fill
the gap, we present an end-to-end general-purpose any-to-any MM-LLM system,
NExT-GPT. We connect an LLM with multimodal adaptors and different diffusion
decoders, enabling NExT-GPT to perceive inputs and generate outputs in
arbitrary combinations of text, images, videos, and audio. By leveraging the
existing well-trained highly-performing encoders and decoders, NExT-GPT is
tuned with only a small amount of parameter (1%) of certain projection layers,
which not only benefits low-cost training and also facilitates convenient
expansion to more potential modalities. Moreover, we introduce a
modality-switching instruction tuning (MosIT) and manually curate a
high-quality dataset for MosIT, based on which NExT-GPT is empowered with
complex cross-modal semantic understanding and content generation. Overall, our
research showcases the promising possibility of building an AI agent capable of
modeling universal modalities, paving the way for more human-like AI research
in the community. Project page: https://next-gpt.github.io/Comment: work in progres
A Strong Transfer Baseline for RGB-D Fusion in Vision Transformers
The Vision Transformer (ViT) architecture has recently established its place in the computer vision literature, with multiple architectures for recognition of image data or other visual modalities. However, training ViTs for RGB-D object recognition remains an understudied topic, viewed in recent literature only through the lens of multi-task pretraining in multiple modalities. Such approaches are often computationally intensive and have not yet been applied for challenging object-level classification tasks. In this work, we propose a simple yet strong recipe for transferring pretrained ViTs in RGB-D domains for single-view 3D object recognition, focusing on fusing RGB and depth representations encoded jointly by the ViT. Compared to previous works in multimodal Transformers, the key challenge here is to use the atested flexibility of ViTs to capture cross-modal interactions at the downstream and not the pretraining stage. We explore which depth representation is better in terms of resulting accuracy and compare two methods for injecting RGB-D fusion within the ViT architecture (i.e., early vs. late fusion). Our results in the Washington RGB-D Objects dataset demonstrates that in such RGB → RGB-D scenarios, late fusion techniques work better than most popularly employed early fusion. With our transfer baseline, adapted ViTs score up to 95.1\% top-1 accuracy in Washington, achieving new state-of-the-art results in this benchmark. We additionally evaluate our approach with an open-ended lifelong learning protocol, where we show that our adapted RGB-D encoder leads to features that outperform unimodal encoders, even without explicit fine-tuning. We further integrate our method with a robot framework and demonstrate how it can serve as a perception utility in an interactive robot learning scenario, both in simulation and with a real robot
System integration report
Several areas that arise from the system integration issue were examined. Intersystem analysis is discussed as it relates to software development, shared data bases and interfaces between TEMPUS and PLAID, shaded graphics rendering systems, object design (BUILD), the TEMPUS animation system, anthropometric lab integration, ongoing TEMPUS support and maintenance, and the impact of UNIX and local workstations on the OSDS environment
- …