5,231 research outputs found
ImageSpirit: Verbal Guided Image Parsing
Humans describe images in terms of nouns and adjectives while algorithms
operate on images represented as sets of pixels. Bridging this gap between how
humans would like to access images versus their typical representation is the
goal of image parsing, which involves assigning object and attribute labels to
pixel. In this paper we propose treating nouns as object labels and adjectives
as visual attribute labels. This allows us to formulate the image parsing
problem as one of jointly estimating per-pixel object and attribute labels from
a set of training images. We propose an efficient (interactive time) solution.
Using the extracted labels as handles, our system empowers a user to verbally
refine the results. This enables hands-free parsing of an image into pixel-wise
object/attribute labels that correspond to human semantics. Verbally selecting
objects of interests enables a novel and natural interaction modality that can
possibly be used to interact with new generation devices (e.g. smart phones,
Google Glass, living room devices). We demonstrate our system on a large number
of real-world images with varying complexity. To help understand the tradeoffs
compared to traditional mouse based interactions, results are reported for both
a large scale quantitative evaluation and a user study.Comment: http://mmcheng.net/imagespirit
AviationGPT: A Large Language Model for the Aviation Domain
The advent of ChatGPT and GPT-4 has captivated the world with large language
models (LLMs), demonstrating exceptional performance in question-answering,
summarization, and content generation. The aviation industry is characterized
by an abundance of complex, unstructured text data, replete with technical
jargon and specialized terminology. Moreover, labeled data for model building
are scarce in this domain, resulting in low usage of aviation text data. The
emergence of LLMs presents an opportunity to transform this situation, but
there is a lack of LLMs specifically designed for the aviation domain. To
address this gap, we propose AviationGPT, which is built on open-source LLaMA-2
and Mistral architectures and continuously trained on a wealth of carefully
curated aviation datasets. Experimental results reveal that AviationGPT offers
users multiple advantages, including the versatility to tackle diverse natural
language processing (NLP) problems (e.g., question-answering, summarization,
document writing, information extraction, report querying, data cleaning, and
interactive data exploration). It also provides accurate and contextually
relevant responses within the aviation domain and significantly improves
performance (e.g., over a 40% performance gain in tested cases). With
AviationGPT, the aviation industry is better equipped to address more complex
research problems and enhance the efficiency and safety of National Airspace
System (NAS) operations
An analysis of the application of AI to the development of intelligent aids for flight crew tasks
This report presents the results of a study aimed at developing a basis for applying artificial intelligence to the flight deck environment of commercial transport aircraft. In particular, the study was comprised of four tasks: (1) analysis of flight crew tasks, (2) survey of the state-of-the-art of relevant artificial intelligence areas, (3) identification of human factors issues relevant to intelligent cockpit aids, and (4) identification of artificial intelligence areas requiring further research
Development of an NLP-driven computer-based test guide for visually impaired students
In recent years, advancements in Natural Language Processing (NLP) techniques
have revolutionized the field of accessibility and exclusivity of testing,
particularly for visually impaired students (VIS). CBT has shown in years back
its relevance in terms of administering exams electronically, making the test
process easier, providing quicker and more accurate results, and offering
greater flexibility and accessibility for candidates. Yet, its relevance was
not felt by the visually impaired students as they cannot access printed
documents. Hence, in this paper, we present an NLP-driven Computer-Based Test
guide for visually impaired students. It employs a speech technology
pre-trained methods to provide real-time assistance and support to visually
impaired students. The system utilizes NLP technologies to convert the
text-based questions and the associated options in a machine-readable format.
Subsequently, the speech technology pre-trained model processes the converted
text enabling the VIS to comprehend and analyze the content. Furthermore, we
validated that this pre-trained model is not perverse by testing for accuracy
using sample audio datasets labels (A, B, C, D, E, F, G) to compare with the
voice recordings obtained from 20 VIS which is been predicted by the system to
attain values for precision, recall, and F1-scores. These metrics are used to
assess the performance of the pre-trained model and have indicated that it is
proficient enough to give its better performance to the evaluated system. The
methodology adopted for this system is Object Oriented Analysis and Design
Methodology (OOADM) where Objects are discussed and built by modeling
real-world instances.Comment: 10 pages, 6 figure
- …