276,423 research outputs found
Text-to-picture tools, systems, and approaches: a survey
Text-to-picture systems attempt to facilitate high-level, user-friendly communication between humans and computers while promoting understanding of natural language. These systems interpret a natural language text and transform it into a visual format as pictures or images that are either static or dynamic. In this paper, we aim to identify current difficulties and the main problems faced by prior systems, and in particular, we seek to investigate the feasibility of automatic visualization of Arabic story text through multimedia. Hence, we analyzed a number of well-known text-to-picture systems, tools, and approaches. We showed their constituent steps, such as knowledge extraction, mapping, and image layout, as well as their performance and limitations. We also compared these systems based on a set of criteria, mainly natural language processing, natural language understanding, and input/output modalities. Our survey showed that currently emerging techniques in natural language processing tools and computer vision have made promising advances in analyzing general text and understanding images and videos. Furthermore, important remarks and findings have been deduced from these prior works, which would help in developing an effective text-to-picture system for learning and educational purposes. - 2019, The Author(s).This work was made possible by NPRP grant #10-0205-170346 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors
Sign language lexical recognition with Propositional Dynamic Logic
International audienceThis paper explores the use of Propositional Dynamic Logic (PDL) as a suitable formal framework for describing Sign Language (SL), the language of deaf people, in the context of natural language processing. SLs are visual, complete, standalone languages which are just as expressive as oral languages. Signs in SL usually correspond to sequences of highly specific body postures interleaved with movements, which make reference to real world objects, characters or situations. Here we propose a formal representation of SL signs, that will help us with the analysis of automatically-collected hand tracking data from French Sign Language (FSL) video corpora. We further show how such a representation could help us with the design of computer aided SL verification tools, which in turn would bring us closer to the development of an automatic recognition system for these languages
Towards a Generation of Artificially Intelligent Strategy Tools: The SWOT Bot
Strategy tools are widely used to inform the complex and unstructured decision-making of firms. Although software has evolved to support strategy analysis, such digital strategy tools still require heavy manual work especially on the data input and processing levels, making their use time-intensive, costly, and susceptible to biases. This design research presents the âSWOT Botâ, a digital strategy tool that exploits recent advances in natural language processing (NLP) to perform a SWOT (strengths, weaknesses, opportunities, threats) analysis. Our artifact uses a feed reader, an NLP pipeline, and a visual interface to automatically extract information from a text corpus (e.g., analyst reports) and present it to the user. We argue that the SWOT Bot reduces time and adds objectivity to strategy analyses while allowing the human-in-the-loop to focus on value-adding tasks. Besides providing a functioning prototype, our work provides three general design principles for the development of next-generation digital strategy tools
DisKnow: a social-driven disaster support knowledge extraction system
This research is aimed at creating and presenting DisKnow, a data extraction system with the capability of filtering and abstracting tweets, to improve community resilience and decision-making in disaster scenarios. Nowadays most people act as human sensors, exposing detailed information regarding occurring disasters, in social media. Through a pipeline of natural language processing (NLP) tools for text processing, convolutional neural networks (CNNs) for classifying and extracting disasters, and knowledge graphs (KG) for presenting connected insights, it is possible to generate real-time visual information about such disasters and affected stakeholders, to better the crisis management process, by disseminating such information to both relevant authorities and population alike. DisKnow has proved to be on par with the state-of-the-art Disaster Extraction systems, and it contributes with a way to easily manage and present such happenings.info:eu-repo/semantics/publishedVersio
Recommended from our members
Towards Collaborative Generative AI for Vision-and-Language Studies
In recent years, the field of vision-and-language studies has witnessed significant advancements, aiming to bridge the gap between visual perception and linguistic understanding. These studies have explored various approaches to enhance the capabilities of AI systems in generating natural language or visual content, understanding multimodal scenarios, and conducting commonsense reasoning. Despite these advancements, there remains a crucial need for further progress to enable more collaborative and comprehensive interactions between vision and language modalities. This dissertation addresses this need through three primary contributions:First, I introduce the concept of machine imagination for natural language processing studies. Specifically, I present the use of visual information generated by machines for the automatic evaluation of natural language generation, natural language understanding, and natural language generation.Second, I explore the utilization of large language models (LLMs) to enhance the performance of vision and multimodal tasks. In particular, I examine the effectiveness of applying LLMs for prompt editing in text-to-image generation, compositional layout planning and generation, and vision-and-language navigation.Third, I outline my contributions to publicly available open-source vision-and-language research. Specifically, we introduce Multimodal C4, a large-scale multimodal dataset containing interleaved images and text, which we used to train the large-scale multimodal model OpenFlamingo. Additionally, we introduce VisIT-Bench, a public benchmark for evaluating instruction-following vision-language models in real-world applications.This dissertation aims to push the boundaries of vision-and-language integration, providing new insights and tools for developing more sophisticated AI systems capable of seamless multimodal interactions
New Methods and Tools for the World Wide Web Search
Explosive growth of the World Wide Web as well as its heterogeneity call for powerful and easy to use search tools capable to provide the user with a moderate number of relevant answers. This paper presents analysis of key aspects of recently developed Web search methods and tools: visual representation of subject trees, interactive user interfaces, linguistic approaches, image search, ranking and grouping of search results, database search, and scientific information retrieval. Current trends in Web search include topics such as exploiting Web hyperlinking structure, natural language processing, software agents, influence of XML markup language on search efficiency, and WAP search engines
Explainability of Vision Transformers: A Comprehensive Review and New Perspectives
Transformers have had a significant impact on natural language processing and
have recently demonstrated their potential in computer vision. They have shown
promising results over convolution neural networks in fundamental computer
vision tasks. However, the scientific community has not fully grasped the inner
workings of vision transformers, nor the basis for their decision-making, which
underscores the importance of explainability methods. Understanding how these
models arrive at their decisions not only improves their performance but also
builds trust in AI systems. This study explores different explainability
methods proposed for visual transformers and presents a taxonomy for organizing
them according to their motivations, structures, and application scenarios. In
addition, it provides a comprehensive review of evaluation criteria that can be
used for comparing explanation results, as well as explainability tools and
frameworks. Finally, the paper highlights essential but unexplored aspects that
can enhance the explainability of visual transformers, and promising research
directions are suggested for future investment.Comment: 20 pages,5 figure
From ChatGPT-3 to GPT-4: A Significant Advancement in AI-Driven NLP Tools
Recent improvements in Natural Language Processing (NLP) have led to the creation of powerful language models like Chat Generative Pre-training Transformer (ChatGPT), Googleâs BARD, Ernie which has shown to be very good at many different language tasks. But as language tasks get more complicated, having even more advanced NLP tool is essential nowadays. In this study, researchers look at how the latest versions of the GPT language model(GPT-4 & 5) can help with these advancements. The research method for this paper is based on a narrative analysis of the literature, which makes use of secondary data gathered from previously published studies including articles, websites, blogs, and visual and numerical facts etc. Findings of this study revealed that GPT-4 improves the model's training data, the speed with which it can be computed, the flawless answers that it provides with, and its overall performance. This study also shows that GPT-4 does much better than GPT-3.5 at translating languages, answering questions, and figuring out how people feel about things. The study provides a solid basis for building even more advanced NLP tools and programmes like GPT-5. The study will help the AI & LLM researchers, NLP developers and academicians in exploring more into this particular field of study. As this is the first kind of research comparing two NLP tools, therefore researchers suggested going for a quantitative research in the near future to validate the findings of this research
GeoAnnotator: A Collaborative Semi-Automatic Platform for Constructing Geo-Annotated Text Corpora
Ground-truth datasets are essential for the training and evaluation of any automated algorithm. As such, gold-standard annotated corpora underlie most advances in natural language processing (NLP). However, only a few relatively small (geo-)annotated datasets are available for geoparsing, i.e., the automatic recognition and geolocation of place references in unstructured text. The creation of geoparsing corpora that include both the recognition of place names in text and matching of those names to toponyms in a geographic gazetteer (a process we call geo-annotation), is a laborious, time-consuming and expensive task. The field lacks efficient geo-annotation tools to support corpus building and lacks design guidelines for the development of such tools. Here, we present the iterative design of GeoAnnotator, a web-based, semi-automatic and collaborative visual analytics platform for geo-annotation. GeoAnnotator facilitates collaborative, multi-annotator creation of large corpora of geo-annotated text by generating computationally-generated pre-annotations that can be improved by human-annotator users. The resulting corpora can be used in improving and benchmarking geoparsing algorithms as well as various other spatial language-related methods. Further, the iterative design process and the resulting design decisions can be used in annotation platforms tailored for other application domains of NLP
- âŠ