187,982 research outputs found
Natural Language Modeling Issue
In Uzbek linguistics, a number of studies have been carried out on automatic translation, the development of the linguistic foundations of the author's corpus, the processing of lexicographic texts and linguistic-statistical analysis. However, the processing of the Uzbek language as the language of the Internet: spelling, automatic processing and translation programs, search programs for various characters, text generation, the linguistic basis of the text corpus and national corpus, the technology of its software is not studied in any monograph. Discusses the transformation of language into the language of the Internet , computer technology, mathematical linguistics, its continuation and the formation and development of computer linguistics, in particular the question of modeling natural languages for artificial intelligence. The Uzbek National Corps plays an important role in enhancing the international status of the Uzbek language
Confluence of Vision and Natural Language Processing for Cross-media Semantic Relations Extraction
In this dissertation, we focus on extracting and understanding semantically meaningful relationships between data items of various modalities; especially relations between images and natural language. We explore the ideas and techniques to integrate such cross-media semantic relations for machine understanding of large heterogeneous datasets, made available through the expansion of the World Wide Web. The datasets collected from social media websites, news media outlets and blogging platforms usually contain multiple modalities of data. Intelligent systems are needed to automatically make sense out of these datasets and present them in such a way that humans can find the relevant pieces of information or get a summary of the available material. Such systems have to process multiple modalities of data such as images, text, linguistic features, and structured data in reference to each other. For example, image and video search and retrieval engines are required to understand the relations between visual and textual data so that they can provide relevant answers in the form of images and videos to the users\u27 queries presented in the form of text. We emphasize the automatic extraction of semantic topics or concepts from the data available in any form such as images, free-flowing text or metadata. These semantic concepts/topics become the basis of semantic relations across heterogeneous data types, e.g., visual and textual data. A classic problem involving image-text relations is the automatic generation of textual descriptions of images. This problem is the main focus of our work. In many cases, large amount of text is associated with images. Deep exploration of linguistic features of such text is required to fully utilize the semantic information encoded in it. A news dataset involving images and news articles is an example of this scenario. We devise frameworks for automatic news image description generation based on the semantic relations of images, as well as semantic understanding of linguistic features of the news articles
OptGAN:Optimizing and Interpreting the Latent Space of the Conditional Text-to-Image GANs
Text-to-image generation intends to automatically produce a photo-realistic
image, conditioned on a textual description. It can be potentially employed in
the field of art creation, data augmentation, photo-editing, etc. Although many
efforts have been dedicated to this task, it remains particularly challenging
to generate believable, natural scenes. To facilitate the real-world
applications of text-to-image synthesis, we focus on studying the following
three issues: 1) How to ensure that generated samples are believable, realistic
or natural? 2) How to exploit the latent space of the generator to edit a
synthesized image? 3) How to improve the explainability of a text-to-image
generation framework? In this work, we constructed two novel data sets (i.e.,
the Good & Bad bird and face data sets) consisting of successful as well as
unsuccessful generated samples, according to strict criteria. To effectively
and efficiently acquire high-quality images by increasing the probability of
generating Good latent codes, we use a dedicated Good/Bad classifier for
generated images. It is based on a pre-trained front end and fine-tuned on the
basis of the proposed Good & Bad data set. After that, we present a novel
algorithm which identifies semantically-understandable directions in the latent
space of a conditional text-to-image GAN architecture by performing independent
component analysis on the pre-trained weight values of the generator.
Furthermore, we develop a background-flattening loss (BFL), to improve the
background appearance in the edited image. Subsequently, we introduce linear
interpolation analysis between pairs of keywords. This is extended into a
similar triangular `linguistic' interpolation in order to take a deep look into
what a text-to-image synthesis model has learned within the linguistic
embeddings. Our data set is available at
https://zenodo.org/record/6283798#.YhkN_ujMI2w.Comment: 18 page
Language-based multimedia information retrieval
This paper describes various methods and approaches for language-based multimedia information retrieval, which have been developed in the projects POP-EYE and OLIVE and which will be developed further in the MUMIS project. All of these project aim at supporting automated indexing of video material by use of human language technologies. Thus, in contrast to image or sound-based retrieval methods, where both the query language and the indexing methods build on non-linguistic data, these methods attempt to exploit advanced text retrieval technologies for the retrieval of non-textual material. While POP-EYE was building on subtitles or captions as the prime language key for disclosing video fragments, OLIVE is making use of speech recognition to automatically derive transcriptions of the sound tracks, generating time-coded linguistic elements which then serve as the basis for text-based retrieval functionality
Improvising Linguistic Style: Social and Affective Bases for Agent Personality
This paper introduces Linguistic Style Improvisation, a theory and set of
algorithms for improvisation of spoken utterances by artificial agents, with
applications to interactive story and dialogue systems. We argue that
linguistic style is a key aspect of character, and show how speech act
representations common in AI can provide abstract representations from which
computer characters can improvise. We show that the mechanisms proposed
introduce the possibility of socially oriented agents, meet the requirements
that lifelike characters be believable, and satisfy particular criteria for
improvisation proposed by Hayes-Roth.Comment: 10 pages, uses aaai.sty, lingmacros.sty, psfig.st
A Flexible Shallow Approach to Text Generation
In order to support the efficient development of NL generation systems, two
orthogonal methods are currently pursued with emphasis: (1) reusable, general,
and linguistically motivated surface realization components, and (2) simple,
task-oriented template-based techniques. In this paper we argue that, from an
application-oriented perspective, the benefits of both are still limited. In
order to improve this situation, we suggest and evaluate shallow generation
methods associated with increased flexibility. We advise a close connection
between domain-motivated and linguistic ontologies that supports the quick
adaptation to new tasks and domains, rather than the reuse of general
resources. Our method is especially designed for generating reports with
limited linguistic variations.Comment: LaTeX, 10 page
- …