310 research outputs found
Figure mining for biomedical research
Motivation: Figures from biomedical articles contain valuable information difficult to reach without specialized tools. Currently, there is no search engine that can retrieve specific figure types. Results: This study describes a retrieval method that takes advantage of principles in image understanding, text mining and optical character recognition (OCR) to retrieve figure types defined conceptually. A search engine was developed to retrieve tables and figure types to aid computational and experimental research. Availability: http://iossifovlab.cshl.edu/figurome Contact: [email protected]
Image and information management system
A system and methods through which pictorial views of an object's configuration, arranged in a hierarchical fashion, are navigated by a person to establish a visual context within the configuration. The visual context is automatically translated by the system into a set of search parameters driving retrieval of structured data and content (images, documents, multimedia, etc.) associated with the specific context. The system places hot spots, or actionable regions, on various portions of the pictorials representing the object. When a user interacts with an actionable region, a more detailed pictorial from the hierarchy is presented representing that portion of the object, along with real-time feedback in the form of a popup pane containing information about that region, and counts-by-type reflecting the number of items that are available within the system associated with the specific context and search filters established at that point in time
Image and information management system
A system and methods through which pictorial views of an object's configuration, arranged in a hierarchical fashion, are navigated by a person to establish a visual context within the configuration. The visual context is automatically translated by the system into a set of search parameters driving retrieval of structured data and content (images, documents, multimedia, etc.) associated with the specific context. The system places ''hot spots'', or actionable regions, on various portions of the pictorials representing the object. When a user interacts with an actionable region, a more detailed pictorial from the hierarchy is presented representing that portion of the object, along with real-time feedback in the form of a popup pane containing information about that region, and counts-by-type reflecting the number of items that are available within the system associated with the specific context and search filters established at that point in time
Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation
This paper surveys the current state of the art in Natural Language
Generation (NLG), defined as the task of generating text or speech from
non-linguistic input. A survey of NLG is timely in view of the changes that the
field has undergone over the past decade or so, especially in relation to new
(usually data-driven) methods, as well as new applications of NLG technology.
This survey therefore aims to (a) give an up-to-date synthesis of research on
the core tasks in NLG and the architectures adopted in which such tasks are
organised; (b) highlight a number of relatively recent research topics that
have arisen partly as a result of growing synergies between NLG and other areas
of artificial intelligence; (c) draw attention to the challenges in NLG
evaluation, relating them to similar challenges faced in other areas of Natural
Language Processing, with an emphasis on different evaluation methods and the
relationships between them.Comment: Published in Journal of AI Research (JAIR), volume 61, pp 75-170. 118
pages, 8 figures, 1 tabl
Fine Art Pattern Extraction and Recognition
This is a reprint of articles from the Special Issue published online in the open access journal Journal of Imaging (ISSN 2313-433X) (available at: https://www.mdpi.com/journal/jimaging/special issues/faper2020)
An Exploratory Study of Word-Scale Graphics in Data-Rich Text Documents
International audienceWe contribute an investigation of the design and function of word-scale graphics and visualizations embedded in text documents. Word-scale graphics include both data-driven representations such as word-scale visualizations and sparklines, and non-data-driven visual marks. Their design, function, and use has so far received little research attention. We present the results of an open ended exploratory study with 9 graphic designers. The study resulted in a rich collection of different types of graphics, data provenance, and relationships between text, graphics, and data. Based on this corpus, we present a systematic overview of word-scale graphic designs, and examine how designers used them. We also discuss the designersā goals in creating their graphics, and characterize how they used word-scale graphics to visualize data, add emphasis, and create alternative narratives. Building on these examples, we discuss implications for the design of authoring tools for word-scale graphics and visualizations, and explore how new authoring environments could make it easier for designers to integrate them into documents
Learning visual representations with neural networks for video captioning and image generation
La recherche sur les reĢseaux de neurones a permis de reĢaliser de larges progreĢs durant la dernieĢre deĢcennie. Non seulement les reĢseaux de neurones ont eĢteĢ appliqueĢs avec succeĢs pour reĢsoudre des probleĢmes de plus en plus complexes; mais ils sont aussi devenus lāapproche dominante dans les domaines ouĢ ils ont eĢteĢ testeĢs tels que la compreĢhension du langage, les agents jouant aĢ des jeux de manieĢre automatique ou encore la vision par ordinateur, graĢce aĢ leurs capaciteĢs calculatoires et leurs efficaciteĢs statistiques.
La preĢsente theĢse eĢtudie les reĢseaux de neurones appliqueĢs aĢ des probleĢmes en vision par ordinateur, ouĢ les repreĢsentations seĢmantiques abstraites jouent un roĢle fondamental. Nous deĢmontrerons, aĢ la fois par la theĢorie et par lāexpeĢrimentation, la capaciteĢ des reĢseaux de neurones aĢ apprendre de telles repreĢsentations aĢ partir de donneĢes, avec ou sans supervision.
Le contenu de la theĢse est diviseĢ en deux parties. La premieĢre partie eĢtudie les reĢseaux de neurones appliqueĢs aĢ la description de videĢo en langage naturel, neĢcessitant lāapprentissage de repreĢsentation visuelle. Le premier modeĢle proposeĢ permet dāavoir une attention dynamique sur les diffeĢrentes trames de la videĢo lors de la geĢneĢration de la description textuelle pour de courtes videĢos. Ce modeĢle est ensuite ameĢlioreĢ par lāintroduction dāune opeĢration de convolution reĢcurrente. Par la suite, la dernieĢre section de cette partie identifie un probleĢme fondamental dans la description de videĢo en langage naturel et propose un nouveau type de meĢtrique dāeĢvaluation qui peut eĢtre utiliseĢ empiriquement comme un oracle afin dāanalyser les performances de modeĢles concernant cette taĢche.
La deuxieĢme partie se concentre sur lāapprentissage non-superviseĢ et eĢtudie une famille de modeĢles capables de geĢneĢrer des images. En particulier, lāaccent est mis sur les āNeural Autoregressive Density Estimators (NADEs), une famille de modeĢles probabilistes pour les images naturelles. Ce travail met tout dāabord en eĢvidence une connection entre les modeĢles NADEs et les reĢseaux stochastiques geĢneĢratifs (GSN). De plus, une ameĢlioration des modeĢles NADEs standards est proposeĢe. DeĢnommeĢs NADEs iteĢratifs, cette ameĢlioration introduit plusieurs iteĢrations lors de lāinfeĢrence du modeĢle NADEs tout en preĢservant son nombre de parameĢtres.
DeĢbutant par une revue chronologique, ce travail se termine par un reĢsumeĢ des reĢcents deĢveloppements en lien avec les contributions preĢsenteĢes dans les deux parties principales, concernant les probleĢmes dāapprentissage de repreĢsentation seĢmantiques pour les images et les videĢos. De prometteuses directions de recherche sont envisageĢes.The past decade has been marked as a golden era of neural network research. Not only have neural networks been successfully applied to solve more and more challenging real- world problems, but also they have become the dominant approach in many of the places where they have been tested. These places include, for instance, language understanding, game playing, and computer vision, thanks to neural networksā superiority in computational efficiency and statistical capacity. This thesis applies neural networks to problems in computer vision where high-level and semantically meaningful representations play a fundamental role. It demonstrates both in theory and in experiment the ability to learn such representations from data with and without supervision. The main content of the thesis is divided into two parts. The first part studies neural networks in the context of learning visual representations for the task of video captioning. Models are developed to dynamically focus on different frames while generating a natural language description of a short video. Such a model is further improved by recurrent convolutional operations. The end of this part identifies fundamental challenges in video captioning and proposes a new type of evaluation metric that may be used experimentally as an oracle to benchmark performance. The second part studies the family of models that generate images. While the first part is supervised, this part is unsupervised. The focus of it is the popular family of Neural Autoregressive Density Estimators (NADEs), a tractable probabilistic model for natural images. This work first makes a connection between NADEs and Generative Stochastic Networks (GSNs). The standard NADE is improved by introducing multiple iterations in its inference without increasing the number of parameters, which is dubbed iterative NADE. With a historical view at the beginning, this work ends with a summary of recent development for work discussed in the first two parts around the central topic of learning visual representations for images and videos. A bright future is envisioned at the end
Deep Neural Networks for Visual Reasoning, Program Induction, and Text-to-Image Synthesis.
Deep neural networks excel at pattern recognition, especially in the setting of large scale supervised learning. A combination of better hardware, more data, and algorithmic improvements have yielded breakthroughs in image classification, speech recognition and other perception problems. The research frontier has shifted towards the weak side of neural networks: reasoning, planning, and (like all machine learning algorithms) creativity. How can we advance along this frontier using the same generic techniques so effective in pattern recognition; i.e. gradient descent with backpropagation? In this thesis I develop neural architectures with new capabilities in visual reasoning, program induction and text-to-image synthesis. I propose two models that disentangle the latent visual factors of variation that give rise to images, and enable analogical reasoning in the latent space. I show how to augment a recurrent network with a memory of programs that enables the learning of compositional structure for more data-efficient and generalizable program induction. Finally, I develop a generative neural network that translates descriptions of birds, flowers and other categories into compelling natural images.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/135763/1/reedscot_1.pd
Neural Language Models for Data-Driven Programming Support
Programming can be hard to learn and master. Search engines and social Q&A websites offer tremendous help to programmers, but great expertise (e.g., āGoogle-fuā) is required to efficiently use these resources and successfully solve complex problems. An integrated system that can recognize a programmerās tasks and provide contextualized solutions is thus desirable, and ideally programmers can interact with the system using natural input channels, in a way similar to how they communicate with a human expert. To enable such an integrated system, neural language models constitute a promising solution. These models encode programming language in the same high-dimensional space with data of other modalities, and can be trained in an end-to-end fashion. By leveraging the massive data about programming knowledge that are available online, including social Q&A websites, tutorials, blogs, and open-source code repositories, we can train neural language models to support a variety of user intentions, including the long-tail ones. We propose three studies related to using neural language models to solve programming problems in practice. First, we introduce CodeMend, an intelligent programming assistant that supports interactive programming. The system employs a bimodal embedding model to encode programming language and natural language in the same vector space. We demonstrate that this model can effectively understand the code context and associate it with user input to suggest relevant code modifications. We also develop novel user interface to render search results in a way that makes the problem solving process more efficient. Second, we propose a deep learning pipeline that converts data visualization images to source code. The pipeline is built by using computer vision techniques and recurrent neural networks, and it supports the user to get source code generated based on visual examples. We develop novel techniques that augment existing a limited set of training samples via code parameterization and random variation. We also propose strategies that can adapt the general-purpose neural language model to fit the task of predicting source code. Third, we introduce LAMVI, a set of visualization tools for diagnosing issues with neural language models. It tracks the ranks of individual candidate outputs for user-selected queries, and supports the exploration of the corresponding hidden-layer activations. It also tracks influential training instances, and provides guidance for taking actions for tuning the model. The system is evaluated on simulated datasets facilitates the user to efficiently adapt mature neural language models to new datasets or new tasks. Collectively, these three components form an integral solution to computer-assisted problem solving for programmers driven by big data, and may have impact on various different domains, including natural language processing, machine learning, software engineering, and interactive data visualization.PHDInformationUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/138509/1/ronxin_1.pd
- ā¦