1,485 research outputs found
Depth-Assisted Semantic Segmentation, Image Enhancement and Parametric Modeling
This dissertation addresses the problem of employing 3D depth information on solving a number of traditional challenging computer vision/graphics problems. Humans have the abilities of perceiving the depth information in 3D world, which enable humans to reconstruct layouts, recognize objects and understand the geometric space and semantic meanings of the visual world. Therefore it is significant to explore how the 3D depth information can be utilized by computer vision systems to mimic such abilities of humans. This dissertation aims at employing 3D depth information to solve vision/graphics problems in the following aspects: scene understanding, image enhancements and 3D reconstruction and modeling.
In addressing scene understanding problem, we present a framework for semantic segmentation and object recognition on urban video sequence only using dense depth maps recovered from the video. Five view-independent 3D features that vary with object class are extracted from dense depth maps and used for segmenting and recognizing different object classes in street scene images. We demonstrate a scene parsing algorithm that uses only dense 3D depth information to outperform using sparse 3D or 2D appearance features.
In addressing image enhancement problem, we present a framework to overcome the imperfections of personal photographs of tourist sites using the rich information provided by large-scale internet photo collections (IPCs). By augmenting personal 2D images with 3D information reconstructed from IPCs, we address a number of traditionally challenging image enhancement techniques and achieve high-quality results using simple and robust algorithms.
In addressing 3D reconstruction and modeling problem, we focus on parametric modeling of flower petals, the most distinctive part of a plant. The complex structure, severe occlusions and wide variations make the reconstruction of their 3D models a challenging task. We overcome these challenges by combining data driven modeling techniques with domain knowledge from botany. Taking a 3D point cloud of an input flower scanned from a single view, each segmented petal is fitted with a scale-invariant morphable petal shape model, which is constructed from individually scanned 3D exemplar petals. Novel constraints based on botany studies are incorporated into the fitting process for realistically reconstructing occluded regions and maintaining correct 3D spatial relations.
The main contribution of the dissertation is in the intelligent usage of 3D depth information on solving traditional challenging vision/graphics problems. By developing some advanced algorithms either automatically or with minimum user interaction, the goal of this dissertation is to demonstrate that computed 3D depth behind the multiple images contains rich information of the visual world and therefore can be intelligently utilized to recognize/ understand semantic meanings of scenes, efficiently enhance and augment single 2D images, and reconstruct high-quality 3D models
Recommended from our members
Metabolizing Capital: Writing, Information, and the Biophysical World
While the discipline of rhetoric and composition has looked at a variety of topics related to the materiality of writing, the majority of materialist approaches limit their scope to local, situated writing practices. However, with the spread of digital media and the establishment of a global, networked infrastructure for communication and inscription, the abundant textuality that has emerged in the early 21st century demands that we develop more rigorous materialist approaches to the study and teaching of writing.
This growing textual environment has been called, in popular and academic discourse, Web 2.0—a more “social Web” than its early form in the late 1990s, one that encourages more interaction and collaboration between users. The ethos of sharing that defines Web 2.0 has been celebrated by writing scholars as a qualitatively new public sphere where we are writing and participating more than ever. Yet, underlying our exuberance of Web 2.0 is the problematic assumption that more writing is an intrinsic good. As more writing is produced, the logic goes, the richer the opportunities for human agency. In a world of infinite resources, such a productivist ethos makes sense; but in a world of finite resources, one whose health is intertwined with our global network of writing technologies, unrestrained textual production has become a threat to other human and nonhuman systems.
In this dissertation, I analyze current materialist approaches to writing to theorize how the usefulness of Web 2.0 technologies--and the writing labor they harness—have become necessary agents in the production of capitalist, consumer culture. Drawing on ecological models of writing and supplementing them with Marxian concepts of value, metabolism, and capital circulation, I explore the historical and dialectical relations that have given rise to a new phase of digital culture, one called Web 3.0, where the celebrated use value of Web 2.0 writing is eclipsed by the ascendant exchange value of Big Data--the massive substratum of consumer data that is produced as a by-product of our writing. Because the economic value of user data depends on two critical resources--the labor of our writing and the finite natural resources of the planet—our celebration of the productivity of Web 2.0 is in direct antagonism with other natural systems, including the organic system of the writing body. I conclude with a sequence of writing activities designed to help students foster critical, ecological literacies that will prepare them to grapple with the social and ecological problems emerging in Web 3.
Automatic Image Captioning with Style
This thesis connects two core topics in machine learning, vision
and language. The problem of choice is image caption generation:
automatically constructing natural language descriptions of image
content. Previous research into image caption generation has
focused on generating purely descriptive captions; I focus on
generating visually relevant captions with a distinct linguistic
style. Captions with style have the potential to ease
communication and add a new layer of personalisation.
First, I consider naming variations in image captions, and
propose a method for predicting context-dependent names that
takes into account visual and linguistic information. This method
makes use of a large-scale image caption dataset, which I also
use to explore naming conventions and report naming conventions
for hundreds of animal classes. Next I propose the SentiCap
model, which relies on recent advances in artificial neural
networks to generate visually relevant image captions with
positive or negative sentiment. To balance descriptiveness and
sentiment, the SentiCap model dynamically switches between two
recurrent neural networks, one tuned for descriptive words and
one for sentiment words. As the first published model for
generating captions with sentiment, SentiCap has influenced a
number of subsequent works. I then investigate the sub-task of
modelling styled sentences without images. The specific task
chosen is sentence simplification: rewriting news article
sentences to make them easier to understand.
For this task I design a neural sequence-to-sequence model that
can work with
limited training data, using novel adaptations for word copying
and sharing
word embeddings. Finally, I present SemStyle, a system for
generating visually
relevant image captions in the style of an arbitrary text corpus.
A shared term
space allows a neural network for vision and content planning to
communicate
with a network for styled language generation. SemStyle achieves
competitive
results in human and automatic evaluations of descriptiveness and
style.
As a whole, this thesis presents two complete systems for styled
caption generation that are first of their kind and demonstrate,
for the first time, that automatic style transfer for image
captions is achievable. Contributions also include novel ideas
for object naming and sentence simplification. This thesis opens
up inquiries into highly personalised image captions; large scale
visually grounded concept naming; and more generally, styled text
generation with content control
Analyzing Twitter Feeds to Facilitate Crises Informatics and Disaster Response During Mass Emergencies
It is a common practice these days for general public to use various micro-blogging platforms, predominantly Twitter, to share ideas, opinions and information about things and life. Twitter is also being increasingly used as a popular source of information sharing during natural disasters and mass emergencies to update and communicate the extent of the geographic phenomena, report the affected population and casualties, request or provide volunteering services and to share the status of disaster recovery process initiated by humanitarian-aid and disaster-management organizations. Recent research in this area has affirmed the potential use of such social media data for various disaster response tasks. Even though the availability of social media data is massive, open and free, there is a significant limitation in making sense of this data because of its high volume, variety, velocity, value, variability and veracity. The current work provides a comprehensive framework of text processing and analysis performed on several thousands of tweets shared on Twitter during natural disaster events. Specifically, this work em- ploys state-of-the-art machine learning techniques from natural language processing on tweet content to process the ginormous data generated at the time of disasters. This study shall serve as a basis to provide useful actionable information to the crises management and mitigation teams in planning and preparation of effective disaster response and to facilitate the development of future automated systems for handling crises situations
From rituals to magic: Interactive art and HCI of the past, present, and future
The connection between art and technology is much tighter than is commonly recognized. The emergence of aesthetic computing in the early 2000s has brought renewed focus on this relationship. In this article, we articulate how art and Human–Computer Interaction (HCI) are compatible with each other and actually essential to advance each other in this era, by briefly addressing interconnected components in both areas—interaction, creativity, embodiment, affect, and presence. After briefly introducing the history of interactive art, we discuss how art and HCI can contribute to one another by illustrating contemporary examples of art in immersive environments, robotic art, and machine intelligence in art. Then, we identify challenges and opportunities for collaborative efforts between art and HCI. Finally, we reiterate important implications and pose future directions. This article is intended as a catalyst to facilitate discussions on the mutual benefits of working together in the art and HCI communities. It also aims to provide artists and researchers in this domain with suggestions about where to go next
TALK COMMONSENSE TO ME! ENRICHING LANGUAGE MODELS WITH COMMONSENSE KNOWLEDGE
Human cognition is exciting, it is a mesh up of several neural phenomena which really
strive our ability to constantly reason and infer about the involving world. In cognitive
computer science, Commonsense Reasoning is the terminology given to our ability to
infer uncertain events and reason about Cognitive Knowledge. The introduction of Commonsense
to intelligent systems has been for years desired, but the mechanism for this
introduction remains a scientific jigsaw. Some, implicitly believe language understanding
is enough to achieve some level of Commonsense [90]. In a less common ground, there
are others who think enriching language with Knowledge Graphs might be enough for
human-like reasoning [63], while there are others who believe human-like reasoning can
only be truly captured with symbolic rules and logical deduction powered by Knowledge
Bases, such as taxonomies and ontologies [50]. We focus on Commonsense Knowledge
integration to Language Models, because we believe that this integration is a step towards
a beneficial embedding of Commonsense Reasoning to interactive Intelligent Systems,
such as conversational assistants.
Conversational assistants, such as Alexa from Amazon, are user driven systems. Thus,
giving birth to a more human-like interaction is strongly desired to really capture the
user’s attention and empathy. We believe that such humanistic characteristics can be
leveraged through the introduction of stronger Commonsense Knowledge and Reasoning
to fruitfully engage with users.
To this end, we intend to introduce a new family of models, the Relation-Aware
BART (RA-BART), leveraging language generation abilities of BART [51] with explicit
Commonsense Knowledge extracted from Commonsense Knowledge Graphs to further
extend human capabilities on these models.
We evaluate our model on three different tasks: Abstractive Question Answering, Text
Generation conditioned on certain concepts and aMulti-Choice Question Answering task.
We find out that, on generation tasks, RA-BART outperforms non-knowledge enriched
models, however, it underperforms on the multi-choice question answering task.
Our Project can be consulted in our open source, public GitHub repository (Explicit
Commonsense).A cognição humana é entusiasmante, é uma malha de vários fenómenos neuronais que
nos estimulam vivamente a capacidade de raciocinar e inferir constantemente sobre o
mundo envolvente. Na ciência cognitiva computacional, o raciocínio de senso comum é
a terminologia dada à nossa capacidade de inquirir sobre acontecimentos incertos e de
raciocinar sobre o conhecimento cognitivo. A introdução do senso comum nos sistemas
inteligentes é desejada há anos, mas o mecanismo para esta introdução continua a ser
um quebra-cabeças científico. Alguns acreditam que apenas compreensão da linguagem
é suficiente para alcançar o senso comum [90], num campo menos similar há outros que
pensam que enriquecendo a linguagem com gráfos de conhecimento pode serum caminho
para obter um raciocínio mais semelhante ao ser humano [63], enquanto que há outros
ciêntistas que acreditam que o raciocínio humano só pode ser verdadeiramente capturado
com regras simbólicas e deduções lógicas alimentadas por bases de conhecimento, como
taxonomias e ontologias [50]. Concentramo-nos na integração de conhecimento de censo
comum em Modelos Linguísticos, acreditando que esta integração é um passo no sentido
de uma incorporação benéfica no racíocinio de senso comum em Sistemas Inteligentes
Interactivos, como é o caso dos assistentes de conversação.
Assistentes de conversação, como o Alexa da Amazon, são sistemas orientados aos
utilizadores. Assim, dar origem a uma comunicação mais humana é fortemente desejada
para captar realmente a atenção e a empatia do utilizador. Acreditamos que tais características
humanísticas podem ser alavancadas por meio de uma introdução mais rica de
conhecimento e raciocínio de senso comum de forma a proporcionar uma interação mais
natural com o utilizador.
Para tal, pretendemos introduzir uma nova família de modelos, o Relation-Aware
BART (RA-BART), alavancando as capacidades de geração de linguagem do BART [51]
com conhecimento de censo comum extraído a partir de grafos de conhecimento explícito
de senso comum para alargar ainda mais as capacidades humanas nestes modelos.
Avaliamos o nosso modelo em três tarefas distintas: Respostas a Perguntas Abstratas,
Geração de Texto com base em conceitos e numa tarefa de Resposta a Perguntas de Escolha Múltipla . Descobrimos que, nas tarefas de geração, o RA-BART tem um desempenho
superior aos modelos sem enriquecimento de conhecimento, contudo, tem um
desempenho inferior na tarefa de resposta a perguntas de múltipla escolha.
O nosso Projecto pode ser consultado no nosso repositório GitHub público, de código
aberto (Explicit Commonsense)
Practical deep learning
Deep learning is experiencing a revolution with tremendous progress because of the availability of large datasets and computing resources. The development of deeper and larger neural network models has made significant progress recently in boosting the accuracy of many applications, such as image classification, image captioning, object detection, and language translation. However, despite the opportunities they offer, existing deep learning approaches are impractical for many applications due to the following challenges. Many applications exist with only limited amounts of annotated training data, or the collected labelled training data is too expensive. Such scenarios impose significant drawbacks for deep learning methods, which are not designed for limited data and suffer from performance decay. Especially for generative tasks, because the data for many generative tasks is difficult to obtain from the real world and the results they generate are difficult to control. As deep learning algorithms become more complicated increasing the workload for researchers to train neural network models and manage the life-cycle deep learning workflows, including the model, dataset, and training pipeline, the demand for efficient deep learning development is rising.
Practical deep learning should achieve adequate performance from the limited training data as well as be based on efficient deep learning development processes. In this thesis, we propose several novel methods to improve the practicability of deep generative models and development processes, leading to four contributions. First, we improve the visual quality of synthesising images conditioned on text descriptions without requiring more manual labelled data, which provides controllable generated results using object attribute information from text descriptions. Second, we achieve unsupervised image-to-image translation that synthesises images conditioned on input images without requiring paired images to supervise the training, which provides controllable generated results using semantic visual information from input images. Third, we deliver semantic image synthesis that synthesises images conditioned on both image and text descriptions without requiring ground truth images to supervise the training, which provides controllable generated results using both semantic visual and object attribute information. Fourth, we develop a research-oriented deep learning library called TensorLayer to reduce the workload of researchers for defining models, implementing new layers, and managing the deep learning workflow comprised of the dataset, model, and training pipeline. In 2017, this library has won the best open source software award issued by ACM Multimedia (MM).Open Acces
A survey on knowledge-enhanced multimodal learning
Multimodal learning has been a field of increasing interest, aiming to
combine various modalities in a single joint representation. Especially in the
area of visiolinguistic (VL) learning multiple models and techniques have been
developed, targeting a variety of tasks that involve images and text. VL models
have reached unprecedented performances by extending the idea of Transformers,
so that both modalities can learn from each other. Massive pre-training
procedures enable VL models to acquire a certain level of real-world
understanding, although many gaps can be identified: the limited comprehension
of commonsense, factual, temporal and other everyday knowledge aspects
questions the extendability of VL tasks. Knowledge graphs and other knowledge
sources can fill those gaps by explicitly providing missing information,
unlocking novel capabilities of VL models. In the same time, knowledge graphs
enhance explainability, fairness and validity of decision making, issues of
outermost importance for such complex implementations. The current survey aims
to unify the fields of VL representation learning and knowledge graphs, and
provides a taxonomy and analysis of knowledge-enhanced VL models
NMC Horizon Report: 2017 Higher Education Edition
The NMC Horizon Report > 2017 Higher Education Edition is a collaborative effort between the NMC and the EDUCAUSE Learning Initiative (ELI). This 14th edition describes annual findings from the NMC Horizon Project, an ongoing research project designed to identify and describe emerging technologies likely to have an impact on learning, teaching, and creative inquiry in education. Six key trends, six significant challenges, and six important developments in educational technology are placed directly in the context of their likely impact on the core missions of universities and colleges. The three key sections of this report constitute a reference and straightforward technology-planning guide for educators, higher education leaders, administrators, policymakers, and technologists. It is our hope that this research will help to inform the choices that institutions are making about technology to improve, support, or extend teaching, learning, and creative inquiry in higher education across the globe. All of the topics were selected by an expert panel that represented a range of backgrounds and perspectives
- …