187 research outputs found

    Visually-Enabled Active Deep Learning for (Geo) Text and Image Classification: A Review

    Get PDF
    This paper investigates recent research on active learning for (geo) text and image classification, with an emphasis on methods that combine visual analytics and/or deep learning. Deep learning has attracted substantial attention across many domains of science and practice, because it can find intricate patterns in big data; but successful application of the methods requires a big set of labeled data. Active learning, which has the potential to address the data labeling challenge, has already had success in geospatial applications such as trajectory classification from movement data and (geo) text and image classification. This review is intended to be particularly relevant for extension of these methods to GISience, to support work in domains such as geographic information retrieval from text and image repositories, interpretation of spatial language, and related geo-semantics challenges. Specifically, to provide a structure for leveraging recent advances, we group the relevant work into five categories: active learning, visual analytics, active learning with visual analytics, active deep learning, plus GIScience and Remote Sensing (RS) using active learning and active deep learning. Each category is exemplified by recent influential work. Based on this framing and our systematic review of key research, we then discuss some of the main challenges of integrating active learning with visual analytics and deep learning, and point out research opportunities from technical and application perspectives-for application-based opportunities, with emphasis on those that address big data with geospatial components

    Annotating speaker stance in discourse:the Brexit Blog Corpus

    Get PDF
    The aim of this study is to explore the possibility of identifying speaker stance in discourse, provide an analytical resource for it and an evaluation of the level of agreement across speakers. We also explore to what extent language users agree about what kind of stances are expressed in natural language use or whether their interpretations diverge. In order to perform this task, a comprehensive cognitive-functional framework of ten stance categories was developed based on previous work on speaker stance in the literature. A corpus of opinionated texts was compiled, the Brexit Blog Corpus (BBC). An analytical protocol and interface (Active Learning and Visual Analytics) for the annotations was set up and the data were independently annotated by two annotators. The annotation procedure, the annotation agreements and the co-occurrence of more than one stance in the utterances are described and discussed. The careful, analytical annotation process has returned satisfactory inter- and intra-annotation agreement scores, resulting in a gold standard corpus, the final version of the BBC

    Detection of Stance-Related Characteristics in Social Media Text

    Get PDF
    In this paper, we present a study for the identification of stance-related features in text data from social media. Based on our previous work on stance and our findings on stance patterns, we detected stance-related characteristics in a data set from Twitter and Facebook. We extracted various corpus-, quantitative- and computational-based features that proved to be significant for six stance categories (contrariety, hypotheticality, necessity, prediction, source of knowledge, and uncertainty), and we tested them in our data set. The results of a preliminary clustering method are presented and discussed as a starting point for future contributions in the field. The results of our experiments showed a strong correlation between different characteristics and stance constructions, which can lead us to a methodology for automatic stance annotation of these data

    Visual Analytics for the Exploratory Analysis and Labeling of Cultural Data

    Get PDF
    Cultural data can come in various forms and modalities, such as text traditions, artworks, music, crafted objects, or even as intangible heritage such as biographies of people, performing arts, cultural customs and rites. The assignment of metadata to such cultural heritage objects is an important task that people working in galleries, libraries, archives, and museums (GLAM) do on a daily basis. These rich metadata collections are used to categorize, structure, and study collections, but can also be used to apply computational methods. Such computational methods are in the focus of Computational and Digital Humanities projects and research. For the longest time, the digital humanities community has focused on textual corpora, including text mining, and other natural language processing techniques. Although some disciplines of the humanities, such as art history and archaeology have a long history of using visualizations. In recent years, the digital humanities community has started to shift the focus to include other modalities, such as audio-visual data. In turn, methods in machine learning and computer vision have been proposed for the specificities of such corpora. Over the last decade, the visualization community has engaged in several collaborations with the digital humanities, often with a focus on exploratory or comparative analysis of the data at hand. This includes both methods and systems that support classical Close Reading of the material and Distant Reading methods that give an overview of larger collections, as well as methods in between, such as Meso Reading. Furthermore, a wider application of machine learning methods can be observed on cultural heritage collections. But they are rarely applied together with visualizations to allow for further perspectives on the collections in a visual analytics or human-in-the-loop setting. Visual analytics can help in the decision-making process by guiding domain experts through the collection of interest. However, state-of-the-art supervised machine learning methods are often not applicable to the collection of interest due to missing ground truth. One form of ground truth are class labels, e.g., of entities depicted in an image collection, assigned to the individual images. Labeling all objects in a collection is an arduous task when performed manually, because cultural heritage collections contain a wide variety of different objects with plenty of details. A problem that arises with these collections curated in different institutions is that not always a specific standard is followed, so the vocabulary used can drift apart from another, making it difficult to combine the data from these institutions for large-scale analysis. This thesis presents a series of projects that combine machine learning methods with interactive visualizations for the exploratory analysis and labeling of cultural data. First, we define cultural data with regard to heritage and contemporary data, then we look at the state-of-the-art of existing visualization, computer vision, and visual analytics methods and projects focusing on cultural data collections. After this, we present the problems addressed in this thesis and their solutions, starting with a series of visualizations to explore different facets of rap lyrics and rap artists with a focus on text reuse. Next, we engage in a more complex case of text reuse, the collation of medieval vernacular text editions. For this, a human-in-the-loop process is presented that applies word embeddings and interactive visualizations to perform textual alignments on under-resourced languages supported by labeling of the relations between lines and the relations between words. We then switch the focus from textual data to another modality of cultural data by presenting a Virtual Museum that combines interactive visualizations and computer vision in order to explore a collection of artworks. With the lessons learned from the previous projects, we engage in the labeling and analysis of medieval illuminated manuscripts and so combine some of the machine learning methods and visualizations that were used for textual data with computer vision methods. Finally, we give reflections on the interdisciplinary projects and the lessons learned, before we discuss existing challenges when working with cultural heritage data from the computer science perspective to outline potential research directions for machine learning and visual analytics of cultural heritage data

    Evaluating stance-annotated sentences from political blogs regarding the Brexit:a quantitative analysis

    Get PDF
    This paper offers a formally driven quantitative analysis of stance-annotated sentences in the Brexit Blog Corpus (BBC). Our goal is to identify features that determine the formal profiles of six stance categories (contrariety, hypotheticality, necessity, prediction, source of knowledge and uncertainty) in a subset of the BBC. The study has two parts: firstly, it examines a large number of formal linguistic features, such as punctuation, words and grammatical categories that occur in the sentences in order to describe the specific characteristics of each category, and secondly, it compares characteristics in the entire data set in order to determine stance similarities in the data set. We show that among the six stance categories in the corpus, contrariety and necessity are the most discriminative ones, with the former using longer sentences, more conjunctions, more repetitions and shorter forms than the sentences expressing other stances. necessity has longer lexical forms but shorter sentences, which are syntactically more complex. We show that stance in our data set is expressed in sentences with around 21 words per sentence. The sentences consist mainly of alphabetical characters forming a varied vocabulary without special forms, such as digits or special characters

    Bayesian Quadrature with Prior Information: Modeling and Policies

    Get PDF
    Quadrature is the problem of estimating intractable integrals. Such integrals regularly arise in engineering and the natural sciences, especially when Bayesian methods are applied; examples include model evidences, normalizing constants and marginal distributions. This dissertation explores Bayesian quadrature, a probabilistic, model-based quadrature method. Specifically, we study different ways in which Bayesian quadrature can be adapted to account for different kinds of prior information one may have about the task. We demonstrate that by taking into account prior knowledge, Bayesian quadrature can outperform commonly used numerical methods that are agnostic to prior knowledge, such as Monte Carlo based integration. We focus on two types of information that are (a) frequently available when faced with an intractable integral and (b) can be (approximately) incorporated into Bayesian quadrature: • Natural bounds on the possible values that the integrand can take, e.g., when the integrand is a probability density function, it must nonnegative everywhere.• Knowledge about how the integral estimate will be used, i.e., for settings where quadrature is a subroutine, different downstream inference tasks can result in different priorities or desiderata for the estimate. These types of prior information are used to inform two aspects of the Bayesian quadrature inference routine: • Modeling: how the belief on the integrand can be tailored to account for the additional information.• Policies: where the integrand will be observed given a constrained budget of observations. This second aspect of Bayesian quadrature, policies for deciding where to observe the integrand, can be framed as an experimental design problem, where an agent must choose locations to evaluate a function of interest so as to maximize some notion of value. We will study the broader area of sequential experimental design, applying ideas from Bayesian decision theory to develop an efficient and nonmyopic policy for general sequential experimental design problems. We consider other sequential experimental design tasks such as Bayesian optimization and active search; in the latter, we focus on facilitating human–computer partnerships with the goal of aiding human agents engaged in data foraging through the use of active search based suggestions and an interactive visual interface. Finally, this dissertation will return to Bayesian quadrature and discuss the batch setting for experimental design, where multiple observations of the function in question are made simultaneously

    Art Museum attendance and the public realm: The agency of visitor information in Tate's organisational practices of making the art museum's audiences

    Get PDF
    This study presents an original contribution to knowledge in its investigation of Tate’s strategic practices of audience, via materially-traced networks of action. In recent years, museological literature has examined issues of access and evaluation, their relation to cultural policy, and the wider framework of value delivery within the public realm. The present study employs ethnographic observation over a fifteen month period, combined with a theoretical approach, to trace and describe the social construction of Tate’s understandings of its audiences. The study provides insights into how the visitor information is generated, distributed, mediated, valued and applied across the various departments of the museum, and in what forms it exerts agency upon the daily practices of the art museum. This study advances understandings of audiences within museological discourse by moving beyond the customary calls for the generation of more data, or improved data-collection methods, to consider the effects of the application of visitor information in the formation of audiences, and the significance of this agency in terms of structures of power

    Research and innovation 2019

    Get PDF
    Research and innovation are two pillars that come together when universities are at stake. The expansion of the frontiers of human knowledge, in all areas and disciplines, is an irrefutable commitment of higher education institutions. Together with public and private entities, they are also committed to promoting knowledge transfer to society and the economy, in the form of new ideas, new products and new processes. Universities are supposed to transform ideas into value for society. To achieve these goals, higher education institutions have to assure their human resources are highly qualified, that they have an adequate atmosphere, that research is of high quality, and finally that adequate interactions take place. At UMinho we have a clear strategy to be an open and permanent space for knowledge production and furtherance of nationally and internationally relevant innovation across different social and economic sectors. For many years, UMinho has adopted the principles of open access and open science. We aim at carrying out our scientific activity and the dissemination of the corresponding results transparently and collaboratively; this implies that researchers, citizens, policymakers, state agencies, companies, and third sector organizations work in close cooperation facing research and innovation processes. We believe this is the shorter way to trigger smart and sustainable growth and qualified job creation. At UMinho, we encourage the coupling between research and education. Our goal is to expand research opportunities and to give our students occasions to experience vibrant research environments, ensuring that learning goes beyond the “common” routines. Joining research and learning processes provides both undergraduate and postgraduate students with opportunities to own their learning process. We believe that research experience has a role to play in improving students’ motivation for learning, in the pursuit of their interests. Doing better science occurs when we make it both more sensitive to the needs of society and also more efficient in what concerns the allocated resources. It is also a question of accountability. This is fundamental for reinforcing society awareness about our contributions to human and social development. Following the 2018 publication, we present here the 2019 edition of Research and Innovation, a series that draws on the outcomes of the activity of the UMinho research and innovation ecosystem. This comprehensive volume gives particular emphasis to the Research Units outcomes, namely in terms of funding, research projects, papers, and the most important achievements; the activity of the Interface Units and Collaborative Laboratories in which UMinho participates is also reported, through their activities and institutional projects, making evident their importance for the continuous growth of our Institution, our region, and our country. Rui Vieira de Castro RectorPublishe
    • …
    corecore