205 research outputs found
Data driven approaches for investigating molecular heterogeneity of the brain
It has been proposed that one of the clearest organizing principles for most sensory systems is the existence of parallel subcircuits and processing streams that form orderly and systematic mappings from stimulus space to neurons. Although the spatial heterogeneity of the early olfactory circuitry has long been recognized, we know comparatively little about the circuits that propagate sensory signals downstream. Investigating the potential modularity of the bulbâs intrinsic circuits proves to be a difficult task as termination patterns of converging projections, as with the bulbâs inputs, are not feasibly realized. Thus, if such circuit motifs exist, their detection essentially relies on identifying differential gene expression, or âmolecular signatures,â that may demarcate functional subregions. With the arrival of comprehensive (whole genome, cellular resolution) datasets in biology and neuroscience, it is now possible for us to carry out large-scale investigations and make particular use of the densely catalogued, whole genome expression maps of the Allen Brain Atlas to carry out systematic investigations of the molecular topography of the olfactory bulbâs intrinsic circuits. To address the challenges associated with high-throughput and high-dimensional datasets, a deep learning approach will form the backbone of our informatic pipeline. In the proposed work, we test the hypothesis that the bulbâs intrinsic circuits are parceled into distinct, parallel modules that can be defined by genome-wide patterns of expression. In pursuit of this aim, our deep learning framework will facilitate the group-registration of the mitral cell layers of ~ 50,000 in-situ olfactory bulb circuits to test this hypothesis
Towards Content-based Pixel Retrieval in Revisited Oxford and Paris
This paper introduces the first two pixel retrieval benchmarks. Pixel
retrieval is segmented instance retrieval. Like semantic segmentation extends
classification to the pixel level, pixel retrieval is an extension of image
retrieval and offers information about which pixels are related to the query
object. In addition to retrieving images for the given query, it helps users
quickly identify the query object in true positive images and exclude false
positive images by denoting the correlated pixels. Our user study results show
pixel-level annotation can significantly improve the user experience.
Compared with semantic and instance segmentation, pixel retrieval requires a
fine-grained recognition capability for variable-granularity targets. To this
end, we propose pixel retrieval benchmarks named PROxford and PRParis, which
are based on the widely used image retrieval datasets, ROxford and RParis.
Three professional annotators label 5,942 images with two rounds of
double-checking and refinement. Furthermore, we conduct extensive experiments
and analysis on the SOTA methods in image search, image matching, detection,
segmentation, and dense matching using our pixel retrieval benchmarks. Results
show that the pixel retrieval task is challenging to these approaches and
distinctive from existing problems, suggesting that further research can
advance the content-based pixel-retrieval and thus user search experience. The
datasets can be downloaded from
\href{https://github.com/anguoyuan/Pixel_retrieval-Segmented_instance_retrieval}{this
link}
Recommended from our members
Complex Ecologies: Micro-Evidence for Storage Landscapes in Early Bronze Age Lebanon
This dissertation presents the results of an archaeological investigation into the environmental strategies of emergent aggregated societies in coastal Lebanon over the course of the Early Bronze Age (c. 3200-2400 BCE). The Early Bronze Age marked not only the rise of large-scale urbanized polities in neighboring regions of Mesopotamia and, to a lesser extent, the Southern Levant, but it took place during the dramatic climate variability of the Middle Holocene. This dissertation uses the analysis of microbotanical and ground stone tool data to assess agricultural strategies, land use, and plant processing technologies at two settlements along the Lebanese littoral during this time of political and climatic upheaval. By comparing phytolith data, stone tool use-wear and microbotanical residues from grinding tools from the sites of Sidon and Tell Fadous-Kfarabida, this project reconstructs local plant and stone environments and the choices that populations were making about those resources over time. It concludes that selectivity between conservative and innovative plant management technologies allowed these settlements to maintain small-scale local networks built into the landscape and to participate with, while resisting incorporation into, growing urban and state economies nearby
'Doing Food-Knowing Food: An Exploration of Allotment Practices and the Production of Knowledge Through Visceral Engagementâ
The original contribution of this thesis is through its conceptualisations of human more-than- human encounters on the allotment that break down the boundaries of subjectivities. This work extends knowledge of cultural food geography by investigating how people engage with the matter of the plot and learn to grow food. The conceptual tool by which this occurs is set out as processes of visceral learning within a framework of mattering. Therefore this work follows the material transformations of matter across production consumption cycles of allotment produce. This is examined through processes of bodily adaptions to the matter of the plot. The processes of growing your own food affords an opportunity to focus on the processes of doing and becoming, allowing the how of food growing to take centre stage (Crouch 2003, Ingold 2010, Grosz 1999). Procuring and producing food for consumption is enacted through the human more-than-human interface of bodily engagement that disrupts dualisms and revealing their complex inter-relationships, as well as the potential of visceral research (Roe 2006, Whatmore 2006, Hayes-Conroy 2008). Therefore, this is an immersive account of the procurement of food and the development of food knowledge through material, sensory and visceral becomings, which occur within a contextual frame of everyday food experiences.
This study is contextualised in the complexities of contemporary food issues where matters of access, foodism and sustainability shape the enquiry. However the research is carried out at a micro-geographies lens of bodily engagements with food matter through grow your own practices on allotments. Growing food on new allotments is the locus of procurement reflecting a resurgence in such activities following from the recent rise in interest in local food, alternative food networks (AFNs) and food as a conduit for celebrity in the media (Dupuis & Goodman 2005, Lockie & Kitto 2000, Winter 2003). Moreover, the current spread of the allotment is examined as transgressing urban/rural divides and disrupting traditional perceptions of plot users. This allows investigations into spaces where community processes can unfold, providing a richly observed insight into the broadened demographics of recent allotment life
Unsupervised quantification of entity consistency between photos and text in real-world news
Das World Wide Web und die sozialen Medien ĂŒbernehmen im heutigen Informationszeitalter eine wichtige Rolle fĂŒr die Vermittlung von Nachrichten und Informationen. In der Regel werden verschiedene ModalitĂ€ten im Sinne der Informationskodierung wie beispielsweise Fotos und Text verwendet, um Nachrichten effektiver zu vermitteln oder Aufmerksamkeit zu erregen. Kommunikations- und Sprachwissenschaftler erforschen das komplexe Zusammenspiel zwischen ModalitĂ€ten seit Jahrzehnten und haben unter Anderem untersucht, wie durch die Kombination der ModalitĂ€ten zusĂ€tzliche Informationen oder eine neue Bedeutungsebene entstehen können. Die Anzahl gemeinsamer Konzepte oder EntitĂ€ten (beispielsweise Personen, Orte und Ereignisse) zwischen Fotos und Text stellen einen wichtigen Aspekt fĂŒr die Bewertung der Gesamtaussage und Bedeutung eines multimodalen Artikels dar. Automatisierte AnsĂ€tze zur Quantifizierung von Bild-Text-Beziehungen können fĂŒr zahlreiche Anwendungen eingesetzt werden. Sie ermöglichen beispielsweise eine effiziente Exploration von Nachrichten, erleichtern die semantische Suche von Multimedia-Inhalten in (Web)-Archiven oder unterstĂŒtzen menschliche Analysten bei der Evaluierung der GlaubwĂŒrdigkeit von Nachrichten. Allerdings gibt es bislang nur wenige AnsĂ€tze, die sich mit der Quantifizierung von Beziehungen zwischen Fotos und Text beschĂ€ftigen. Diese AnsĂ€tze berĂŒcksichtigen jedoch nicht explizit die intermodalen Beziehungen von EntitĂ€ten, welche eine wichtige Rolle in Nachrichten darstellen, oder basieren auf ĂŒberwachten multimodalen Deep-Learning-Techniken. Diese ĂŒberwachten Lernverfahren können ausschlieĂlich die intermodalen Beziehungen von EntitĂ€ten detektieren, die in annotierten Trainingsdaten enthalten sind. Um diese ForschungslĂŒcke zu schlieĂen, wird in dieser Arbeit ein unĂŒberwachter Ansatz zur Quantifizierung der intermodalen Konsistenz von EntitĂ€ten zwischen Fotos und Text in realen multimodalen Nachrichtenartikeln vorgestellt.
Im ersten Teil dieser Arbeit werden neuartige Verfahren auf Basis von Deep Learning zur Extrahierung von Informationen aus Fotos vorgestellt, um Ereignisse (Events), Orte, Zeitangaben und Personen automatisch zu erkennen. Diese Verfahren bilden eine wichtige Voraussetzung, um die Beziehungen von EntitĂ€ten zwischen Bild und Text zu bewerten. ZunĂ€chst wird ein Ansatz zur Ereignisklassifizierung prĂ€sentiert, der neuartige Optimierungsfunktionen und Gewichtungsschemata nutzt um Ontologie-Informationen aus einer Wissensdatenbank in ein Deep-Learning-Verfahren zu integrieren. Das Training erfolgt anhand eines neu vorgestellten Datensatzes, der 570.540 Fotos und eine Ontologie mit 148 Ereignistypen enthĂ€lt. Der Ansatz ĂŒbertrifft die Ergebnisse von Referenzsystemen die keine strukturierten Ontologie-Informationen verwenden. Weiterhin wird ein DeepLearning-Ansatz zur SchĂ€tzung des Aufnahmeortes von Fotos vorgeschlagen, der Kontextinformationen ĂŒber die Umgebung (Innen-, Stadt-, oder Naturaufnahme) und von Erdpartitionen unterschiedlicher GranularitĂ€t verwendet. Die vorgeschlagene Lösung ĂŒbertrifft die bisher besten Ergebnisse von aktuellen Forschungsarbeiten, obwohl diese deutlich mehr Fotos zum Training verwenden. DarĂŒber hinaus stellen wir den ersten Datensatz zur SchĂ€tzung des Aufnahmejahres von Fotos vor, der mehr als eine Million Bilder aus den Jahren 1930 bis 1999 umfasst. Dieser Datensatz wird fĂŒr das Training von zwei Deep-Learning-AnsĂ€tzen zur SchĂ€tzung des Aufnahmejahres verwendet, welche die Aufgabe als Klassifizierungs- und Regressionsproblem behandeln. Beide AnsĂ€tze erzielen sehr gute Ergebnisse und ĂŒbertreffen Annotationen von menschlichen Probanden. SchlieĂlich wird ein neuartiger Ansatz zur Identifizierung von Personen des öffentlichen Lebens und ihres gemeinsamen Auftretens in Nachrichtenfotos aus der digitalen Bibliothek Internet Archiv prĂ€sentiert. Der Ansatz ermöglicht es unstrukturierte Webdaten aus dem Internet Archiv mit Metadaten, beispielsweise zur semantischen Suche, zu erweitern. Experimentelle Ergebnisse haben die EffektivitĂ€t des zugrundeliegenden Deep-Learning-Ansatzes zur Personenerkennung bestĂ€tigt.
Im zweiten Teil dieser Arbeit wird ein unĂŒberwachtes System zur Quantifizierung von BildText-Beziehungen in realen Nachrichten vorgestellt. Im Gegensatz zu bisherigen Verfahren liefert es automatisch neuartige MaĂe der intermodalen Konsistenz fĂŒr verschiedene EntitĂ€tstypen (Personen, Orte und Ereignisse) sowie den Gesamtkontext. Das System ist nicht auf vordefinierte DatensĂ€tze angewiesen, und kann daher mit der Vielzahl und DiversitĂ€t von EntitĂ€ten und Themen in Nachrichten umgehen. Zur Extrahierung von EntitĂ€ten aus dem Text werden geeignete Methoden der natĂŒrlichen Sprachverarbeitung eingesetzt. Examplarbilder fĂŒr diese EntitĂ€ten werden automatisch aus dem Internet beschafft. Die vorgeschlagenen Methoden zur Informationsextraktion aus Fotos werden auf die Nachrichten- und heruntergeladenen Exemplarbilder angewendet, um die intermodale Konsistenz von EntitĂ€ten zu quantifizieren. Es werden zwei Aufgaben untersucht um die QualitĂ€t des vorgeschlagenen Ansatzes in realen Anwendungen zu bewerten. Experimentelle Ergebnisse fĂŒr die Dokumentverifikation und die Beschaffung von Nachrichten mit geringer (potenzielle Fehlinformation) oder hoher multimodalen Konsistenz zeigen den Nutzen und das Potenzial des Ansatzes zur UnterstĂŒtzung menschlicher Analysten bei der Untersuchung von Nachrichten.In todayâs information age, the World Wide Web and social media are important sources for news and information. Different modalities (in the sense of information encoding) such as photos and text are typically used to communicate news more effectively or to attract attention. Communication scientists, linguists, and semioticians have studied the complex interplay between modalities for decades and investigated, e.g., how their combination can carry additional information or add a new level of meaning. The number of shared concepts or entities (e.g., persons, locations, and events) between photos and text is an important aspect to evaluate the overall message and meaning of an article. Computational models for the quantification of image-text relations can enable many applications. For example, they allow for more efficient exploration of news, facilitate semantic search and multimedia retrieval in large (web) archives, or assist human assessors in evaluating news for credibility. To date, only a few approaches have been suggested that quantify relations between photos and text. However, they either do not explicitly consider the cross-modal relations of entities â which are important in the news â or rely on supervised deep learning approaches that can only detect the cross-modal presence of entities covered in the labeled training data. To address this research gap, this thesis proposes an unsupervised approach that can quantify entity consistency between photos and text in multimodal real-world news articles.
The first part of this thesis presents novel approaches based on deep learning for information extraction from photos to recognize events, locations, dates, and persons. These approaches are an important prerequisite to measure the cross-modal presence of entities in text and photos. First, an ontology-driven event classification approach that leverages new loss functions and weighting schemes is presented. It is trained on a novel dataset of 570,540 photos and an ontology with 148 event types. The proposed system outperforms approaches that do not use structured ontology information. Second, a novel deep learning approach for geolocation estimation is proposed that uses additional contextual information on the environmental setting (indoor, urban, natural) and from earth partitions of different granularity. The proposed solution outperforms state-of-the-art approaches, which are trained with significantly more photos. Third, we introduce the first large-scale dataset for date estimation with more than one million photos taken between 1930 and 1999, along with two deep learning approaches that treat date estimation as a classification and regression problem. Both approaches achieve very good results that are superior to human annotations. Finally, a novel approach is presented that identifies public persons and their co-occurrences in news photos extracted from the Internet Archive, which collects time-versioned snapshots of web pages that are rarely enriched with metadata relevant to multimedia retrieval. Experimental results confirm the effectiveness of the deep learning approach for person identification.
The second part of this thesis introduces an unsupervised approach capable of quantifying image-text relations in real-world news. Unlike related work, the proposed solution automatically provides novel measures of cross-modal consistency for different entity types (persons, locations, and events) as well as the overall context. The approach does not rely on any predefined datasets to cope with the large amount and diversity of entities and topics covered in the news. State-of-the-art tools for natural language processing are applied to extract named entities from the text. Example photos for these entities are automatically crawled from the Web. The proposed methods for information extraction from photos are applied to both news images and example photos to quantify the cross-modal consistency of entities. Two tasks are introduced to assess the quality of the proposed approach in real-world applications. Experimental results for document verification and retrieval of news with either low (potential misinformation) or high cross-modal similarities demonstrate the feasibility of the approach and its potential to support human assessors to study news
The development and application of the use of encased voids within the body of glass artefacts as a means of drawing and expression
This practice -led thesis is based on a study of the use of encased voids or bubbles in
glass. The study is grounded in practice and draws out through antecedents in
philosophy, psychology and epistemology, a methodology called Reflective Risk. It
shows that through a rigorous analysis of practice, using video and personal
reflection that new insights emerge. The study is framed by craft practice (the word
craft here used as a collection of 'genre' of which glass is part). The thesis uses
experiential learning as a tool and a means of understanding the practice of creating
and controlling encased voids in glass in the context of contemporary applied arts
practice. The framework, Reflective Risk, is constructivist in approach. It is based on
Experiential Learning Theory (ELT), but it also draws on epistemological theories of
tacit knowledge. The thesis shows that through an understanding of technique and
material qualities, process can be deconstructed to reveal new insights. The thesis
documents how an understanding ELT and a range of self- regulatory antecedents
can influence the cognitive process of craft practice through praxis. The results of
this study, on the one hand, are directed to glass practitioners and on the other, to
provide a theoretical approach appropriate for the reflective practitioner working in
other media by adopting a parallel method of enquiry
TextâtoâVideo: Image Semantics and NLP
When aiming at automatically translating an arbitrary text into a visual story, the main challenge consists in finding a semantically close visual representation whereby the displayed meaning should remain the same as in the given text. Besides, the appearance of an image itself largely influences how its meaningful information is transported towards an observer. This thesis now demonstrates that investigating in both, image semantics as well as the semantic relatedness between visual and textual sources enables us to tackle the challenging semantic gap and to find a semantically close translation from natural language to a corresponding visual representation.
Within the last years, social networking became of high interest leading to an enormous and still increasing amount of online available data. Photo sharing sites like Flickr allow users to associate textual information with their uploaded imagery. Thus, this thesis exploits this huge knowledge source of user generated data providing initial links between images and words, and other meaningful data.
In order to approach visual semantics, this work presents various methods to analyze the visual structure as well as the appearance of images in terms of meaningful similarities, aesthetic appeal, and emotional effect towards an observer. In detail, our GPU-based approach efficiently finds visual similarities between images in large datasets across visual domains and identifies various meanings for ambiguous words exploring similarity in online search results. Further, we investigate in the highly subjective aesthetic appeal of images and make use of deep learning to directly learn aesthetic rankings from a broad diversity of user reactions in social online behavior. To gain even deeper insights into the influence of visual appearance towards an observer, we explore how simple image processing is capable of actually changing the emotional perception and derive a simple but effective image filter.
To identify meaningful connections between written text and visual representations, we employ methods from Natural Language Processing (NLP). Extensive textual processing allows us to create semantically relevant illustrations for simple text elements as well as complete storylines. More precisely, we present an approach that resolves dependencies in textual descriptions to arrange 3D models correctly. Further, we develop a method that finds semantically relevant illustrations to texts of different types based on a novel hierarchical querying algorithm. Finally, we present an optimization based framework that is capable of not only generating semantically relevant but also visually coherent picture stories in different styles.Bei der automatischen Umwandlung eines beliebigen Textes in eine visuelle Geschichte, besteht die gröĂte Herausforderung darin eine semantisch passende visuelle Darstellung zu finden. Dabei sollte die Bedeutung der Darstellung dem vorgegebenen Text entsprechen. DarĂŒber hinaus hat die Erscheinung eines Bildes einen groĂen EinfluĂ darauf, wie seine bedeutungsvollen Inhalte auf einen Betrachter ĂŒbertragen werden. Diese Dissertation zeigt, dass die Erforschung sowohl der Bildsemantik als auch der semantischen Verbindung zwischen visuellen und textuellen Quellen es ermöglicht, die anspruchsvolle semantische LĂŒcke zu schlieĂen und eine semantisch nahe Ăbersetzung von natĂŒrlicher Sprache in eine entsprechend sinngemĂ€Ăe visuelle Darstellung zu finden.
Des Weiteren gewann die soziale Vernetzung in den letzten Jahren zunehmend an Bedeutung, was zu einer enormen und immer noch wachsenden Menge an online verfĂŒgbaren Daten gefĂŒhrt hat. Foto-Sharing-Websites wie Flickr ermöglichen es Benutzern, Textinformationen mit ihren hochgeladenen Bildern zu verknĂŒpfen. Die vorliegende Arbeit nutzt die enorme Wissensquelle von benutzergenerierten Daten welche erste Verbindungen zwischen Bildern und Wörtern sowie anderen aussagekrĂ€ftigen Daten zur VerfĂŒgung stellt.
Zur Erforschung der visuellen Semantik stellt diese Arbeit unterschiedliche Methoden vor, um die visuelle Struktur sowie die Wirkung von Bildern in Bezug auf bedeutungsvolle Ăhnlichkeiten, Ă€sthetische Erscheinung und emotionalem Einfluss auf einen Beobachter zu analysieren. Genauer gesagt, findet unser GPU-basierter Ansatz effizient visuelle Ăhnlichkeiten zwischen Bildern in groĂen Datenmengen quer ĂŒber visuelle DomĂ€nen hinweg und identifiziert verschiedene Bedeutungen fĂŒr mehrdeutige Wörter durch die Erforschung von Ăhnlichkeiten in Online-Suchergebnissen. Des Weiteren wird die höchst subjektive Ă€sthetische Anziehungskraft von Bildern untersucht und "deep learning" genutzt, um direkt Ă€sthetische Einordnungen aus einer breiten Vielfalt von Benutzerreaktionen im sozialen Online-Verhalten zu lernen. Um noch tiefere Erkenntnisse ĂŒber den Einfluss des visuellen Erscheinungsbildes auf einen Betrachter zu gewinnen, wird erforscht, wie alleinig einfache Bildverarbeitung in der Lage ist, tatsĂ€chlich die emotionale Wahrnehmung zu verĂ€ndern und ein einfacher aber wirkungsvoller Bildfilter davon abgeleitet werden kann.
Um bedeutungserhaltende Verbindungen zwischen geschriebenem Text und visueller Darstellung zu ermitteln, werden Methoden des "Natural Language Processing (NLP)" verwendet, die der Verarbeitung natĂŒrlicher Sprache dienen. Der Einsatz umfangreicher Textverarbeitung ermöglicht es, semantisch relevante Illustrationen fĂŒr einfache Textteile sowie fĂŒr komplette HandlungsstrĂ€nge zu erzeugen. Im Detail wird ein Ansatz vorgestellt, der AbhĂ€ngigkeiten in Textbeschreibungen auflöst, um 3D-Modelle korrekt anzuordnen. Des Weiteren wird eine Methode entwickelt die, basierend auf einem neuen hierarchischen Such-Anfrage Algorithmus, semantisch relevante Illustrationen zu Texten verschiedener Art findet. SchlieĂlich wird ein optimierungsbasiertes Framework vorgestellt, das nicht nur semantisch relevante, sondern auch visuell kohĂ€rente Bildgeschichten in verschiedenen Bildstilen erzeugen kann
The integrated sound, space and movement environment : The uses of analogue and digital technologies to correlate topographical and gestural movement with sound
This thesis investigates correlations between auditory parameters and parameters associated with movement in a sensitised space. The research examines those aspects of sound that form correspondences with movement, force or position of a body or bodies in a space sensitised by devices for acquiring gestural or topographical data. A wide range of digital technologies are scrutinised to establish what the most effective technologies are in order to achieve detailed and accurate information about movement in a given space, and the methods and procedures for analysis, transposition and synthesis into sound. The thesis describes pertinent work in the field from the last 20 years, the issues that have been raised in those works and issues raised by my work in the area. The thesis draws conclusions that point to further development of an integrated model of a space that is sensitised to movement, and responds in sound in such a way that it can be appreciated by performers and audiences. The artistic and research practices that are cited, are principally from the areas of danceand- technology, sound installation and alternative gestural controllers for musical applications
- âŠ