Search CORE

146 research outputs found

Contributions in image and video coding

Author: Testoni Vanessa
Publication venue: [s.n.]
Publication date: 19/08/2018
Field of study

Orientador: Max Henrique Machado CostaTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: A comunidade de codificação de imagens e vídeo vem também trabalhando em inovações que vão além das tradicionais técnicas de codificação de imagens e vídeo. Este trabalho é um conjunto de contribuições a vários tópicos que têm recebido crescente interesse de pesquisadores na comunidade, nominalmente, codificação escalável, codificação de baixa complexidade para dispositivos móveis, codificação de vídeo de múltiplas vistas e codificação adaptativa em tempo real. A primeira contribuição estuda o desempenho de três transformadas 3-D rápidas por blocos em um codificador de vídeo de baixa complexidade. O codificador recebeu o nome de Fast Embedded Video Codec (FEVC). Novos métodos de implementação e ordens de varredura são propostos para as transformadas. Os coeficiente 3-D são codificados por planos de bits pelos codificadores de entropia, produzindo um fluxo de bits (bitstream) de saída totalmente embutida. Todas as implementações são feitas usando arquitetura com aritmética inteira de 16 bits. Somente adições e deslocamentos de bits são necessários, o que reduz a complexidade computacional. Mesmo com essas restrições, um bom desempenho em termos de taxa de bits versus distorção pôde ser obtido e os tempos de codificação são significativamente menores (em torno de 160 vezes) quando comparados ao padrão H.264/AVC. A segunda contribuição é a otimização de uma recente abordagem proposta para codificação de vídeo de múltiplas vistas em aplicações de video-conferência e outras aplicações do tipo "unicast" similares. O cenário alvo nessa abordagem é fornecer vídeo com percepção real em 3-D e ponto de vista livre a boas taxas de compressão. Para atingir tal objetivo, pesos são atribuídos a cada vista e mapeados em parâmetros de quantização. Neste trabalho, o mapeamento ad-hoc anteriormente proposto entre pesos e parâmetros de quantização é mostrado ser quase-ótimo para uma fonte Gaussiana e um mapeamento ótimo é derivado para fonte típicas de vídeo. A terceira contribuição explora várias estratégias para varredura adaptativa dos coeficientes da transformada no padrão JPEG XR. A ordem de varredura original, global e adaptativa do JPEG XR é comparada com os métodos de varredura localizados e híbridos propostos neste trabalho. Essas novas ordens não requerem mudanças nem nos outros estágios de codificação e decodificação, nem na definição da bitstream A quarta e última contribuição propõe uma transformada por blocos dependente do sinal. As transformadas hierárquicas usualmente exploram a informação residual entre os níveis no estágio da codificação de entropia, mas não no estágio da transformada. A transformada proposta neste trabalho é uma técnica de compactação de energia que também explora as similaridades estruturais entre os níveis de resolução. A idéia central da técnica é incluir na transformada hierárquica um número de funções de base adaptativas derivadas da resolução menor do sinal. Um codificador de imagens completo foi desenvolvido para medir o desempenho da nova transformada e os resultados obtidos são discutidos neste trabalhoAbstract: The image and video coding community has often been working on new advances that go beyond traditional image and video architectures. This work is a set of contributions to various topics that have received increasing attention from researchers in the community, namely, scalable coding, low-complexity coding for portable devices, multiview video coding and run-time adaptive coding. The first contribution studies the performance of three fast block-based 3-D transforms in a low complexity video codec. The codec has received the name Fast Embedded Video Codec (FEVC). New implementation methods and scanning orders are proposed for the transforms. The 3-D coefficients are encoded bit-plane by bit-plane by entropy coders, producing a fully embedded output bitstream. All implementation is performed using 16-bit integer arithmetic. Only additions and bit shifts are necessary, thus lowering computational complexity. Even with these constraints, reasonable rate versus distortion performance can be achieved and the encoding time is significantly smaller (around 160 times) when compared to the H.264/AVC standard. The second contribution is the optimization of a recent approach proposed for multiview video coding in videoconferencing applications or other similar unicast-like applications. The target scenario in this approach is providing realistic 3-D video with free viewpoint video at good compression rates. To achieve such an objective, weights are computed for each view and mapped into quantization parameters. In this work, the previously proposed ad-hoc mapping between weights and quantization parameters is shown to be quasi-optimum for a Gaussian source and an optimum mapping is derived for a typical video source. The third contribution exploits several strategies for adaptive scanning of transform coefficients in the JPEG XR standard. The original global adaptive scanning order applied in JPEG XR is compared with the localized and hybrid scanning methods proposed in this work. These new orders do not require changes in either the other coding and decoding stages or in the bitstream definition. The fourth and last contribution proposes an hierarchical signal dependent block-based transform. Hierarchical transforms usually exploit the residual cross-level information at the entropy coding step, but not at the transform step. The transform proposed in this work is an energy compaction technique that can also exploit these cross-resolution-level structural similarities. The core idea of the technique is to include in the hierarchical transform a number of adaptive basis functions derived from the lower resolution of the signal. A full image codec is developed in order to measure the performance of the new transform and the obtained results are discussed in this workDoutoradoTelecomunicações e TelemáticaDoutor em Engenharia Elétric

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio da Producao Cientifica e Intelectual da Unicamp

Key-frame Analysis and Extraction for Automatic Summarization of Real-time Videos

Author: Kannappan Sivapriyaa
Publication venue
Publication date: 01/01/2019
Field of study

Aberystwyth Research Portal

Holistic methods for visual navigation of mobile robots in outdoor environments

Author: Differt Dario
Publication venue: Universität Bielefeld
Publication date: 01/01/2017
Field of study

Differt D. Holistic methods for visual navigation of mobile robots in outdoor environments. Bielefeld: Universität Bielefeld; 2017

Publications at Bielefeld University

Complex queries and complex data

Author: Niedermayer Johannes
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 30/10/2015
Field of study

With the widespread availability of wearable computers, equipped with sensors such as GPS or cameras, and with the ubiquitous presence of micro-blogging platforms, social media sites and digital marketplaces, data can be collected and shared on a massive scale. A necessary building block for taking advantage from this vast amount of information are efficient and effective similarity search algorithms that are able to find objects in a database which are similar to a query object. Due to the general applicability of similarity search over different data types and applications, the formalization of this concept and the development of strategies for evaluating similarity queries has evolved to an important field of research in the database community, spatio-temporal database community, and others, such as information retrieval and computer vision. This thesis concentrates on a special instance of similarity queries, namely k-Nearest Neighbor (kNN) Queries and their close relative, Reverse k-Nearest Neighbor (RkNN) Queries. As a first contribution we provide an in-depth analysis of the RkNN join. While the problem of reverse nearest neighbor queries has received a vast amount of research interest, the problem of performing such queries in a bulk has not seen an in-depth analysis so far. We first formalize the RkNN join, identifying its monochromatic and bichromatic versions and their self-join variants. After pinpointing the monochromatic RkNN join as an important and interesting instance, we develop solutions for this class, including a self-pruning and a mutual pruning algorithm. We then evaluate these algorithms extensively on a variety of synthetic and real datasets. From this starting point of similarity queries on certain data we shift our focus to uncertain data, addressing nearest neighbor queries in uncertain spatio-temporal databases. Starting from the traditional definition of nearest neighbor queries and a data model for uncertain spatio-temporal data, we develop efficient query mechanisms that consider temporal dependencies during query evaluation. We define intuitive query semantics, aiming not only at returning the objects closest to the query but also their probability of being a nearest neighbor. After theoretically evaluating these query predicates we develop efficient querying algorithms for the proposed query predicates. Given the findings of this research on nearest neighbor queries, we extend these results to reverse nearest neighbor queries. Finally we address the problem of querying large datasets containing set-based objects, namely image databases, where images are represented by (multi-)sets of vectors and additional metadata describing the position of features in the image. We aim at reducing the number of kNN queries performed during query processing and evaluate a modified pipeline that aims at optimizing the query accuracy at a small number of kNN queries. Additionally, as feature representations in object recognition are moving more and more from the real-valued domain to the binary domain, we evaluate efficient indexing techniques for binary feature vectors.Nicht nur durch die Verbreitung von tragbaren Computern, die mit einer Vielzahl von Sensoren wie GPS oder Kameras ausgestattet sind, sondern auch durch die breite Nutzung von Microblogging-Plattformen, Social-Media Websites und digitale Marktplätze wie Amazon und Ebay wird durch die User eine gigantische Menge an Daten veröffentlicht. Um aus diesen Daten einen Mehrwert erzeugen zu können bedarf es effizienter und effektiver Algorithmen zur Ähnlichkeitssuche, die zu einem gegebenen Anfrageobjekt ähnliche Objekte in einer Datenbank identifiziert. Durch die Allgemeinheit dieses Konzeptes der Ähnlichkeit über unterschiedliche Datentypen und Anwendungen hinweg hat sich die Ähnlichkeitssuche zu einem wichtigen Forschungsfeld, nicht nur im Datenbankumfeld oder im Bereich raum-zeitlicher Datenbanken, sondern auch in anderen Forschungsgebieten wie dem Information Retrieval oder dem Maschinellen Sehen entwickelt. In der vorliegenden Arbeit beschäftigen wir uns mit einem speziellen Anfrageprädikat im Bereich der Ähnlichkeitsanfragen, mit k-nächste Nachbarn (kNN) Anfragen und ihrem Verwandten, den Revers k-nächsten Nachbarn (RkNN) Anfragen. In einem ersten Beitrag analysieren wir den RkNN Join. Obwohl das Problem von reverse nächsten Nachbar Anfragen in den letzten Jahren eine breite Aufmerksamkeit in der Forschungsgemeinschaft erfahren hat, wurde das Problem eine Menge von RkNN Anfragen gleichzeitig auszuführen nicht ausreichend analysiert. Aus diesem Grund formalisieren wir das Problem des RkNN Joins mit seinen monochromatischen und bichromatischen Varianten. Wir identifizieren den monochromatischen RkNN Join als einen wichtigen und interessanten Fall und entwickeln entsprechende Anfragealgorithmen. In einer detaillierten Evaluation vergleichen wir die ausgearbeiteten Verfahren auf einer Vielzahl von synthetischen und realen Datensätzen. Nach diesem Kapitel über Ähnlichkeitssuche auf sicheren Daten konzentrieren wir uns auf unsichere Daten, speziell im Bereich raum-zeitlicher Datenbanken. Ausgehend von der traditionellen Definition von Nachbarschaftsanfragen und einem Datenmodell für unsichere raum-zeitliche Daten entwickeln wir effiziente Anfrageverfahren, die zeitliche Abhängigkeiten bei der Anfragebearbeitung beachten. Zu diesem Zweck definieren wir Anfrageprädikate die nicht nur die Objekte zurückzugeben, die dem Anfrageobjekt am nächsten sind, sondern auch die Wahrscheinlichkeit mit der sie ein nächster Nachbar sind. Wir evaluieren die definierten Anfrageprädikate theoretisch und entwickeln effiziente Anfragestrategien, die eine Anfragebearbeitung zu vertretbaren Laufzeiten gewährleisten. Ausgehend von den Ergebnissen für Nachbarschaftsanfragen erweitern wir unsere Ergebnisse auf Reverse Nachbarschaftsanfragen. Zuletzt behandeln wir das Problem der Anfragebearbeitung bei Mengen-basierten Objekten, die zum Beispiel in Bilddatenbanken Verwendung finden: Oft werden Bilder durch eine Menge von Merkmalsvektoren und zusätzliche Metadaten (zum Beispiel die Position der Merkmale im Bild) dargestellt. Wir evaluieren eine modifizierte Pipeline, die darauf abzielt, die Anfragegenauigkeit bei einer kleinen Anzahl an kNN-Anfragen zu maximieren. Da reellwertige Merkmalsvektoren im Bereich der Objekterkennung immer öfter durch Bitvektoren ersetzt werden, die sich durch einen geringeren Speicherplatzbedarf und höhere Laufzeiteffizienz auszeichnen, evaluieren wir außerdem Indexierungsverfahren für Binärvektoren

A Voxel-Based Approach for Imaging Voids in Three-Dimensional Point Clouds

Author: Salvaggio Katie N
Publication venue: RIT Scholar Works
Publication date: 21/05/2015
Field of study

Geographically accurate scene models have enormous potential beyond that of just simple visualizations in regard to automated scene generation. In recent years, thanks to ever increasing computational efficiencies, there has been significant growth in both the computer vision and photogrammetry communities pertaining to automatic scene reconstruction from multiple-view imagery. The result of these algorithms is a three-dimensional (3D) point cloud which can be used to derive a final model using surface reconstruction techniques. However, the fidelity of these point clouds has not been well studied, and voids often exist within the point cloud. Voids exist in texturally difficult areas, as well as areas where multiple views were not obtained during collection, constant occlusion existed due to collection angles or overlapping scene geometry, or in regions that failed to triangulate accurately. It may be possible to fill in small voids in the scene using surface reconstruction or hole-filling techniques, but this is not the case with larger more complex voids, and attempting to reconstruct them using only the knowledge of the incomplete point cloud is neither accurate nor aesthetically pleasing. A method is presented for identifying voids in point clouds by using a voxel-based approach to partition the 3D space. By using collection geometry and information derived from the point cloud, it is possible to detect unsampled voxels such that voids can be identified. This analysis takes into account the location of the camera and the 3D points themselves to capitalize on the idea of free space, such that voxels that lie on the ray between the camera and point are devoid of obstruction, as a clear line of sight is a necessary requirement for reconstruction. Using this approach, voxels are classified into three categories: occupied (contains points from the point cloud), free (rays from the camera to the point passed through the voxel), and unsampled (does not contain points and no rays passed through the area). Voids in the voxel space are manifested as unsampled voxels. A similar line-of-sight analysis can then be used to pinpoint locations at aircraft altitude at which the voids in the point clouds could theoretically be imaged. This work is based on the assumption that inclusion of more images of the void areas in the 3D reconstruction process will reduce the number of voids in the point cloud that were a result of lack of coverage. Voids resulting from texturally difficult areas will not benefit from more imagery in the reconstruction process, and thus are identified and removed prior to the determination of future potential imaging locations

RIT Scholar Works

Pattern Recognition

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

A wealth of advanced pattern recognition algorithms are emerging from the interdiscipline between technologies of effective visual features and the human-brain cognition process. Effective visual features are made possible through the rapid developments in appropriate sensor equipments, novel filter designs, and viable information processing architectures. While the understanding of human-brain cognition process broadens the way in which the computer can perform pattern recognition tasks. The present book is intended to collect representative researches around the globe focusing on low-level vision, filter design, features and image descriptors, data mining and analysis, and biologically inspired algorithms. The 27 chapters coved in this book disclose recent advances and new ideas in promoting the techniques, technology and applications of pattern recognition

Directory of Open Access Books (DOAB)

Classification of Animal Sound Using Convolutional Neural Network

Author: Singh Neha
Publication venue: Dublin Institute of Technology
Publication date: 01/01/2020
Field of study

Recently, labeling of acoustic events has emerged as an active topic covering a wide range of applications. High-level semantic inference can be conducted based on main audioeffects to facilitate various content-based applications for analysis, efficient recovery and content management. This paper proposes a flexible Convolutional neural network-based framework for animal audio classification. The work takes inspiration from various deep neural network developed for multimedia classification recently. The model is driven by the ideology of identifying the animal sound in the audio file by forcing the network to pay attention to core audio effect present in the audio to generate Mel-spectrogram. The designed framework achieves an accuracy of 98% while classifying the animal audio on weekly labelled datasets. The state-of-the-art in this research is to build a framework which could even run on the basic machine and do not necessarily require high end devices to run the classification

Arrow@TUDublin

Assessing the perceived environment through crowdsourced spatial photo content for application to the fields of landscape and urban planning

Author: Dunkel Alexander
Publication venue
Publication date: 23/06/2016
Field of study

Assessing information on aspects of identification, perception, emotion, and social interaction with respect to the environment is of particular importance to the fields of natural resource management. Our ability to visualize this type of information has rapidly improved with the proliferation of social media sites throughout the Internet in recent years. While many methods to extract information on human behavior from crowdsourced geodata already exist, this work focuses on visualizing landscape perception for application to the fields of landscape and urban planning. Visualization of people’s perceptual responses to landscape is demonstrated with crowdsourced photo geodata from Flickr, a popular photo sharing community. A basic, general method to map, visualize and evaluate perception and perceptual values is proposed. The approach utilizes common tools for spatial knowledge discovery and builds on existing research, but is specifically designed for implementation within the context of landscape perception analysis and particularly suited as a base for further evaluation in multiple scenarios. To demonstrate the process in application, three novel types of visualizations are presented: the mapping of lines of sight in Yosemite Valley, the assessment of landscape change in the area surrounding the High Line in Manhattan, and individual location analysis for Coit Tower in San Francisco. The results suggest that analyzing crowdsourced data may contribute to a more balanced assessment of the perceived landscape, which provides a basis for a better integration of public values into planning processes.:Contents 3 1 Introduction 7 1.1 Motivation 7 1.2 Literature review and conceptual scope 9 1.3 Terminology 11 1.4 Related research 12 1.5 Objectives 14 1.6 Methodology 16 1.7 Formal conventions 21 I. Part I: Conceptual framework 23 1.1 Visual perception 23 1.2 Theory and practice in landscape perception assessment 27 1.2.1 Expert valuation versus participation 27 1.2.2 Photography-based landscape perception assessment 32 1.2.2.1. Photo-based surveys 32 1.2.2.2. Photo-based Internet surveys 35 1.2.2.3. Photo-interviewing and participant photography 37 1.2.3 Conclusions 40 1.3 Conceptual approach 42 1.3.1 A framing theory: Distributed cognition 42 1.3.2 Description of the approach 46 1.3.3 Choosing the right data source 48 1.3.3.1. Availability of crowdsourced and georeferenced photo data 48 1.3.3.2. Suitability for analyzing human behavior and perception 51 1.3.4 Relations between data and the phenomenon under observation 55 1.3.4.1. Photo taking and landscape perception 55 1.3.4.2. User motivation in the context of photo sharing in communities 61 1.3.4.3. Describing and tagging photos: Forms of attributing meaning 66 1.3.5 Considerations for measuring and weighting data 70 1.3.6 Conclusions 77 II. Part II: Application example – Flickr photo analysis and evaluation of results 80 2.1 Software architecture 80 2.2 Materials and methods 86 2.2.1 Data retrieval, initial data structure and overall quantification 86 2.2.2 Global data bias 89 2.2.3 Basic techniques for filtering and classifying data 94 2.2.3.1. Where: photo locations 94 2.2.3.2. Who: user origin 96 2.2.3.3. When: time of photo taking 102 2.2.3.4. What: tag frequency 108 2.2.4 Methods for aggregating data 113 2.2.4.1. Clustering of photo locations 113 2.2.4.2. Clustering of tag locations 115 2.3 Application to planning: techniques for visualizing data 118 2.3.1 Introduction 118 2.3.2 Tag maps 121 2.3.2.1. Description of technique 121 2.3.2.2. Results: San Francisco and Berkeley waterfront 126 2.3.2.3. Results: Berkeley downtown and university campus 129 2.3.2.4. Results: Dresden and the Elbe Valley 132 2.3.2.5. Results: Greater Toronto Area and City of Toronto 136 2.3.2.6. Results: Baden-Württemberg 143 2.3.2.7. Summary 156 2.3.3 Temporal comparison for assessing landscape change 158 2.3.3.1. Description of technique 158 2.3.3.2. Results: The High Line, NY 159 2.3.3.3. Summary 160 2.3.4 Determining lines of sight and important visual connections 161 2.3.4.1. Description of technique 161 2.3.4.2. Results: Yosemite Valley 162 2.3.4.3. Results: Golden Gate and Bay Bridge 167 2.3.4.4. Results: CN Tower, Toronto 168 2.3.4.5. Summary 170 2.3.5 Individual location analysis 171 2.3.5.1. Description of technique 171 2.3.5.2. Results: Coit Tower, San Francisco 171 2.3.5.3. Results: CN Tower, Toronto 172 2.3.5.4. Summary 173 2.4 Quality and accuracy of results 175 2.4.1 Methodology 175 2.4.2 Accuracy of data 175 2.4.3 Validity and reliability of visualizations 178 2.4.3.1. Reliability 178 2.4.3.2. Validity 180 2.5 Implementation example: the London View Framework 181 2.5.1 Description 181 2.5.2 Evaluation methodology 183 2.5.3 Analysis 184 2.5.3.1. Landmarks 184 2.5.3.2. Views 192 2.5.4 Summary 199 III. Discussion 203 3.1 Application of the framework from a wider perspective 203 3.2 Significance of results 204 3.3 Further research 205 3.4 Discussion of workshop results and further feedback 206 3.4.1 Workshops at University of Waterloo and University of Toronto, Canada 206 3.4.2 Workshop at University of Technology Dresden, Germany 209 3.4.3 Feedback from presentations, discussions, exhibitions: second thoughts 210 IV. Conclusions 212 V. References 213 5.1 Literature 213 5.2 List of web references 228 5.3 List of figures 230 5.4 List of tables 234 5.5 List of maps 235 5.6 List of appendices 236 VI. Appendices 237 Als Wahrnehmung wird der Bewusstseinsprozess des subjektiven Verstehens der Umwelt bezeichnet. Grundlage für diesen Prozess ist die Gewinnung von Informationen über die Sinne, also aus visuellen, olfaktorischen, akustischen und anderen Reizen. Die Wahrnehmung ist aber auch wesentlich durch interne Prozesse beeinflusst. Das menschliche Gehirn ist fortlaufend damit beschäftigt, sowohl bewusst als auch unbewusst Sinneswahrnehmungen mit Erinnerungen abzugleichen, zu vereinfachen, zu assoziieren, vorherzusagen oder zu vergleichen. Aus diesem Grund ist es schwierig, die Wahrnehmung von Orten und Landschaften in Planungsprozessen zu berücksichtigen. Jedoch wird genau dies von der Europäischen Landschaftskonvention gefordert, die Landschaft als einen bestimmten Bereich definiert, so wie er von Besuchern und Einwohnern wahrgenommen wird (“as a zone or area as perceived by local people or visitors”, ELC Art. 1, Abs. 38). Während viele Fortschritte und Erkenntnisse, zum Beispiel aus den Kognitionswissenschaften, heute helfen, die Wahrnehmung einzelner Menschen zu verstehen, konnte die Stadt- und Landschaftsplanung kaum profitieren. Es fehlt an Kenntnissen über das Zusammenwirken der Wahrnehmung vieler Menschen. Schon Stadtplaner Kevin Lynch beschäftigte dieses gemeinsame, kollektive ‚Bild‘ der menschlichen Umwelt ("generalized mental picture", Lynch, 1960, p. 4). Seitdem wurden kaum nennenswerte Fortschritte bei der Erfassung der allgemeinen, öffentlichen Wahrnehmung von Stadt- und Landschaft erzielt. Dies war Anlass und Motivation für die vorliegende Arbeit. Eine bisher in der Planung ungenutzte Informationsquelle für die Erfassung der Wahrnehmung vieler Menschen bietet sich in Form von crowdsourced Daten (auch ‚Big Data‘), also großen Mengen an Daten die von vielen Menschen im Internet zusammengetragen werden. Im Vergleich zu konventionellen Daten, zum Beispiel solchen die durch Experten erhoben werden und durch öffentliche Träger zur Verfügung stehen, eröffnet sich durch crowdsourced Daten eine bisher nicht verfügbare Quelle für Informationen, um die komplexen Zusammenhänge zwischen Raum, Identität und subjektiver Wahrnehmung zu verstehen. Dabei enthalten crowdsourced Daten lediglich Spuren menschlicher Entscheidungen. Aufgrund der Menge ist es aber möglich, wesentliche Informationen über die Wahrnehmung derer, die diese Daten zusammengetragen haben, zu gewinnen. Dies ermöglicht es Planern zu verstehen, wie Menschen ihre unmittelbare Umgebung wahrnehmen und mit ihr interagieren. Darüber hinaus wird es immer wichtiger, die Ansichten Vieler in Planungsprozessen zu berücksichtigen (Lynam, De Jong, Sheil, Kusumanto, & Evans, 2007; Brody, 2004). Der Wunsch nach öffentlicher Beteiligung sowie die Anzahl an beteiligten Stakeholdern nehmen dabei konstant zu. Durch das Nutzen dieser neuen Informationsquelle bietet sich eine Alternative zu herkömmlichen Ansätzen wie Umfragen, die genutzt werden um beispielsweise Meinungen, Positionen, Werte, Normen oder Vorlieben von bestimmten sozialen Gruppen zu messen. Indem es crowdsourced Daten erleichtern, solch soziokulturelle Werte zu bestimmen, können die Ergebnisse vor allem bei der schwierigen Gewichtung gegensätzlicher Interessen und Ansichten helfen. Es wird die Ansicht geteilt, dass die Nutzung von crowdsourced Daten, indem Einschätzungen von Experten ergänzt werden, letztendlich zu einer faireren, ausgeglichenen Berücksichtigung der Allgemeinheit in Entscheidungsprozessen führen kann (Erickson, 2011, p.1). Eine große Anzahl an Methoden ist bereits verfügbar, um aus dieser Datenquelle wichtige landschaftsbezogene Informationen auszulesen. Beispiele sind die Bewertung der Attraktivität von Landschaften, die Bestimmung der Bedeutung von Sehenswürdigkeiten oder Wahrzeichen, oder die Einschätzung von Reisevorlieben von Nutzergruppen. Viele der bisherigen Methoden wurden jedoch als ungenügend empfunden, um die speziellen Bedürfnisse und das breite Spektrum an Fragestellungen zur Landschaftswahrnehmung in Stadt- und Landschaftsplanung zu berücksichtigen. Das Ziel der vorliegenden Arbeit ist es, praxisrelevantes Wissen zu vermitteln, welches es Planern erlaubt, selbstständig Daten zu erforschen, zu visualisieren und zu interpretieren. Der Schlüssel für eine erfolgreiche Umsetzung wird dabei in der Synthese von Wissen aus drei Kategorien gesehen, theoretische Grundlagen (1), technisches Wissen zur Datenverarbeitung (2) sowie Kenntnisse zur grafischen Visualisierungen (3). Die theoretischen Grundlagen werden im ersten Teil der Arbeit (Part I) präsentiert. In diesem Teil werden zunächst Schwachpunkte aktueller Verfahren diskutiert, um anschließend einen neuen, konzeptionell-technischen Ansatz vorzuschlagen der gezielt auf die Ergänzung bereits vorhandener Methoden zielt. Im zweiten Teil der Arbeit (Part II) wird anhand eines Datenbeispiels die Anwendung des Ansatzes exemplarisch demonstriert. Fragestellungen die angesprochen werden reichen von der Datenabfrage, Verarbeitung, Analyse, Visualisierung, bis zur Interpretation von Grafiken in Planungsprozessen. Als Basis dient dabei ein Datenset mit 147 Millionen georeferenzierte Foto-Daten und 882 Millionen Tags der Fotoaustauschplatform Flickr, welches in den Jahren 2007 bis 2015 von 1,3 Millionen Nutzern zusammengetragen wurde. Anhand dieser Daten wird die Entwicklung neuer Visualisierungstechniken exemplarisch vorgestellt. Beispiele umfassen Spatio-temporal Tag Clouds, eine experimentelle Technik zur Generierung von wahrnehmungsgewichteten Karten, die Visualisierung von wahrgenommenem Landschaftswandel, das Abbilden von wahrnehmungsgewichteten Sichtlinien, sowie die Auswertung von individueller Wahrnehmung von und an bestimmten Orten. Die Anwendung dieser Techniken wird anhand verschiedener Testregionen in den USA, Kanada und Deutschland für alle Maßstabsebenen geprüft und diskutiert. Dies umfasst beispielsweise die Erfassung und Bewertung von Sichtlinien und visuellen Bezügen in Yosemite Valley, das Monitoring von wahrgenommenen Veränderungen im Bereich der High Line in New York, die Auswertung von individueller Wahrnehmung für Coit Tower in San Francisco, oder die Beurteilung von regional wahrgenommenen identitätsstiftenden Landschaftswerten für Baden-Württemberg und die Greater Toronto Area (GTA). Anschließend werden Ansätze vorgestellt, um die Qualität und Validität von Visualisierungen einzuschätzen. Abschließend wird anhand eines konkreten Planungsbeispiels, des London View Management Frameworks (LVMF), eine spezifische Implementation des Ansatzes und der Visualisierungen kurz aufgezeigt und diskutiert. Mit der Arbeit wird vor allem das breite Potential betont, welches die Nutzung von crowdsourced Daten für die Bewertung von Landschaftswahrnehmung in Stadt- und Landschaftsplanung bereithält. Insbesondere crowdsourced Fotodaten werden als wichtige zusätzliche Informationsquelle gesehen, da sie eine bisher nicht verfügbare Perspektive auf die allgemeine, öffentliche Wahrnehmung der Umwelt ermöglichen. Während der breiteren Anwendung noch einige Grenzen gesetzt sind, können die vorgestellten experimentellen Methoden und Techniken schon wichtige Aufschlüsse über eine ganze Reihe von wahrgenommenen Landschaftswerten geben. Auf konzeptioneller Ebene stellt die Arbeit eine erste Grundlage für weitere Forschung dar. Bevor jedoch eine breite Anwendung in der Praxis möglich ist, müssen entscheidende Fragen gelöst werden, beispielsweise zum Copyright, zur Definition von ethischen Standards innerhalb der Profession, sowie zum Schutz der Privatsphäre Beteiligter. Längerfristig wird nicht nur die Nutzung der Daten als wichtig angesehen, sondern auch die Erschließung der essentiellen Möglichkeiten dieser Entwicklung zur besseren Kommunikation mit Auftraggebern, Beteiligten und der Öffentlichkeit in Planungs- und Entscheidungsprozessen.:Contents 3 1 Introduction 7 1.1 Motivation 7 1.2 Literature review and conceptual scope 9 1.3 Terminology 11 1.4 Related research 12 1.5 Objectives 14 1.6 Methodology 16 1.7 Formal conventions 21 I. Part I: Conceptual framework 23 1.1 Visual perception 23 1.2 Theory and practice in landscape perception assessment 27 1.2.1 Expert valuation versus participation 27 1.2.2 Photography-based landscape perception assessment 32 1.2.2.1. Photo-based surveys 32 1.2.2.2. Photo-based Internet surveys 35 1.2.2.3. Photo-interviewing and participant photography 37 1.2.3 Conclusions 40 1.3 Conceptual approach 42 1.3.1 A framing theory: Distributed cognition 42 1.3.2 Description of the approach 46 1.3.3 Choosing the right data source 48 1.3.3.1. Availability of crowdsourced and georeferenced photo data 48 1.3.3.2. Suitability for analyzing human behavior and perception 51 1.3.4 Relations between data and the phenomenon under observation 55 1.3.4.1. Photo taking and landscape perception 55 1.3.4.2. User motivation in the context of photo sharing in communities 61 1.3.4.3. Describing and tagging photos: Forms of attributing meaning 66 1.3.5 Considerations for measuring and weighting data 70 1.3.6 Conclusions 77 II. Part II: Application example – Flickr photo analysis and evaluation of results 80 2.1 Software architecture 80 2.2 Materials and methods 86 2.2.1 Data retrieval, initial data structure and overall quantification 86 2.2.2 Global data bias 89 2.2.3 Basic techniques for filtering and classifying data 94 2.2.3.1. Where: photo locations 94 2.2.3.2. Who: user origin 96 2.2.3.3. When: time of photo taking 102 2.2.3.4. What: tag frequency 108 2.2.4 Methods for aggregating data 113 2.2.4.1. Clustering of photo locations 113 2.2.4.2. Clustering of tag locations 115 2.3 Application to planning: techniques for visualizing data 118 2.3.1 Introduction 118 2.3.2 Tag maps 121 2.3.2.1. Description of technique 121 2.3.2.2. Results: San Francisco and Berkeley waterfront 126 2.3.2.3. Results: Berkeley downtown and university campus 129 2.3.2.4. Results: Dresden and the Elbe Valley 132 2.3.2.5. Results: Greater Toronto Area and City of Toronto 136 2.3.2.6. Results: Baden-Württemberg 143 2.3.2.7. Summary 156 2.3.3 Temporal comparison for assessing landscape change 158 2.3.3.1. Description of technique 158 2.3.3.2. Results: The High Line, NY 159 2.3.3.3. Summary 160 2.3.4 Determining lines of sight and important visual connections 161 2.3.4.1. Description of technique 161 2.3.4.2. Results: Yosemite Valley 162 2.3.4.3. Results: Golden Gate and Bay Bridge 167 2.3.4.4. Results: CN Tower, Toronto 168 2.3.4.5. Summary 170 2.3.5 Individual location analysis 171 2.3.5.1. Description of technique 171 2.3.5.2. Results: Coit Tower, San Francisco 171 2.3.5.3. Results: CN Tower, Toronto 172 2.3.5.4. Summary 173 2.4 Quality and accuracy of results 175 2.4.1 Methodology 175 2.4.2 Accuracy of data 175 2.4.3 Validity and reliability of visualizations 178 2.4.3.1. Reliability 178 2.4.3.2. Validity 180 2.5 Implementation example: the London View Framework 181 2.5.1 Description 181 2.5.2 Evaluation methodology 183 2.5.3 Analysis 184 2.5.3.1. Landmarks 184 2.5.3.2. Views 192 2.5.4 Summary 199 III. Discussion 203 3.1 Application of the framework from a wider perspective 203 3.2 Significance of results 204 3.3 Further research 205 3.4 Discussion of workshop results and further feedback 206 3.4.1 Workshops at University of Waterloo and University of Toronto, Canada 206 3.4.2 Workshop at University of Technology Dresden, Germany 209 3.4.3 Feedback from presentations, discussions, exhibitions: second thoughts 210 IV. Conclusions 212 V. References 213 5.1 Literature 213 5.2 List of web references 228 5.3 List of figures 230 5.4 List of tables 234 5.5 List of maps 235 5.6 List of appendices 236 VI. Appendices 237

Technische Universität Dresden: Qucosa

Advances in database technology - EDBT 2016: 19th International Conference on Extending Database Technology, Bordeaux, France, March 15-18, 2016 : proceedings

Author
Publication venue: University of Konstanz, University Library
Publication date: 01/01/2016
Field of study

Digitale Bibliothek Thüringen

Music similarity analysis using the big data framework spark

Author: Schoder Johannes
Publication venue
Publication date: 01/01/2019
Field of study

A parameterizable recommender system based on the Big Data processing framework Spark is introduced, which takes multiple tonal properties of music into account and is capable of recommending music based on a user's personal preferences. The implemented system is fully scalable; more songs can be added to the dataset, the cluster size can be increased, and the possibility to add different kinds of audio features and more state-of-the-art similarity measurements is given. This thesis also deals with the extraction of the required audio features in parallel on a computer cluster. The extracted features are then processed by the Spark based recommender system, and song recommendations for a dataset consisting of approximately 114000 songs are retrieved in less than 12 seconds on a 16 node Spark cluster, combining eight different audio feature types and similarity measurements.Ein parametrisierbares Empfehlungssystem, basierend auf dem Big Data Framework Spark, wird präsentiert. Dieses berücksichtigt verschiedene klangliche Eigenschaften der Musik und erstellt Musikempfehlungen basierend auf den persönlichen Vorlieben eines Nutzers. Das implementierte Empfehlungssystem ist voll skalierbar. Mehr Lieder können dem Datensatz hinzugefügt werden, mehr Rechner können in das Computercluster eingebunden werden und die Möglichkeit andere Audiofeatures und aktuellere Ähnlichkeitsmaße hizuzufügen und zu verwenden, ist ebenfalls gegeben. Des Weiteren behandelt die Arbeit die parallele Berechnung der benötigten Audiofeatures auf einem Computercluster. Die Features werden von dem auf Spark basierenden Empfehlungssystem verarbeitet und Empfehlungen für einen Datensatz bestehend aus ca. 114000 Liedern können unter Berücksichtigung von acht verschiedenen Arten von Audiofeatures und Abstandsmaßen innerhalb von zwölf Sekunden auf einem Computercluster mit 16 Knoten berechnet werden

Digitale Bibliothek Thüringen