23 research outputs found
Forward Error Correction applied to JPEG-XS codestreams
JPEG-XS offers low complexity image compression for applications with
constrained but reasonable bit-rate, and low latency. Our paper explores the
deployment of JPEG-XS on lossy packet networks. To preserve low latency,
Forward Error Correction (FEC) is envisioned as the protection mechanism of
interest. Despite the JPEG-XS codestream is not scalable in essence, we observe
that the loss of a codestream fraction impacts the decoded image quality
differently, depending on whether this codestream fraction corresponds to
codestream headers, to coefficients significance information, or to low/high
frequency data, respectively. Hence, we propose a rate-distortion optimal
unequal error protection scheme that adapts the redundancy level of
Reed-Solomon codes according to the rate of channel losses and the type of
information protected by the code. Our experiments demonstrate that, at 5% loss
rates, it reduces the Mean Squared Error by up to 92% and 65%, compared to a
transmission without and with optimal but equal protection, respectively
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
In this work we address the task of semantic image segmentation with Deep
Learning and make three main contributions that are experimentally shown to
have substantial practical merit. First, we highlight convolution with
upsampled filters, or 'atrous convolution', as a powerful tool in dense
prediction tasks. Atrous convolution allows us to explicitly control the
resolution at which feature responses are computed within Deep Convolutional
Neural Networks. It also allows us to effectively enlarge the field of view of
filters to incorporate larger context without increasing the number of
parameters or the amount of computation. Second, we propose atrous spatial
pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP
probes an incoming convolutional feature layer with filters at multiple
sampling rates and effective fields-of-views, thus capturing objects as well as
image context at multiple scales. Third, we improve the localization of object
boundaries by combining methods from DCNNs and probabilistic graphical models.
The commonly deployed combination of max-pooling and downsampling in DCNNs
achieves invariance but has a toll on localization accuracy. We overcome this
by combining the responses at the final DCNN layer with a fully connected
Conditional Random Field (CRF), which is shown both qualitatively and
quantitatively to improve localization performance. Our proposed "DeepLab"
system sets the new state-of-art at the PASCAL VOC-2012 semantic image
segmentation task, reaching 79.7% mIOU in the test set, and advances the
results on three other datasets: PASCAL-Context, PASCAL-Person-Part, and
Cityscapes. All of our code is made publicly available online.Comment: Accepted by TPAM
ImageNet Large Scale Visual Recognition Challenge
The ImageNet Large Scale Visual Recognition Challenge is a benchmark in
object category classification and detection on hundreds of object categories
and millions of images. The challenge has been run annually from 2010 to
present, attracting participation from more than fifty institutions.
This paper describes the creation of this benchmark dataset and the advances
in object recognition that have been possible as a result. We discuss the
challenges of collecting large-scale ground truth annotation, highlight key
breakthroughs in categorical object recognition, provide a detailed analysis of
the current state of the field of large-scale image classification and object
detection, and compare the state-of-the-art computer vision accuracy with human
accuracy. We conclude with lessons learned in the five years of the challenge,
and propose future directions and improvements.Comment: 43 pages, 16 figures. v3 includes additional comparisons with PASCAL
VOC (per-category comparisons in Table 3, distribution of localization
difficulty in Fig 16), a list of queries used for obtaining object detection
images (Appendix C), and some additional reference
Matchmakers or tastemakers? Platformization of cultural intermediation & social mediaâs engines for âmaking up tasteâ
There are long-standing practices and processes that have traditionally mediated between the processes of production and consumption of cultural content. The prominent instances of these are: curating content by identifying and selecting cultural content in order to promote to a particular set of audiences; measuring audience behaviours to construct knowledge about their tastes; and guiding audiences through recommendations from cultural experts. These cultural intermediation processes are currently being transformed, and social media platforms play important roles in this transformation. However, their role is often attributed to the work of users and/or recommendation algorithms. Thus, the processes through which data about usersâ taste are aggregated and made ready for algorithmic processing are largely neglected. This study takes this problematic as an important gap in our understanding of social media platformsâ role in the transformation of cultural intermediation. To address this gap, the notion of platformization is used as a theoretical lens to examine the role of users and algorithms as part of social mediaâs distinct data-based sociotechnical configuration, which is built on the so-called âplatform-logicâ. Based on a set of conceptual ideas and the findings derived through a single case study on a music discovery platform, this thesis developed a framework to explain âplatformization of cultural intermediationâ. This framework outlines how curation, guidance, and measurement processes are âplat-formedâ in the course of development and optimisation of a social media platform. This is the main contribution of the thesis. The study also contributes to the literature by developing the concept of social mediaâs engines for âmaking up tasteâ. This concept illuminates how social media operate as sociotechnical cultural intermediaries and participates in tastemaking in ways that acquire legitimacy from the long-standing trust in the objectivity of classification, quantification, and measurement processes
A comparative investigation on the application and performance of Femtocell against Wi-Fi networks in an indoor environment
Due to the strenuous demands on the available spectrum and bandwidth, alongside the ever increasing rate at which data traffic is growing and the poor quality of experience QoE) faced with indoor communications, in order for cellular networks to remain dominant in areas pertaining to voice and data services, cellular service providers have to reform their marketing and service delivery strategies together with their overall network rchitecture. To accomplish this leap forward in performance, cellular service operators need to employ a network topology, which makes use of a mix of macrocells and small cells, effectively evolving the network, bringing it closer to the end-Ââuser. This investigation explores the use of small cell technology, specifically Femtocell technology in comparison to the already employed Wi-ÂâFi technology as a viable solution to poor indoor communications.The performance evolution is done by comparing key areas in the
every day use of Internet communications. These include HTTP testing, RTP testing and VoIP testing. Results are explained and the modes of operation of both technologies are compared
BEYOND MULTI-TARGET TRACKING: STATISTICAL PATTERN ANALYSIS OF PEOPLE AND GROUPS
Ogni giorno milioni e milioni di videocamere monitorano la vita quotidiana delle persone, registrando e collezionando una grande quantit\ue0 di dati. Questi dati possono essere molto utili per scopi di video-sorveglianza: dalla rilevazione di comportamenti anomali all'analisi del traffico urbano nelle strade. Tuttavia i dati collezionati vengono usati raramente, in quanto non \ue8 pensabile che un operatore umano riesca a esaminare manualmente e prestare attenzione a una tale quantit\ue0 di dati simultaneamente.
Per questo motivo, negli ultimi anni si \ue8 verificato un incremento della richiesta di strumenti per l'analisi automatica di dati acquisiti da sistemi di video-sorveglianza in modo da estrarre informazione di pi\uf9 alto livello (per esempio, John, Sam e Anne stanno camminando in gruppo al parco giochi vicino alla stazione) a partire dai dati a disposizione che sono solitamente a basso livello e ridondati (per esempio, una sequenza di immagini). L'obiettivo principale di questa tesi \ue8 quello di proporre soluzioni e algoritmi automatici che permettono di estrarre informazione ad alto livello da una zona di interesse che viene monitorata da telecamere. Cos\uec i dati sono rappresentati in modo da essere facilmente interpretabili e analizzabili da qualsiasi persona. In particolare, questo lavoro \ue8 focalizzato sull'analisi di persone e i loro comportamenti sociali collettivi.
Il titolo della tesi, beyond multi-target tracking, evidenzia lo scopo del lavoro: tutti i metodi proposti in questa tesi che si andranno ad analizzare hanno come comune denominatore il target tracking. Inoltre andremo oltre le tecniche standard per arrivare a una rappresentazione del dato a pi\uf9 alto livello. Per prima cosa, analizzeremo il problema del target tracking in quanto \ue8 alle basi di questo lavoro. In pratica, target tracking significa stimare la posizione di ogni oggetto di interesse in un immagine e la sua traiettoria nel tempo. Analizzeremo il problema da due prospettive complementari: 1) il punto di vista ingegneristico, dove l'obiettivo \ue8 quello di creare algoritmi che ottengono i risultati migliori per il problema in esame. 2) Il punto di vista della neuroscienza: motivati dalle teorie che cercano di spiegare il funzionamento del sistema percettivo umano, proporremo in modello attenzionale per tracking e il riconoscimento di oggetti e persone.
Il secondo problema che andremo a esplorare sar\ue0 l'estensione del tracking alla situazione dove pi\uf9 telecamere sono disponibili. L'obiettivo \ue8 quello di mantenere un identificatore univoco per ogni persona nell'intera rete di telecamere. In altre parole, si vuole riconoscere gli individui che vengono monitorati in posizioni e telecamere diverse considerando un database di candidati. Tale problema \ue8 chiamato in letteratura re-indetificazione di persone. In questa tesi, proporremo un modello standard di come affrontare il problema. In questo modello, presenteremo dei nuovi descrittori di aspetto degli individui, in quanto giocano un ruolo importante allo scopo di ottenere i risultati migliori.
Infine raggiungeremo il livello pi\uf9 alto di rappresentazione dei dati che viene affrontato in questa tesi, che \ue8 l'analisi di interazioni sociali tra persone. In particolare, ci focalizzeremo in un tipo specifico di interazione: il raggruppamento di persone. Proporremo dei metodi di visione computazionale che sfruttano nozioni di psicologia sociale per rilevare gruppi di persone. Inoltre, analizzeremo due modelli probabilistici che affrontano il problema di tracking (congiunto) di gruppi e individui.Every day millions and millions of surveillance cameras monitor the world, recording and collecting huge amount of data. The collected data can be extremely useful: from the behavior analysis to prevent unpleasant events, to the analysis of the traffic. However, these valuable data is seldom used, because of the amount of information that the human operator has to manually attend and examine. It would be like looking for a needle in the haystack.
The automatic analysis of data is becoming mandatory for extracting summarized high-level information (e.g., John, Sam and Anne are walking together in group at the playground near the station) from the available redundant low-level data (e.g., an image sequence).
The main goal of this thesis is to propose solutions and automatic algorithms that perform high-level analysis of a camera-monitored environment. In this way, the data are summarized in a high-level representation for a better understanding.
In particular, this work is focused on the analysis of moving people and their collective behaviors.
The title of the thesis, beyond multi-target tracking, mirrors the purpose of the work: we will propose methods that have the target tracking as common denominator, and go beyond the standard techniques in order to provide a high-level description of the data.
First, we investigate the target tracking problem as it is the basis of all the next work. Target tracking estimates the position of each target in the image and its trajectory over time. We analyze the problem from two complementary perspectives: 1) the engineering point of view, where we deal with problem in order to obtain the best results in terms of accuracy and performance. 2) The neuroscience point of view, where we propose an attentional model for tracking and recognition of objects and people, motivated by theories of the human perceptual system.
Second, target tracking is extended to the camera network case, where the goal is to keep a unique identifier for each person in the whole network, i.e., to perform person re-identification. The goal is to recognize individuals in diverse locations over different non-overlapping camera views or also the same camera, considering a large set of candidates.
In this context, we propose a pipeline and appearance-based descriptors that enable us to define in a proper way the problem and to reach the-state-of-the-art results.
Finally, the higher level of description investigated in this thesis is the analysis (discovery and tracking) of social interaction between people. In particular, we focus on finding small groups of people. We introduce methods that embed notions of social psychology into computer vision algorithms. Then, we extend the detection of social interaction over time, proposing novel probabilistic models that deal with (joint) individual-group tracking
Land Use and Spatial Planning from a Sustainability Perspective: Designing the One-Minute City
This PhD thesis by publication, comprising four journal papers and a book chapter, addresses the overarching research question of how sustainability features can increase the value of land in urban development. Using two case studies (in China and Australia), it offers insights from a sustainability perspective into land use and spatial planning within the wider notion of value creation. The key findings are the concept of the Minute City and the spatial logic behind it