63 research outputs found
Machine Learning for Multimedia Communications
Machine learning is revolutionizing the way multimedia information is processed and transmitted to users. After intensive and powerful training, some impressive efficiency/accuracy improvements have been made all over the transmission pipeline. For example, the high model capacity of the learning-based architectures enables us to accurately model the image and video behavior such that tremendous compression gains can be achieved. Similarly, error concealment, streaming strategy or even user perception modeling have widely benefited from the recent learningoriented developments. However, learning-based algorithms often imply drastic changes to the way data are represented or consumed, meaning that the overall pipeline can be affected even though a subpart of it is optimized. In this paper, we review the recent major advances that have been proposed all across the transmission chain, and we discuss their potential impact and the research challenges that they raise
In-Datacenter Performance Analysis of a Tensor Processing Unit
Many architects believe that major improvements in cost-energy-performance
must now come from domain-specific hardware. This paper evaluates a custom
ASIC---called a Tensor Processing Unit (TPU)---deployed in datacenters since
2015 that accelerates the inference phase of neural networks (NN). The heart of
the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak
throughput of 92 TeraOps/second (TOPS) and a large (28 MiB) software-managed
on-chip memory. The TPU's deterministic execution model is a better match to
the 99th-percentile response-time requirement of our NN applications than are
the time-varying optimizations of CPUs and GPUs (caches, out-of-order
execution, multithreading, multiprocessing, prefetching, ...) that help average
throughput more than guaranteed latency. The lack of such features helps
explain why, despite having myriad MACs and a big memory, the TPU is relatively
small and low power. We compare the TPU to a server-class Intel Haswell CPU and
an Nvidia K80 GPU, which are contemporaries deployed in the same datacenters.
Our workload, written in the high-level TensorFlow framework, uses production
NN applications (MLPs, CNNs, and LSTMs) that represent 95% of our datacenters'
NN inference demand. Despite low utilization for some applications, the TPU is
on average about 15X - 30X faster than its contemporary GPU or CPU, with
TOPS/Watt about 30X - 80X higher. Moreover, using the GPU's GDDR5 memory in the
TPU would triple achieved TOPS and raise TOPS/Watt to nearly 70X the GPU and
200X the CPU.Comment: 17 pages, 11 figures, 8 tables. To appear at the 44th International
Symposium on Computer Architecture (ISCA), Toronto, Canada, June 24-28, 201
Aplicación de técnicas de aprendizaje automático a la gestión y optimización de cachés de teselas para la aceleración de servicios de mapas en las infraestructuras de datos espaciales
La gran proliferación en el uso de servicios de mapas a través de la Web ha motivado la
necesidad de disponer de servicios cada vez más escalables. Como respuesta a esta necesidad,
los servicios de mapas basados en teselado se han perfilado como una alternativa escalable
frente a los servicios de mapas tradicionales, permitiendo la actuación de mecanismos de
caché o incluso la prestación del servicio mediante una colección de imágenes pregeneradas.
Sin embargo, los requisitos de almacenamiento y tiempo de puesta en marcha de estos
servicios resultan a menudo prohibitivos cuando la cartografía a servir cubre una zona
geográfica extensa para un elevado número de escalas.
Por ello, habitualmente estos servicios se ofrecen recurriendo a cachés parciales que
contienen tan solo un subconjunto de la cartografía. Para garantizar una Calidad de Servicio
(QoS - Quality of Service) aceptable es necesaria la actuación de adecuadas políticas de
mantenimiento y gestión de estas cachés de mapas: 1) Estrategias de población inicial ó
seeding de la caché. 2) Algoritmos de carga dinámica ante las peticiones de los usuarios. 3)
Políticas de reemplazo de caché.
Sin embargo, existe un reducido número de estas estrategias que sean específicas para los
servicios de mapas. La mayor parte de estrategias aplicadas a estos servicios son extraídas
de otros ámbitos, como los proxies Web tradicionales, las cuáles no tienen en cuenta la
componente espacial de los objetos de mapa que gestionan.
En la presente tesis se aborda este punto de mejora, diseñando nuevos algoritmos específicos para este dominio de aplicación que permitan optimizar el rendimiento de los
servicios de mapas. Dado el elevado número de objetos gestionados por estas cachés y la
heterogeneidad de los mismos en cuanto a capas, escalas de representación, etc., se ha hecho
un esfuerzo para que las estrategias diseñadas sean automáticas o semi-automáticas,
requiriendo la menor intervención humana posible.
Así, se han propuesto dos novedosas estrategias para la población inicial de una caché de
mapas. Una de ellas utiliza un modelo descriptivo mediante los registros de peticiones pasadas
del servicio. La otra se basa en un modelo predictivo para la identificación de fenómenos
geográficos directores de las peticiones de los usuarios, parametrizado o bien mediante un
análisis regresivo OLS (Ordinary Least Squares) o mediante un sistema inteligente con redes
neuronales.
Asimismo, se han llevado a cabo importantes contribuciones en relación con las estrategias
de reemplazo de estas cachés. Por una parte, se ha propuesto un sistema inteligente
basado en redes neuronales, que estima la popularidad de acceso futuro en base a ciertas
propiedades de los objetos que gestiona: actualidad de referencia, frecuencia de referencia,
y el tamaño de la tesela referenciada. Por otra parte, se ha propuesto una estrategia, bautizada
como Spatial-LFU, la cual es una variante de la estrategia Perfect-LFU, simplificada
aprovechando la correlación espacial existente entre las peticiones.Departamento de Teoría de la Señal y Comunicaciones e Ingeniería Telemátic
Evaluating the Performance of Three Popular Web Mapping Libraries: A Case Study Using Argentina’s Life Quality Index
Recent Web technologies such as HTML5, JavaScript, and WebGL have enabled powerful and highly dynamic Web mapping applications executing on standard Web browsers. Despite the complexity for developing such applications has been greatly reduced by Web mapping libraries, developers face many choices to achieve optimal performance and network usage. This scenario is even more complex when considering different representations of geographical data (raster, raw data or vector) and variety of devices (tablets, smartphones, and personal computers). This paper compares the performance and network usage of three popular JavaScript Web mapping libraries for implementing a Web map using different representations for geodata, and executing on different devices. In the experiments, Mapbox GL JS achieved the best overall performance on mid and high end devices for displaying raster or vector maps, while OpenLayers was the best for raster maps on all devices. Vector-based maps are a safe bet for new Web maps, since performance is on par with raster maps on mid-end smartphones, with significant less network bandwidth requirements.Fil: Zunino Suarez, Alejandro Octavio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; ArgentinaFil: Velázquez, Guillermo Ángel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto de Geografía, Historia y Ciencias Sociales. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto de Geografía, Historia y Ciencias Sociales; ArgentinaFil: Celemin, Juan Pablo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto de Geografía, Historia y Ciencias Sociales. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto de Geografía, Historia y Ciencias Sociales; ArgentinaFil: Mateos Diaz, Cristian Maximiliano. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; ArgentinaFil: Hirsch Jofré, Matías Eberardo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; ArgentinaFil: Rodriguez, Juan Manuel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; Argentin
Machine Learning for Multimedia Communications
Machine learning is revolutionizing the way multimedia information is processed and transmitted to users. After intensive and powerful training, some impressive efficiency/accuracy improvements have been made all over the transmission pipeline. For example, the high model capacity of the learning-based architectures enables us to accurately model the image and video behavior such that tremendous compression gains can be achieved. Similarly, error concealment, streaming strategy or even user perception modeling have widely benefited from the recent learning-oriented developments. However, learning-based algorithms often imply drastic changes to the way data are represented or consumed, meaning that the overall pipeline can be affected even though a subpart of it is optimized. In this paper, we review the recent major advances that have been proposed all across the transmission chain, and we discuss their potential impact and the research challenges that they raise
Navigating Diverse Datasets in the Face of Uncertainty
When exploring big volumes of data, one of the challenging aspects is their diversity
of origin. Multiple files that have not yet been ingested into a database system may
contain information of interest to a researcher, who must curate, understand and sieve
their content before being able to extract knowledge.
Performance is one of the greatest difficulties in exploring these datasets. On the
one hand, examining non-indexed, unprocessed files can be inefficient. On the other
hand, any processing before its understanding introduces latency and potentially un-
necessary work if the chosen schema matches poorly the data. We have surveyed the
state-of-the-art and, fortunately, there exist multiple proposal of solutions to handle
data in-situ performantly.
Another major difficulty is matching files from multiple origins since their schema
and layout may not be compatible or properly documented. Most surveyed solutions
overlook this problem, especially for numeric, uncertain data, as is typical in fields
like astronomy.
The main objective of our research is to assist data scientists during the exploration
of unprocessed, numerical, raw data distributed across multiple files based solely on
its intrinsic distribution.
In this thesis, we first introduce the concept of Equally-Distributed Dependencies,
which provides the foundations to match this kind of dataset. We propose PresQ,
a novel algorithm that finds quasi-cliques on hypergraphs based on their expected
statistical properties. The probabilistic approach of PresQ can be successfully exploited to mine EDD between diverse datasets when the underlying populations can
be assumed to be the same.
Finally, we propose a two-sample statistical test based on Self-Organizing Maps
(SOM). This method can outperform, in terms of power, other classifier-based two-
sample tests, being in some cases comparable to kernel-based methods, with the
advantage of being interpretable.
Both PresQ and the SOM-based statistical test can provide insights that drive
serendipitous discoveries
- …