1,003 research outputs found
Video Summarization Using Unsupervised Deep Learning
In this thesis, we address the task of video summarization using unsupervised deep-learning architectures. Video summarization aims to generate a short summary by selecting the most informative and important frames (key-frames) or fragments (key-fragments) of the full-length video, and presenting them in temporally-ordered fashion. Our objective is to overcome observed weaknesses of existing video summarization approaches that utilize RNNs for modeling the temporal dependence of frames, related to: i) the small influence of the estimated frame-level importance scores in the created video summary, ii) the insufficiency of RNNs to model long-range frames' dependence, and iii) the small amount of parallelizable operations during the training of RNNs. To address the first weakness, we propose a new unsupervised network architecture, called AC-SUM-GAN, which formulates the selection of important video fragments as a sequence generation task and learns this task by embedding an Actor-Critic model in a Generative Adversarial Network. The feedback of a trainable Discriminator is used as a reward by the Actor-Critic model in order to explore a space of actions and learn a value function (Critic) and a policy (Actor) for video fragment selection. To tackle the remaining weaknesses, we investigate the use of attention mechanisms for video summarization and propose a new supervised network architecture, called PGL-SUM, that combines global and local multi-head attention mechanisms which take into account the temporal position of the video frames, in order to discover different modelings of the frames' dependencies at different levels of granularity. Based on the acquired experience, we then propose a new unsupervised network architecture, called CA-SUM, which estimates the frames' importance using a novel concentrated attention mechanism that focuses on non-overlapping blocks in the main diagonal of the attention matrix and takes into account the attentive uniqueness and diversity of the associated frames of the video. All the proposed architectures have been extensively evaluated on the most commonly-used benchmark datasets, demonstrating their competitiveness against other approaches and documenting the contribution of our proposals on advancing the current state-of-the-art on video summarization. Finally, we make a first attempt on producing explanations for the video summarization results. Inspired by relevant works in the Natural Language Processing domain, we propose an attention-based method for explainable video summarization and we evaluate the performance of various explanation signals using our CA-SUM architecture and two benchmark datasets for video summarization. The experimental results indicate the advanced performance of explanation signals formed using the inherent attention weights, and demonstrate the ability of the proposed method to explain the video summarization results using clues about the focus of the attention mechanism
SCALE: Scaling up the Complexity for Advanced Language Model Evaluation
Recent strides in Large Language Models (LLMs) have saturated many NLP benchmarks (even professional domain-specific ones), emphasizing the need for novel, more challenging novel ones to properly assess LLM capabilities. In this paper, we introduce a novel NLP benchmark that poses challenges to current LLMs across four key dimensions: processing long documents (up to 50K tokens), utilizing domain specific knowledge (embodied in legal texts), multilingual understanding (covering five languages), and multitasking (comprising legal document to
document Information Retrieval, Court View Generation, Leading Decision Summarization, Citation Extraction, and eight challenging Text Classification tasks). Our benchmark comprises diverse legal NLP datasets from the Swiss legal system, allowing for a comprehensive study of the underlying Non-English, inherently multilingual, federal legal system. Despite recent advances, efficiently processing long documents for intense review/analysis tasks remains an open challenge for language models. Also, comprehensive, domain-specific benchmarks requiring high expertise to develop are rare, as are multilingual benchmarks. This scarcity underscores our contribution’s value, considering most public models are trained predominantly on English corpora, while other languages remain understudied, particularly for practical domain-specific NLP tasks. Our benchmark allows for testing and advancing the state-of-the-art LLMs. As part of our study, we evaluate several pre-trained multilingual language models on our benchmark to establish strong baselines as a point of reference. Despite the large size of our datasets ∗ Equal contribution. (tens to hundreds of thousands of examples), existing publicly available models struggle with most tasks, even after in-domain pretraining. We publish all resources (benchmark suite, pre-trained models, code) under a fully permissive open CC BY-SA license
Explainable temporal data mining techniques to support the prediction task in Medicine
In the last decades, the increasing amount of data available in all fields raises the necessity to discover new knowledge and explain the hidden information found. On one hand, the rapid increase of interest in, and use of, artificial intelligence (AI) in computer applications has raised a parallel concern about its ability (or lack thereof) to provide understandable, or explainable, results to users. In the biomedical informatics and computer science communities, there is considerable discussion about the `` un-explainable" nature of artificial intelligence, where often algorithms and systems leave users, and even developers, in the dark with respect to how results were obtained. Especially in the biomedical context, the necessity to explain an artificial intelligence system result is legitimate of the importance of patient safety. On the other hand, current database systems enable us to store huge quantities of data. Their analysis through data mining techniques provides the possibility to extract relevant knowledge and useful hidden information. Relationships and patterns within these data could provide new medical knowledge. The analysis of such healthcare/medical data collections could greatly help to observe the health conditions of the population and extract useful information that can be exploited in the assessment of healthcare/medical processes. Particularly, the prediction of medical events is essential for preventing disease, understanding disease mechanisms, and increasing patient quality of care. In this context, an important aspect is to verify whether the database content supports the capability of predicting future events. In this thesis, we start addressing the problem of explainability, discussing some of the most significant challenges need to be addressed with scientific and engineering rigor in a variety of biomedical domains. We analyze the ``temporal component" of explainability, focusing on detailing different perspectives such as: the use of temporal data, the temporal task, the temporal reasoning, and the dynamics of explainability in respect to the user perspective and to knowledge. Starting from this panorama, we focus our attention on two different temporal data mining techniques. The first one, based on trend abstractions, starting from the concept of Trend-Event Pattern and moving through the concept of prediction, we propose a new kind of predictive temporal patterns, namely Predictive Trend-Event Patterns (PTE-Ps). The framework aims to combine complex temporal features to extract a compact and non-redundant predictive set of patterns composed by such temporal features. The second one, based on functional dependencies, we propose a methodology for deriving a new kind of approximate temporal functional dependencies, called Approximate Predictive Functional Dependencies (APFDs), based on a three-window framework. We then discuss the concept of approximation, the data complexity of deriving an APFD, the introduction of two new error measures, and finally the quality of APFDs in terms of coverage and reliability. Exploiting these methodologies, we analyze intensive care unit data from the MIMIC dataset
Resilient and Scalable Forwarding for Software-Defined Networks with P4-Programmable Switches
Traditional networking devices support only fixed features and limited configurability.
Network softwarization leverages programmable software and hardware platforms to remove those limitations.
In this context the concept of programmable data planes allows directly to program the packet processing pipeline of networking devices and create custom control plane algorithms.
This flexibility enables the design of novel networking mechanisms where the status quo struggles to meet high demands of next-generation networks like 5G, Internet of Things, cloud computing, and industry 4.0.
P4 is the most popular technology to implement programmable data planes.
However, programmable data planes, and in particular, the P4 technology, emerged only recently.
Thus, P4 support for some well-established networking concepts is still lacking and several issues remain unsolved due to the different characteristics of programmable data planes in comparison to traditional networking.
The research of this thesis focuses on two open issues of programmable data planes.
First, it develops resilient and efficient forwarding mechanisms for the P4 data plane as there are no satisfying state of the art best practices yet.
Second, it enables BIER in high-performance P4 data planes.
BIER is a novel, scalable, and efficient transport mechanism for IP multicast traffic which has only very limited support of high-performance forwarding platforms yet.
The main results of this thesis are published as 8 peer-reviewed and one post-publication peer-reviewed publication. The results cover the development of suitable resilience mechanisms for P4 data planes, the development and implementation of resilient BIER forwarding in P4, and the extensive evaluations of all developed and implemented mechanisms. Furthermore, the results contain a comprehensive P4 literature study.
Two more peer-reviewed papers contain additional content that is not directly related to the main results.
They implement congestion avoidance mechanisms in P4 and develop a scheduling concept to find cost-optimized load schedules based on day-ahead forecasts
Deep Attentive Time Series Modelling for Quantitative Finance
Mención Internacional en el título de doctorTime series modelling and forecasting is a persistent problem with extensive
implications in scientific, business, industrial, and economic areas. This thesis’s contribution
is twofold. Firstly, we propose a novel probabilistic time series forecasting
methodology that introduces the use of Fourier domain-based attention models,
merging classic signal processing spectral filtering techniques with machine learning
architectures. Secondly, we take advantage of the abundance of financial intraday
high-frequency data to develop deep learning-based solutions for modelling financial
time series. Machine learning methods can potentially enhance the performance
of traditional methodologies used by practitioners. Deep neural networks’ feature
extraction capabilities, which can benefit from the rising accessibility of highfrequency
data, and attention mechanisms, which help to model temporal patterns,
are mostly to blame for this.
Concerning our first major contribution, this thesis empirically demonstrates
that spectral domain-based machine learning models can learn the properties of time
series datasets and integrate this information to improve the forecasting accuracy.
Simultaneously, Fourier domain-based models alleviate some of the inconveniences
commonly associated with deep autoregressive models. These architectures, prone
to prioritising recent past data, often ignore critical global information not contained
in previous time steps. Additionally, they are susceptible to error accumulation
and propagation and may not yield illustrative results. The proposed model, the
Spectral Attention Autoregressive Model (SAAM), mitigates these problems by
combining deep autoregressive models with a Spectral Attention (SA) module. This
module uses two attention models operating over the Fourier domain representation
of the time series’ embedding. Through spectral filtering, SAAM differentiates
between the components of the frequency domain that should be considered noise
and subsequently filtered out, and the global patterns that are relevant and should
be incorporated into the predictions. Empirical evaluation proves how the proposed
Spectral Attention module can be integrated into various deep autoregressive
models, consistently improving the results of these base architectures and achieving
state-of-the-art performance.
Afterwards, this thesis shifts toward showcasing the benefits of machine learning
solutions in two different quantitative finance scenarios, proving how attention-based deep learning approaches compare favourably to classic parametric-based models and providing
solutions for various algorithmic and high-frequency trading problems. In the context of volatility
forecasting, which plays a central role among equity risk measures, we show that Dilated Causal
Convolutional-based neural networks offer significant performance gains compared to
well-established volatility-oriented parametric models. The proposed model, called DeepVol,
showcases how data- driven models can avoid the limitations of classical methods by taking
advantage of the abundance of high-frequency data. DeepVol outperforms baseline methods while
exhibiting robustness in the presence of volatility shocks, showing its ability to extract
universal features and transfer learning to out-of-distribution data. Consequently, data-driven
approaches should be carefully considered in the context of volatility forecasting, as they can be
instrumental in the valuation of financial
derivatives, risk management, and the formation of investment portfolios.
Finally, this thesis presents a survival analysis model for estimating the distri- bution of fill
times for limit orders posted in the Limit Order Book (LOB). The proposed model, which does not
make assumptions about the underlying stochastic processes, employs a convolutional-Transformer
encoder and a monotonic neural network decoder to relate the time-varying features of the LOB to
the distribution of fill times. It grants practitioners the capability of making informed decisions
between market orders and limit orders, which in practice entails a trade-off between immediate
execution and price premium. We offer an exhaustive comparison of the survival functions resulting
from different order placement strategies, offering insight into the fill probability of orders
placed within the spread. Empirical evaluation reveals the superior performance of the monotonic
encoder-decoder convolutional- Transformer compared to state-of-the-art benchmarks, leading to more
accurate
predictions and improved economic value.El modelado y predicción de series temporales es un problema persistente con amplias
implicaciones en áreas científicas, comerciales, industriales y económicas. Esta tesis
propone una doble contribución en este ámbito. En primer lugar, formulamos una
novedosa metodología para la predicción probabilística de series temporales que
introduce el uso de modelos de atención basados en el dominio de la frecuencia,
con la transformada de Fourier desempeñando un papel fundamental. El modelo
propuesto fusiona técnicas clásicas de filtrado espectral, pertenecientes al campo
del procesado de señal, con modelos de aprendizaje automático. En segundo lugar,
desarrollamos varias soluciones basadas en aprendizaje profundo para el modelado
de datos financieros intradía, aprovechando la cada vez mayor disponibilidad de los
mismos. Los métodos de aprendizaje automático poseen el potencial para mejorar los
resultados obtenidos por las metodologías clásicas que los profesionales del ámbito
de las finanzas cuantitativas acostumbran a utilizar. La capacidad de extracción
de características de las redes neuronales, que pueden aprovechar la creciente
accesibilidad a los datos financieros de alta frecuencia, y el uso de los mecanismos
de atención para el modelado temporal, son los principales responsables de ésto.
En lo relativo a la primera de las contribuciones mencionadas anteriormente, es
decir, el uso de modelos de aprendizaje automático que operan sobre el dominio de la
frecuencia, esta tesis demuestra de manera empírica que los modelos de aprendizaje
profundo basados en el dominio espectral pueden aprender de forma más eficiente
las propiedades de las series temporales a predecir. De esta manera, logran mejorar
la precisión de las predicciones a la vez que solventan varios de los problemas
que lastran el rendimiento de los modelos autoregresivos. Estas arquitecturas son
propensas a sobreponderar los datos del pasado inmediato, ignorando a menudo
valiosa información global que no está contenida en estas observaciones recientes.
Además, son susceptibles a la acumulación y propagación de errores. Finalmente,
los resultados que producen son difícilmente interpretables. Proponemos un nuevo
modelo, llamado “Spectral Attention Autoregressive Model”(SAAM) (Modelo
Autorregresivo con Atención Espectral), que mitiga estos problemas combinando
modelos autorregresivos basados en aprendizaje profundo con un módulo de Atención
Espectral. Dicho módulo contiene dos modelos de atención que operan sobre la
representación en el dominio de Fourier del “embedding” obtenido a partir de la serie temporal a predecir. Usando técnicas de filtrado espectral, SAAM diferencia entre
los componentes del espectro que deben ser considerados ruido, y por consiguiente
deben ser filtrados, y aquellos patrones globales que son relevantes y deben ser
incorporados en las predicciones. Mediante una exhaustiva evaluación empírica,
demostramos que nuestro modelo de Atención Espectral puede ser integrado en
diversos modelos autorregresivos que forman parte del estado del arte actual,
mejorando de forma consistente los resultados obtenidos.
En lo relativo a la segunda contribución principal de esta tesis doctoral, demostramos
los beneficios que las metodologías de aprendizaje automático basadas
en modelos de atención pueden aportar en dos problemas propios de las finanzas
cuantitativas. Diversos experimentos demuestran cómo este tipo de modelos pueden
mejorar los resultados obtenidos por los modelos clásicos empleados en este campo,
proporcionando soluciones innovadoras para diversos problemas recurrentes dentro
del trading algorítmico de alta frecuencia.
La predicción de volatilidad en mercados financieros es el primero de estos
problemas en ser abordado en la presente tesis. La estimación de volatilidad
desempeña un papel central entre las medidas de riesgo utilizadas en los mercados
de renta variable. En esta tesis demostramos que las redes neuronales basadas
en “Dilated Causal Convolutions” (Convolucionales Causales Dilatadas) ofrecen
ganancias significativas en comparación con los modelos paramétricos clásicos
desarrollados única y exclusivamente para predicción de volatilidad. El modelo
propuesto, llamado DeepVol, evidencia que el uso de modelos de aprendizaje
profundo puede evitar las numerosas limitaciones propias de los métodos clásicos,
logrando aprovechar la abundancia de datos de alta frecuencia para aprender las
funciones deseadas. DeepVol supera a todos los modelos de referencia usados
como comparativa, a la vez que exhibe robustez en períodos que contienen shocks
de volatilidad, demostrando su capacidad para extraer características universales
comunes a diferentes instrumentos financieros. Los resultados obtenidos en esta
parte de la tesis nos llevan a concluir que los modelos de aprendizaje automático
deben considerarse cuidadosamente en el contexto de predicción de volatilidad,
pudiendo ser especialmente relevantes en la valoración de derivados financieros,
gestión del riesgo, y creación de carteras de inversión.
Para terminar, esta tesis presenta un modelo de análisis de supervivencia para
estimar la distribución de probabilidad de ejecución subyacente a órdenes limitadas
publicadas en el conocido como “Limit Order Book” (Libro de Órdenes Limitadas).
El modelo propuesto, que no necesita partir de suposiciones sobre los procesos
estocásticos subyacentes, emplea una arquitectura codificador/decodificador que
utiliza un “Transformer” convolutional para codificar la información del libro de
órdenes y una red monotónica que decodifica la función de supervivencia a estimar.Programa de Doctorado en Multimedia y Comunicaciones por la Universidad Carlos III de Madrid y la Universidad Rey Juan CarlosPresidente: Juan José Murillo Fuentes.- Secretario: Emilio Parrado Hernández.- Vocal: Manuel Gómez Rodrígue
Knowledge and pre-trained language models inside and out: a deep-dive into datasets and external knowledge
Pre-trained Language Models (PLMs) have greatly advanced the performance of various NLP tasks and have undoubtedly been serving as foundation models for this field. These pre-trained models are able to capture rich semantic patterns from large-scale text corpora and learn high-quality representations of texts. However, such models still have shortcomings - they underperform when faced with tasks that requires implicit external knowledge to be understood, which is difficult to learn with commonly employed pre-training objectives. Moreover, there lacks a comprehensive understanding of PLMs’ behaviour in learning knowledge during the fine-tuning phase. Therefore, in order to address the aforementioned challenges, we propose a set of approaches to inject external knowledge into PLMs and demonstrate experiments investigating their behaviour in learning knowledge during the fine-tuning phase, primarily focusing on Sentiment Analysis, Question Answering and Video Question Answering.
Specifically, we introduce novel approaches explicitly using textual historical reviews of users and products for improving sentiment analysis. To overcome the problem of context-question lexical overlap and data scarcity for question generation, we propose a novel method making use of linguistic and semantic knowledge with heuristics. Additionally, we explore how to utilise multimodal (visual and acoustic) information/knowledge to improve Video Question Answering.
Experiments conducted on benchmark datasets show that our proposed approaches achieve superior performance compared to state-of-the-art models, demonstrating the effectiveness of our methods for injecting external knowledge. Furthermore, we conduct a set of experiments investigating the learning of knowledge for PLMs for question answering under various scenarios. Results reveal that the internal characteristics of QA datasets can pose strong bias for PLMs when learning from downstream tasks datasets. Finally, we present an in-depth discussion of future directions for improving PLMs with external knowledge
Parallel and Flow-Based High Quality Hypergraph Partitioning
Balanced hypergraph partitioning is a classic NP-hard optimization problem that is a fundamental tool in such diverse disciplines as VLSI circuit design, route planning, sharding distributed databases, optimizing communication volume in parallel computing, and accelerating the simulation of quantum circuits.
Given a hypergraph and an integer , the task is to divide the vertices into disjoint blocks with bounded size, while minimizing an objective function on the hyperedges that span multiple blocks.
In this dissertation we consider the most commonly used objective, the connectivity metric, where we aim to minimize the number of different blocks connected by each hyperedge.
The most successful heuristic for balanced partitioning is the multilevel approach, which consists of three phases.
In the coarsening phase, vertex clusters are contracted to obtain a sequence of structurally similar but successively smaller hypergraphs.
Once sufficiently small, an initial partition is computed.
Lastly, the contractions are successively undone in reverse order, and an iterative improvement algorithm is employed to refine the projected partition on each level.
An important aspect in designing practical heuristics for optimization problems is the trade-off between solution quality and running time.
The appropriate trade-off depends on the specific application, the size of the data sets, and the computational resources available to solve the problem.
Existing algorithms are either slow, sequential and offer high solution quality, or are simple, fast, easy to parallelize, and offer low quality.
While this trade-off cannot be avoided entirely, our goal is to close the gaps as much as possible.
We achieve this by improving the state of the art in all non-trivial areas of the trade-off landscape with only a few techniques, but employed in two different ways.
Furthermore, most research on parallelization has focused on distributed memory, which neglects the greater flexibility of shared-memory algorithms and the wide availability of commodity multi-core machines.
In this thesis, we therefore design and revisit fundamental techniques for each phase of the multilevel approach, and develop highly efficient shared-memory parallel implementations thereof.
We consider two iterative improvement algorithms, one based on the Fiduccia-Mattheyses (FM) heuristic, and one based on label propagation.
For these, we propose a variety of techniques to improve the accuracy of gains when moving vertices in parallel, as well as low-level algorithmic improvements.
For coarsening, we present a parallel variant of greedy agglomerative clustering with a novel method to resolve cluster join conflicts on-the-fly.
Combined with a preprocessing phase for coarsening based on community detection, a portfolio of from-scratch partitioning algorithms, as well as recursive partitioning with work-stealing, we obtain our first parallel multilevel framework.
It is the fastest partitioner known, and achieves medium-high quality, beating all parallel partitioners, and is close to the highest quality sequential partitioner.
Our second contribution is a parallelization of an n-level approach, where only one vertex is contracted and uncontracted on each level.
This extreme approach aims at high solution quality via very fine-grained, localized refinement, but seems inherently sequential.
We devise an asynchronous n-level coarsening scheme based on a hierarchical decomposition of the contractions, as well as a batch-synchronous uncoarsening, and later fully asynchronous uncoarsening.
In addition, we adapt our refinement algorithms, and also use the preprocessing and portfolio.
This scheme is highly scalable, and achieves the same quality as the highest quality sequential partitioner (which is based on the same components), but is of course slower than our first framework due to fine-grained uncoarsening.
The last ingredient for high quality is an iterative improvement algorithm based on maximum flows.
In the sequential setting, we first improve an existing idea by solving incremental maximum flow problems, which leads to smaller cuts and is faster due to engineering efforts.
Subsequently, we parallelize the maximum flow algorithm and schedule refinements in parallel.
Beyond the strive for highest quality, we present a deterministically parallel partitioning framework.
We develop deterministic versions of the preprocessing, coarsening, and label propagation refinement.
Experimentally, we demonstrate that the penalties for determinism in terms of partition quality and running time are very small.
All of our claims are validated through extensive experiments, comparing our algorithms with state-of-the-art solvers on large and diverse benchmark sets.
To foster further research, we make our contributions available in our open-source framework Mt-KaHyPar.
While it seems inevitable, that with ever increasing problem sizes, we must transition to distributed memory algorithms, the study of shared-memory techniques is not in vain.
With the multilevel approach, even the inherently slow techniques have a role to play in fast systems, as they can be employed to boost quality on coarse levels at little expense.
Similarly, techniques for shared-memory parallelism are important, both as soon as a coarse graph fits into memory, and as local building blocks in the distributed algorithm
- …