30 research outputs found

    Analyzing Granger causality in climate data with time series classification methods

    Get PDF
    Attribution studies in climate science aim for scientifically ascertaining the influence of climatic variations on natural or anthropogenic factors. Many of those studies adopt the concept of Granger causality to infer statistical cause-effect relationships, while utilizing traditional autoregressive models. In this article, we investigate the potential of state-of-the-art time series classification techniques to enhance causal inference in climate science. We conduct a comparative experimental study of different types of algorithms on a large test suite that comprises a unique collection of datasets from the area of climate-vegetation dynamics. The results indicate that specialized time series classification methods are able to improve existing inference procedures. Substantial differences are observed among the methods that were tested

    AdaCC: Cumulative Cost-Sensitive Boosting for Imbalanced Classification

    Get PDF
    Class imbalance poses a major challenge for machine learning as most supervised learning models might exhibit bias towards the majority class and under-perform in the minority class. Cost-sensitive learning tackles this problem by treating the classes differently, formulated typically via a user-defined fixed misclassification cost matrix provided as input to the learner. Such parameter tuning is a challenging task that requires domain knowledge and moreover, wrong adjustments might lead to overall predictive performance deterioration. In this work, we propose a novel cost-sensitive boosting approach for imbalanced data that dynamically adjusts the misclassification costs over the boosting rounds in response to model's performance instead of using a fixed misclassification cost matrix. Our method, called AdaCC, is parameter-free as it relies on the cumulative behavior of the boosting model in order to adjust the misclassification costs for the next boosting round and comes with theoretical guarantees regarding the training error. Experiments on 27 real-world datasets from different domains with high class imbalance demonstrate the superiority of our method over 12 state-of-the-art cost-sensitive boosting approaches exhibiting consistent improvements in different measures, for instance, in the range of [0.3%-28.56%] for AUC, [3.4%-21.4%] for balanced accuracy, [4.8%-45%] for gmean and [7.4%-85.5%] for recall.Comment: 30 page

    EZ-CLIP: Efficient Zeroshot Video Action Recognition

    Full text link
    Recent advancements in large-scale pre-training of visual-language models on paired image-text data have demonstrated impressive generalization capabilities for zero-shot tasks. Building on this success, efforts have been made to adapt these image-based visual-language models, such as CLIP, for videos extending their zero-shot capabilities to the video domain. While these adaptations have shown promising results, they come at a significant computational cost and struggle with effectively modeling the crucial temporal aspects inherent to the video domain. In this study, we present EZ-CLIP, a simple and efficient adaptation of CLIP that addresses these challenges. EZ-CLIP leverages temporal visual prompting for seamless temporal adaptation, requiring no fundamental alterations to the core CLIP architecture while preserving its remarkable generalization abilities. Moreover, we introduce a novel learning objective that guides the temporal visual prompts to focus on capturing motion, thereby enhancing its learning capabilities from video data. We conducted extensive experiments on five different benchmark datasets, thoroughly evaluating EZ-CLIP for zero-shot learning and base-to-novel video action recognition, and also demonstrating its potential for few-shot generalization.Impressively, with a mere 5.2 million learnable parameters (as opposed to the 71.1 million in the prior best model), EZ-CLIP can be efficiently trained on a single GPU, outperforming existing approaches in several evaluations

    Temporal graph mining and distributed processing

    Get PDF
    With the recent growth of social media platforms and the human desire to interact with the digital world a lot of human-human and human-device interaction data is getting generated every second. With the boom of the Internet of Things (IoT) devices, a lot of device-device interactions are also now on the rise. All these interactions are nothing but a representation of how the underlying network is connecting different entities over time. These interactions when modeled as an interaction network presents a lot of unique opportunities to uncover interesting patterns and to understand the dynamics of the network. Understanding the dynamics of the network is very important because it encapsulates the way we communicate, socialize, consume information and get influenced. To this end, in this PhD thesis, we focus on analyzing an interaction network to understand how the underlying network is being used. We define interaction network as a sequence of time-stamped interactions E over edges of a static graph G=(V, E). Interaction networks can be used to model many real-world networks for example, in a social network or a communication network, each interaction over an edge represents an interaction between two users, e.g., emailing, making a call, re-tweeting, or in case of the financial network an interaction between two accounts to represent a transaction. We analyze interaction network under two settings. In the first setting, we study interaction network under a sliding window model. We assume a node could pass information to other nodes if they are connected to them using edges present in a time window. In this model, we study how the importance or centrality of a node evolves over time. In the second setting, we put additional constraints on how information flows between nodes. We assume a node could pass information to other nodes only if there is a temporal path between them. To restrict the length of the temporal paths we consider a time window in this approach as well. We apply this model to solve the time-constrained influence maximization problem. By analyzing the interaction network data under our model we find the top-k most influential nodes. We test our model both on human-human interaction using social network data as well as on location-location interaction using location-based social network(LBSNs) data. In the same setting, we also mine temporal cyclic paths to understand the communication patterns in a network. Temporal cycles have many applications and appear naturally in communication networks where one person posts a message and after a while reacts to a thread of reactions from peers on the post. In financial networks, on the other hand, the presence of a temporal cycle could be indicative of certain types of fraud. We provide efficient algorithms for all our analysis and test their efficiency and effectiveness on real-world data. Finally, given that many of the algorithms we study have huge computational demands, we also studied distributed graph processing algorithms. An important aspect of distributed graph processing is to correctly partition the graph data between different machine. A lot of research has been done on efficient graph partitioning strategies but there is no one good partitioning strategy for all kind of graphs and algorithms. Choosing the best partitioning strategy is nontrivial and is mostly a trial and error exercise. To address this problem we provide a cost model based approach to give a better understanding of how a given partitioning strategy is performing for a given graph and algorithm.Con el reciente crecimiento de las redes sociales y el deseo humano de interactuar con el mundo digital, una gran cantidad de datos de interacción humano-a-humano o humano-a-dispositivo se generan cada segundo. Con el auge de los dispositivos IoT, las interacciones dispositivo-a-dispositivo también están en alza. Todas estas interacciones no son más que una representación de como la red subyacente conecta distintas entidades en el tiempo. Modelar estas interacciones en forma de red de interacciones presenta una gran cantidad de oportunidades únicas para descubrir patrones interesantes y entender la dinamicidad de la red. Entender la dinamicidad de la red es clave ya que encapsula la forma en la que nos comunicamos, socializamos, consumimos información y somos influenciados. Para ello, en esta tesis doctoral, nos centramos en analizar una red de interacciones para entender como la red subyacente es usada. Definimos una red de interacciones como una sequencia de interacciones grabadas en el tiempo E sobre aristas de un grafo estático G=(V, E). Las redes de interacción se pueden usar para modelar gran cantidad de aplicaciones reales, por ejemplo en una red social o de comunicaciones cada interacción sobre una arista representa una interacción entre dos usuarios (correo electrónico, llamada, retweet), o en el caso de una red financiera una interacción entre dos cuentas para representar una transacción. Analizamos las redes de interacción bajo múltiples escenarios. En el primero, estudiamos las redes de interacción bajo un modelo de ventana deslizante. Asumimos que un nodo puede mandar información a otros nodos si estan conectados utilizando aristas presentes en una ventana temporal. En este modelo, estudiamos como la importancia o centralidad de un nodo evoluciona en el tiempo. En el segundo escenario añadimos restricciones adicionales respecto como la información fluye entre nodos. Asumimos que un nodo puede mandar información a otros nodos solo si existe un camino temporal entre ellos. Para restringir la longitud de los caminos temporales también asumimos una ventana temporal. Aplicamos este modelo para resolver este problema de maximización de influencia restringido temporalmente. Analizando los datos de la red de interacción bajo nuestro modelo intentamos descubrir los k nodos más influyentes. Examinamos nuestro modelo en interacciones humano-a-humano, usando datos de redes sociales, como en ubicación-a-ubicación usando datos de redes sociales basades en localización (LBSNs). En el mismo escenario también minamos camínos cíclicos temporales para entender los patrones de comunicación en una red. Existen múltiples aplicaciones para cíclos temporales y aparecen naturalmente en redes de comunicación donde una persona envía un mensaje y después de un tiempo reacciona a una cadena de reacciones de compañeros en el mensaje. En redes financieras, por otro lado, la presencia de un ciclo temporal puede indicar ciertos tipos de fraude. Proponemos algoritmos eficientes para todos nuestros análisis y evaluamos su eficiencia y efectividad en datos reales. Finalmente, dado que muchos de los algoritmos estudiados tienen una gran demanda computacional, también estudiamos los algoritmos de procesado distribuido de grafos. Un aspecto importante de procesado distribuido de grafos es el de correctamente particionar los datos del grafo entre distintas máquinas. Gran cantidad de investigación se ha realizado en estrategias para particionar eficientemente un grafo, pero no existe un particionamento bueno para todos los tipos de grafos y algoritmos. Escoger la mejor estrategia de partición no es trivial y es mayoritariamente un ejercicio de prueba y error. Con tal de abordar este problema, proporcionamos un modelo de costes para dar un mejor entendimiento en como una estrategia de particionamiento actúa dado un grafo y un algoritmo

    Temporal graph mining and distributed processing

    Get PDF
    Cotutela Universitat Politècnica de Catalunya i Université Libre de BruxellesWith the recent growth of social media platforms and the human desire to interact with the digital world a lot of human-human and human-device interaction data is getting generated every second. With the boom of the Internet of Things (IoT) devices, a lot of device-device interactions are also now on the rise. All these interactions are nothing but a representation of how the underlying network is connecting different entities over time. These interactions when modeled as an interaction network presents a lot of unique opportunities to uncover interesting patterns and to understand the dynamics of the network. Understanding the dynamics of the network is very important because it encapsulates the way we communicate, socialize, consume information and get influenced. To this end, in this PhD thesis, we focus on analyzing an interaction network to understand how the underlying network is being used. We define interaction network as a sequence of time-stamped interactions E over edges of a static graph G=(V, E). Interaction networks can be used to model many real-world networks for example, in a social network or a communication network, each interaction over an edge represents an interaction between two users, e.g., emailing, making a call, re-tweeting, or in case of the financial network an interaction between two accounts to represent a transaction. We analyze interaction network under two settings. In the first setting, we study interaction network under a sliding window model. We assume a node could pass information to other nodes if they are connected to them using edges present in a time window. In this model, we study how the importance or centrality of a node evolves over time. In the second setting, we put additional constraints on how information flows between nodes. We assume a node could pass information to other nodes only if there is a temporal path between them. To restrict the length of the temporal paths we consider a time window in this approach as well. We apply this model to solve the time-constrained influence maximization problem. By analyzing the interaction network data under our model we find the top-k most influential nodes. We test our model both on human-human interaction using social network data as well as on location-location interaction using location-based social network(LBSNs) data. In the same setting, we also mine temporal cyclic paths to understand the communication patterns in a network. Temporal cycles have many applications and appear naturally in communication networks where one person posts a message and after a while reacts to a thread of reactions from peers on the post. In financial networks, on the other hand, the presence of a temporal cycle could be indicative of certain types of fraud. We provide efficient algorithms for all our analysis and test their efficiency and effectiveness on real-world data. Finally, given that many of the algorithms we study have huge computational demands, we also studied distributed graph processing algorithms. An important aspect of distributed graph processing is to correctly partition the graph data between different machine. A lot of research has been done on efficient graph partitioning strategies but there is no one good partitioning strategy for all kind of graphs and algorithms. Choosing the best partitioning strategy is nontrivial and is mostly a trial and error exercise. To address this problem we provide a cost model based approach to give a better understanding of how a given partitioning strategy is performing for a given graph and algorithm.Con el reciente crecimiento de las redes sociales y el deseo humano de interactuar con el mundo digital, una gran cantidad de datos de interacción humano-a-humano o humano-a-dispositivo se generan cada segundo. Con el auge de los dispositivos IoT, las interacciones dispositivo-a-dispositivo también están en alza. Todas estas interacciones no son más que una representación de como la red subyacente conecta distintas entidades en el tiempo. Modelar estas interacciones en forma de red de interacciones presenta una gran cantidad de oportunidades únicas para descubrir patrones interesantes y entender la dinamicidad de la red. Entender la dinamicidad de la red es clave ya que encapsula la forma en la que nos comunicamos, socializamos, consumimos información y somos influenciados. Para ello, en esta tesis doctoral, nos centramos en analizar una red de interacciones para entender como la red subyacente es usada. Definimos una red de interacciones como una sequencia de interacciones grabadas en el tiempo E sobre aristas de un grafo estático G=(V, E). Las redes de interacción se pueden usar para modelar gran cantidad de aplicaciones reales, por ejemplo en una red social o de comunicaciones cada interacción sobre una arista representa una interacción entre dos usuarios (correo electrónico, llamada, retweet), o en el caso de una red financiera una interacción entre dos cuentas para representar una transacción. Analizamos las redes de interacción bajo múltiples escenarios. En el primero, estudiamos las redes de interacción bajo un modelo de ventana deslizante. Asumimos que un nodo puede mandar información a otros nodos si estan conectados utilizando aristas presentes en una ventana temporal. En este modelo, estudiamos como la importancia o centralidad de un nodo evoluciona en el tiempo. En el segundo escenario añadimos restricciones adicionales respecto como la información fluye entre nodos. Asumimos que un nodo puede mandar información a otros nodos solo si existe un camino temporal entre ellos. Para restringir la longitud de los caminos temporales también asumimos una ventana temporal. Aplicamos este modelo para resolver este problema de maximización de influencia restringido temporalmente. Analizando los datos de la red de interacción bajo nuestro modelo intentamos descubrir los k nodos más influyentes. Examinamos nuestro modelo en interacciones humano-a-humano, usando datos de redes sociales, como en ubicación-a-ubicación usando datos de redes sociales basades en localización (LBSNs). En el mismo escenario también minamos camínos cíclicos temporales para entender los patrones de comunicación en una red. Existen múltiples aplicaciones para cíclos temporales y aparecen naturalmente en redes de comunicación donde una persona envía un mensaje y después de un tiempo reacciona a una cadena de reacciones de compañeros en el mensaje. En redes financieras, por otro lado, la presencia de un ciclo temporal puede indicar ciertos tipos de fraude. Proponemos algoritmos eficientes para todos nuestros análisis y evaluamos su eficiencia y efectividad en datos reales. Finalmente, dado que muchos de los algoritmos estudiados tienen una gran demanda computacional, también estudiamos los algoritmos de procesado distribuido de grafos. Un aspecto importante de procesado distribuido de grafos es el de correctamente particionar los datos del grafo entre distintas máquinas. Gran cantidad de investigación se ha realizado en estrategias para particionar eficientemente un grafo, pero no existe un particionamento bueno para todos los tipos de grafos y algoritmos. Escoger la mejor estrategia de partición no es trivial y es mayoritariamente un ejercicio de prueba y error. Con tal de abordar este problema, proporcionamos un modelo de costes para dar un mejor entendimiento en como una estrategia de particionamiento actúa dado un grafo y un algoritmo.Postprint (published version

    MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge

    Full text link
    Large scale Vision-Language (VL) models have shown tremendous success in aligning representations between visual and text modalities. This enables remarkable progress in zero-shot recognition, image generation & editing, and many other exciting tasks. However, VL models tend to over-represent objects while paying much less attention to verbs, and require additional tuning on video data for best zero-shot action recognition performance. While previous work relied on large-scale, fully-annotated data, in this work we propose an unsupervised approach. We adapt a VL model for zero-shot and few-shot action recognition using a collection of unlabeled videos and an unpaired action dictionary. Based on that, we leverage Large Language Models and VL models to build a text bag for each unlabeled video via matching, text expansion and captioning. We use those bags in a Multiple Instance Learning setup to adapt an image-text backbone to video data. Although finetuned on unlabeled video data, our resulting models demonstrate high transferability to numerous unseen zero-shot downstream tasks, improving the base VL model performance by up to 14\%, and even comparing favorably to fully-supervised baselines in both zero-shot and few-shot video recognition transfer. The code will be released later at \url{https://github.com/wlin-at/MAXI}.Comment: Accepted at ICCV 202

    Ensemble learning with discrete classifiers on small devices

    Get PDF
    Machine learning has become an integral part of everyday life ranging from applications in AI-powered search queries to (partial) autonomous driving. Many of the advances in machine learning and its application have been possible due to increases in computation power, i.e., by reducing manufacturing sizes while maintaining or even increasing energy consumption. However, 2-3 nm manufacturing is within reach, making further miniaturization increasingly difficult while thermal design power limits are simultaneously reached, rendering entire parts of the chip useless for certain computational loads. In this thesis, we investigate discrete classifier ensembles as a resource-efficient alternative that can be deployed to small devices that only require small amounts of energy. Discrete classifiers are classifiers that can be applied -- and oftentimes also trained -- without the need for costly floating-point operations. Hence, they are ideally suited for deployment to small devices with limited resources. The disadvantage of discrete classifiers is that their predictive performance often lacks behind their floating-point siblings. Here, the combination of multiple discrete classifiers into an ensemble can help to improve the predictive performance while still having a manageable resource consumption. This thesis studies discrete classifier ensembles from a theoretical point of view, an algorithmic point of view, and a practical point of view. In the theoretical investigation, the bias-variance decomposition and the double-descent phenomenon are examined. The bias-variance decomposition of the mean-squared error is re-visited and generalized to an arbitrary twice-differentiable loss function, which serves as a guiding tool throughout the thesis. Similarly, the double-descent phenomenon is -- for the first time -- studied comprehensively in the context of tree ensembles and specifically random forests. Contrary to established literature, the experiments in this thesis indicate that there is no double-descent in random forests. While the training of ensembles is well-studied in literature, the deployment to small devices is often neglected. Additionally, the training of ensembles on small devices has not been considered much so far. Hence, the algorithmic part of this thesis focuses on the deployment of discrete classifiers and the training of ensembles on small devices. First, a novel combination of ensemble pruning (i.e., removing classifiers from the ensemble) and ensemble refinement (i.e., re-training of classifiers in the ensemble) is presented, which uses a novel proximal gradient descent algorithm to minimize a combined loss function. The resulting algorithm removes unnecessary classifiers from an already trained ensemble while improving the performance of the remaining classifiers at the same time. Second, this algorithm is extended to the more challenging setting of online learning in which the algorithm receives training examples one by one. The resulting shrub ensembles algorithm allows the training of ensembles in an online fashion while maintaining a strictly bounded memory consumption. It outperforms existing state-of-the-art algorithms under resource constraints and offers competitive performance in the general case. Last, this thesis studies the deployment of decision tree ensembles to small devices by optimizing their memory layout. The key insight here is that decision trees have a probabilistic inference time because different observations can take different paths from the root to a leaf. By estimating the probability of visiting a particular node in the tree, one can place it favorably in the memory to maximize the caching behavior and, thus, increase its performance without changing the model. Last, several real-world applications of tree ensembles and Binarized Neural Networks are presented

    Benchmarking zero-shot stance detection with FlanT5-XXL: Insights from training data, prompting, and decoding strategies into its near-SoTA performance

    Full text link
    We investigate the performance of LLM-based zero-shot stance detection on tweets. Using FlanT5-XXL, an instruction-tuned open-source LLM, with the SemEval 2016 Tasks 6A, 6B, and P-Stance datasets, we study the performance and its variations under different prompts and decoding strategies, as well as the potential biases of the model. We show that the zero-shot approach can match or outperform state-of-the-art benchmarks, including fine-tuned models. We provide various insights into its performance including the sensitivity to instructions and prompts, the decoding strategies, the perplexity of the prompts, and to negations and oppositions present in prompts. Finally, we ensure that the LLM has not been trained on test datasets, and identify a positivity bias which may partially explain the performance differences across decoding strategi

    Äriprotsessi tulemuste ennustav ja korralduslik seire

    Get PDF
    Viimastel aastatel on erinevates valdkondades tegutsevad ettevõtted üles näidanud kasvavat huvi masinõppel põhinevate rakenduste kasutusele võtmiseks. Muuhulgas otsitakse võimalusi oma äriprotsesside efektiivsuse tõstmiseks, kasutades ennustusmudeleid protsesside jooksvaks seireks. Sellised ennustava protsessiseire meetodid võtavad sisendiks sündmuslogi, mis koosneb hulgast lõpetatud äriprotsessi juhtumite sündmusjadadest, ning kasutavad masinõppe algoritme ennustusmudelite treenimiseks. Saadud mudelid teevad ennustusi lõpetamata (antud ajahetkel aktiivsete) protsessijuhtumite jaoks, võttes sisendiks sündmuste jada, mis selle hetkeni on toimunud ning ennustades kas järgmist sündmust antud juhtumis, juhtumi lõppemiseni jäänud aega või instantsi lõpptulemust. Lõpptulemusele orienteeritud ennustava protsessiseire meetodid keskenduvad ennustamisele, kas protsessijuhtum lõppeb soovitud või ebasoovitava lõpptulemusega. Süsteemi kasutaja saab ennustuste alusel otsustada, kas sekkuda antud protsessijuhtumisse või mitte, eesmärgiga ära hoida ebasoovitavat lõpptulemust või leevendada selle negatiivseid tagajärgi. Erinevalt puhtalt ennustavatest süsteemidest annavad korralduslikud protsessiseire meetodid kasutajale ka soovitusi, kas ja kuidas antud juhtumisse sekkuda, eesmärgiga optimeerida mingit kindlat kasulikkusfunktsiooni. Käesolev doktoritöö uurib, kuidas treenida, hinnata ja kasutada ennustusmudeleid äriprotsesside lõpptulemuste ennustava ja korraldusliku seire raames. Doktoritöö pakub välja taksonoomia olemasolevate meetodite klassifitseerimiseks ja võrdleb neid katseliselt. Lisaks pakub töö välja raamistiku tekstiliste andmete kasutamiseks antud ennustusmudelites. Samuti pakume välja ennustuste ajalise stabiilsuse mõiste ning koostame raamistiku korralduslikuks protsessiseireks, mis annab kasutajatele soovitusi, kas protsessi sekkuda või mitte. Katsed näitavad, et väljapakutud lahendused täiendavad olemasolevaid meetodeid ning aitavad kaasa ennustava protsessiseire süsteemide rakendamisele reaalsetes süsteemides.Recent years have witnessed a growing adoption of machine learning techniques for business improvement across various fields. Among other emerging applications, organizations are exploiting opportunities to improve the performance of their business processes by using predictive models for runtime monitoring. Such predictive process monitoring techniques take an event log (a set of completed business process execution traces) as input and use machine learning techniques to train predictive models. At runtime, these techniques predict either the next event, the remaining time, or the final outcome of an ongoing case, given its incomplete execution trace consisting of the events performed up to the present moment in the given case. In particular, a family of techniques called outcome-oriented predictive process monitoring focuses on predicting whether a case will end with a desired or an undesired outcome. The user of the system can use the predictions to decide whether or not to intervene, with the purpose of preventing an undesired outcome or mitigating its negative effects. Prescriptive process monitoring systems go beyond purely predictive ones, by not only generating predictions but also advising the user if and how to intervene in a running case in order to optimize a given utility function. This thesis addresses the question of how to train, evaluate, and use predictive models for predictive and prescriptive monitoring of business process outcomes. The thesis proposes a taxonomy and performs a comparative experimental evaluation of existing techniques in the field. Moreover, we propose a framework for incorporating textual data to predictive monitoring systems. We introduce the notion of temporal stability to evaluate these systems and propose a prescriptive process monitoring framework for advising users if and how to act upon the predictions. The results suggest that the proposed solutions complement the existing techniques and can be useful for practitioners in implementing predictive process monitoring systems in real life
    corecore