Search CORE

3,784 research outputs found

A data-driven framework for dimensionality reduction and causal inference in climate fields

Author: Falasca Fabrizio
Perezhogin Pavel
Zanna Laure
Publication venue
Publication date: 16/10/2023
Field of study

We propose a data-driven framework to simplify the description of spatiotemporal climate variability into few entities and their causal linkages. Given a high-dimensional climate field, the methodology first reduces its dimensionality into a set of regionally constrained patterns. Time-dependent causal links are then inferred in the interventional sense through the fluctuation-response formalism, as shown in Baldovin et al. (2020). These two steps allow to explore how regional climate variability can influence remote locations. To distinguish between true and spurious responses, we propose a novel analytical null model for the fluctuation-dissipation relation, therefore allowing for uncertainty estimation at a given confidence level. Finally, we select a set of metrics to summarize the results, offering a useful and simplified approach to explore climate dynamics. We showcase the methodology on the monthly sea surface temperature field at global scale. We demonstrate the usefulness of the proposed framework by studying few individual links as well as "link maps", visualizing the cumulative degree of causation between a given region and the whole system. Finally, each pattern is ranked in terms of its "causal strength", quantifying its relative ability to influence the system's dynamics. We argue that the methodology allows to explore and characterize causal relationships in high-dimensional spatiotemporal fields in a rigorous and interpretable way

arXiv.org e-Print Archive

Multidimensional process discovery

Author: Ribeiro J.T.S.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2013
Field of study

Repository TU/e

Pure OAI Repository

Recommended from our members

Enhancing Usability and Explainability of Data Systems

Author: Fariha Anna
Publication venue: ScholarWorks@UMass Amherst
Publication date: 20/10/2021
Field of study

The recent growth of data science expanded its reach to an ever-growing user base of nonexperts, increasing the need for usability, understandability, and explainability in these systems. Enhancing usability makes data systems accessible to people with different skills and backgrounds alike, leading to democratization of data systems. Furthermore, proper understanding of data and data-driven systems is necessary for the users to trust the function of the systems that learn from data. Finally, data systems should be transparent: when a data system behaves unexpectedly or malfunctions, the users deserve proper explanation of what caused the observed incident. Unfortunately, most existing data systems offer limited usability and support for explanations: these systems are usable only by experts with sound technical skills, and even expert users are hindered by the lack of transparency into the systems\u27 inner workings and functions. The aim of my thesis is to bridge the usability gap between nonexpert users and complex data systems, aid all sort of users, including the expert ones, in data and system understanding, and provide explanations that help reason about unexpected outcomes involving data systems. Specifically, my thesis has the following three goals: (1) enhancing usability of data systems for nonexperts, (2) enable data understanding that can assist users in a variety of tasks such as achieving trust in data-driven machine learning, gaining data understanding, and data cleaning, and (3) explaining causes of unexpected outcomes involving data and data systems. For enhancing usability, we focus on example-driven user intent discovery. We develop systems based on example-driven interactions in two different settings: querying relational databases and personalized document summarization. Towards data understanding, we develop a new data-profiling primitive that can characterize tuples for which a machine-learned model is likely to produce untrustworthy predictions. We also develop an explanation framework to explain causes of such untrustworthy predictions. Additionally, this new data-profiling primitive enables interactive data cleaning. Finally, we develop two explanation frameworks, tailored to provide explanations in debugging data system components, including the data itself. The explanation frameworks focus on explaining the root cause of a concurrent application\u27s intermittent failure and exposing issues in the data that cause a data-driven system to malfunction

ScholarWorks@UMass Amherst

Automated functional and robustness testing of microservice architectures

Author: Antonio Guerriero
Luca Giamattei
Roberto Pietrantuono
Stefano Russo
Publication venue
Publication date: 01/01/2024
Field of study

Archivio della ricerca - Università degli studi di Napoli Federico II

A Machine Learning Enhanced Scheme for Intelligent Network Management

Author: Zuo Y
Publication venue: 'Division of Chemical Information and Computer Sciences'
Publication date: 25/11/2019
Field of study

The versatile networking services bring about huge influence on daily living styles while the amount and diversity of services cause high complexity of network systems. The network scale and complexity grow with the increasing infrastructure apparatuses, networking function, networking slices, and underlying architecture evolution. The conventional way is manual administration to maintain the large and complex platform, which makes effective and insightful management troublesome. A feasible and promising scheme is to extract insightful information from largely produced network data. The goal of this thesis is to use learning-based algorithms inspired by machine learning communities to discover valuable knowledge from substantial network data, which directly promotes intelligent management and maintenance. In the thesis, the management and maintenance focus on two schemes: network anomalies detection and root causes localization; critical traffic resource control and optimization. Firstly, the abundant network data wrap up informative messages but its heterogeneity and perplexity make diagnosis challenging. For unstructured logs, abstract and formatted log templates are extracted to regulate log records. An in-depth analysis framework based on heterogeneous data is proposed in order to detect the occurrence of faults and anomalies. It employs representation learning methods to map unstructured data into numerical features, and fuses the extracted feature for network anomaly and fault detection. The representation learning makes use of word2vec-based embedding technologies for semantic expression. Next, the fault and anomaly detection solely unveils the occurrence of events while failing to figure out the root causes for useful administration so that the fault localization opens a gate to narrow down the source of systematic anomalies. The extracted features are formed as the anomaly degree coupled with an importance ranking method to highlight the locations of anomalies in network systems. Two types of ranking modes are instantiated by PageRank and operation errors for jointly highlighting latent issue of locations. Besides the fault and anomaly detection, network traffic engineering deals with network communication and computation resource to optimize data traffic transferring efficiency. Especially when network traffic are constrained with communication conditions, a pro-active path planning scheme is helpful for efficient traffic controlling actions. Then a learning-based traffic planning algorithm is proposed based on sequence-to-sequence model to discover hidden reasonable paths from abundant traffic history data over the Software Defined Network architecture. Finally, traffic engineering merely based on empirical data is likely to result in stale and sub-optimal solutions, even ending up with worse situations. A resilient mechanism is required to adapt network flows based on context into a dynamic environment. Thus, a reinforcement learning-based scheme is put forward for dynamic data forwarding considering network resource status, which explicitly presents a promising performance improvement. In the end, the proposed anomaly processing framework strengthens the analysis and diagnosis for network system administrators through synthesized fault detection and root cause localization. The learning-based traffic engineering stimulates networking flow management via experienced data and further shows a promising direction of flexible traffic adjustment for ever-changing environments

Open Research Exeter

Towards a Learning Theory of Cause-Effect Inference

Author: Lopez-Paz David
Muandet Krikamol
Schölkopf Bernhard
Tolstikhin Ilya
Publication venue
Publication date: 18/05/2015
Field of study

We pose causal inference as the problem of learning to classify probability distributions. In particular, we assume access to a collection

\{(S_i,l_i)\}_{i=1}^n

, where each

S_i

is a sample drawn from the probability distribution of

X_i \times Y_i

, and

l_i

is a binary label indicating whether "

X_i \to Y_i

" or "

X_i \leftarrow Y_i

". Given these data, we build a causal inference rule in two steps. First, we featurize each

S_i

using the kernel mean embedding associated with some characteristic kernel. Second, we train a binary classifier on such embeddings to distinguish between causal directions. We present generalization bounds showing the statistical consistency and learning rates of the proposed approach, and provide a simple implementation that achieves state-of-the-art cause-effect inference. Furthermore, we extend our ideas to infer causal relationships between more than two variables

arXiv.org e-Print Archive

MPG.PuRe

A Survey on Causal Discovery Methods for Temporal and Non-Temporal Data

Author: Gani Md Osman
Hasan Uzma
Hossain Emam
Publication venue
Publication date: 27/03/2023
Field of study

Causal Discovery (CD) is the process of identifying the cause-effect relationships among the variables from data. Over the years, several methods have been developed primarily based on the statistical properties of data to uncover the underlying causal mechanism. In this study we introduce the common terminologies in causal discovery, and provide a comprehensive discussion of the approaches designed to identify the causal edges in different settings. We further discuss some of the benchmark datasets available for evaluating the performance of the causal discovery algorithms, available tools to perform causal discovery readily, and the common metrics used to evaluate these methods. Finally, we conclude by presenting the common challenges involved in CD and also, discuss the applications of CD in multiple areas of interest

arXiv.org e-Print Archive