3 research outputs found

    Analyzing Granger causality in climate data with time series classification methods

    Get PDF
    Attribution studies in climate science aim for scientifically ascertaining the influence of climatic variations on natural or anthropogenic factors. Many of those studies adopt the concept of Granger causality to infer statistical cause-effect relationships, while utilizing traditional autoregressive models. In this article, we investigate the potential of state-of-the-art time series classification techniques to enhance causal inference in climate science. We conduct a comparative experimental study of different types of algorithms on a large test suite that comprises a unique collection of datasets from the area of climate-vegetation dynamics. The results indicate that specialized time series classification methods are able to improve existing inference procedures. Substantial differences are observed among the methods that were tested

    Survival Factorization on Diffusion Networks

    No full text
    In this paper we propose a survival factorization framework that models information cascades by tying together social influence patterns, topical structure and temporal dynamics. This is achieved through the introduction of a latent space which encodes: (a) the relevance of a information cascade on a topic; (b) the topical authoritativeness and the susceptibility of each individual involved in the information cascade, and (c) temporal topical patterns. By exploiting the cumulative properties of the survival function and of the likelihood of the model on a given adoption log, which records the observed activation times of users and side-information for each cascade, we show that the inference phase is linear in the number of users and in the number of adoptions. The evaluation on both synthetic and real-world data shows the effectiveness of the model in detecting the interplay between topics and social influence patterns, which ultimately provides high accuracy in predicting users activation times. Code and data related to this chapter are available at: https://doi.org/10.6084/m9.figshare.5411341

    Survival Factorization Framework source code and data

    No full text
    <div><br></div><div>This dataset contains the source code for the Survival Factorization Framework published as:</div><div><br></div><div>Nicola Barbieri, Giuseppe Manco, Ettore Ritacco: <b>Survival Factorization on Diffusion Networks</b>. THE EUROPEAN CONFERENCE ON MACHINE LEARNING & PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, 2017. </div><div><br></div><div>The dataset is in .zip format that can be uncompressed by standard and openly accessible file zip utilities. Code is stored in .java and .jar files that can be accessed and edited by standard and openly accessible text edit software. Testing and training datasets containing Users and Cascade Timestamps are available in various text file formats. Figures and tables from the related publication are included in .pdf format. </div><div><br></div><div>See the description below for more detail on file formats and instructions on building the model and Network Reconstruction.</div><div><br></div><div>In the related paper we propose a survival factorization framework that models information cascades by tying together social influence patterns, topical structure and temporal dynamics. This is achieved through the introduction of a latent space which encodes: (a) the relevance of a information cascade on a topic; (b) the topical authoritativeness and the susceptibility of each individual involved in the information cascade, and (c) temporal topical patterns. By exploiting the cumulative properties of the survival function and of the likelihood of the model on a given adoption log, which records the observed activation times of users and side-information for each cascade, we show that the inference phase is linear in the number of users and in the number of adoptions. The evaluation on both synthetic and real-world data shows the effectiveness of the model in detecting the interplay between topics and social influence patterns, which ultimately provides high accuracy in predicting users activation times.</div><div><br></div><div><br></div><div>###############################</div><div><br></div><div>How to Build the Model:</div><div><br></div><div>Run <b>survivalFactorizationEM.SurvivalFactorizationEM_Runner</b> providing a path of a configuration file. Given a dataset, this script generates several instances of class <b>survivalFactorizationEM.SurvivalFactorizationEM_Model</b>.</div><div>The configuration file must to be written according to the .”properties” syntax. The fields are:</div><div><br></div><div>n_factors = </div><div>output = </div><div>max_iterations = </div><div>assignment_file = </div><div>event_file = </div><div>[ content_file = ]</div><div><br></div><div>where</div><div><br></div><div>: an integer list separated by “;”</div><div>e.g. n_factors = 2;4;8;16;32;64;128</div><div>This list sets the number of models to build, one for each number of factors.</div><div><br></div><div>: a String</div><div>e.g. output = resources/datasets/synth/models/Synth</div><div>This string contains two elements:</div><div>- the path of folder where the built models (in the example “resources/datasets/synth/models”) will be stored</div><div>- the prefix of the name of the file which will contain the model (in the example “Synth”).</div><div>For each topic number in , a model, with the corresponding number of factors, will be created and will be stored in the folder; the name of the model file is a concatenation of the prefix + “_” + + “.model”</div><div><br></div><div>: an integer</div><div>e.g. max_iterations = 1000</div><div>The fix point iterations will continue until convergence or when this number (burn-in phase included) is reached</div><div><br></div><div>: a String</div><div>e.g. assignment_file = resources/datasets/synth/models/Synth</div><div>This field is similar to . Each assignment file will contain the association cascade - topic for each cascade</div><div><br></div><div>: a String</div><div>e.g. event_file = resources/datasets/Synth/cascades_training.txt</div><div>The name of a file containing the cascades of events (e.g. tweets) exploited to build the model</div><div><br></div><div>: a String</div><div>e.g. event_file = resources/datasets/Synth/text_training.txt</div><div>The name of a file containing the text information for each cascade. Note: this field is optional</div><div><br></div><div><br></div><div>###############################</div><div><br></div><div>Cascade file format. A text document containing this information:</div><div><br></div><div>NodeIdCascadeIdTimeStamp</div><div>144911222254982000</div><div>693011222277866000</div><div>246621222281238000</div><div>…</div><div><br></div><div>Each row is an activation, the separator is the tab character “\t”</div><div><br></div><div><br></div><div>###############################</div><div><br></div><div>Text file format. A text document containing this information:</div><div><br></div><div>WordIdCascadeIdFrequency</div><div>124811</div><div>80415</div><div>678823</div><div>813421</div><div>…</div><div><br></div><div>Each row is an word in an event, the separator is the tab character “\t”</div><div><br></div><div><br></div><div>###############################</div><div><br></div><div>How to perform the Network Reconstruction:</div><div><br></div><div>Run <b>survivalFactorizationEM.FullTestNetworkReconstruction</b> providing a path of a configuration file. Given a folder containing built models and a test network, this script will generate the network reconstruction for each model. The configuration file must to be written according to the .”properties” syntax. The fields are:</div><div><br></div><div>model_folder = </div><div>model_files = </div><div>test_file = </div><div>output_folder = </div><div>output_files = </div><div><br></div><div>where:</div><div><br></div><div>: a String</div><div>e.g. model_folder = resources/datasets/synth/models</div><div>The folder containing the built models</div><div><br></div><div>: a String list whose elements are separated by “;”</div><div>e.g. model_files = Synth_2f.model;Synth_4f.model;Synth_8f.model;Synth_16f.model;Synth_32f.model;Synth_64f.model;Synth_128f.model</div><div>This list contains the file names where the models are stored</div><div><br></div><div>: a String</div><div>e.g. test_file = resources/datasets/Synth/s2/links_reduced-FF1400.remapped_two_hops</div><div>The file containing the test network to reconstruct. Note: the syntax of the test file is equal to the Cascade file format.</div><div><br></div><div>: a String</div><div>e.g. output_folder = resources/datasets/Synth/preds</div><div>The folder where to put the reconstructed network files (one for each model)</div><div><br></div><div>: a String list whose elements are separated by “;”</div><div>e.g. output_files = Synth_2f.pred;Synth_4f.pred;Synth_8f.pred;Synth_16f.pred;Synth_32f.pred;Synth_64f.pred;Synth_128f.pred</div><div>The file names of the reconstructed networks</div><div><br></div><div><br></div><div>###############################</div><div><br></div><div>Network reconstruction output prediction file format.</div><div><br></div><div>PredictionActualClass</div><div>1.15421803E-61</div><div>2.27729428E-61</div><div>1.16779013E-72</div><div>…</div><div><br></div><div>Each row is a prediction, the separator is the tab character “\t”</div
    corecore