9 research outputs found

### Entropy, Free Energy, and Work of Restricted Boltzmann Machines

A restricted Boltzmann machine is a generative probabilistic graphic network. A probability of finding the network in a certain configuration is given by the Boltzmann distribution. Given training data, its learning is done by optimizing the parameters of the energy function of the network. In this paper, we analyze the training process of the restricted Boltzmann machine in the context of statistical physics. As an illustration, for small size bar-and-stripe patterns, we calculate thermodynamic quantities such as entropy, free energy, and internal energy as a function of the training epoch. We demonstrate the growth of the correlation between the visible and hidden layers via the subadditivity of entropies as the training proceeds. Using the Monte-Carlo simulation of trajectories of the visible and hidden vectors in the configuration space, we also calculate the distribution of the work done on the restricted Boltzmann machine by switching the parameters of the energy function. We discuss the Jarzynski equality which connects the path average of the exponential function of the work and the difference in free energies before and after training

### Parallel Implementation of the Discontinuous Galerkin Method

This paper describes a parallel implementation of the discontinuous Galerkin method. Discontinuous Galerkin is a spatially compact method that retains its accuracy and robustness on non-smooth unstructured grids and is well suited for time dependent simulations. Several parallelization approaches are studied and evaluated. The most natural and symmetric of the approaches has been implemented in an object-oriented code used to simulate aeroacoustic scattering. The parallel implementation is MPI-based and has been tested on various parallel platforms such as the SGI Origin, IBM SP2, and clusters of SGI and Sun workstations. The scalability results presented for the SGI Origin show slightly superlinear speedup on a fixed-size problem due to cache effects

### Operated by Universities Space Research Association

under Contract NAS 1-97046 Available from the following

### Resilience analytics: coverage and robustness in multi-modal transportation networks

Abstract A multi-modal transportation system of a city can be modeled as a multiplex network with different layers corresponding to different transportation modes. These layers include, but are not limited to, bus network, metro network, and road network. Formally, a multiplex network is a multilayer graph in which the same set of nodes are connected by different types of relationships. Intra-layer relationships denote the road segments connecting stations of the same transportation mode, whereas inter-layer relationships represent connections between different transportation modes within the same station. Given a multi-modal transportation system of a city, we are interested in assessing its quality or efficiency by estimating the coverage i.e., a portion of the city that can be covered by a random walker who navigates through it within a given time budget, or steps. We are also interested in the robustness of the whole transportation system which denotes the degree to which the system is able to withstand a random or targeted failure affecting one or more parts of it. Previous approaches proposed a mathematical framework to numerically compute the coverage in multiplex networks. However solutions are usually based on eigenvalue decomposition, known to be time consuming and hard to obtain in the case of large systems. In this work, we propose MUME, an efficient algorithm for Multi-modal Urban Mobility Estimation, that takes advantage of the special structure of the supra-Laplacian matrix of the transportation multiplex, to compute the coverage of the system. We conduct a comprehensive series of experiments to demonstrate the effectiveness and efficiency of MUME on both synthetic and real transportation networks of various cities such as Paris, London, New York and Chicago. A future goal is to use this experience to make projections for a fast growing city like Doha

### Parallelization of an Object-Oriented Unstructured Aeroacoustics Solver

A computational aeroacoustics code based on the discontinuous Galerkin method is ported to several parallel platforms using MPI. The discontinuous Galerkin method is a compact high-order method that retains its accuracy and robustness on non-smooth unstructured meshes. In its semi-discrete form, the discontinuous Galerkin method can be combined with explicit time marching methods making it well suited to time accurate computations. The compact nature of the discontinuous Galerkin method also makes it well suited for distributed memory parallel platforms. The original serial code was written using an objectoriented approach and was previously optimized for cache-based machines. The port to parallel platforms was achieved simply by treating partition boundaries as a type of boundary condition. Code modifications were minimal because boundary conditions were abstractions in the original program. Scalability results are presented for the SGI Origin, IBM SP2, and clusters of SGI and Sun workstations. Slightly superlinear speedup is achieved on a fixed-size problem on the Origin, due to cache effects

### Resilience analytics: coverage and robustness in multi-modal transportation networks

A multi-modal transportation system of a city can be modeled as a multiplex network with different layers corresponding to different transportation modes. These layers include, but are not limited to, bus network, metro network, and road network. Formally, a multiplex network is a multilayer graph in which the same set of nodes are connected by different types of relationships. Intra-layer relationships denote the road segments connecting stations of the same transportation mode, whereas inter-layer relationships represent connections between different transportation modes within the same station. Given a multi-modal transportation system of a city, we are interested in assessing its quality or efficiency by estimating the coverage i.e., a portion of the city that can be covered by a random walker who navigates through it within a given time budget, or steps. We are also interested in the robustness of the whole transportation system which denotes the degree to which the system is able to withstand a random or targeted failure affecting one or more parts of it. Previous approaches proposed a mathematical framework to numerically compute the coverage in multiplex networks. However solutions are usually based on eigenvalue decomposition, known to be time consuming and hard to obtain in the case of large systems. In this work, we propose MUME, an efficient algorithm for Multi-modal Urban Mobility Estimation, that takes advantage of the special structure of the supra-Laplacian matrix of the transportation multiplex, to compute the coverage of the system. We conduct a comprehensive series of experiments to demonstrate the effectiveness and efficiency of MUME on both synthetic and real transportation networks of various cities such as Paris, London, New York and Chicago. A future goal is to use this experience to make projections for a fast growing city like Doha.Other Information Published in: EPJ Data Science License: https://creativecommons.org/licenses/by/4.0See article on publisher's website: http://dx.doi.org/10.1140/epjds/s13688-018-0139-7</p

### Datasets and code for ClustMe and ClustML visual quality measures of grouping patterns in monochrome scatterplots

<p>Code and datasets S1 and S2 used in the paper <strong>ClustMe: A Visual Quality Measure for Ranking Monochrome Scatterplots based on Cluster Patterns.</strong> Computer Graphics Forum 38(3): 225-236 (2019) and to appear in <strong>ClustML: A Measure of Cluster Pattern Complexity in Scatterplots Learnt from Human-labeled Groupings</strong>, SAGE Information Visualization Journal.</p><p>Code is written with R4.3.1 language. Data are stored in RData, images and csv formats.</p><p>CONTENT:</p><ul><li>/_1_TRAINING_MERGER_ON_GMM_PARAMETERS_S1</li></ul><p>Pipeline used to train all CARET ML models to train and find the best merger used in ClustML.</p><p>These functions use data S1. Refer to README.txt file therein </p><ul><li>/_2_ClustMe_vs_ClustML_257data_S2</li></ul><p>Run the script CompareClustMLvsClustMe_Data257.R to plot the comparative scatterplot of ClustMe and ClustML scores.</p><ul><li>/_3_USAGE_SCENARIO_GENOMICS</li></ul><p>Check the script to set options, then run: run_analysis_of_genomic_data_with_ClustML.R</p><p>Process Thousand genome project data (coming as PCA from IBD pairs stored in PCA_of_genomic_data.RData)</p><p>Compute plots for the usage scenario and summary plot of statistics of all scatterplots based on pairs of PCA.</p><p>Compute the interactive plot for selecting clusters and highlight them in another scatterplot.</p><ul><li>/CLUSTML_VQM </li></ul><p>Contains the main ClustML function (ClustML_Pipeline() in ClustML_VQM.R) to compute a GMM over scatterplot (x,y) data and compute the ClustML score. It uses treebag_up_PP_PCA_BoxCox_SpatialSign.RData is a CARET classification model to take merging pairwise decisions. This model is the best obtained by training on 2-component GMM evaluated for containing one or more-than-on cluster by 34 human subjects.</p><ul><li>/DATASETS</li></ul><p>Contains datasets from study S1 and S2, with ClustML (CARET model) results and human judgments.</p><p>Scatterplot stimuli can be plot using "plotSP" function from plotDataXY.R (see example in that code)</p><ul><li>./DATA_S1_ORIGINAL_PARAMETER_JUDGEMENT_DATA</li></ul><p>1000_2gaussians_param_34judgment_ClustMe_EXP1.csv contains 34 human judgments of each of 1000s 2-component GMM scatterplots and the 8 parameters used to generate a sample from these GMM models.</p><p>"XYposCSVfilename": name of the file in ../DATA_S1_ORIGINAL_Scatterplots_IMG_ClustMe</p><p>"Nsample": sample size generated from the GMM = number of points in the scatterplot.</p><p>"MuA1","MuA2": mean along axes 1 and 2 of component A of the GMM</p><p>"SigmaA1","SigmaA2": variance along axes 1 and 2 of component A of the GMM </p><p>"ThetaA": angle of the component A of the GMM</p><p>"MuB1","MuB2": mean along axes 1 and 2 of component B of the GMM</p><p>"SigmaB1","SigmaB2": variance along axes 1 and 2 of component B of the GMM</p><p>"ThetaB": angle of the component B of the GMM</p><p>"Tau": proportion of component A</p><p>"Alpha": rotation from horizontal of the full mixture</p><p>"Score_1",...,"Score_34": Human judgment (1 = see one cluster, 2 = see more-than-one cluster)</p><p>"probMore","probSingle": proportion of judgments seeing more-than-one/one clusters</p><ul><li>./DATA_S1_ORIGINAL_Scatterplots_IMG_ClustMe</li></ul><p>png image files stimuli shown to the human subjects, and whose filename is used in ../DATA_S1_ORIGINAL_PARAMETER_JUDGEMENT_DATA</p><p>1000_2gaussians_param_34judgment_ClustMe_EXP1.csv</p><ul><li>./DATA_S1_ORIGINAL_Scatterplots_XY_ClustMe</li></ul><p>zzzz.csv file containing x and y coordinates of points displayed in file zzzz.png stored in folder ../DATA_S1_ORIGINAL_Scatterplots_IMG_ClustMe</p><ul><li>./DATA_S2</li></ul><p>Data used in Study S2</p><p>Data_257.RData: contains list of filenames and x,y positions of points of the scatterplot stimuli</p><p>Data257_435pairwiseRanking_CARETmodels.csv /.RData rankings are given by ClustML using various CARET models as merging classifiers trained on S1 data.</p><p>Data257_435pairwiseRanking_31HumanJudgments.csv /.RData ranking given by 31 human judgments</p><p>The row name is filename1@@@@@filename2, where filename1 and 2 correspond to names in Data_257</p><p>Each cell contains the filename judged by the column header model/subject, as showing the most complex cluster patterns, BOTH if they are both judged of similar complexity.</p><ul><li>/DEMO</li></ul><p>Run Demo_ClustML_VQM.R to demonstrate how to use the ClustML_Pipeline function to compute the ClustML score of a scatterplot.</p>