20 research outputs found

    Exact Computation of Influence Spread by Binary Decision Diagrams

    Full text link
    Evaluating influence spread in social networks is a fundamental procedure to estimate the word-of-mouth effect in viral marketing. There are enormous studies about this topic; however, under the standard stochastic cascade models, the exact computation of influence spread is known to be #P-hard. Thus, the existing studies have used Monte-Carlo simulation-based approximations to avoid exact computation. We propose the first algorithm to compute influence spread exactly under the independent cascade model. The algorithm first constructs binary decision diagrams (BDDs) for all possible realizations of influence spread, then computes influence spread by dynamic programming on the constructed BDDs. To construct the BDDs efficiently, we designed a new frontier-based search-type procedure. The constructed BDDs can also be used to solve other influence-spread related problems, such as random sampling without rejection, conditional influence spread evaluation, dynamic probability update, and gradient computation for probability optimization problems. We conducted computational experiments to evaluate the proposed algorithm. The algorithm successfully computed influence spread on real-world networks with a hundred edges in a reasonable time, which is quite impossible by the naive algorithm. We also conducted an experiment to evaluate the accuracy of the Monte-Carlo simulation-based approximation by comparing exact influence spread obtained by the proposed algorithm.Comment: WWW'1

    Constrained Path Search with Submodular Function Maximization

    Get PDF
    In this paper, we study the problem of constrained path search with submodular function maximization (CPS-SM). We aim to find the path with the best submodular function score under a given constraint (e.g., a length limit), where the submodular function score is computed over the set of nodes in this path. This problem can be used in many applications. For example, tourists may want to search the most diversified path (e.g., a path passing by the most diverse facilities such as parks and museums) given that the traveling time is less than 6 hours. We show that the CPS-SM problem is NP-hard. We first propose a concept called “submodular α -dominance” by utilizing the submodular function properties, and we develop an algorithm with a guaranteed error bound based on this concept. By relaxing the submodular α -dominance conditions, we design another more efficient algorithm that has the same error bound. We also utilize the way of bi-directional path search to further improve the efficiency of the algorithms. We finally propose a heuristic algorithm that is efficient yet effective in practice. The experiments conducted on several real datasets show that our proposed algorithms can achieve high accuracy and are faster than one state-of-the-art method by orders of magnitude

    LIPIcs, Volume 274, ESA 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 274, ESA 2023, Complete Volum

    Discrete Methods in Statistics: Feature Selection and Fairness-Aware Data Mining

    Get PDF
    This dissertation is a detailed investigation of issues that arise in models that change discretely. Models are often constructed by either including or excluding features based on some criteria. These discrete changes are challenging to analyze due to correlation between features. Feature selection is the problem of identifying an appropriate set of features to include in a model, while fairness-aware data mining is the problem of needing to remove the \emph{influence} of protected features from a model. This dissertation provides frameworks for understanding each problem and algorithms for accomplishing the desired goal. The feature selection problem is addressed through the framework of sequential hypothesis testing. We elucidate the statistical challenges in repeatedly using inference in this domain and demonstrate how current methods fail to address them. Our algorithms build on classically motivated, multiple testing procedures to control measures of false rejections when using hypothesis testing during forward stepwise regression. Furthermore, these methods have much higher power than recent proposals from the conditional inference literature. The fairness-aware data mining community is grappling with fundamental questions concerning fairness in statistical modeling. Tension exists between identifying explainable differences between groups and discriminatory ones. We provide a framework for understanding the connections between fairness and the use of protected information in modeling. With this discussion in hand, generating fair estimates is straight-forward

    LIPIcs, Volume 244, ESA 2022, Complete Volume

    Get PDF
    LIPIcs, Volume 244, ESA 2022, Complete Volum

    Discovery in Physics

    Get PDF
    Volume 2 covers knowledge discovery in particle and astroparticle physics. Instruments gather petabytes of data and machine learning is used to process the vast amounts of data and to detect relevant examples efficiently. The physical knowledge is encoded in simulations used to train the machine learning models. The interpretation of the learned models serves to expand the physical knowledge resulting in a cycle of theory enhancement

    LIPIcs, Volume 261, ICALP 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 261, ICALP 2023, Complete Volum

    Learning-based Segmentation for Connectomics

    Get PDF
    Recent advances in electron microscopy techniques make it possible to acquire highresolution, isotropic volume images of neural circuitry. In connectomics, neuroscientists seek to obtain the circuit diagram involving all neurons and synapses in such a volume image. Mapping neuron connectivity requires tracing each and every neural process through terabytes of image data. Due to the size and complexity of these volume images, fully automated analysis methods are desperately needed. In this thesis, I consider automated, machine learning-based neurite segmentation approaches based on a simultaneous merge decision of adjacent supervoxels. - Given a learned likelihood of merging adjacent supervoxels, Chapter 4 adapts a probabilistic graphical model which ensures that merge decisions are consistent and the surfaces of final segments are closed. This model can be posed as a multicut optimization problem and is solved with the cutting-plane method. In order to scale to large datasets, a fast search for (and good choice of) violated cycle constraints is crucial. Quantitative experiments show that the proposed closed-surface regularization significantly improves segmentation performance. - In Chapter 5, I investigate whether the edge weights of the previous model can be chosen to minimize the loss with respect to non-local segmentation quality measures (e.g. Rand Index). Suitable w are obtained from a structured learning approach. In the Structured Support Vector Machine formulation, a novel fast enumeration scheme is used to find the most violated constraint. Quantitative experiments show that structured learning can improve upon unstructured methods. Furthermore, I introduce a new approximate, hierarchical and blockwise optimization approach for large-scale multicut segmentation. Using this method, high-quality approximate solutions for large problem instances are found quickly. - Chapter 6 introduces another novel approximate scheme for multicut segmentation -- Cut, Glue&Cut -- which is based on the move-making paradigm. First, the graph is recursively partitioned into small regions (cut phase). Then, for any two adjacent regions, alternative cuts of these two regions define possible moves (glue&cut phase). The proposed algorithm finds segmentations that are { as measured by a loss function { as close to the ground-truth as the global optimum found by exact solvers, while being significantly faster than existing methods. - In order to jointly label resulting segments as well as to label the boundaries between segments, Chapter 7 proposes the Asymmetric Multi-way Cut model, a variant of Multi-way Cut. In this new model, within-class cuts are allowed for some labels, while being forbidden for other labels. Qualitative experiments show when such a formulation can be beneficial. In particular, an application to joint neurite and cell organelle labeling in EM volume images is discussed. - Custom software tools that can cope with the large data volumes common in the field of connectomics are a prerequisite for the implementation and evaluation of novel segmentation techniques. Chapter 3 presents version 1.0 of ilastik, a joint effort of multiple researchers. I have co-written its volume viewing component, volumina. ilastik provides an interactive pixel classification work ow on largerthan-RAM datasets as well as a semi-automated segmentation module useful for acquiring gold standard segmentations. Furthermore, I describe new software for dealing with hierarchies of cell complexes as well as for blockwise image processing operations on large datasets. The different segmentation methods presented in this thesis provide a promising direction towards reaching the required reliability as well as the required data throughput necessary for connectomics applications
    corecore