142 research outputs found

    Pruning based Distance Sketches with Provable Guarantees on Random Graphs

    Full text link
    Measuring the distances between vertices on graphs is one of the most fundamental components in network analysis. Since finding shortest paths requires traversing the graph, it is challenging to obtain distance information on large graphs very quickly. In this work, we present a preprocessing algorithm that is able to create landmark based distance sketches efficiently, with strong theoretical guarantees. When evaluated on a diverse set of social and information networks, our algorithm significantly improves over existing approaches by reducing the number of landmarks stored, preprocessing time, or stretch of the estimated distances. On Erd\"{o}s-R\'{e}nyi graphs and random power law graphs with degree distribution exponent 2<β<32 < \beta < 3, our algorithm outputs an exact distance data structure with space between Θ(n5/4)\Theta(n^{5/4}) and Θ(n3/2)\Theta(n^{3/2}) depending on the value of β\beta, where nn is the number of vertices. We complement the algorithm with tight lower bounds for Erdos-Renyi graphs and the case when β\beta is close to two.Comment: Full version for the conference paper to appear in The Web Conference'1

    Influence maximization on social graphs: A survey

    Get PDF

    Bridging Dense and Sparse Maximum Inner Product Search

    Full text link
    Maximum inner product search (MIPS) over dense and sparse vectors have progressed independently in a bifurcated literature for decades; the latter is better known as top-kk retrieval in Information Retrieval. This duality exists because sparse and dense vectors serve different end goals. That is despite the fact that they are manifestations of the same mathematical problem. In this work, we ask if algorithms for dense vectors could be applied effectively to sparse vectors, particularly those that violate the assumptions underlying top-kk retrieval methods. We study IVF-based retrieval where vectors are partitioned into clusters and only a fraction of clusters are searched during retrieval. We conduct a comprehensive analysis of dimensionality reduction for sparse vectors, and examine standard and spherical KMeans for partitioning. Our experiments demonstrate that IVF serves as an efficient solution for sparse MIPS. As byproducts, we identify two research opportunities and demonstrate their potential. First, we cast the IVF paradigm as a dynamic pruning technique and turn that insight into a novel organization of the inverted index for approximate MIPS for general sparse vectors. Second, we offer a unified regime for MIPS over vectors that have dense and sparse subspaces, and show its robustness to query distributions

    Graph enabled cross-domain knowledge transfer

    Get PDF
    The world has never been more connected, led by the information technology revolution in the past decades that has fundamentally changed the way people interact with each other using social networks. Consequently, enormous human activity data are collected from the business world and machine learning techniques are widely adopted to aid our decision processes. Despite of the success of machine learning in various application scenarios, there are still many questions that need to be well answered, such as optimizing machine learning outcomes when desired knowledge cannot be extracted from the available data. This naturally drives us to ponder if one can leverage some side information to populate the knowledge domain of their interest, such that the problems within that knowledge domain can be better tackled. In this work, such problems are investigated and practical solutions are proposed. To leverage machine learning in any decision-making process, one must convert the given knowledge (for example, natural language, unstructured text) into representation vectors that can be understood and processed by machine learning model in their compatible language and data format. The frequently encountered difficulty is, however, the given knowledge is not rich or reliable enough in the first place. In such cases, one seeks to fuse side information from a separate domain to mitigate the gap between good representation learning and the scarce knowledge in the domain of interest. This approach is named Cross-Domain Knowledge Transfer. It is crucial to study the problem because of the commonality of scarce knowledge in many scenarios, from online healthcare platform analyses to financial market risk quantification, leaving an obstacle in front of us benefiting from automated decision making. From the machine learning perspective, the paradigm of semi-supervised learning takes advantage of large amount of data without ground truth and achieves impressive learning performance improvement. It is adopted in this dissertation for cross-domain knowledge transfer. Furthermore, graph learning techniques are indispensable given that networks commonly exist in real word, such as taxonomy networks and scholarly article citation networks. These networks contain additional useful knowledge and are ought to be incorporated in the learning process, which serve as an important lever in solving the problem of cross-domain knowledge transfer. This dissertation proposes graph-based learning solutions and demonstrates their practical usage via empirical studies on real-world applications. Another line of effort in this work lies in leveraging the rich capacity of neural networks to improve the learning outcomes, as we are in the era of big data. In contrast to many Graph Neural Networks that directly iterate on the graph adjacency to approximate graph convolution filters, this work also proposes an efficient Eigenvalue learning method that directly optimizes the graph convolution in the spectral space. This work articulates the importance of network spectrum and provides detailed analyses on the spectral properties in the proposed EigenLearn method, which well aligns with a series of CNN models that attempt to have meaningful spectral interpretation in designing graph neural networks. The disser-tation also addresses the efficiency, which can be categorized in two folds. First, by adopting approximate solutions it mitigates the complexity concerns for graph related algorithms, which are naturally quadratic in most cases and do not scale to large datasets. Second, it mitigates the storage and computation overhead in deep neural network, such that they can be deployed on many light-weight devices and significantly broaden the applicability. Finally, the dissertation is concluded by future endeavors

    Toward certifiable optimal motion planning for medical steerable needles

    Get PDF
    Medical steerable needles can follow 3D curvilinear trajectories to avoid anatomical obstacles and reach clinically significant targets inside the human body. Automating steerable needle procedures can enable physicians and patients to harness the full potential of steerable needles by maximally leveraging their steerability to safely and accurately reach targets for medical procedures such as biopsies. For the automation of medical procedures to be clinically accepted, it is critical from a patient care, safety, and regulatory perspective to certify the correctness and effectiveness of the planning algorithms involved in procedure automation. In this paper, we take an important step toward creating a certifiable optimal planner for steerable needles. We present an efficient, resolution-complete motion planner for steerable needles based on a novel adaptation of multi-resolution planning. This is the first motion planner for steerable needles that guarantees to compute in finite time an obstacle-avoiding plan (or notify the user that no such plan exists), under clinically appropriate assumptions. Based on this planner, we then develop the first resolution-optimal motion planner for steerable needles that further provides theoretical guarantees on the quality of the computed motion plan, that is, global optimality, in finite time. Compared to state-of-the-art steerable needle motion planners, we demonstrate with clinically realistic simulations that our planners not only provide theoretical guarantees but also have higher success rates, have lower computation times, and result in higher quality plans

    Algorithms for learning from spatial and mobility data

    Get PDF
    Data from the numerous mobile devices, location-based applications, and collection sensors used currently can provide important insights about human and natural processes. These insights can inform decision making in designing and optimising in frastructure such as transportation or energy. However, extracting patterns related to spatial properties is challenging due to the large quantity of the data produced and the complexity of the processes it describes. We propose scalable, multi-resolution approximation and heuristic algorithms that make use of spatial proximity properties to solve fundamental data mining and optimisation problems with a better running time and accuracy. We observe that abstracting from individual data points and working with units of neighbouring points based on various measures on similarity, improves computational efficiency and diminishes the effects of noise and overfitting. We consider applications in: mobility data compression, transit network planning, and solar power output prediction. Firstly, in order to understand transportation needs, it is essential to have efficient ways to represent large amounts of travel data. In analysing spatial trajectories (for example taxis travelling in a city), one of the main challenges is computing distances between trajectories efficiently; due to their size and complexity this task is computationally expensive. We build data structures and algorithms to sketch trajectory data that make queries such as distance computation, nearest neighbour search and clustering, which are key to finding mobility patterns, more computationally efficient. We use locality sensitive hashing, a technique that associates similar objects to the same hash. Secondly, to build efficient infrastructure it is necessary to satisfy travel demand by placing resources optimally. This is difficult due to external constraints (such as limits on budget) and the complexity of existing road networks that allow for a large number of candidate locations. For this purpose, we present heuristic algorithms for efficient transit network design with a case study on cycling lane placement. The heuristic is based on a new type of clustering by projection, that is both computationally efficient and gives good results in practice. Lastly, we devise a novel method to forecast solar power output based on numerical weather predictions, clear sky predictions and persistence data. The ensemble of a multivariate linear regression model, support vector machines model, and an artificial neural network gives more accurate predictions than any of the individual models. Analysing the performance of the models in a suite of frameworks reveals that building separate models for each self-contained area based on weather patterns gives a better accuracy than a single model that predicts the total. The ensemble can be further improved by giving performance-based weights to the individual models. This suggests that the models identify different patterns in the data, which motivated the choice of an ensemble architecture

    Big Networks: Analysis and Optimal Control

    Get PDF
    The study of networks has seen a tremendous breed of researches due to the explosive spectrum of practical problems that involve networks as the access point. Those problems widely range from detecting functionally correlated proteins in biology to finding people to give discounts and gain maximum popularity of a product in economics. Thus, understanding and further being able to manipulate/control the development and evolution of the networks become critical tasks for network scientists. Despite the vast research effort putting towards these studies, the present state-of-the-arts largely either lack of high quality solutions or require excessive amount of time in real-world `Big Data\u27 requirement. This research aims at affirmatively boosting the modern algorithmic efficiency to approach practical requirements. That is developing a ground-breaking class of algorithms that provide simultaneously both provably good solution qualities and low time and space complexities. Specifically, I target the important yet challenging problems in the three main areas: Information Diffusion: Analyzing and maximizing the influence in networks and extending results for different variations of the problems. Community Detection: Finding communities from multiple sources of information. Security and Privacy: Assessing organization vulnerability under targeted-cyber attacks via social networks

    Worst-Case to Average-Case Reductions for the SIS Problem: Tightness and Security

    Get PDF
    We present a framework for evaluating the concrete security assurances of cryptographic constructions given by the worst-case SIVP_γ to average-case SIS_{n,m,q,β} reductions. As part of this analysis, we present the tightness gaps for three worst-case SIVP_γ to average-case SIS_{n,m,q,β} reductions. We also analyze the hardness of worst-case SIVP_γ instances. We apply our methodology to two SIS-based signature schemes, and compute the security guarantees that these systems get from reductions to worst-case SIVP_γ. We find that most of the presented reductions do not apply to the chosen parameter sets for the signature schemes. We propose modifications to the schemes to make the reductions applicable, and find that the worst-case security assurances of the (modified) signature schemes are, for both signature schemes, significantly lower than the amount of security previously claimed
    • …
    corecore