2,100 research outputs found

    Outward Influence and Cascade Size Estimation in Billion-scale Networks

    Full text link
    Estimating cascade size and nodes' influence is a fundamental task in social, technological, and biological networks. Yet this task is extremely challenging due to the sheer size and the structural heterogeneity of networks. We investigate a new influence measure, termed outward influence (OI), defined as the (expected) number of nodes that a subset of nodes SS will activate, excluding the nodes in S. Thus, OI equals, the de facto standard measure, influence spread of S minus |S|. OI is not only more informative for nodes with small influence, but also, critical in designing new effective sampling and statistical estimation methods. Based on OI, we propose SIEA/SOIEA, novel methods to estimate influence spread/outward influence at scale and with rigorous theoretical guarantees. The proposed methods are built on two novel components 1) IICP an important sampling method for outward influence, and 2) RSA, a robust mean estimation method that minimize the number of samples through analyzing variance and range of random variables. Compared to the state-of-the art for influence estimation, SIEA is Ω(log4n)\Omega(\log^4 n) times faster in theory and up to several orders of magnitude faster in practice. For the first time, influence of nodes in the networks of billions of edges can be estimated with high accuracy within a few minutes. Our comprehensive experiments on real-world networks also give evidence against the popular practice of using a fixed number, e.g. 10K or 20K, of samples to compute the "ground truth" for influence spread.Comment: 16 pages, SIGMETRICS 201

    2D Proactive Uplink Resource Allocation Algorithm for Event Based MTC Applications

    Full text link
    We propose a two dimension (2D) proactive uplink resource allocation (2D-PURA) algorithm that aims to reduce the delay/latency in event-based machine-type communications (MTC) applications. Specifically, when an event of interest occurs at a device, it tends to spread to the neighboring devices. Consequently, when a device has data to send to the base station (BS), its neighbors later are highly likely to transmit. Thus, we propose to cluster devices in the neighborhood around the event, also referred to as the disturbance region, into rings based on the distance from the original event. To reduce the uplink latency, we then proactively allocate resources for these rings. To evaluate the proposed algorithm, we analytically derive the mean uplink delay, the proportion of resource conservation due to successful allocations, and the proportion of uplink resource wastage due to unsuccessful allocations for 2D-PURA algorithm. Numerical results demonstrate that the proposed method can save over 16.5 and 27 percent of mean uplink delay, compared with the 1D algorithm and the standard method, respectively.Comment: 6 pages, 6 figures, Published in 2018 IEEE Wireless Communications and Networking Conference (WCNC

    When can we reconstruct the ancestral state? Beyond Brownian motion

    Full text link
    Reconstructing the ancestral state of a group of species helps answer many important questions in evolutionary biology. Therefore, it is crucial to understand when we can estimate the ancestral state accurately. Previous works provide a necessary and sufficient condition, called the big bang condition, for the existence of an accurate reconstruction method under discrete trait evolution models and the Brownian motion model. In this paper, we extend this result to a wide range of continuous trait evolution models. In particular, we consider a general setting where continuous traits evolve along the tree according to stochastic processes that satisfy some regularity conditions. We verify these conditions for popular continuous trait evolution models including Ornstein-Uhlenbeck, reflected Brownian Motion, and Cox-Ingersoll-Ross

    n-Gram-based text compression

    Get PDF
    We propose an efficient method for compressing Vietnamese text using n-gram dictionaries. It has a significant compression ratio in comparison with those of state-of-the-art methods on the same dataset. Given a text, first, the proposed method splits it into n-grams and then encodes them based on n-gram dictionaries. In the encoding phase, we use a sliding window with a size that ranges from bigram to five grams to obtain the best encoding stream. Each n-gram is encoded by two to four bytes accordingly based on its corresponding n-gram dictionary. We collected 2.5 GB text corpus from some Vietnamese news agencies to build n-gram dictionaries from unigram to five grams and achieve dictionaries with a size of 12 GB in total. In order to evaluate our method, we collected a testing set of 10 different text files with different sizes. The experimental results indicate that our method achieves compression ratio around 90% and outperforms state-of-the-art methods.Web of Scienceart. no. 948364

    VLSP SHARED TASK: SENTIMENT ANALYSIS

    Get PDF
    Sentiment analysis is a natural language processing (NLP) task of identifying orextracting the sentiment content of a text unit. This task has become an active research topic since the early 2000s. During the two last editions of the VLSP workshop series, the shared task on Sentiment Analysis (SA) for Vietnamese has been organized in order to provide an objective evaluation measurement about the performance (quality) of sentiment analysis tools, and encouragethe development of Vietnamese sentiment analysis systems, as well as to provide benchmark datasets for this task. The rst campaign in 2016 only focused on the sentiment polarity classication, with a dataset containing reviews of electronic products. The second campaign in 2018 addressed the problem of Aspect Based Sentiment Analysis (ABSA) for Vietnamese, by providing two datasets containing reviews in restaurant and hotel domains. These data are accessible for research purpose via the VLSP website vlsp.org.vn/resources. This paper describes the built datasets as well as the evaluation results of the systems participating to these campaigns
    corecore