763 research outputs found

    Distributed OSN Crawling System based on Ajax Simulation

    Get PDF
    AbstractIn the age of Web2.0, lots of online social networks (OSNs) like Facebook, Twitter, WeiBo become the most popular information transform platform, which catch more and more attention from Information Retrieval (IR). However, traditional web crawling System get into trouble because of the complicated OSN web pages, the rapid message exploding and the heavy using of Asynchronous JavaScript and XML(AJAX). We design and implement a distributed system based on Message Oriented Middleware (MOM) and Ajax simulation, which crawls 70 millions of Twitter detail items in one month. The data Acquisition shows that the crawling with Ajax simulation is able to get items loaded by Ajax without limitations, the distributed system based on MOM and Ajax simulation is able to crawl massive OSN data completely, quickly, frequently and unrestrictedly

    Towards a collaborative tourist system using serious games

    Get PDF
    Tese de mestrado integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 201

    Topics in Complex and Large-scale Data Analysis

    Get PDF
    Past few decades have witnessed skyrocketed development of modern technologies. As a result, data collected from modern technologies are evolving towards a direction with more complicated structure and larger scale, driving the traditional data analysis methods to develop and adapt. In this dissertation, we study three statistical issues rising in data with complicated structure and/or in large scale. In Chapter 2, we propose a Bayesian framework via exponential random graph models (ERGM) to estimate the model parameters and network structures for networks with measurement errors; In Chapter 3, we design a novel network sampling algorithm for large-scale networks with community structure; In Chapter 4, we introduce a proper framework to conduct discrete large-scale hypothesis testing procedure based on local false discovery rate (FDR). The performances of our procedures are evaluated through various simulations and real applications, while necessary theoretical properties are carefully studied as well
    corecore