174 research outputs found

    Characterization of P2P Systems

    Full text link
    Understanding existing systems and devising new P2P techniques relies on having access to representative models derived from empirical observations of existing systems. However, the large and dynamic nature of P2P systems makes capturing accurate measurements challenging. Because there is no central repository, data must b

    Characterizing Popularity Dynamics of User-generated Videos: A Category-based Study of YouTube

    Get PDF
    Understanding the growth pattern of content popularity has become a subject of immense interest to Internet service providers, content makers and on-line advertisers. This understanding is also important for the sustainable development of content distribution systems. As an approach to comprehend the characteristics of this growth pattern, a significant amount of research has been done in analyzing the popularity growth patterns of YouTube videos. Unfortunately, no work has been done that intensively investigates the popularity patterns of YouTube videos based on video object category. In this thesis, an in-depth analysis of the popularity pattern of YouTube videos is performed, considering the categories of videos. Metadata and request patterns were collected by employing category-specific YouTube crawlers. The request patterns were observed for a period of five months. Results confirm that the time varying popularity of di fferent YouTube categories are conspicuously diff erent, in spite of having sets of categories with very similar viewing patterns. In particular, News and Sports exhibit similar growth curves, as do Music and Film. While for some categories views at early ages can be used to predict future popularity, for some others predicting future popularity is a challenging task and require more sophisticated techniques, e.g., time-series clustering. The outcomes of these analyses are instrumental towards designing a reliable workload generator, which can be further used to evaluate diff erent caching policies for YouTube and similar sites. In this thesis, workload generators for four of the YouTube categories are developed. Performance of these workload generators suggest that a complete category-specific workload generator can be developed using time-series clustering. Patterns of users' interaction with YouTube videos are also analyzed from a dataset collected in a local network. This shows the possible ways of improving the performance of Peer-to-Peer video distribution technique along with a new video recommendation method

    Advanced monitoring in P2P botnets

    Get PDF
    Botnets are increasingly being held responsible for most of the cybercrimes that occur nowadays. They are used to carry out malicious activities like banking credential theft and Distributed Denial of Service (DDoS) attacks to generate profit for their owner, the botmaster. Traditional botnets utilized centralized and decentralized Command-and-Control Servers (C2s). However, recent botnets have been observed to prefer P2P-based architectures to overcome some of the drawbacks of the earlier architectures. A P2P architecture allows botnets to become more resilient and robust against random node failures and targeted attacks. However, the distributed nature of such botnets requires the defenders, i.e., researchers and law enforcement agencies, to use specialized tools such as crawlers and sensor nodes to monitor them. In return to such monitoring, botmasters have introduced various countermeasures to impede botnet monitoring, e.g., automated blacklisting mechanisms. The presence of anti-monitoring mechanisms not only render any gathered monitoring data to be inaccurate or incomplete, it may also adversely affect the success rate of botnet takedown attempts that rely upon such data. Most of the existing monitoring mechanisms identified from the related works only attempt to tolerate anti-monitoring mechanisms as much as possible, e.g., crawling bots with lower frequency. However, this might also introduce noise into the gathered data, e.g., due to the longer delay for crawling the botnet. This in turn may also reduce the quality of the data. This dissertation addresses most of the major issues associated with monitoring in P2P botnets as described above. Specifically, it analyzes the anti-monitoring mechanisms of three existing P2P botnets: 1) GameOver Zeus, 2)Sality, and 3) ZeroAccess, and proposes countermeasures to circumvent some of them. In addition, this dissertation also proposes several advanced anti-monitoring mechanisms from the perspective of a botmaster to anticipate future advancement of the botnets. This includes a set of lightweight crawler detection mechanisms as well as several novel mechanisms to detect sensor nodes deployed in P2P botnets. To ensure that the defenders do not loose this arms race, this dissertation also includes countermeasures to circumvent the proposed anti-monitoring mechanisms. Finally, this dissertation also investigates if the presence of third party monitoring mechanisms, e.g., sensors, in botnets influences the overall churn measurements. In addition, churn models for Sality and ZeroAccess are also derived using fine-granularity churn measurements. The works proposed in this dissertation have been evaluated using either real-world botnet datasets, i.e., that were gathered using crawlers and sensor nodes, or simulated datasets. Evaluation results indicate that most of the anti-monitoring mechanisms implemented by existing botnets can either be circumvented or tolerated to obtain monitoring data with a better quality. However, many crawlers and sensor nodes in existing botnets are found vulnerable to the antimonitoring mechanisms that are proposed from the perspective of a botmaster in this dissertation. Analysis of the fine-grained churn measurements for Sality and ZeroAccess indicate that churn in these botnets are similar to that of regular P2P file-sharing networks like Gnutella and Bittorent. In addition, the presence of highly responsive sensor nodes in the botnets are found not influencing the overall churn measurements. This is mainly due to low number of sensor nodes currently deployed in the botnets. Existing and future botnet monitoring mechanisms should apply the findings of this dissertation to ensure high quality monitoring data, and to remain undetected from the bots or the botmasters

    Cloudarmor: Supporting Reputation-Based Trust Management for Cloud Services

    Get PDF
    Cloud services have become predominant in the current technological era. For the rich set of features provided by cloud services, consumers want to access the services while protecting their privacy. In this kind of environment, protection of cloud services will become a significant problem. So, research has started for a system, which lets the users access cloud services without losing the privacy of their data. Trust management and identity model makes sense in this case. The identity model maintains the authentication and authorization of the components involved in the system and trust-based model provides us with a dynamic way of identifying issues and attacks with the system and take appropriate actions. Further, a trust management-based system provides us with a new set of challenges such as reputation-based attacks, availability of components, and misleading trust feedbacks. Collusion attacks and Sybil attacks form a significant part of these challenges. This paper aims to solve the above problems in a trust management-based model by introducing a credibility model on top of a new trust management model, which addresses these use-cases, and also provides reliability and availability

    Text Mining for Information Systems Researchers: An Annotated Topic Modeling Tutorial

    Get PDF
    Analysts have estimated that more than 80 percent of today’s data is stored in unstructured form (e.g., text, audio, image, video)—much of it expressed in rich and ambiguous natural language. Traditionally, to analyze natural language, one has used qualitative data-analysis approaches, such as manual coding. Yet, the size of text data sets obtained from the Internet makes manual analysis virtually impossible. In this tutorial, we discuss the challenges encountered when applying automated text-mining techniques in information systems research. In particular, we showcase how to use probabilistic topic modeling via Latent Dirichlet allocation, an unsupervised text-mining technique, with a LASSO multinomial logistic regression to explain user satisfaction with an IT artifact by automatically analyzing more than 12,000 online customer reviews. For fellow information systems researchers, this tutorial provides guidance for conducting text-mining studies on their own and for evaluating the quality of others

    Exploiting Context-Aware Event Data for Fault Analysis

    Get PDF
    Fault analysis in communication networks and distributed systems is a difficult process that heavily depends on system administrator’s experience and supporting tools. This process usually requires analytic techniques and several types of event data including log events, debug messages, trace obtained from these systems to investigate the root cause of faults. This paper introduces an approach of exploiting context-aware data and classification technique for improving this process. This approach uses both event data and context-aware data including CPU load, memory, processes, temperature, status to train a decision tree, and then applies the tree to assess suspected events. We have implemented and experimented the approach on the OpenStack cloud computing system with the Hadoop computing service and MELA event collection system. The experimental results reveal that the accuracy score of the approach reaches 85% on average. The paper also includes detailed analysis for the results

    Data-driven Computational Social Science: A Survey

    Get PDF
    Social science concerns issues on individuals, relationships, and the whole society. The complexity of research topics in social science makes it the amalgamation of multiple disciplines, such as economics, political science, and sociology, etc. For centuries, scientists have conducted many studies to understand the mechanisms of the society. However, due to the limitations of traditional research methods, there exist many critical social issues to be explored. To solve those issues, computational social science emerges due to the rapid advancements of computation technologies and the profound studies on social science. With the aids of the advanced research techniques, various kinds of data from diverse areas can be acquired nowadays, and they can help us look into social problems with a new eye. As a result, utilizing various data to reveal issues derived from computational social science area has attracted more and more attentions. In this paper, to the best of our knowledge, we present a survey on data-driven computational social science for the first time which primarily focuses on reviewing application domains involving human dynamics. The state-of-the-art research on human dynamics is reviewed from three aspects: individuals, relationships, and collectives. Specifically, the research methodologies used to address research challenges in aforementioned application domains are summarized. In addition, some important open challenges with respect to both emerging research topics and research methods are discussed.Comment: 28 pages, 8 figure
    • …
    corecore