14,430 research outputs found

    Third Party Tracking in the Mobile Ecosystem

    Full text link
    Third party tracking allows companies to identify users and track their behaviour across multiple digital services. This paper presents an empirical study of the prevalence of third-party trackers on 959,000 apps from the US and UK Google Play stores. We find that most apps contain third party tracking, and the distribution of trackers is long-tailed with several highly dominant trackers accounting for a large portion of the coverage. The extent of tracking also differs between categories of apps; in particular, news apps and apps targeted at children appear to be amongst the worst in terms of the number of third party trackers associated with them. Third party tracking is also revealed to be a highly trans-national phenomenon, with many trackers operating in jurisdictions outside the EU. Based on these findings, we draw out some significant legal compliance challenges facing the tracking industry.Comment: Corrected missing company info (Linkedin owned by Microsoft). Figures for Microsoft and Linkedin re-calculated and added to Table

    Summarization of Dynamic Content in Web Collections

    Full text link

    Static Score Bucketing in Inverted Indexes

    Full text link
    Maintaining strict static score order of inverted lists is a heuristic used by search engines to improve the quality of query results when the entire inverted lists cannot be processed. This heuristic, however, increases the cost of index generation and requires time-consuming index build algorithms. In this paper, we study a new index organization based on static score bucketing. We show that this new technique significantly improves in index build performance while having minimal impact on the quality of search results. We also provide upper bounds on the quality degradation and verify experimentally the benefits of the proposed approach

    Attack graph approach to dynamic network vulnerability analysis and countermeasures

    Get PDF
    A thesis submitted to the University of Bedfordshire, in partial fulfilment of the requirements for the degree of Doctor of PhilosophyIt is widely accepted that modern computer networks (often presented as a heterogeneous collection of functioning organisations, applications, software, and hardware) contain vulnerabilities. This research proposes a new methodology to compute a dynamic severity cost for each state. Here a state refers to the behaviour of a system during an attack; an example of a state is where an attacker could influence the information on an application to alter the credentials. This is performed by utilising a modified variant of the Common Vulnerability Scoring System (CVSS), referred to as a Dynamic Vulnerability Scoring System (DVSS). This calculates scores of intrinsic, time-based, and ecological metrics by combining related sub-scores and modelling the problem’s parameters into a mathematical framework to develop a unique severity cost. The individual static nature of CVSS affects the scoring value, so the author has adapted a novel model to produce a DVSS metric that is more precise and efficient. In this approach, different parameters are used to compute the final scores determined from a number of parameters including network architecture, device setting, and the impact of vulnerability interactions. An attack graph (AG) is a security model representing the chains of vulnerability exploits in a network. A number of researchers have acknowledged the attack graph visual complexity and a lack of in-depth understanding. Current attack graph tools are constrained to only limited attributes or even rely on hand-generated input. The automatic formation of vulnerability information has been troublesome and vulnerability descriptions are frequently created by hand, or based on limited data. The network architectures and configurations along with the interactions between the individual vulnerabilities are considered in the method of computing the Cost using the DVSS and a dynamic cost-centric framework. A new methodology was built up to present an attack graph with a dynamic cost metric based on DVSS and also a novel methodology to estimate and represent the cost-centric approach for each host’ states was followed out. A framework is carried out on a test network, using the Nessus scanner to detect known vulnerabilities, implement these results and to build and represent the dynamic cost centric attack graph using ranking algorithms (in a standardised fashion to Mehta et al. 2006 and Kijsanayothin, 2010). However, instead of using vulnerabilities for each host, a CostRank Markov Model has developed utilising a novel cost-centric approach, thereby reducing the complexity in the attack graph and reducing the problem of visibility. An analogous parallel algorithm is developed to implement CostRank. The reason for developing a parallel CostRank Algorithm is to expedite the states ranking calculations for the increasing number of hosts and/or vulnerabilities. In the same way, the author intends to secure large scale networks that require fast and reliable computing to calculate the ranking of enormous graphs with thousands of vertices (states) and millions of arcs (representing an action to move from one state to another). In this proposed approach, the focus on a parallel CostRank computational architecture to appraise the enhancement in CostRank calculations and scalability of of the algorithm. In particular, a partitioning of input data, graph files and ranking vectors with a load balancing technique can enhance the performance and scalability of CostRank computations in parallel. A practical model of analogous CostRank parallel calculation is undertaken, resulting in a substantial decrease in calculations communication levels and in iteration time. The results are presented in an analytical approach in terms of scalability, efficiency, memory usage, speed up and input/output rates. Finally, a countermeasures model is developed to protect against network attacks by using a Dynamic Countermeasures Attack Tree (DCAT). The following scheme is used to build DCAT tree (i) using scalable parallel CostRank Algorithm to determine the critical asset, that system administrators need to protect; (ii) Track the Nessus scanner to determine the vulnerabilities associated with the asset using the dynamic cost centric framework and DVSS; (iii) Check out all published mitigations for all vulnerabilities. (iv) Assess how well the security solution mitigates those risks; (v) Assess DCAT algorithm in terms of effective security cost, probability and cost/benefit analysis to reduce the total impact of a specific vulnerability

    LambdaFM: Learning Optimal Ranking with Factorization Machines Using Lambda Surrogates

    Get PDF
    State-of-the-art item recommendation algorithms, which apply Factorization Machines (FM) as a scoring function and pairwise ranking loss as a trainer (PRFM for short), have been recently investigated for the implicit feedback based context-aware recommendation problem (IFCAR). However, good recommenders particularly emphasize on the accuracy near the top of the ranked list, and typical pairwise loss functions might not match well with such a requirement. In this paper, we demonstrate, both theoretically and empirically, PRFM models usually lead to non-optimal item recommendation results due to such a mismatch. Inspired by the success of LambdaRank, we introduce Lambda Factorization Machines (LambdaFM), which is particularly intended for optimizing ranking performance for IFCAR. We also point out that the original lambda function suffers from the issue of expensive computational complexity in such settings due to a large amount of unobserved feedback. Hence, instead of directly adopting the original lambda strategy, we create three effective lambda surrogates by conducting a theoretical analysis for lambda from the top-N optimization perspective. Further, we prove that the proposed lambda surrogates are generic and applicable to a large set of pairwise ranking loss functions. Experimental results demonstrate LambdaFM significantly outperforms state-of-the-art algorithms on three real-world datasets in terms of four standard ranking measures

    Survey over Existing Query and Transformation Languages

    Get PDF
    A widely acknowledged obstacle for realizing the vision of the Semantic Web is the inability of many current Semantic Web approaches to cope with data available in such diverging representation formalisms as XML, RDF, or Topic Maps. A common query language is the first step to allow transparent access to data in any of these formats. To further the understanding of the requirements and approaches proposed for query languages in the conventional as well as the Semantic Web, this report surveys a large number of query languages for accessing XML, RDF, or Topic Maps. This is the first systematic survey to consider query languages from all these areas. From the detailed survey of these query languages, a common classification scheme is derived that is useful for understanding and differentiating languages within and among all three areas

    SparkIR: a Scalable Distributed Information Retrieval Engine over Spark

    Get PDF
    Search engines have to deal with a huge amount of data (e.g., billions of documents in the case of the Web) and find scalable and efficient ways to produce effective search results. In this thesis, we propose to use Spark framework, an in memory distributed big data processing framework, and leverage its powerful capabilities of handling large amount of data to build an efficient and scalable experimental search engine over textual documents. The proposed system, SparkIR, can serve as a research framework for conducting information retrieval (IR) experiments. SparkIR supports two indexing schemes, document-based partitioning and term-based partitioning, to adopt document-at-a-time (DAAT) and term-at-a-time (TAAT) query evaluation methods. Moreover, it offers static and dynamic pruning to improve the retrieval efficiency. For static pruning, it employs champion list and tiering, while for dynamic pruning, it uses MaxScore top k retrieval. We evaluated the performance of SparkIR using ClueWeb12-B13 collection that contains about 50M English Web pages. Experiments over different subsets of the collection and compared the Elasticsearch baseline show that SparkIR exhibits reasonable efficiency and scalability performance overall for both indexing and retrieval. Implemented as an open-source library over Spark, users of SparkIR can also benefit from other Spark libraries (e.g., MLlib and GraphX), which, therefore, eliminates the need of usin

    Dublin City University video track experiments for TREC 2003

    Get PDF
    In this paper, we describe our experiments for both the News Story Segmentation task and Interactive Search task for TRECVID 2003. Our News Story Segmentation task involved the use of a Support Vector Machine (SVM) to combine evidence from audio-visual analysis tools in order to generate a listing of news stories from a given news programme. Our Search task experiment compared a video retrieval system based on text, image and relevance feedback with a text-only video retrieval system in order to identify which was more effective. In order to do so we developed two variations of our FĂ­schlĂĄr video retrieval system and conducted user testing in a controlled lab environment. In this paper we outline our work on both of these two tasks
    • 

    corecore