14 research outputs found

    A parallel algorithm to calculate the costrank of a network

    No full text
    We developed analogous parallel algorithms to implement CostRank for distributed memory parallel computers using multi processors. Our intent is to make CostRank calculations for the growing number of hosts in a fast and a scalable way. In the same way we intent to secure large scale networks that require fast and reliable computing to calculate the ranking of enormous graphs with thousands of vertices (states) and millions or arcs (links). In our proposed approach we focus on a parallel CostRank computational architecture on a cluster of PCs networked via Gigabit Ethernet LAN to evaluate the performance and scalability of our implementation. In particular, a partitioning of input data, graph files, and ranking vectors with load balancing technique can improve the runtime and scalability of large-scale parallel computations. An application case study of analogous Cost Rank computation is presented. Applying parallel environment models for one-dimensional sparse matrix partitioning on a modified research page, results in a significant reduction in communication overhead and in per-iteration runtime. We provide an analytical discussion of analogous algorithms performance in terms of I/O and synchronization cost, as well as of memory usage

    Parallel Page Rank Algorithms: A Survey

    Get PDF
    The PageRank method is an important and basic component in effective web search to compute the rank score of each page. The exponential growth of the Internet makes a crucial challenges for search engines to provide up-to-date and relevant user?s query search results within time period. The PageRank method computed on huge number of web pages and this is computation intensive task. In this paper, we provide the basic concept of PageRank method and discuss some Parallel PageRank methods. We also compare some Parallel algorithmic concepts like load balance, distributed vs. shared memory and data layout on these algorithms

    Exploiting Web Matrix Permutations to Speedup PageRank Computation

    Get PDF
    Recently, the research community has devoted an increased attention to reduce the computational time needed by Web ranking algorithms. In particular, we saw many proposals to speed up the well-known PageRank algorithm used by Google. This interest is motivated by two dominant factors: (1) the Web Graph has huge dimensions and it is subject to dramatic updates in term of nodes and links - therefore PageRank assignment tends to became obsolete very soon; (2) many PageRank vectors need to be computed according to different personalization vectors chosen. In the present paper, we address this problem from a numerical point of view. First, we show how to treat dangling nodes in a way which naturally adapts to the random surfer model and preserves the sparsity of the Web Graph. This result allows to consider the PageRank computation as a sparse linear system in alternative to the commonly adopted eigenpairs interpretation. Second, we exploit the Web Matrix reducibility and compose opportunely some Web matrix permutation to speed up the PageRank computation. We tested our approaches on a Web Graphs crawled from the net. The largest one account about 24 millions nodes and more than 100 million links. Upon this Web Graph, the cost for computing the PageRank is reduced of 58% in terms of Mflops and of 89% in terms of time respect to the Power method commonly used

    Local methods for estimating pagerank values

    Full text link

    Attack graph approach to dynamic network vulnerability analysis and countermeasures

    Get PDF
    A thesis submitted to the University of Bedfordshire, in partial fulfilment of the requirements for the degree of Doctor of PhilosophyIt is widely accepted that modern computer networks (often presented as a heterogeneous collection of functioning organisations, applications, software, and hardware) contain vulnerabilities. This research proposes a new methodology to compute a dynamic severity cost for each state. Here a state refers to the behaviour of a system during an attack; an example of a state is where an attacker could influence the information on an application to alter the credentials. This is performed by utilising a modified variant of the Common Vulnerability Scoring System (CVSS), referred to as a Dynamic Vulnerability Scoring System (DVSS). This calculates scores of intrinsic, time-based, and ecological metrics by combining related sub-scores and modelling the problem’s parameters into a mathematical framework to develop a unique severity cost. The individual static nature of CVSS affects the scoring value, so the author has adapted a novel model to produce a DVSS metric that is more precise and efficient. In this approach, different parameters are used to compute the final scores determined from a number of parameters including network architecture, device setting, and the impact of vulnerability interactions. An attack graph (AG) is a security model representing the chains of vulnerability exploits in a network. A number of researchers have acknowledged the attack graph visual complexity and a lack of in-depth understanding. Current attack graph tools are constrained to only limited attributes or even rely on hand-generated input. The automatic formation of vulnerability information has been troublesome and vulnerability descriptions are frequently created by hand, or based on limited data. The network architectures and configurations along with the interactions between the individual vulnerabilities are considered in the method of computing the Cost using the DVSS and a dynamic cost-centric framework. A new methodology was built up to present an attack graph with a dynamic cost metric based on DVSS and also a novel methodology to estimate and represent the cost-centric approach for each host’ states was followed out. A framework is carried out on a test network, using the Nessus scanner to detect known vulnerabilities, implement these results and to build and represent the dynamic cost centric attack graph using ranking algorithms (in a standardised fashion to Mehta et al. 2006 and Kijsanayothin, 2010). However, instead of using vulnerabilities for each host, a CostRank Markov Model has developed utilising a novel cost-centric approach, thereby reducing the complexity in the attack graph and reducing the problem of visibility. An analogous parallel algorithm is developed to implement CostRank. The reason for developing a parallel CostRank Algorithm is to expedite the states ranking calculations for the increasing number of hosts and/or vulnerabilities. In the same way, the author intends to secure large scale networks that require fast and reliable computing to calculate the ranking of enormous graphs with thousands of vertices (states) and millions of arcs (representing an action to move from one state to another). In this proposed approach, the focus on a parallel CostRank computational architecture to appraise the enhancement in CostRank calculations and scalability of of the algorithm. In particular, a partitioning of input data, graph files and ranking vectors with a load balancing technique can enhance the performance and scalability of CostRank computations in parallel. A practical model of analogous CostRank parallel calculation is undertaken, resulting in a substantial decrease in calculations communication levels and in iteration time. The results are presented in an analytical approach in terms of scalability, efficiency, memory usage, speed up and input/output rates. Finally, a countermeasures model is developed to protect against network attacks by using a Dynamic Countermeasures Attack Tree (DCAT). The following scheme is used to build DCAT tree (i) using scalable parallel CostRank Algorithm to determine the critical asset, that system administrators need to protect; (ii) Track the Nessus scanner to determine the vulnerabilities associated with the asset using the dynamic cost centric framework and DVSS; (iii) Check out all published mitigations for all vulnerabilities. (iv) Assess how well the security solution mitigates those risks; (v) Assess DCAT algorithm in terms of effective security cost, probability and cost/benefit analysis to reduce the total impact of a specific vulnerability

    Análisis de los criterios de relevancia documental mediante consultas de información en el entorno web

    Get PDF
    La búsqueda de información no se entiende sin los motores de búsqueda web. Ante una demanda de información los buscadores web ordenan los resultados de forma que las páginas web más relevantes para la consulta aparezcan en las primeras posiciones. Esto genera un alto grado de competitividad entre las páginas web por obtener mejores asignaciones de relevancia por parte de los buscadores. Por norma general, los usuarios suelen consultar sólo los primeros resultados que devuelve un motor de búsqueda, en consecuencia ocupar estos puestos se traduce en mayor prestigio y visibilidad. Por tanto, la percepción de relevancia documental web por parte de los usuarios está intrínsecamente unida a los motores de búsqueda. En este trabajo se propone y desarrolla una metodología para determinar la relevancia documental web de forma automática, que se puede interpretar como: predicción automática de la posición que otorgaría un motor de búsqueda a un documento web entre los resultados de una consulta. La investigación se completa identificando los factores considerados en el posicionamiento web, a partir del estudio de herramientas empleadas en la optimización y promoción de páginas web. También se analiza el peso de cada uno de estos factores en los algoritmos de ordenación de los buscadores. Finalmente, en relación a las capacidades adquiridas para emular el comportamiento de los motores de búsqueda se propone un método de optimización web que estima previamente la rentabilidad del proceso. De esta forma no se invertirá en una campaña de promoción si los pronósticos de mejora del posicionamiento no se juzgan adecuados

    On Two Web IR Boosting Tools: Clustering and Ranking

    Get PDF
    This thesis investigates several research problems which arise in modern Web Information Retrieval (WebIR). The Holy Grail of modern WebIR is to find a way to organize and to rank results so that the most ``relevant' come first. The first break-through technique was the exploitation of the link structure of the Web graph in order to rank the result pages, using the well-known Hits and Pagerank algorithms. This link-analysis approaches have been improved and extended, but yet they seem to be insufficient in providing a satisfying search experience. In a number of situations a flat list of search results is not enough, and the users might desire to have search results grouped on-the-fly in folders of similar topics. In addition, the folders should be annotated with meaningful labels for rapid identification of the desired group of results. In other situations, users may have different search goals even when they express them with the same query. In this case the search results should be personalized according to the users' on-line activities. In order to address this need, we will discuss the algorithmic ideas behind SnakeT, a hierarchical clustering meta-search engine which personalizes searches according to the clusters selected by users on-the-fly. There are also situations where users might desire to access fresh information. In these cases, traditional link analysis could not be suitable. In fact, it is possible that there is not enough time to have many links pointing to a recently produced piece of information. In order to address this need, we will discuss the algorithmic and numerical ideas behind a new ranking algorithm suitable for ranking fresh type of information, such as news articles or blogs. When link analysis suffices to produce good quality search results, the huge amount of Web information asks for fast ranking methodologies. We will discuss numerical methodologies for accelerating the eingenvector-like computation, commonly used by link analysis. An important result of this thesis is that we show how to address the above predominant issues of Web Information Retrieval by using clustering and ranking methodologies. We will demonstrate that both clustering and ranking have a mutual reinforcement propriety which has not yet been studied intensively. This propriety can be exploited to boost the precision of both the two methodologies

    Effective web crawlers

    Get PDF
    Web crawlers are the component of a search engine that must traverse the Web, gathering documents in a local repository for indexing by a search engine so that they can be ranked by their relevance to user queries. Whenever data is replicated in an autonomously updated environment, there are issues with maintaining up-to-date copies of documents. When documents are retrieved by a crawler and have subsequently been altered on the Web, the effect is an inconsistency in user search results. While the impact depends on the type and volume of change, many existing algorithms do not take the degree of change into consideration, instead using simple measures that consider any change as significant. Furthermore, many crawler evaluation metrics do not consider index freshness or the amount of impact that crawling algorithms have on user results. Most of the existing work makes assumptions about the change rate of documents on the Web, or relies on the availability of a long history of change. Our work investigates approaches to improving index consistency: detecting meaningful change, measuring the impact of a crawl on collection freshness from a user perspective, developing a framework for evaluating crawler performance, determining the effectiveness of stateless crawl ordering schemes, and proposing and evaluating the effectiveness of a dynamic crawl approach. Our work is concerned specifically with cases where there is little or no past change statistics with which predictions can be made. Our work analyses different measures of change and introduces a novel approach to measuring the impact of recrawl schemes on search engine users. Our schemes detect important changes that affect user results. Other well-known and widely used schemes have to retrieve around twice the data to achieve the same effectiveness as our schemes. Furthermore, while many studies have assumed that the Web changes according to a model, our experimental results are based on real web documents. We analyse various stateless crawl ordering schemes that have no past change statistics with which to predict which documents will change, none of which, to our knowledge, has been tested to determine effectiveness in crawling changed documents. We empirically show that the effectiveness of these schemes depends on the topology and dynamics of the domain crawled and that no one static crawl ordering scheme can effectively maintain freshness, motivating our work on dynamic approaches. We present our novel approach to maintaining freshness, which uses the anchor text linking documents to determine the likelihood of a document changing, based on statistics gathered during the current crawl. We show that this scheme is highly effective when combined with existing stateless schemes. When we combine our scheme with PageRank, our approach allows the crawler to improve both freshness and quality of a collection. Our scheme improves freshness regardless of which stateless scheme it is used in conjunction with, since it uses both positive and negative reinforcement to determine which document to retrieve. Finally, we present the design and implementation of Lara, our own distributed crawler, which we used to develop our testbed
    corecore