24 research outputs found

    A Comparison of Techniques for Sampling Web Pages

    Get PDF
    As the World Wide Web is growing rapidly, it is getting increasingly challenging to gather representative information about it. Instead of crawling the web exhaustively one has to resort to other techniques like sampling to determine the properties of the web. A uniform random sample of the web would be useful to determine the percentage of web pages in a specific language, on a topic or in a top level domain. Unfortunately, no approach has been shown to sample the web pages in an unbiased way. Three promising web sampling algorithms are based on random walks. They each have been evaluated individually, but making a comparison on different data sets is not possible. We directly compare these algorithms in this paper. We performed three random walks on the web under the same conditions and analyzed their outcomes in detail. We discuss the strengths and the weaknesses of each algorithm and propose improvements based on experimental results

    Identification of Y-Box Binding Protein 1 As a Core Regulator of MEK/ERK Pathway-Dependent Gene Signatures in Colorectal Cancer Cells

    Get PDF
    Transcriptional signatures are an indispensible source of correlative information on disease-related molecular alterations on a genome-wide level. Numerous candidate genes involved in disease and in factors of predictive, as well as of prognostic, value have been deduced from such molecular portraits, e.g. in cancer. However, mechanistic insights into the regulatory principles governing global transcriptional changes are lagging behind extensive compilations of deregulated genes. To identify regulators of transcriptome alterations, we used an integrated approach combining transcriptional profiling of colorectal cancer cell lines treated with inhibitors targeting the receptor tyrosine kinase (RTK)/RAS/mitogen-activated protein kinase pathway, computational prediction of regulatory elements in promoters of co-regulated genes, chromatin-based and functional cellular assays. We identified commonly co-regulated, proliferation-associated target genes that respond to the MAPK pathway. We recognized E2F and NFY transcription factor binding sites as prevalent motifs in those pathway-responsive genes and confirmed the predicted regulatory role of Y-box binding protein 1 (YBX1) by reporter gene, gel shift, and chromatin immunoprecipitation assays. We also validated the MAPK-dependent gene signature in colorectal cancers and provided evidence for the association of YBX1 with poor prognosis in colorectal cancer patients. This suggests that MEK/ERK-dependent, YBX1-regulated target genes are involved in executing malignant properties

    Diminuendo al bottom-Clarifying the semantics of music notation by re-modeling

    No full text
    One of many aspects of musical notation is that of a graphical language which strives to be totally precise, but falls short because it has been defined by historical evolution, cultural construction and de-central ramification. This article applies standard techniques for computer languages to reconstruct a precise model for the syntax and semantics of the historically grown notation systems, taking the conventional way of notating musical dynamics as a simple example. It turns out that no single such model is possible, but a multitude of incompatibles: some have fundamentally different evaluation algorithms, others only slightly different parameter settings. Musical practice is allowed to switch between these models without even noticing their existence, but science may need distinctness. This article constructs and demonstrates an extensible mathematical framework for their precise description and proposes an extensible nomenclature system as a basis for their application and discussion

    Comparing Methods for Near-Uniform URL Sampling

    No full text
    This diploma thesis investigated the problem of sampling URLs uniformly at random from the web. Such a method for sampling URLs can be used to estimate various properties of web pages. For example, one could estimate: • The fraction of web pages written in various languages • The coverage of various search engines • The distribution of web pages in top-level domains In the literature there are two approaches about sampling web pages based on random walks, namely by Henzinger et al. [1] and Bar-Yossef et al [2]. These two methods crawl up to 10 million web pages in their described approaches. The path of the crawl is captured as a directed graph whereas pages represent nodes and links edges. Both methods have only been tested individually. The goal of this project was to compare them by performing random walks and by evaluating the generated random samples (i.e. the results). If these two methods provide nearly equal samples, there is high evidence that both generate good random samples; this means one can make accurate statistics about the internet in a few days given current hardware
    corecore