2,814 research outputs found

    The iCrawl Wizard -- Supporting Interactive Focused Crawl Specification

    Full text link
    Collections of Web documents about specific topics are needed for many areas of current research. Focused crawling enables the creation of such collections on demand. Current focused crawlers require the user to manually specify starting points for the crawl (seed URLs). These are also used to describe the expected topic of the collection. The choice of seed URLs influences the quality of the resulting collection and requires a lot of expertise. In this demonstration we present the iCrawl Wizard, a tool that assists users in defining focused crawls efficiently and semi-automatically. Our tool uses major search engines and Social Media APIs as well as information extraction techniques to find seed URLs and a semantic description of the crawl intent. Using the iCrawl Wizard even non-expert users can create semantic specifications for focused crawlers interactively and efficiently.Comment: Published in the Proceedings of the European Conference on Information Retrieval (ECIR) 201

    Aggressive aggregation

    Get PDF
    Among the first steps in a compilation pipeline is the construction of an Intermediate Representation (IR), an in-memory representation of the input program. Any attempt to program optimisation, both in terms of size and running time, has to operate on this structure. There may be one or multiple such IRs, however, most compilers use some form of a Control Flow Graph (CFG) internally. This representation clearly aims at general-purpose programming languages, for which it is well suited and allows for many classical program optimisations. On the other hand, a growing structural difference between the input program and the chosen IR can lose or obfuscate information that can be crucial for effective optimisation. With today’s rise of a multitude of different programming languages, Domain-Specific Languages (DSLs), and computing platforms, the classical machine-oriented IR is reaching its limits and a broader variety of IRs is needed. This realisation yielded, e.g., Multi-Level Intermediate Representation (MLIR), a compiler framework that facilitates the creation of a wide range of IRs and encourages their reuse among different programming languages and the corresponding compilers. In this modern spirit, this dissertation explores the potential of Algebraic Decision Diagrams (ADDs) as an IR for (domain-specific) program optimisation. The data structure remains the state of the art for Boolean function representation for more than thirty years and is well-known for its optimality in size and depth, i.e. running time. As such, it is ideally suited to represent the corresponding classes of programs in the role of an IR. We will discuss its application in a variety of different program domains, ranging from DSLs to machine-learned programs and even to general-purpose programming languages. Two representatives for DSLs, a graphical and a textual one, prove the adequacy of ADDs for the program optimisation of modelled decision services. The resulting DSLs facilitate experimentation with ADDs and provide valuable insight into their potential and limitations: input programs can be aggregated in a radical fashion, at the risk of the occasional exponential growth. With the aggregation of large Random Forests into a single aggregated ADD, we bring this potential to a program domain of practical relevance. The results are impressive: both running time and size of the Random Forest program are reduced by multiple orders of magnitude. It turns out that this ADD-based aggregation can be generalised, even to generaliii purpose programming languages. The resulting method achieves impressive speedups for a seemingly optimal program: the iterative Fibonacci implementation. Altogether, ADDs facilitate effective program optimisation where the input programs allow for a natural transformation to the data structure. In these cases, they have proven to be an extremely powerful tool for the optimisation of a program’s running time and, in some cases, of its size. The exploration of their potential as an IR has only started and deserves attention in future research

    Modes of TAL effector-mediated repression

    Get PDF
    Engineered transcription activator-like effectors, or TALEs, have emerged as a new class of designer DNA-binding proteins. Their DNA recognition sites can be specified with great flexibility. When fused to appropriate transcriptional regulatory domains, they can serve as designer transcription factors, modulating the activity of targeted promoters. We created tet operator (tetO)-specific TALEs (tetTALEs), with an identical DNA-binding site as the Tet repressor (TetR) and the TetR-based transcription factors that are extensively used in eukaryotic transcriptional control systems. Different constellations of tetTALEs and tetO modified chromosomal transcription units were analyzed for their efficacy in mammalian cells. We find that tetTALE-silencers can entirely abrogate expression from the strong human EF1{alpha} promoter when binding upstream of the transcriptional control sequence. Remarkably, the DNA-binding domain of tetTALE alone can effectively counteract trans-activation mediated by the potent tettrans-activator and also directly interfere with RNA polymerase II transcription initiation from the strong CMV promoter. Our results demonstrate that TALEs can act as highly versatile tools in genetic engineering, serving as trans-activators, trans-silencers and also competitive repressors

    The End of the Sharing Economy? Impact of COVID-19 on Airbnb in Germany

    Get PDF
    This paper analyzes the effect the COVID-19 pandemic is having on the sharing economy. We focus on hosts’ behavior in the German shared housing market and examine hosts’ adaption to the pandemic state. Using monthly data from January 2019 until December 2020 for the city of Berlin, we conduct a probit model regression analysis and investigate the influence of several Airbnb-listing-specific factors and unemployment on the probability of renting the Airbnb accommodation. Through this big data analysis, we find that hosts switch from short-term to long-term options and rent relatively more entire apartments than shared ones during the COVID-19 pandemic compared to the pre-pandemic state

    Algebraic aggregation of random forests

    Get PDF
    Random Forests are one of the most popular classifiers in machine learning. The larger they are, the more precise the outcome of their predictions. However, this comes at a cost: it is increasingly difficult to understand why a Random Forest made a specific choice, and its running time for classification grows linearly with the size (number of trees). In this paper, we propose a method to aggregate large Random Forests into a single, semantically equivalent decision diagram which has the following two effects: (1) minimal, sufficient explanations for Random Forest-based classifications can be obtained by means of a simple three step reduction, and (2) the running time is radically improved. In fact, our experiments on various popular datasets show speed-ups of several orders of magnitude, while, at the same time, also significantly reducing the size of the required data structure

    Generating Optimal Decision Functions from Rule Specifications

    Get PDF
    In this paper we sketch an approach and a tool for rapid evaluation of large systems of weighted decision rules. The tool re-implements the patented miAamics approach, originally devised as a fast technique for multicriterial decision support. The weighted rules are used to express performance critical decision functions. MiAamics optimizes the function and generates its efficient implementation fully automatically. Being declarative, the rules allow experts to define rich sets of complex functions without being familiar with any general purpose programming language. The approach also lends itself to optimize existing decision functions that can be expressed in the form of these rules.The proposed approach first transforms the system of rules into an intermediate representation of Algebraic Decision Diagrams. From this data structure, we generate code in a variety of commonly used target programming languages.We illustrate the principle and tools on a small, easily comprehensible example and present results from experiments with large systems of randomly generated rules. The proposed representation is significantly faster to evaluate and often of smaller size than the original representation. Possible miAamics applications to machine learning concern reducing ensembles of classifiers and allowing for a much faster evaluation of these classification functions. It can also naturally be applied to large scale recommender systems where performance is key
    • …
    corecore