3,175 research outputs found

    Information Extraction in Illicit Domains

    Full text link
    Extracting useful entities and attribute values from illicit domains such as human trafficking is a challenging problem with the potential for widespread social impact. Such domains employ atypical language models, have `long tails' and suffer from the problem of concept drift. In this paper, we propose a lightweight, feature-agnostic Information Extraction (IE) paradigm specifically designed for such domains. Our approach uses raw, unlabeled text from an initial corpus, and a few (12-120) seed annotations per domain-specific attribute, to learn robust IE models for unobserved pages and websites. Empirically, we demonstrate that our approach can outperform feature-centric Conditional Random Field baselines by over 18\% F-Measure on five annotated sets of real-world human trafficking datasets in both low-supervision and high-supervision settings. We also show that our approach is demonstrably robust to concept drift, and can be efficiently bootstrapped even in a serial computing environment.Comment: 10 pages, ACM WWW 201

    API2MoL: Automating the building of bridges between APIs and Model-Driven Engineering

    Get PDF
    International audienceContext: A software artefact typically makes its functionality available through a specialized Application Programming Interface (API) describing the set of services offered to client applications. In fact, building any software system usually involves managing a plethora of APIs, which complicates the development process. In Model-Driven Engineering (MDE), where models are the key elements of any software engineering activity, this API management should take place at the model level. Therefore, tools that facilitate the integration of APIs and MDE are clearly needed. Objective: Our goal is to automate the implementation of API-MDE bridges for supporting both the creation of models from API objects and the generation of such API objects from models. In this sense, this paper presents the API2MoL approach, which provides a declarative rule-based language to easily write mapping definitions to link API specifications and the metamodel that represents them. These definitions are then executed to convert API objects into model elements or vice versa. The approach also allows both the metamodel and the mapping to be automatically obtained from the API specification (bootstrap process). Method: After implementing the API2MoL engine, its correctness was validated using several APIs. Since APIs are normally large, we then developed a tool to implement the bootstrap process, which was also validated. Results: We provide a toolkit (language and bootstrap tool) for the creation of bridges between APIs and MDE. The current implementation focuses on Java APIs, although its adaptation to other statically typed object-oriented languages is straightforward. The correctness, expressiveness and completeness of the approach have been validated with the Swing, SWT and JTwitter APIs. Conclusion: API2MoL frees developers from having to manually implement the tasks of obtaining models from API objects and generating such objects from models. This helps to manage API models in MDE-based solutions

    The interplay of descriptor-based computational analysis with pharmacophore modeling builds the basis for a novel classification scheme for feruloyl esterases

    Get PDF
    One of the most intriguing groups of enzymes, the feruloyl esterases (FAEs), is ubiquitous in both simple and complex organisms. FAEs have gained importance in biofuel, medicine and food industries due to their capability of acting on a large range of substrates for cleaving ester bonds and synthesizing high-added value molecules through esterification and transesterification reactions. During the past two decades extensive studies have been carried out on the production and partial characterization of FAEs from fungi, while much less is known about FAEs of bacterial or plant origin. Initial classification studies on FAEs were restricted on sequence similarity and substrate specificity on just four model substrates and considered only a handful of FAEs belonging to the fungal kingdom. This study centers on the descriptor-based classification and structural analysis of experimentally verified and putative FAEs; nevertheless, the framework presented here is applicable to every poorly characterized enzyme family. 365 FAE-related sequences of fungal, bacterial and plantae origin were collected and they were clustered using Self Organizing Maps followed by k-means clustering into distinct groups based on amino acid composition and physico-chemical composition descriptors derived from the respective amino acid sequence. A Support Vector Machine model was subsequently constructed for the classification of new FAEs into the pre-assigned clusters. The model successfully recognized 98.2% of the training sequences and all the sequences of the blind test. The underlying functionality of the 12 proposed FAE families was validated against a combination of prediction tools and published experimental data. Another important aspect of the present work involves the development of pharmacophore models for the new FAE families, for which sufficient information on known substrates existed. Knowing the pharmacophoric features of a small molecule that are essential for binding to the members of a certain family opens a window of opportunities for tailored applications of FAEs

    Individual employment effects of job creation schemes in Germany with respect to sectoral heterogeneity

    Get PDF
    "Job creation schemes (JCS) have been one important programme of active labour market policy (ALMP) in Germany for a long time. They aim at the re-integration of hard-to-place unemployed into regular employment. A thorough microeconometric evaluation of these programmes was hindered by the fact, that the available (survey) datasets have been too small to account for a possible occurrence of effect heterogeneity. However, identifying effect heterogeneity can help to improve the design and implementation of future programmes. Hence, we use an administrative dataset of the Federal Employment Agency, containing over 11,000 participants to analyse the employment effects of JCS on an individual level. Whereas in a previous paper we analysed these effects with respect to group-specific and regional heterogeneity, we focus here explicitly on effect heterogeneity caused by differences in the implementation of programmes. In particular, we first evaluate the effects with respect to the economic sector in which the JCS are accomplished. Second, we analyse if different types of promotion lead to different effects. And finally we examine if there are varying effects which can be attributed to different implementing institutions. The results are rather discouraging and show that JCS are in general not able to improve the re-integration chances of participants into regular employment." (Author's abstract, IAB-Doku) ((en))IAB-Bewerberangebotsdatei, IAB-Maßnahmeteilnehmergrunddatei, Arbeitsbeschaffungsmaßnahme - Erfolgskontrolle, Wirkungsforschung, Wirtschaftszweige, Trägerschaft, berufliche Reintegration, Teilnehmer, beruflicher Verbleib, Geschlechterverteilung, Westdeutschland, Ostdeutschland, Bundesrepublik Deutschland
    corecore