1,874 research outputs found

    Model-Driven Engineering in the Large: Refactoring Techniques for Models and Model Transformation Systems

    Get PDF
    Model-Driven Engineering (MDE) is a software engineering paradigm that aims to increase the productivity of developers by raising the abstraction level of software development. It envisions the use of models as key artifacts during design, implementation and deployment. From the recent arrival of MDE in large-scale industrial software development – a trend we refer to as MDE in the large –, a set of challenges emerges: First, models are now developed at distributed locations, by teams of teams. In such highly collaborative settings, the presence of large monolithic models gives rise to certain issues, such as their proneness to editing conflicts. Second, in large-scale system development, models are created using various domain-specific modeling languages. Combining these models in a disciplined manner calls for adequate modularization mechanisms. Third, the development of models is handled systematically by expressing the involved operations using model transformation rules. Such rules are often created by cloning, a practice related to performance and maintainability issues. In this thesis, we contribute three refactoring techniques, each aiming to tackle one of these challenges. First, we propose a technique to split a large monolithic model into a set of sub-models. The aim of this technique is to enable a separation of concerns within models, promoting a concern-based collaboration style: Collaborators operate on the submodels relevant for their task at hand. Second, we suggest a technique to encapsulate model components by introducing modular interfaces in a set of related models. The goal of this technique is to establish modularity in these models. Third, we introduce a refactoring to merge a set of model transformation rules exhibiting a high degree of similarity. The aim of this technique is to improve maintainability and performance by eliminating the drawbacks associated with cloning. The refactoring creates variability-based rules, a novel type of rule allowing to capture variability by using annotations. The refactoring techniques contributed in this work help to reduce the manual effort during the refactoring of models and transformation rules to a large extent. As indicated in a series of realistic case studies, the output produced by the techniques is comparable or, in the case of transformation rules, partly even preferable to the result of manual refactoring, yielding a promising outlook on the applicability in real-world settings

    Topic-dependent sentiment analysis of financial blogs

    Get PDF
    While most work in sentiment analysis in the financial domain has focused on the use of content from traditional finance news, in this work we concentrate on more subjective sources of information, blogs. We aim to automatically determine the sentiment of financial bloggers towards companies and their stocks. To do this we develop a corpus of financial blogs, annotated with polarity of sentiment with respect to a number of companies. We conduct an analysis of the annotated corpus, from which we show there is a significant level of topic shift within this collection, and also illustrate the difficulty that human annotators have when annotating certain sentiment categories. To deal with the problem of topic shift within blog articles, we propose text extraction techniques to create topic-specific sub-documents, which we use to train a sentiment classifier. We show that such approaches provide a substantial improvement over full documentclassification and that word-based approaches perform better than sentence-based or paragraph-based approaches

    TopSig: Topology Preserving Document Signatures

    Get PDF
    Performance comparisons between File Signatures and Inverted Files for text retrieval have previously shown several significant shortcomings of file signatures relative to inverted files. The inverted file approach underpins most state-of-the-art search engine algorithms, such as Language and Probabilistic models. It has been widely accepted that traditional file signatures are inferior alternatives to inverted files. This paper describes TopSig, a new approach to the construction of file signatures. Many advances in semantic hashing and dimensionality reduction have been made in recent times, but these were not so far linked to general purpose, signature file based, search engines. This paper introduces a different signature file approach that builds upon and extends these recent advances. We are able to demonstrate significant improvements in the performance of signature file based indexing and retrieval, performance that is comparable to that of state of the art inverted file based systems, including Language models and BM25. These findings suggest that file signatures offer a viable alternative to inverted files in suitable settings and from the theoretical perspective it positions the file signatures model in the class of Vector Space retrieval models.Comment: 12 pages, 8 figures, CIKM 201

    A Conceptual Framework for Efficient Web Crawling in Virtual Integration Contexts

    Get PDF
    Virtual Integration systems require a crawling tool able to navigate and reach relevant pages in the Web in an efficient way. Existing proposals in the crawling area are aware of the efficiency problem, but still most of them need to download pages in order to classify them as relevant or not. In this paper, we present a conceptual framework for designing crawlers supported by a web page classifier that relies solely on URLs to determine page relevance. Such a crawler is able to choose in each step only the URLs that lead to relevant pages, and therefore reduces the number of unnecessary pages downloaded, optimising bandwidth and making it efficient and suitable for virtual integration systems. Our preliminary experiments show that such a classifier is able to distinguish between links leading to different kinds of pages, without previous intervention from the user.Ministerio de Educación y Ciencia TIN2007-64119Junta de Andalucía P07-TIC-2602Junta de Andalucía P08- TIC-4100Ministerio de Ciencia e Innovación TIN2008-04718-EMinisterio de Ciencia e Innovación TIN2010-21744Ministerio de Economía, Industria y Competitividad TIN2010-09809-EMinisterio de Ciencia e Innovación TIN2010-10811-EMinisterio de Ciencia e Innovación TIN2010-09988-

    DNA-inspired online behavioral modeling and its application to spambot detection

    Get PDF
    We propose a strikingly novel, simple, and effective approach to model online user behavior: we extract and analyze digital DNA sequences from user online actions and we use Twitter as a benchmark to test our proposal. We obtain an incisive and compact DNA-inspired characterization of user actions. Then, we apply standard DNA analysis techniques to discriminate between genuine and spambot accounts on Twitter. An experimental campaign supports our proposal, showing its effectiveness and viability. To the best of our knowledge, we are the first ones to identify and adapt DNA-inspired techniques to online user behavioral modeling. While Twitter spambot detection is a specific use case on a specific social media, our proposed methodology is platform and technology agnostic, hence paving the way for diverse behavioral characterization tasks

    @Note: a workbench for biomedical text mining

    Get PDF
    Biomedical Text Mining (BioTM) is providing valuable approaches to the automated curation of scientific literature. However, most efforts have addressed the benchmarking of new algorithms rather than user operational needs. Bridging the gap between BioTM researchers and biologists’ needs is crucial to solve real-world problems and promote further research. We present @Note, a platform for BioTM that aims at the effective translation of the advances between three distinct classes of users: biologists, text miners and software developers. Its main functional contributions are the ability to process abstracts and full-texts; an information retrieval module enabling PubMed search and journal crawling; a pre-processing module with PDF-to-text conversion, tokenisation and stopword removal; a semantic annotation schema; a lexicon-based annotator; a user-friendly annotation view that allows to correct annotations and a Text Mining Module supporting dataset preparation and algorithm evaluation. @Note improves the interoperability, modularity and flexibility when integrating in-home and open-source third-party components. Its component-based architecture allows the rapid development of new applications, emphasizing the principles of transparency and simplicity of use. Although it is still on-going, it has already allowed the development of applications that are currently being used.Fundação para a Ciência e a Tecnologia (FCT
    corecore