1,874 research outputs found
Model-Driven Engineering in the Large: Refactoring Techniques for Models and Model Transformation Systems
Model-Driven Engineering (MDE) is a software engineering paradigm that
aims to increase the productivity of developers by raising the
abstraction level of software development. It envisions the use of
models as key artifacts during design, implementation and deployment.
From the recent arrival of MDE in large-scale industrial software
development – a trend we refer to as MDE in the large –, a set of
challenges emerges: First, models are now developed at distributed
locations, by teams of teams. In such highly collaborative settings, the
presence of large monolithic models gives rise to certain issues, such
as their proneness to editing conflicts. Second, in large-scale system
development, models are created using various domain-specific modeling
languages. Combining these models in a disciplined manner calls for
adequate modularization mechanisms. Third, the development of models is
handled systematically by expressing the involved operations using model
transformation rules. Such rules are often created by cloning, a
practice related to performance and maintainability issues.
In this thesis, we contribute three refactoring techniques, each aiming
to tackle one of these challenges. First, we propose a technique to
split a large monolithic model into a set of sub-models. The aim of this
technique is to enable a separation of concerns within models, promoting
a concern-based collaboration style: Collaborators operate on the
submodels relevant for their task at hand. Second, we suggest a
technique to encapsulate model components by introducing modular
interfaces in a set of related models. The goal of this technique is to
establish modularity in these models. Third, we introduce a refactoring
to merge a set of model transformation rules exhibiting a high degree of
similarity. The aim of this technique is to improve maintainability and
performance by eliminating the drawbacks associated with cloning. The
refactoring creates variability-based rules, a novel type of rule
allowing to capture variability by using annotations.
The refactoring techniques contributed in this work help to reduce the
manual effort during the refactoring of models and transformation rules
to a large extent. As indicated in a series of realistic case studies,
the output produced by the techniques is comparable or, in the case of
transformation rules, partly even preferable to the result of manual
refactoring, yielding a promising outlook on the applicability in
real-world settings
Topic-dependent sentiment analysis of financial blogs
While most work in sentiment analysis in the financial domain has focused on the use of content from traditional finance news, in this work we concentrate on more subjective sources of information, blogs. We aim to automatically determine the sentiment of financial bloggers towards companies and their stocks. To do this we develop a corpus of financial blogs, annotated with polarity of sentiment with respect to a number of companies. We conduct an analysis of the annotated corpus, from which we show there is a significant level of topic shift within this collection, and also illustrate the difficulty that human annotators have when annotating certain sentiment categories. To deal with the problem of topic shift within blog articles, we propose text extraction techniques to create topic-specific sub-documents, which we use to train a sentiment classifier. We show that such approaches provide a substantial improvement over full documentclassification and that word-based approaches perform better than sentence-based or paragraph-based approaches
TopSig: Topology Preserving Document Signatures
Performance comparisons between File Signatures and Inverted Files for text
retrieval have previously shown several significant shortcomings of file
signatures relative to inverted files. The inverted file approach underpins
most state-of-the-art search engine algorithms, such as Language and
Probabilistic models. It has been widely accepted that traditional file
signatures are inferior alternatives to inverted files. This paper describes
TopSig, a new approach to the construction of file signatures. Many advances in
semantic hashing and dimensionality reduction have been made in recent times,
but these were not so far linked to general purpose, signature file based,
search engines. This paper introduces a different signature file approach that
builds upon and extends these recent advances. We are able to demonstrate
significant improvements in the performance of signature file based indexing
and retrieval, performance that is comparable to that of state of the art
inverted file based systems, including Language models and BM25. These findings
suggest that file signatures offer a viable alternative to inverted files in
suitable settings and from the theoretical perspective it positions the file
signatures model in the class of Vector Space retrieval models.Comment: 12 pages, 8 figures, CIKM 201
A Conceptual Framework for Efficient Web Crawling in Virtual Integration Contexts
Virtual Integration systems require a crawling tool able to
navigate and reach relevant pages in the Web in an efficient way. Existing
proposals in the crawling area are aware of the efficiency problem,
but still most of them need to download pages in order to classify them
as relevant or not. In this paper, we present a conceptual framework for
designing crawlers supported by a web page classifier that relies solely
on URLs to determine page relevance. Such a crawler is able to choose
in each step only the URLs that lead to relevant pages, and therefore
reduces the number of unnecessary pages downloaded, optimising bandwidth
and making it efficient and suitable for virtual integration systems.
Our preliminary experiments show that such a classifier is able to distinguish
between links leading to different kinds of pages, without previous
intervention from the user.Ministerio de Educación y Ciencia TIN2007-64119Junta de Andalucía P07-TIC-2602Junta de Andalucía P08- TIC-4100Ministerio de Ciencia e Innovación TIN2008-04718-EMinisterio de Ciencia e Innovación TIN2010-21744Ministerio de Economía, Industria y Competitividad TIN2010-09809-EMinisterio de Ciencia e Innovación TIN2010-10811-EMinisterio de Ciencia e Innovación TIN2010-09988-
DNA-inspired online behavioral modeling and its application to spambot detection
We propose a strikingly novel, simple, and effective approach to model online
user behavior: we extract and analyze digital DNA sequences from user online
actions and we use Twitter as a benchmark to test our proposal. We obtain an
incisive and compact DNA-inspired characterization of user actions. Then, we
apply standard DNA analysis techniques to discriminate between genuine and
spambot accounts on Twitter. An experimental campaign supports our proposal,
showing its effectiveness and viability. To the best of our knowledge, we are
the first ones to identify and adapt DNA-inspired techniques to online user
behavioral modeling. While Twitter spambot detection is a specific use case on
a specific social media, our proposed methodology is platform and technology
agnostic, hence paving the way for diverse behavioral characterization tasks
@Note: a workbench for biomedical text mining
Biomedical Text Mining (BioTM) is providing valuable approaches to the automated curation of scientific literature. However, most efforts have addressed the benchmarking of new algorithms rather than user operational needs. Bridging the gap between BioTM researchers and biologists’ needs is crucial to solve real-world problems and promote further research.
We present @Note, a platform for BioTM that aims at the effective translation of the advances between three distinct classes of users: biologists, text miners and software developers. Its main functional contributions are the ability to process abstracts and full-texts; an information retrieval module enabling PubMed search and journal crawling; a pre-processing module with PDF-to-text conversion, tokenisation and stopword removal; a semantic annotation schema; a lexicon-based annotator; a user-friendly annotation view that allows to correct annotations and a Text Mining Module supporting dataset preparation and algorithm evaluation.
@Note improves the interoperability, modularity and flexibility when integrating in-home and open-source third-party components. Its component-based architecture allows the rapid development of new applications, emphasizing the principles of transparency and simplicity of use. Although it is still on-going, it has already allowed the development of applications that are currently being used.Fundação para a Ciência e a Tecnologia (FCT
- …