58 research outputs found

    Integrating the common variability language with multilanguage annotations for web engineering

    Get PDF
    Web applications development involves managing a high diversity of files and resources like code, pages or style sheets, implemented in different languages. To deal with the automatic generation of custom-made configurations of web applications, industry usually adopts annotation-based approaches even though the majority of studies encourage the use of composition-based approaches to implement Software Product Lines. Recent work tries to combine both approaches to get the complementary benefits. However, technological companies are reticent to adopt new development paradigms such as feature-oriented programming or aspect-oriented programming. Moreover, it is extremely difficult, or even impossible, to apply these programming models to web applications, mainly because of their multilingual nature, since their development involves multiple types of source code (Java, Groovy, JavaScript), templates (HTML, Markdown, XML), style sheet files (CSS and its variants, such as SCSS), and other files (JSON, YML, shell scripts). We propose to use the Common Variability Language as a composition-based approach and integrate annotations to manage fine grained variability of a Software Product Line for web applications. In this paper, we (i) show that existing composition and annotation-based approaches, including some well-known combinations, are not appropriate to model and implement the variability of web applications; and (ii) present a combined approach that effectively integrates annotations into a composition-based approach for web applications. We implement our approach and show its applicability with an industrial real-world system.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Combining Multiple Granularity Variability in a Software Product Line Approach for Web Engineering

    Get PDF
    [Abstract] Context: Web engineering involves managing a high diversity of artifacts implemented in different languages and with different levels of granularity. Technological companies usually implement variable artifacts of Software Product Lines (SPLs) using annotations, being reluctant to adopt hybrid, often complex, approaches combining composition and annotations despite their benefits. Objective: This paper proposes a combined approach to support fine and coarse-grained variability for web artifacts. The proposal allows web developers to continue using annotations to handle fine-grained variability for those artifacts whose variability is very difficult to implement with a composition-based approach, but obtaining the advantages of the composition-based approach for the coarse-grained variable artifacts. Methods: A combined approach based on feature modeling that integrates annotations into a generic composition-based approach. We propose the definition of compositional and annotative variation points with custom-defined semantics, which is resolved by a scaffolding-based derivation engine. The approach is evaluated on a real-world web-based SPL by applying a set of variability metrics, as well as discussing its quality criteria in comparison with annotations, compositional, and combined existing approaches. Results: Our approach effectively handles both fine and coarse-grained variability. The mapping between the feature model and the web artifacts promotes the traceability of the features and the uniformity of the variation points regardless of the granularity of the web artifacts. Conclusions: Using well-known techniques of SPLs from an architectural point of view, such as feature modeling, can improve the design and maintenance of variable web artifacts without the need of introducing complex approaches for implementing the underlying variability.The work of the authors from the Universidad de Málaga is supported by the projects Magic P12-TIC1814 (post-doctoral research grant), MEDEA RTI2018-099213-B-I00 (co-financed by FEDER funds), Rhea P18-FR-1081 (MCI/AEI/FEDER, UE), LEIA UMA18-FEDERIA-157, TASOVA MCIU-AEI TIN2017-90644-REDT and, European Union’s H2020 research and innovation program under grant agreement DAEMON 101017109. The work of the authors from the Universidade da Coruña has been funded by MCIN/AEI/10.13039/501100011033, NextGenerationEU/PRTR, FLATCITY-POC: PDC2021-121239-C31 ; MCIN/AEI/10.13039/501100011033 EXTRACompact: PID2020-114635RB-I00 ; GAIN/Xunta de Galicia/ERDF CEDCOVID: COV20/00604 ; Xunta de Galicia/FEDER-UE GRC: ED431C 2021/53 ; MICIU/FEDER-UE BIZDEVOPSGLOBAL: RTI-2018-098309-B-C32 ; MCIN/AEI/10.13039/501100011033 MAGIST: PID2019-105221RB-C41Junta de Andalucía; P12-TIC-1814Universidad de Málaga; UMA18-FEDERIA-157Xunta de Galicia; COV20/00604Xunta de Galicia; ED431C 2021/53Junta de Andalucía; P18-FR-108

    Towards Erlang-based ABS Microservices Framework for Software Product Line Development

    Get PDF
    The current widely used software system can be categorised as a large or very large decentralised control system with various requirements and continuous interchangeable elements. This characteristic leads to a need to control the variability to manage such systems. Software Product Line Engineering (SPLE) is one of the approaches that can manage the variability by developing sets of products. However, there is a need for support tools for development with software product line engineering. One language that supports the SPLE process is Abstract Behavioral Specification (ABS). Some SPLE research has used ABS to create frameworks that support the SPLE process. ABS Microservices is one research that utilises ABS to create a web framework that supports the SPLE process. This framework uses ABS to generate Java-based applications. The research interest in the web application is driven by the fact that it is one of the software types widely used by organisations and serves as the primary support of their business. Microservices are highly interoperable, thus enabling researchers to integrate different technology from other research. However, there is a need for renewal to the ABS Microservices framework. There is a need for more variants of SPLE-enabled frameworks that use more programming language as a specific programming language has its strength and weakness. Deprecation of the Java backend of the ABS opens a new exploration of another web framework that uses other ABS backend languages. We present the ABS microservices web framework based on Erlang OTP. We choose Erlang because it promises more efficient resource usage and the Erlang backend is one of the ABS backends with the most available features. This research aims to create an entry point for ABS Microservices to support more language. This research shows that the Erlang variant of ABS Microservices has less resource usage than the Java variant. Hence, this promises more options to develop product lines using ABS Microservices

    Software product line engineering: a practical experience

    Get PDF
    The lack of mature tool support is one of the main reasons that make the industry to be reluctant to adopt Software Product Line (SPL) approaches. A number of systematic literature reviews exist that identify the main characteristics offered by existing tools and the SPL phases in which they can be applied. However, these reviews do not really help to understand if those tools are offering what is really needed to apply SPLs to complex projects. These studies are mainly based on information extracted from the tool documentation or published papers. In this paper, we follow a different approach, in which we firstly identify those characteristics that are currently essential for the development of an SPL, and secondly analyze whether the tools provide or not support for those characteristics. We focus on those tools that satisfy certain selection criteria (e.g., they can be downloaded and are ready to be used). The paper presents a state of practice with the availability and usability of the existing tools for SPL, and defines different roadmaps that allow carrying out a complete SPL process with the existing tool support.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech. Magic P12-TIC1814, HADAS TIN2015-64841-R (cofinanciado con fondos FEDER), MEDEA RTI2018-099213-B-I00 (cofinanciado con fondos FEDER), TASOVA MCIU-AEI TIN2017-90644-RED

    Advances in Automatic Keyphrase Extraction

    Get PDF
    The main purpose of this thesis is to analyze and propose new improvements in the field of Automatic Keyphrase Extraction, i.e., the field of automatically detecting the key concepts in a document. We will discuss, in particular, supervised machine learning algorithms for keyphrase extraction, by first identifying their shortcomings and then proposing new techniques which exploit contextual information to overcome them. Keyphrase extraction requires that the key concepts, or \emph{keyphrases}, appear verbatim in the body of the document. We will identify the fact that current algorithms do not use contextual information when detecting keyphrases as one of the main shortcomings of supervised keyphrase extraction. Instead, statistical and positional cues, like the frequency of the candidate keyphrase or its first appearance in the document, are mainly used to determine if a phrase appearing in a document is a keyphrase or not. For this reason, we will prove that a supervised keyphrase extraction algorithm, by using only statistical and positional features, is actually able to extract good keyphrases from documents written in languages that it has never seen. The algorithm will be trained over a common dataset for the English language, a purpose-collected dataset for the Arabic language, and evaluated on the Italian, Romanian and Portuguese languages as well. This result is then used as a starting point to develop new algorithms that use contextual information to increase the performance in automatic keyphrase extraction. The first algorithm that we present uses new linguistics features based on anaphora resolution, which is a field of natural language processing that exploits the relations between elements of the discourse as, e.g., pronouns. We evaluate several supervised AKE pipelines based on these features on the well-known SEMEVAL 2010 dataset, and we show that the performance increases when we add such features to a model that employs statistical and positional knowledge only. Finally, we investigate the possibilities offered by the field of Deep Learning, by proposing six different deep neural networks that perform automatic keyphrase extraction. Such networks are based on bidirectional long-short term memory networks, or on convolutional neural networks, or on a combination of both of them, and on a neural language model which creates a vector representation of each word of the document. These networks are able to learn new features using the the whole document when extracting keyphrases, and they have the advantage of not needing a corpus after being trained to extract keyphrases from new documents. We show that with deep learning based architectures we are able to outperform several other keyphrase extraction algorithms, both supervised and not supervised, used in literature and that the best performances are obtained when we build an additional neural representation of the input document and we append it to the neural language model. Both the anaphora-based and the deep-learning based approaches show that using contextual information, the performance in supervised algorithms for automatic keyphrase extraction improves. In fact, in the methods presented in this thesis, the algorithms which obtained the best performance are the ones receiving more contextual information, both about the relations of the potential keyphrase with other parts of the document, as in the anaphora based approach, and in the shape of a neural representation of the input document, as in the deep learning approach. In contrast, the approach of using statistical and positional knowledge only allows the building of language agnostic keyphrase extraction algorithms, at the cost of decreased precision and recall

    A Citizen Science Approach for Analyzing Social Media With Crowdsourcing

    Get PDF
    Social media have the potential to provide timely information about emergency situations and sudden events. However, finding relevant information among the millions of posts being added every day can be difficult, and in current approaches developing an automatic data analysis project requires time and technical skills. This work presents a new approach for the analysis of social media posts, based on configurable automatic classification combined with Citizen Science methodologies. The process is facilitated by a set of flexible, automatic and open-source data processing tools called the Citizen Science Solution Kit. The kit provides a comprehensive set of tools that can be used and personalized in different situations, particularly during natural emergencies, starting from images and text contained in the posts. The tools can be employed by citizen scientists for filtering, classifying, and geolocating the content with a human-in-the-loop approach to support the data analyst, including feedback and suggestions on how to configure the automated tools, and techniques to gather inputs from citizens. Using flooding scenario as a guiding example, this paper illustrates the structure and functioning of the different tools proposed to support citizens scientists in their projects, and a methodological approach to their use. The process is then validated by discussing three case studies based on the Albania earthquake of 2019, the Covid-19 pandemic, and the Thailand floods of 2021. The results suggest that a flexible approach to tools composition and configuration can support a timely setup of an analysis project by citizen scientists, especially in case of emergencies in unexpected locations.ISSN:2169-353

    A Generic method for assembling software product line components

    Get PDF
    Software product lines (SPL) facilitate the industrialization of software development. The main goal is to create a set of reusable software components for the rapid production of a software systems family. Many authors propose different approaches to implement and assemble the reusable components of an SPL. However, the construction and assembly of these components continue to be a complex and time-consuming process. This thesis analyzes the advantages and disadvantages of the current approaches to implement and assemble the reusable components of an SPL. Taking advantage of these elements and with the goal of developing a generic method (which can be applied to several software components developed in different software languages), we develop Fragment-oriented programming (FragOP), a framework to design, implement and reuse SPL domain components. FragOP is based on: (i) domain components, (ii) domain files, (iii) fragmentation points, (iv) fragments, (v) customization points, and (vi) customization files. FragOP was implemented in an open-source tool called VariaMos, and we also carried out three evaluations: (i) we created a clothing stores SPL, derived five different products, and discussed the results. (ii) We developed a discussion about the comparison between FragOP and other approaches. And (iii) we designed and executed a usability test of VariaMos to support the FragOP approach. The results show preliminary evidence that the use of FragOP reduces the manual intervention when assembling SPL domain components and it can be used as a generic method for assembling assets and SPL components developed in different software languages.Las líneas de productos de software (LPS) promueven la industrialización del desarrollo de software mediante la definición y ensamblaje de componentes reutilizables de software. Actualmente existen diferentes propuestas para implementar y ensamblar estos componentes. Sin embargo, su construcción y ensamblaje continúa siendo un proceso complejo y que requiere mucho tiempo. Esta tesis analiza las ventajas y desventajas de las diferentes estrategias actuales para implementación y ensamblaje de componentes de LPS. Con base en esto y con el objetivo de desarrollar un método genérico (el cual se pueda aplicar a múltiples componentes de software desarrollados en diferentes lenguajes), esta tesis desarrolla la programación orientada a fragmentos (FragOP), la cual define un marco de trabajo para diseñar, implementar y reutilizar componentes de dominio de LPS. FragOP se basa en: (i) componentes de dominio, (ii) archivos de dominio, (iii) puntos de fragmentación, (iv) fragmentos, (v) puntos de personalización, y (vi) archivos de personalización. Además, se realizó una implementación de FragOP en una herramienta llamada VariaMos, y se llevaron a cabo tres evaluaciones: (i) se creó una LPS de tiendas de ropa, se derivaron cinco productos y se discutieron los resultados. (ii) Se realizó una discusión acerca de la comparación de FragOP y otras propuestas actuales. Y (iii) se diseñó una prueba de usabilidad acerca del soporte de VariaMos para FragOP. Los resultados muestran evidencia preliminar de que el uso de FragOP reduce la intervención manual cuando se ensamblan componentes, y que FragOP puede usarse como un método genérico para el ensamblaje de componentes.Doctorad

    Enriching information extraction pipelines in clinical decision support systems

    Get PDF
    Programa Oficial de Doutoramento en Tecnoloxías da Información e as Comunicacións. 5032V01[Resumo] Os estudos sanitarios de múltiples centros son importantes para aumentar a repercusión dos resultados da investigación médica debido ao número de suxeitos que poden participar neles. Para simplificar a execución destes estudos, o proceso de intercambio de datos debería ser sinxelo, por exemplo, mediante o uso de bases de datos interoperables. Con todo, a consecución desta interoperabilidade segue sendo un tema de investigación en curso, sobre todo debido aos problemas de gobernanza e privacidade dos datos. Na primeira fase deste traballo, propoñemos varias metodoloxías para optimizar os procesos de estandarización das bases de datos sanitarias. Este traballo centrouse na estandarización de fontes de datos heteroxéneas nun esquema de datos estándar, concretamente o OMOP CDM, que foi desenvolvido e promovido pola comunidade OHDSI. Validamos a nosa proposta utilizando conxuntos de datos de pacientes con enfermidade de Alzheimer procedentes de distintas institucións. Na seguinte etapa, co obxectivo de enriquecer a información almacenada nas bases de datos de OMOP CDM, investigamos solucións para extraer conceptos clínicos de narrativas non estruturadas, utilizando técnicas de recuperación de información e de procesamento da linguaxe natural. A validación realizouse a través de conxuntos de datos proporcionados en desafíos científicos, concretamente no National NLP Clinical Challenges(n2c2). Na etapa final, propuxémonos simplificar a execución de protocolos de estudos provenientes de múltiples centros, propoñendo solucións novas para perfilar, publicar e facilitar o descubrimento de bases de datos. Algunhas das solucións desenvolvidas están a utilizarse actualmente en tres proxectos europeos destinados a crear redes federadas de bases de datos de saúde en toda Europa.[Resumen] Los estudios sanitarios de múltiples centros son importantes para aumentar la repercusión de los resultados de la investigación médica debido al número de sujetos que pueden participar en ellos. Para simplificar la ejecución de estos estudios, el proceso de intercambio de datos debería ser sencillo, por ejemplo, mediante el uso de bases de datos interoperables. Sin embargo, la consecución de esta interoperabilidad sigue siendo un tema de investigación en curso, sobre todo debido a los problemas de gobernanza y privacidad de los datos. En la primera fase de este trabajo, proponemos varias metodologías para optimizar los procesos de estandarización de las bases de datos sanitarias. Este trabajo se centró en la estandarización de fuentes de datos heterogéneas en un esquema de datos estándar, concretamente el OMOP CDM, que ha sido desarrollado y promovido por la comunidad OHDSI. Validamos nuestra propuesta utilizando conjuntos de datos de pacientes con enfermedad de Alzheimer procedentes de distintas instituciones. En la siguiente etapa, con el objetivo de enriquecer la información almacenada en las bases de datos de OMOP CDM, hemos investigado soluciones para extraer conceptos clínicos de narrativas no estructuradas, utilizando técnicas de recuperación de información y de procesamiento del lenguaje natural. La validación se realizó a través de conjuntos de datos proporcionados en desafíos científicos, concretamente en el National NLP Clinical Challenges (n2c2). En la etapa final, nos propusimos simplificar la ejecución de protocolos de estudios provenientes de múltiples centros, proponiendo soluciones novedosas para perfilar, publicar y facilitar el descubrimiento de bases de datos. Algunas de las soluciones desarrolladas se están utilizando actualmente en tres proyectos europeos destinados a crear redes federadas de bases de datos de salud en toda Europa.[Abstract] Multicentre health studies are important to increase the impact of medical research findings due to the number of subjects that they are able to engage. To simplify the execution of these studies, the data-sharing process should be effortless, for instance, through the use of interoperable databases. However, achieving this interoperability is still an ongoing research topic, namely due to data governance and privacy issues. In the first stage of this work, we propose several methodologies to optimise the harmonisation pipelines of health databases. This work was focused on harmonising heterogeneous data sources into a standard data schema, namely the OMOP CDM which has been developed and promoted by the OHDSI community. We validated our proposal using data sets of Alzheimer’s disease patients from distinct institutions. In the following stage, aiming to enrich the information stored in OMOP CDM databases, we have investigated solutions to extract clinical concepts from unstructured narratives, using information retrieval and natural language processing techniques. The validation was performed through datasets provided in scientific challenges, namely in the National NLP Clinical Challenges (n2c2). In the final stage, we aimed to simplify the protocol execution of multicentre studies, by proposing novel solutions for profiling, publishing and facilitating the discovery of databases. Some of the developed solutions are currently being used in three European projects aiming to create federated networks of health databases across Europe

    Theory and Applications for Advanced Text Mining

    Get PDF
    Due to the growth of computer technologies and web technologies, we can easily collect and store large amounts of text data. We can believe that the data include useful knowledge. Text mining techniques have been studied aggressively in order to extract the knowledge from the data since late 1990s. Even if many important techniques have been developed, the text mining research field continues to expand for the needs arising from various application fields. This book is composed of 9 chapters introducing advanced text mining techniques. They are various techniques from relation extraction to under or less resourced language. I believe that this book will give new knowledge in the text mining field and help many readers open their new research fields
    corecore