2,174 research outputs found
Un environnement de spécification et de découverte pour la réutilisation des composants logiciels dans le développement des logiciels distribués
Notre travail vise à élaborer une solution efficace pour la découverte et la réutilisation des composants logiciels dans les environnements de développement existants et couramment utilisés. Nous proposons une ontologie pour décrire et découvrir des composants logiciels élémentaires. La description couvre à la fois les propriétés fonctionnelles et les propriétés non fonctionnelles des composants logiciels exprimées comme des paramètres de QoS. Notre processus de recherche est basé sur la fonction qui calcule la distance sémantique entre la signature d'un composant et la signature d'une requête donnée, réalisant ainsi une comparaison judicieuse. Nous employons également la notion de " subsumption " pour comparer l'entrée-sortie de la requête et des composants. Après sélection des composants adéquats, les propriétés non fonctionnelles sont employées comme un facteur distinctif pour raffiner le résultat de publication des composants résultats. Nous proposons une approche de découverte des composants composite si aucun composant élémentaire n'est trouvé, cette approche basée sur l'ontologie commune. Pour intégrer le composant résultat dans le projet en cours de développement, nous avons développé l'ontologie d'intégration et les deux services " input/output convertor " et " output Matching ".Our work aims to develop an effective solution for the discovery and the reuse of software components in existing and commonly used development environments. We propose an ontology for describing and discovering atomic software components. The description covers both the functional and non functional properties which are expressed as QoS parameters. Our search process is based on the function that calculates the semantic distance between the component interface signature and the signature of a given query, thus achieving an appropriate comparison. We also use the notion of "subsumption" to compare the input/output of the query and the components input/output. After selecting the appropriate components, the non-functional properties are used to refine the search result. We propose an approach for discovering composite components if any atomic component is found, this approach based on the shared ontology. To integrate the component results in the project under development, we developed the ontology integration and two services " input/output convertor " and " output Matching "
Search based software engineering: Trends, techniques and applications
© ACM, 2012. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version is available from the link below.In the past five years there has been a dramatic increase in work on Search-Based Software Engineering (SBSE), an approach to Software Engineering (SE) in which Search-Based Optimization (SBO) algorithms are used to address problems in SE. SBSE has been applied to problems throughout the SE lifecycle, from requirements and project planning to maintenance and reengineering. The approach is attractive because it offers a suite of adaptive automated and semiautomated solutions in situations typified by large complex problem spaces with multiple competing and conflicting objectives.
This article provides a review and classification of literature on SBSE. The work identifies research trends and relationships between the techniques applied and the applications to which they have been applied and highlights gaps in the literature and avenues for further research.EPSRC and E
FixMiner: Mining Relevant Fix Patterns for Automated Program Repair
Patching is a common activity in software development. It is generally
performed on a source code base to address bugs or add new functionalities. In
this context, given the recurrence of bugs across projects, the associated
similar patches can be leveraged to extract generic fix actions. While the
literature includes various approaches leveraging similarity among patches to
guide program repair, these approaches often do not yield fix patterns that are
tractable and reusable as actionable input to APR systems. In this paper, we
propose a systematic and automated approach to mining relevant and actionable
fix patterns based on an iterative clustering strategy applied to atomic
changes within patches. The goal of FixMiner is thus to infer separate and
reusable fix patterns that can be leveraged in other patch generation systems.
Our technique, FixMiner, leverages Rich Edit Script which is a specialized tree
structure of the edit scripts that captures the AST-level context of the code
changes. FixMiner uses different tree representations of Rich Edit Scripts for
each round of clustering to identify similar changes. These are abstract syntax
trees, edit actions trees, and code context trees. We have evaluated FixMiner
on thousands of software patches collected from open source projects.
Preliminary results show that we are able to mine accurate patterns,
efficiently exploiting change information in Rich Edit Scripts. We further
integrated the mined patterns to an automated program repair prototype,
PARFixMiner, with which we are able to correctly fix 26 bugs of the Defects4J
benchmark. Beyond this quantitative performance, we show that the mined fix
patterns are sufficiently relevant to produce patches with a high probability
of correctness: 81% of PARFixMiner's generated plausible patches are correct.Comment: 31 pages, 11 figure
Query-Time Data Integration
Today, data is collected in ever increasing scale and variety, opening up enormous potential for new insights and data-centric products. However, in many cases the volume and heterogeneity of new data sources precludes up-front integration using traditional ETL processes and data warehouses. In some cases, it is even unclear if and in what context the collected data will be utilized. Therefore, there is a need for agile methods that defer the effort of integration until the usage context is established.
This thesis introduces Query-Time Data Integration as an alternative concept to traditional up-front integration. It aims at enabling users to issue ad-hoc queries on their own data as if all potential other data sources were already integrated, without declaring specific sources and mappings to use. Automated data search and integration methods are then coupled directly with query processing on the available data. The ambiguity and uncertainty introduced through fully automated retrieval and mapping methods is compensated by answering those queries with ranked lists of alternative results. Each result is then based on different data sources or query interpretations, allowing users to pick the result most suitable to their information need.
To this end, this thesis makes three main contributions. Firstly, we introduce a novel method for Top-k Entity Augmentation, which is able to construct a top-k list of consistent integration results from a large corpus of heterogeneous data sources. It improves on the state-of-the-art by producing a set of individually consistent, but mutually diverse, set of alternative solutions, while minimizing the number of data sources used. Secondly, based on this novel augmentation method, we introduce the DrillBeyond system, which is able to process Open World SQL queries, i.e., queries referencing arbitrary attributes not defined in the queried database. The original database is then augmented at query time with Web data sources providing those attributes. Its hybrid augmentation/relational query processing enables the use of ad-hoc data search and integration in data analysis queries, and improves both performance and quality when compared to using separate systems for the two tasks. Finally, we studied the management of large-scale dataset corpora such as data lakes or Open Data platforms, which are used as data sources for our augmentation methods. We introduce Publish-time Data Integration as a new technique for data curation systems managing such corpora, which aims at improving the individual reusability of datasets without requiring up-front global integration. This is achieved by automatically generating metadata and format recommendations, allowing publishers to enhance their datasets with minimal effort.
Collectively, these three contributions are the foundation of a Query-time Data Integration architecture, that enables ad-hoc data search and integration queries over large heterogeneous dataset collections
Development of a framework for the classification of antibiotics adjuvants
Dissertação de mestrado em BioInformaticsThroughout the last decades, bacteria have become increasingly resistant to available
antibiotics, leading to a growing need for new antibiotics and new drug development
methodologies. In the last 40 years, there are no records of the development of new
antibiotics, which has begun to shorten possible alternatives. Therefore, finding new
antibiotics and bringing them to market is increasingly challenging. One approach is finding
compounds that restore or leverage the activity of existing antibiotics against biofilm bacteria.
As the information in this field is very limited and there is no database regarding this theme,
machine learning models were used to predict the relevance of the documents regarding
adjuvants.
In this project, the BIOFILMad - Catalog of antimicrobial adjuvants to tackle biofilms
application was developed to help researchers save time in their daily research. This
application was constructed using Django and Django REST Framework for the backend
and React for the frontend.
As for the backend, a database needed to be constructed since no database entirely
focuses on this topic. For that, a machine learning model was trained to help us classify
articles. Three different algorithms were used, Support-Vector Machine (SVM), Random
Forest (RF), and Logistic Regression (LR), combined with a different number of features
used, more precisely, 945 and 1890. When analyzing all metrics, model LR-1 performed
the best for classifying relevant documents with an accuracy score of 0.8461, a recall score
of 0.6170, an f1-score of 0.6904, and a precision score of 0.7837. This model is the best at
correctly predicting the relevant documents, as proven by the higher recall score compared
to the other models. With this model, our database was populated with relevant information.
Our backend has a unique feature, the aggregation feature constructed with Named
Entity Recognition (NER). The goal is to identify specific entity types, in our case, it identifies CHEMICAL and DISEASE. An association between these entities was made, thus delivering
the user the respective associations, saving researchers time. For example, a researcher can
see with which compounds "pseudomonas aeruginosa" has already been tested thanks to this
aggregation feature.
The frontend was implemented so the user could access this aggregation feature, see the
articles present in the database, use the machine learning models to classify new documents,
and insert them in the database if they are relevant.Ao longo das últimas décadas, as bactérias tornaram-se cada vez mais resistentes aos
antibióticos disponíveis, levando a uma crescente necessidade de novos antibióticos e novas
metodologias de desenvolvimento de medicamentos. Nos últimos 40 anos, não há registos
do desenvolvimento de novos antibióticos, o que começa a reduzir as alternativas possíveis.
Portanto, criar novos antibióticos e torna-los disponíveis no mercado é cada vez mais
desafiante. Uma abordagem é a descoberta de compostos que restaurem ou potencializem a
atividade dos antibióticos existentes contra bactérias multirresistentes. Como as informações
neste campo são muito limitadas e não há uma base de dados sobre este tema, modelos
de Machine Learning foram utilizados para prever a relevância dos documentos acerca dos
adjuvantes.
Neste projeto, foi desenvolvida a aplicação BIOFILMad - Catalog of antimicrobial adjuvants
to tackle biofilms para ajudar os investigadores a economizar tempo nas suas pesquisas. Esta
aplicação foi construída usando o Django e Django REST Framework para o backend e React
para o frontend.
Quanto ao backend, foi necessário construir uma base de dados, pois não existe nenhuma
que se concentre inteiramente neste tópico. Para isso, foi treinado um modelo machine
learning para nos ajudar a classificar os artigos. Três algoritmos diferentes foram usados:
Support-Vector Machine (SVM), Random Forest (RF) e Logistic Regression (LR), combinados
com um número diferente de features, mais precisamente, 945 e 1890. Ao analisar todas as
métricas, o modelo LR-1 teve o melhor desempenho para classificar artigos relevantes com
uma accuracy de 0,8461, um recall de 0,6170, um f1-score de 0,6904 e uma precision de 0,7837.
Este modelo foi o melhor a prever corretamente os artigos relevantes, comprovado pelo
alto recall em comparação com os outros modelos. Com este modelo, a base de dados foi
populda com informação relevante.
O backend apresenta uma caracteristica particular, a agregação construída com Named-Entity-Recognition (NER). O objetivo é identificar tipos específicos de entidades, no nosso
caso, identifica QUÍMICOS e DOENÇAS. Esta classificação serviu para formar associações
entre entidades, demonstrando ao utilizador as respetivas associações feitas, permitindo
economizar o tempo dos investigadores. Por exemplo, um investigador pode ver com quais
compostos a "pseudomonas aeruginosa" já foi testada graças à funcionalidade de agregação.
O frontend foi implementado para que o utilizador possa ter acesso a esta
funcionalidade de agregação, ver os artigos presentes na base de dados, utilizar o modelo
de machine learning para classificar novos artigos e inseri-los na base de dados caso sejam
relevantes
Case-based reasoning: concepts, features and soft computing
Here we first describe the concepts, components and features of CBR. The feasibility and merits of using CBR for problem solving is then explained. This is followed by a description of the relevance of soft computing tools to CBR. In particular, some of the tasks in the four REs, namely Retrieve, Reuse, Revise and Retain, of the CBR cycle that have relevance as prospective candidates for soft computing applications are explained
- …