83 research outputs found

    A Taxonomy of Information Retrieval Models and Tools

    Get PDF
    Information retrieval is attracting significant attention due to the exponential growth of the amount of information available in digital format. The proliferation of information retrieval objects, including algorithms, methods, technologies, and tools, makes it difficult to assess their capabilities and features and to understand the relationships that exist among them. In addition, the terminology is often confusing and misleading, as different terms are used to denote the same, or similar, tasks. This paper proposes a taxonomy of information retrieval models and tools and provides precise definitions for the key terms. The taxonomy consists of superimposing two views: a vertical taxonomy, that classifies IR models with respect to a set of basic features, and a horizontal taxonomy, which classifies IR systems and services with respect to the tasks they support. The aim is to provide a framework for classifying existing information retrieval models and tools and a solid point to assess future developments in the field

    How Clones are Maintained: An Empirical Study

    Full text link
    Despite the conventional wisdom concerning the risks related to the use of source code cloning as a software development strategy, several studies appeared in literature indicated that this is not true. In most cases clones are properly maintained and, when this does not happen, is because cloned code evolves independently. Stemming from previous works, this paper combines clone detection and co–change analysis to investigate how clones are maintained when an evolution activity or a bug fixing impact a source code fragment belonging to a clone class. The two case studies reported confirm that, either for bug fixing or for evolution purposes, most of the cloned code is consistently maintained during the same co–change or during temporally close co–changes

    MOViDA: multiomics visible drug activity prediction with a biologically informed neural network model

    Get PDF
    Motivation: The process of drug development is inherently complex, marked by extended intervals from the inception of a pharmaceutical agent to its eventual launch in the market. Additionally, each phase in this process is associated with a significant failure rate, amplifying the inherent challenges of this task. Computational virtual screening powered by machine learning algorithms has emerged as a promising approach for predicting therapeutic efficacy. However, the complex relationships between the features learned by these algorithms can be challenging to decipher.Results: We have engineered an artificial neural network model designed specifically for predicting drug sensitivity. This model utilizes a biologically informed visible neural network, thereby enhancing its interpretability. The trained model allows for an in-depth exploration of the biological pathways integral to prediction and the chemical attributes of drugs that impact sensitivity. Our model harnesses multiomics data derived from a different tumor tissue sources, as well as molecular descriptors that encapsulate the properties of drugs. We extended the model to predict drug synergy, resulting in favorable outcomes while retaining interpretability. Given the imbalanced nature of publicly available drug screening datasets, our model demonstrated superior performance to state-of-the-art visible machine learning algorithms.Availability and implementation: MOViDA is implemented in Python using PyTorch library and freely available for download at https://github. com/Luigi-Ferraro/MOViDA. Training data, RIS score and drug features are archived on Zenodo https://doi.org/10.5281/zenodo.8180380

    Tracking Your Changes: A Language-Independent Approach

    Full text link

    Detection of statistically significant network changes in complex biological networks

    Get PDF
    Table S1. Description of data: GHD and MRA Results for all the 457 considered transcription factors on the TCGA and Rembrandt datasets. (XLSX 62.7 kb

    Enhancing Online Discussion Forums with Topic-Driven Content Search and Assisted Posting

    Get PDF
    Online forums represent nowadays one of the most popular and rich repository of user generated information over the Internet. Searching information of interest in an online forum may be substantially improved by a proper organization of the forum content. With this aim, in this paper we propose an approach that enhances an existing forum by introducing a navigation structure that enables searching and navigating the forum content by topics of discussion. Topics and hierarchical relations between them are semi-automatically extracted from the forum content by applying Information Retrieval techniques, specifically Topic Models and Formal Concept Analysis. Then, forum posts and discussion threads are associated to discussion topics on a similarity score basis. Moreover, to support automatic moderation in websites that host several forums, we propose a strategy to assist a user writing a new post in choosing the most appropriate forum into which it should be added. An implementation of the topic-driven content search and navigation and assisted posting forum enhancement approaches for the Moodle learning management system is also presented in the paper, opening to the application of these approaches to several real distance learning contexts. Finally, we also report on two case studies that we have conducted to validate the two approaches and evaluate their benefits.Laboratorio de Investigación y Formación en Informática Avanzad

    A negative selection heuristic to predict new transcriptional targets

    Get PDF
    Background: Supervised machine learning approaches have been recently adopted in the inference of transcriptional targets from high throughput trascriptomic and proteomic data showing major improvements from with respect to the state of the art of reverse gene regulatory network methods. Beside traditional unsupervised techniques, a supervised classifier learns, from known examples, a function that is able to recognize new relationships for new data. In the context of gene regulatory inference a supervised classifier is coerced to learn from positive and unlabeled examples, as the counter negative examples are unavailable or hard to collect. Such a condition could limit the performance of the classifier especially when the amount of training examples is low. Results: In this paper we improve the supervised identification of transcriptional targets by selecting reliable counter negative examples from the unlabeled set. We introduce an heuristic based on the known topology of transcriptional networks that in fact restores the conventional positive/negative training condition and shows a significant improvement of the classification performance. We empirically evaluate the proposed heuristic with the experimental datasets of Escherichia coli and show an example of application in the prediction of BCL6 direct core targets in normal germinal center human B cells obtaining a precision of 60%. Conclusions: The availability of only positive examples in learning transcriptional relationships negatively affects the performance of supervised classifiers. We show that the selection of reliable negative examples, a practice adopted in text mining approaches, improves the performance of such classifiers opening new perspectives in the identification of new transcriptional targets

    Deep learning predicts short non-coding RNA functions from only raw sequence data.

    Get PDF
    Small non-coding RNAs (ncRNAs) are short non-coding sequences involved in gene regulation in many biological processes and diseases. The lack of a complete comprehension of their biological functionality, especially in a genome-wide scenario, has demanded new computational approaches to annotate their roles. It is widely known that secondary structure is determinant to know RNA function and machine learning based approaches have been successfully proven to predict RNA function from secondary structure information. Here we show that RNA function can be predicted with good accuracy from a lightweight representation of sequence information without the necessity of computing secondary structure features which is computationally expensive. This finding appears to go against the dogma of secondary structure being a key determinant of function in RNA. Compared to recent secondary structure based methods, the proposed solution is more robust to sequence boundary noise and reduces drastically the computational cost allowing for large data volume annotations. Scripts and datasets to reproduce the results of experiments proposed in this study are available at: https://github.com/bioinformatics-sannio/ncrna-deep
    corecore