60 research outputs found

    Icing: Large-scale inference of immunoglobulin clonotypes

    Get PDF
    Immunoglobulin (IG) clonotype identification is a fundamental open question in modern immunology. An accurate description of the IG repertoire is crucial to understand the variety within the immune system of an individual, potentially shedding light on the pathogenetic process. Intrinsic IG heterogeneity makes clonotype inference an extremely challenging task, both from a computational and a biological point of view. Here we present icing, a framework that allows to reconstruct clonal families also in case of highly mutated sequences. icing has a modular structure, and it is designed to be used with large next generation sequencing (NGS) datasets, a technology which allows the characterisation of large-scale IG repertoires. We extensively validated the framework with clustering performance metrics on the results in a simulated case. icing is implemented in Python, and it is publicly available under FreeBSD licence at https://github.com/slipguru/icing

    Ranked Adjusted Rand: integrating distance and partition information in a measure of clustering agreement

    Get PDF
    BACKGROUND: Biological information is commonly used to cluster or classify entities of interest such as genes, conditions, species or samples. However, different sources of data can be used to classify the same set of entities and methods allowing the comparison of the performance of two data sources or the determination of how well a given classification agrees with another are frequently needed, especially in the absence of a universally accepted "gold standard" classification. RESULTS: Here, we describe a novel measure – the Ranked Adjusted Rand (RAR) index. RAR differs from existing methods by evaluating the extent of agreement between any two groupings, taking into account the intercluster distances. This characteristic is relevant to evaluate cases of pairs of entities grouped in the same cluster by one method and separated by another. The latter method may assign them to close neighbour clusters or, on the contrary, to clusters that are far apart from each other. RAR is applicable even when intercluster distance information is absent for both or one of the groupings. In the first case, RAR is equal to its predecessor, Adjusted Rand (HA) index. Artificially designed clusterings were used to demonstrate situations in which only RAR was able to detect differences in the grouping patterns. A study with larger simulated clusterings ensured that in realistic conditions, RAR is effectively integrating distance and partition information. The new method was applied to biological examples to compare 1) two microbial typing methods, 2) two gene regulatory network distances and 3) microarray gene expression data with pathway information. In the first application, one of the methods does not provide intercluster distances while the other originated a hierarchical clustering. RAR proved to be more sensitive than HA in the choice of a threshold for defining clusters in the hierarchical method that maximizes agreement between the results of both methods. CONCLUSION: RAR has its major advantage in combining cluster distance and partition information, while the previously available methods used only the latter. RAR should be used in the research problems were HA was previously used, because in the absence of inter cluster distance effects it is an equally effective measure, and in the presence of distance effects it is a more complete one

    Evaluation of Jackknife and Bootstrap for Defining Confidence Intervals for Pairwise Agreement Measures

    Get PDF
    Several research fields frequently deal with the analysis of diverse classification results of the same entities. This should imply an objective detection of overlaps and divergences between the formed clusters. The congruence between classifications can be quantified by clustering agreement measures, including pairwise agreement measures. Several measures have been proposed and the importance of obtaining confidence intervals for the point estimate in the comparison of these measures has been highlighted. A broad range of methods can be used for the estimation of confidence intervals. However, evidence is lacking about what are the appropriate methods for the calculation of confidence intervals for most clustering agreement measures. Here we evaluate the resampling techniques of bootstrap and jackknife for the calculation of the confidence intervals for clustering agreement measures. Contrary to what has been shown for some statistics, simulations showed that the jackknife performs better than the bootstrap at accurately estimating confidence intervals for pairwise agreement measures, especially when the agreement between partitions is low. The coverage of the jackknife confidence interval is robust to changes in cluster number and cluster size distribution

    Lazy Lasso for local regression

    Get PDF
    Locally weighted regression is a technique that predicts the response for new data items from their neighbors in the training data set, where closer data items are assigned higher weights in the prediction. However, the original method may suffer from overfitting and fail to select the relevant variables. In this paper we propose combining a regularization approach with locally weighted regression to achieve sparse models. Specifically, the lasso is a shrinkage and selection method for linear regression. We present an algorithm that embeds lasso in an iterative procedure that alternatively computes weights and performs lasso-wise regression. The algorithm is tested on three synthetic scenarios and two real data sets. Results show that the proposed method outperforms linear and local models for several kinds of scenario

    How to find simple and accurate rules for viral protease cleavage specificities

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Proteases of human pathogens are becoming increasingly important drug targets, hence it is necessary to understand their substrate specificity and to interpret this knowledge in practically useful ways. New methods are being developed that produce large amounts of cleavage information for individual proteases and some have been applied to extract cleavage rules from data. However, the hitherto proposed methods for extracting rules have been neither easy to understand nor very accurate. To be practically useful, cleavage rules should be accurate, compact, and expressed in an easily understandable way.</p> <p>Results</p> <p>A new method is presented for producing cleavage rules for viral proteases with seemingly complex cleavage profiles. The method is based on orthogonal search-based rule extraction (OSRE) combined with spectral clustering. It is demonstrated on substrate data sets for human immunodeficiency virus type 1 (HIV-1) protease and hepatitis C (HCV) NS3/4A protease, showing excellent prediction performance for both HIV-1 cleavage and HCV NS3/4A cleavage, agreeing with observed HCV genotype differences. New cleavage rules (consensus sequences) are suggested for HIV-1 and HCV NS3/4A cleavages. The practical usability of the method is also demonstrated by using it to predict the location of an internal cleavage site in the HCV NS3 protease and to correct the location of a previously reported internal cleavage site in the HCV NS3 protease. The method is fast to converge and yields accurate rules, on par with previous results for HIV-1 protease and better than previous state-of-the-art for HCV NS3/4A protease. Moreover, the rules are fewer and simpler than previously obtained with rule extraction methods.</p> <p>Conclusion</p> <p>A rule extraction methodology by searching for multivariate low-order predicates yields results that significantly outperform existing rule bases on out-of-sample data, but are more transparent to expert users. The approach yields rules that are easy to use and useful for interpreting experimental data.</p

    An Overall Index for Comparing Hierarchical Clusterings

    No full text
    In this paper we suggest a new index for measuring thedistance between two hierarchical clusterings. This index can bedecomposed into the contributions pertaining to each stage of thehierarchies. We show the relations of such components with thecurrently used criteria for comparing two partitions. We obtain asimilarity index as the complement to one of the suggesteddistance and we propose its adjustment for agreement due tochance. We consider the extension of the proposed distance andsimilarity measures to more than two dendrograms and their use forthe consensus of classification and variable selection in clusteranalysis

    Clustering Methods

    No full text

    Sjukvårdspersonalens upplevelse i mötet med patienter med fetma

    No full text
    Bakgrund: Då fetma ökar i världen behöver sjukvårdspersonal vara förberedd för att möta dessa patienter ute i vården. I ett möte sker en kontinuerlig kommunikation, vilket i sin tur har en stor betydelse på relationer, då det är en ömsesidig process. Det krävs vidare forskning för att utvärdera hur patientens vikt påverkar mötet med sjukvårdspersonal. Det var därför av intresse att forska vidare inom detta område. Syfte: Att beskriva sjukvårdspersonals upplevelser i mötet med patienter med fetma Metod: Metoden var en litteraturstudie med kvalitativ metod baserad på sju vetenskapliga artiklar. Analysen genomfördes utifrån Graneheim och Lundmans tolkning av kvalitativ innehållsanalys. Resultat: Efter analysprocessen framkom tre kategorier som beskrev studiens resultat. Att känna medkänsla. Att känna frustration. Att känna oro. Slutsats: Sjukvårdspersonals upplevelse i mötet med patienter med fetma bygger till stor del på förutfattade meningar. Då fetma alltmer blir ett växande problem i världen kan detta leda till att sjukvårdspersonal i sitt arbete kommer att möta dessa patienter i större utsträckning. Det behövs därför vidare forskning för att sjukvårdspersonal ska ha en trygg grund att stå på inför mötet med patienter med fetma. Endast då kan fetma bekämpas
    corecore