35 research outputs found

    A comparison of three different methods for classification of breast cancer data

    Get PDF
    The classification of breast cancer patients is of great importance in cancer diagnosis. During the last few years, many algorithms have been proposed for this task. In this paper, we review different supervised machine learning techniques for classification of a novel dataset and perform a methodological comparison of these. We used the C4.5 tree classifier, a Multilayer Perceptron and a naĂŻve Bayes classifier over a large set of tumour markers. We found good performance of the Multilayer Perceptron even when we reduced the number of features to be classified. We found naive Bayes achieved a competitive performance even though the assumption of normality of the data is strongly violated

    Cancer profiles by affinity propagation

    Get PDF
    The affinity propagation algorithm is applied to a problem of breast cancer subtyping using traditional biologic markers. The algorithm provides a procedure to determine the number of profiles to be considered. A well know breast cancer case series was used to compare the results of the affinity propagation with the results obtained with standard algorithms and indexes for the optimal choice of the number of clusters. Results from affinity propagation are consistent with the results already obtained having the advantage of providing an indication about the number of clusters

    A "non-parametric" version of the naive Bayes classifier

    Get PDF
    Many algorithms have been proposed for the machine learning task of classication. One of the simplest methods, the naive Bayes classifyer, has often been found to give good performance despite the fact that its underlying assumptions (of independence and a Normal distribution of the variables) are perhaps violated. In previous work, we applied naive Bayes and other standard algorithms to a breast cancer database from Nottingham City Hospital in which the variables are highly non-Normal and found that the algorithm performed well when predicting a class that had been derived from the same data. However, when we then applied naive Bayes to predict an alternative clinical variable, it performed much worse than other techniques. This motivated us to propose an alternative method, based on naive Bayes, which removes the requirement for the variables to be Normally distributed, but retains the essential structure and other underlying assumptions of the method. We tested our novel algorithm on our breast cancer data and on three UCI datasets which also exhibited strong violations of Normality. We found our algorithm outperformed naive Bayes in all four cases and outperformed multinomial logistic regression (MLR) in two cases. We conclude that our method offers a competitive alternative to MLR and naive Bayes when dealing with data sets in which non-Normal distributions are observed

    Cancer profiles by Affinity Propagation

    Get PDF
    The Affinity Propagation algorithm is applied to various problems of breast and cutaneous tumours subtyping using traditional biologic markers. The algorithm provides a procedure to determine the number of profiles to be considered. Well know breast cancer case series and cutaneous melanoma were used to compare the results of the Affinity Propagation with the results obtained with standard algorithms and indexes for the optimal choice of the number of clusters.Results from Affinity Propagation are consistent with the results already obtained having the advantage of providing an indication about the number of clusters

    Clustering breast cancer data by consensus of different validity indices

    Get PDF
    Clustering algorithms will, in general, either partition a given data set into a pre-specified number of clusters or will produce a hierarchy of clusters. In this paper we analyse several different clustering techniques and apply them to a particular data set of breast cancer data. When we do not know a priori which is the best number of groups, we use a range of different validity indices to test the quality of clustering results and to determine the best number of clusters. While for the K-means method there is not absolute agreement among the indices as to which is the best number of clusters, for the PAM algorithm all the indices indicate 4 as the best cluster number

    Clustering breast cancer data by consensus of different validity indices

    Get PDF
    Clustering algorithms will, in general, either partition a given data set into a pre-specified number of clusters or will produce a hierarchy of clusters. In this paper we analyse several different clustering techniques and apply them to a particular data set of breast cancer data. When we do not know a priori which is the best number of groups, we use a range of different validity indices to test the quality of clustering results and to determine the best number of clusters. While for the K-means method there is not absolute agreement among the indices as to which is the best number of clusters, for the PAM algorithm all the indices indicate 4 as the best cluster number

    Conditional independence relations among biological markers may improve clinical decision as in the case of triple negative breast cancers

    Get PDF
    The associations existing among different biomarkers are important in clinical settings because they contribute to the characterisation of specific pathways related to the natural history of the disease, genetic and environmental determinants. Despite the availability of binary/linear (or at least monotonic) correlation indices, the full exploitation of molecular information depends on the knowledge of direct/indirect conditional independence (and eventually causal) relationships among biomarkers, and with target variables in the population of interest. In other words, that depends on inferences which are performed on the joint multivariate distribution of markers and target variables. Graphical models, such as Bayesian Networks, are well suited to this purpose. Therefore, we reconsidered a previously published case study on classical biomarkers in breast cancer, namely estrogen receptor (ER), progesterone receptor (PR), a proliferative index (Ki67/MIB-1) and to protein HER2/neu (NEU) and p53, to infer conditional independence relations existing in the joint distribution by inferring (learning) the structure of graphs entailing those relations of independence. We also examined the conditional distribution of a special molecular phenotype, called triple-negative, in which ER, PR and NEU were absent. We confirmed that ER is a key marker and we found that it was able to define subpopulations of patients characterized by different conditional independence relations among biomarkers. We also found a preliminary evidence that, given a triple-negative profile, the distribution of p53 protein is mostly supported in 'zero' and 'high' states providing useful information in selecting patients that could benefit from an adjuvant anthracyclines/alkylating agent-based chemotherapy

    Recurrence dynamics does not depend on the recurrence site

    Get PDF
    Introduction: The dynamics of breast cancer recurrence and death, indicating a bimodal hazard rate pattern, has been confirmed in various databases. A few explanations have been suggested to help interpret this finding, assuming that each peak is generated by clustering of similar recurrences and different peaks result from distinct categories of recurrence. Methods: The recurrence dynamics was analysed in a series of 1526 patients undergoing conservative surgery at the National Cancer Institute of Milan, Italy, for whom the site of first recurrence was recorded. The study was focused on the first clinically relevant event occurring during the follow up (ie, local recurrence, distant metastasis, contralateral breast cancer, second primary tumour), the dynamics of which was studied by estimating the specific hazard rate.Results The hazard rate for any recurrence (including both local and distant disease relapses) displayed a bimodal pattern with a first surge peaking at about 24 months and a second peak at almost 60 months. The same pattern was observed when the whole recurrence risk was split into the risk of local recurrence and the risk of distant metastasis. However, the hazard rate curves for both contralateral breast tumours and second primary tumours revealed a uniform course at an almost constant level. When patients with distant metastases were grouped by site of recurrence (soft tissue, bone, lung or liver or central nervous system), the corresponding hazard rate curves displayed the typical bimodal pattern with a first peak at about 24 months and a later peak at about 60 months.Conclusions The bimodal dynamics for early stage breast cancer recurrence is again confirmed, providing support to the proposed tumour-dormancy-based model. The recurrence dynamics does not depend on the site of metastasis indicating that the timing of recurrences is generated by factors influencing the metastatic development regardless of the seeded organ. This finding supports the view that the disease course after surgical removal of the primary tumour follows a common pathway with well-defined steps and that the recurrence risk pattern results from inherent features of the metastasis development process, which are apparently attributable to tumour cells

    Assessing Agreement between miRNA Microarray Platforms

    No full text
    Over the last few years, miRNA microarray platforms have provided great insights into the biological mechanisms underlying the onset and development of several diseases. However, only a few studies have evaluated the concordance between different microarray platforms using methods that took into account measurement error in the data. In this work, we propose the use of a modified version of the Bland–Altman plot to assess agreement between microarray platforms. To this aim, two samples, one renal tumor cell line and a pool of 20 different human normal tissues, were profiled using three different miRNA platforms (Affymetrix, Agilent, Illumina) on triplicate arrays. Intra-platform reliability was assessed by calculating pair-wise concordance correlation coefficients (CCC) between technical replicates and overall concordance correlation coefficient (OCCC) with bootstrap percentile confidence intervals, which revealed moderate-to-good repeatability of all platforms for both samples. Modified Bland–Altman analysis revealed good patterns of concordance for Agilent and Illumina, whereas Affymetrix showed poor-to-moderate agreement for both samples considered. The proposed method is useful to assess agreement between array platforms by modifying the original Bland–Altman plot to let it account for measurement error and bias correction and can be used to assess patterns of concordance between other kinds of arrays other than miRNA microarrays
    corecore