509 research outputs found

    A comparison of three different methods for classification of breast cancer data

    Get PDF
    The classification of breast cancer patients is of great importance in cancer diagnosis. During the last few years, many algorithms have been proposed for this task. In this paper, we review different supervised machine learning techniques for classification of a novel dataset and perform a methodological comparison of these. We used the C4.5 tree classifier, a Multilayer Perceptron and a naïve Bayes classifier over a large set of tumour markers. We found good performance of the Multilayer Perceptron even when we reduced the number of features to be classified. We found naive Bayes achieved a competitive performance even though the assumption of normality of the data is strongly violated

    Cancer profiles by affinity propagation

    Get PDF
    The affinity propagation algorithm is applied to a problem of breast cancer subtyping using traditional biologic markers. The algorithm provides a procedure to determine the number of profiles to be considered. A well know breast cancer case series was used to compare the results of the affinity propagation with the results obtained with standard algorithms and indexes for the optimal choice of the number of clusters. Results from affinity propagation are consistent with the results already obtained having the advantage of providing an indication about the number of clusters

    A "non-parametric" version of the naive Bayes classifier

    Get PDF
    Many algorithms have been proposed for the machine learning task of classication. One of the simplest methods, the naive Bayes classifyer, has often been found to give good performance despite the fact that its underlying assumptions (of independence and a Normal distribution of the variables) are perhaps violated. In previous work, we applied naive Bayes and other standard algorithms to a breast cancer database from Nottingham City Hospital in which the variables are highly non-Normal and found that the algorithm performed well when predicting a class that had been derived from the same data. However, when we then applied naive Bayes to predict an alternative clinical variable, it performed much worse than other techniques. This motivated us to propose an alternative method, based on naive Bayes, which removes the requirement for the variables to be Normally distributed, but retains the essential structure and other underlying assumptions of the method. We tested our novel algorithm on our breast cancer data and on three UCI datasets which also exhibited strong violations of Normality. We found our algorithm outperformed naive Bayes in all four cases and outperformed multinomial logistic regression (MLR) in two cases. We conclude that our method offers a competitive alternative to MLR and naive Bayes when dealing with data sets in which non-Normal distributions are observed

    Cancer profiles by Affinity Propagation

    Get PDF
    The Affinity Propagation algorithm is applied to various problems of breast and cutaneous tumours subtyping using traditional biologic markers. The algorithm provides a procedure to determine the number of profiles to be considered. Well know breast cancer case series and cutaneous melanoma were used to compare the results of the Affinity Propagation with the results obtained with standard algorithms and indexes for the optimal choice of the number of clusters.Results from Affinity Propagation are consistent with the results already obtained having the advantage of providing an indication about the number of clusters

    Clustering breast cancer data by consensus of different validity indices

    Get PDF
    Clustering algorithms will, in general, either partition a given data set into a pre-specified number of clusters or will produce a hierarchy of clusters. In this paper we analyse several different clustering techniques and apply them to a particular data set of breast cancer data. When we do not know a priori which is the best number of groups, we use a range of different validity indices to test the quality of clustering results and to determine the best number of clusters. While for the K-means method there is not absolute agreement among the indices as to which is the best number of clusters, for the PAM algorithm all the indices indicate 4 as the best cluster number

    Conditional independence relations among biological markers may improve clinical decision as in the case of triple negative breast cancers

    Get PDF
    The associations existing among different biomarkers are important in clinical settings because they contribute to the characterisation of specific pathways related to the natural history of the disease, genetic and environmental determinants. Despite the availability of binary/linear (or at least monotonic) correlation indices, the full exploitation of molecular information depends on the knowledge of direct/indirect conditional independence (and eventually causal) relationships among biomarkers, and with target variables in the population of interest. In other words, that depends on inferences which are performed on the joint multivariate distribution of markers and target variables. Graphical models, such as Bayesian Networks, are well suited to this purpose. Therefore, we reconsidered a previously published case study on classical biomarkers in breast cancer, namely estrogen receptor (ER), progesterone receptor (PR), a proliferative index (Ki67/MIB-1) and to protein HER2/neu (NEU) and p53, to infer conditional independence relations existing in the joint distribution by inferring (learning) the structure of graphs entailing those relations of independence. We also examined the conditional distribution of a special molecular phenotype, called triple-negative, in which ER, PR and NEU were absent. We confirmed that ER is a key marker and we found that it was able to define subpopulations of patients characterized by different conditional independence relations among biomarkers. We also found a preliminary evidence that, given a triple-negative profile, the distribution of p53 protein is mostly supported in 'zero' and 'high' states providing useful information in selecting patients that could benefit from an adjuvant anthracyclines/alkylating agent-based chemotherapy

    Bimodal mortality dynamics for uveal melanoma : a cue for metastasis development traits?

    Get PDF
    Background: The study estimates mortality dynamics (event-specific hazard rates over a follow-up time interval) for uveal melanoma. Methods: Three thousands six hundred seventy two patients undergoing radical or conservative treatment for unilateral uveal melanoma, whose yearly follow-up data were reported in three published datasets, were analysed. Mortality dynamics was studied by estimating with the life-table method the discrete hazard rate for death. Smoothed curves were obtained by a Kernel-like smoothing procedure and a piecewise exponential regression model. The ratio deaths/patients at risk per year was the main outcome measure. Results: The three explored hazard rate curves display a common bimodal pattern, with a sudden increase peaking at about three years, followed by reduction until the sixth-seventh year and a second surge peaking at about nine years after treatment. Conclusions: The bimodal pattern of mortality indicates that uveal melanoma metastatic development cannot be explained by a continuous growth model. Similar metastasis dynamics have been reported for other tumours, including early breast cancer, for which it supported a paradigm shift to an interrupted growth model, the implications of which are episodes of 'tumour dormancy'. We propose that the concepts of tumour homeostasis, tumour dormancy and enhancement of metastasis growth related to primary tumour removal, convincingly explaining the clinical behaviour of breast cancer, may be used for uveal melanoma as well. To confirm this proposition, a careful analysis of uveal melanoma metastasis dynamics is strongly warranted. \ua9 2014 Demicheli et al.; licensee BioMed Central Ltd

    Clustering breast cancer data by consensus of different validity indices

    Get PDF
    Clustering algorithms will, in general, either partition a given data set into a pre-specified number of clusters or will produce a hierarchy of clusters. In this paper we analyse several different clustering techniques and apply them to a particular data set of breast cancer data. When we do not know a priori which is the best number of groups, we use a range of different validity indices to test the quality of clustering results and to determine the best number of clusters. While for the K-means method there is not absolute agreement among the indices as to which is the best number of clusters, for the PAM algorithm all the indices indicate 4 as the best cluster number
    • …
    corecore