134 research outputs found

    SMS spam filtering using probabilistic topic modelling and Stacked Denoising Autoencoder.

    Get PDF
    In This paper we present a novel approach to spam filtering and demonstrate its applicability with respect to SMS messages. Our approach requires minimum features engineering and a small set of labelled data samples. Features are extracted using topic modelling based on latent Dirichlet allocation, and then a comprehensive data model is created using a Stacked Denoising Autoencoder (SDA). Topic modelling summarises the data providing ease of use and high interpretability by visualising the topics using word clouds. Given that the SMS messages can be regarded as either spam (unwanted) or ham (wanted), the SDA is able to model the messages and accurately discriminate between the two classes without the need for a pre-labelled training set. The results are compared against the state-of-the-art spam detection algorithms with our proposed approach achieving over 97 % accuracy which compares favourably to the best reported algorithms presented in the literature

    Using Model Explanations to Guide Deep Learning Models Towards Consistent Explanations for EHR Data

    Get PDF
    It has been shown that identical Deep Learning (DL) architectures will produce distinct explanations when trained with different hyperparameters that are orthogonal to the task (e.g. random seed, training set order). In domains such as healthcare and finance, where transparency and explainability is paramount, this can be a significant barrier to DL adoption. In this study we present a further analysis of explanation (in)consistency on 6 tabular datasets/tasks, with a focus on Electronic Health Records data. We propose a novel deep learning ensemble architecture that trains its sub-models to produce consistent explanations, improving explanation consistency by as much as 315% (e.g. from 0.02433 to 0.1011 on MIMIC-IV), and on average by 124% (e.g. from 0.12282 to 0.4450 on the BCW dataset). We evaluate the effectiveness of our proposed technique and discuss the implications our results have for both industrial applications of DL and explainability as well as future methodological work

    HIV and HPV infections and ocular surface squamous neoplasia: systematic review and meta-analysis.

    Get PDF
    BACKGROUND: The frequency of ocular surface squamous neoplasias (OSSNs) has been increasing in populations with a high prevalence of infection with human immunodeficiency virus/acquired immunodeficiency syndrome (HIV/AIDS) and infection with human papillomavirus (HPV). We aimed to quantify the association between HIV/AIDS and HPV infection and OSSN, through systematic review and meta-analysis. METHODS: The articles providing data on the association between HIV/AIDS and/or HPV infection and OSSN were identified in MEDLINE, SCOPUS and EMBASE searched up to May 2013, and through backward citation tracking. The DerSimonian and Laird method was used to compute summary relative risk (RR) estimates and 95% confidence intervals (95% CI). Heterogeneity was quantified with the I(2) statistic. RESULTS: HIV/AIDS was strongly associated with an increased risk of OSSN (summary RR=8.06, 95% CI: 5.29-12.30, I(2)=56.0%, 12 studies). The summary RR estimate for the infection with mucosal HPV subtypes was 3.13 (95% CI: 1.72-5.71, I(2)=45.6%, 16 studies). Four studies addressed the association between both cutaneous and mucosal HPV subtypes and OSSN; the summary RR estimates were 3.52 (95% CI: 1.23-10.08, I(2)=21.8%) and 1.08 (95% CI: 0.57-2.05, I(2)=0.0%), respectively. CONCLUSION: Human immunodeficiency virus infection increases the risk of OSSN by nearly eight-fold. Regarding HPV infection, only the cutaneous subtypes seem to be a risk factor

    SMS Spam Filtering using Probabilistic Topic Modelling and Stacked Denoising Autoencoder

    Get PDF
    In This paper we present a novel approach to spam filtering and demonstrate its applicability with respect to SMS messages. Our approach requires minimum features engineering and a small set of labelled data samples. Features are extracted using topic modelling based on latent Dirichlet allocation, and then a comprehensive data model is created using a Stacked Denoising Autoencoder (SDA). Topic modelling summarises the data providing ease of use and high interpretability by visualising the topics using word clouds. Given that the SMS messages can be regarded as either spam (unwanted) or ham (wanted), the SDA is able to model the messages and accurately discriminate between the two classes without the need for a pre-labelled training set. The results are compared against the state-of-the-art spam detection algorithms with our proposed approach achieving over 97 % accuracy which compares favourably to the best reported algorithms presented in the literature

    Geometric Particle Swarm Optimization for Multi-objective Optimization Using Decomposition

    Get PDF
    Multi-objective evolutionary algorithms (MOEAs) based on decomposition are aggregation-based algorithms which transform a multi-objective optimization problem (MOP) into several single-objective subproblems. Being effective, efficient, and easy to implement, Particle Swarm Optimization (PSO) has become one of the most popular single-objective optimizers for continuous problems, and recently it has been successfully extended to the multi-objective domain. However, no investigation on the application of PSO within a multi-objective decomposition framework exists in the context of combinatorial optimization. This is precisely the focus of the paper. More specifically, we study the incorporation of Geometric Particle Swarm Optimization (GPSO), a discrete generalization of PSO that has proven successful on a number of single-objective combinatorial problems, into a decomposition approach. We conduct experiments on manyobjective 1/0 knapsack problems i.e. problems with more than three objectives functions, substantially harder than multi-objective problems with fewer objectives. The results indicate that the proposed multi-objective GPSO based on decomposition is able to outperform two version of the wellknow MOEA based on decomposition (MOEA/D) and the most recent version of the non-dominated sorting genetic algorithm (NSGA-III), which are state-of-the-art multi-objective evolutionary approaches based on decomposition

    Collaborative denoising autoencoder for high glycated haemoglobin prediction.

    Get PDF
    A pioneering study is presented demonstrating that the presence of high glycated haemoglobin (HbA1c) levels in a patient’s blood can be reliably predicted from routinely collected clinical data. This paves the way for performing early detection of Type-2 Diabetes Mellitus (T2DM). This will save healthcare providers a major cost associated with the administration and assessment of clinical tests for HbA1c. A novel collaborative denoising autoencoder framework is used to address this challenge. The framework builds an independent denoising autoencoder model for the high and low HbA1c level, which extracts feature representations in the latent space. A baseline model using just three features: patient age together with triglycerides and glucose level achieves 76% F1-score with an SVM classifier. The collaborative denoising autoencoder uses 78 features and can predict HbA1c level with 81% F1-score

    Large-scale proteomic identification of S100 proteins in breast cancer tissues

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Attempts to reduce morbidity and mortality in breast cancer is based on efforts to identify novel biomarkers to support prognosis and therapeutic choices. The present study has focussed on S100 proteins as a potentially promising group of markers in cancer development and progression. One reason of interest in this family of proteins is because the majority of the S100 genes are clustered on a region of human chromosome 1q21 that is prone to genomic rearrangements. Moreover, there is increasing evidence that S100 proteins are often up-regulated in many cancers, including breast, and this is frequently associated with tumour progression.</p> <p>Methods</p> <p>Samples of breast cancer tissues were obtained during surgical intervention, according to the bioethical recommendations, and cryo-preserved until used. Tissue extracts were submitted to proteomic preparations for 2D-IPG. Protein identification was performed by N-terminal sequencing and/or peptide mass finger printing.</p> <p>Results</p> <p>The majority of the detected S100 proteins were absent, or present at very low levels, in the non-tumoral tissues adjacent to the primary tumor. This finding strengthens the role of S100 proteins as putative biomarkers. The proteomic screening of 100 cryo-preserved breast cancer tissues showed that some proteins were ubiquitously expressed in almost all patients while others appeared more sporadic. Most, if not all, of the detected S100 members appeared reciprocally correlated. Finally, from the perspective of biomarkers establishment, a promising finding was the observation that patients which developed distant metastases after a three year follow-up showed a general tendency of higher S100 protein expression, compared to the disease-free group.</p> <p>Conclusions</p> <p>This article reports for the first time the comparative proteomic screening of several S100 protein members among a large group of breast cancer patients. The results obtained strongly support the hypothesis that a significant deregulation of multiple S100 protein members is associated with breast cancer progression, and suggest that these proteins might act as potential prognostic factors for patient stratification. We propose that this may offer a significant contribution to the knowledge and clinical applications of the S100 protein family to breast cancer.</p

    S100A7-Downregulation Inhibits Epidermal Growth Factor-Induced Signaling in Breast Cancer Cells and Blocks Osteoclast Formation

    Get PDF
    S100A7 is a small calcium binding protein, which has been shown to be differentially expressed in psoriatic skin lesions, as well as in squamous cell tumors of the skin, lung and breast. Although its expression has been correlated to HER+ high-grade tumors and to a high risk of progression, the molecular mechanisms of these S100A7-mediated tumorigenic effects are not well known. Here, we showed for the first time that epidermal growth factor (EGF) induces S100A7 expression in both MCF-7 and MDA-MB-468 cell lines. We also observed a decrease in EGF-directed migration in shRNA-downregulated MDA-MB-468 cell lines. Furthermore, our signaling studies revealed that EGF induced simultaneous EGF receptor phosphorylation at Tyr1173 and HER2 phosphorylation at Tyr1248 in S100A7-downregulated cell lines as compared to the vector-transfected controls. In addition, reduced phosphorylation of Src at tyrosine 416 and p-SHP2 at tyrosine 542 was observed in these downregulated cell lines. Further studies revealed that S100A7-downregulated cells had reduced angiogenesis in vivo based on matrigel plug assays. Our results also showed decreased tumor-induced osteoclastic resorption in an intra-tibial bone injection model involving SCID mice. S100A7-downregulated cells had decreased osteoclast number and size as compared to the vector controls, and this decrease was associated with variations in IL-8 expression in in vitro cell cultures. This is a novel report on the role of S100A7 in EGF-induced signaling in breast cancer cells and in osteoclast formation

    Nuclear S100A7 Is Associated with Poor Prognosis in Head and Neck Cancer

    Get PDF
    Tissue proteomic analysis of head and neck squamous cell carcinoma (HNSCC) and normal oral mucosa using iTRAQ (isobaric tag for relative and absolute quantitation) labeling and liquid chromatography-mass spectrometry, led to the identification of a panel of biomarkers including S100A7. In the multi-step process of head and neck tumorigenesis, the presence of dysplastic areas in the epithelium is proposed to be associated with a likely progression to cancer; however there are no established biomarkers to predict their potential of malignant transformation. This study aimed to determine the clinical significance of S100A7 overexpression in HNSCC.Immunohistochemical analysis of S100A7 expression in HNSCC (100 cases), oral lesions (166 cases) and 100 histologically normal tissues was carried out and correlated with clinicopathological parameters and disease prognosis over 7 years for HNSCC patients. Overexpression of S100A7 protein was significant in oral lesions (squamous cell hyperplasia/dysplasia) and sustained in HNSCC in comparison with oral normal mucosa (p(trend)<0.001). Significant increase in nuclear S100A7 was observed in HNSCC as compared to dysplastic lesions (p = 0.005) and associated with well differentiated squamous cell carcinoma (p = 0.031). Notably, nuclear accumulation of S100A7 also emerged as an independent predictor of reduced disease free survival (p = 0.006, Hazard ratio (HR = 7.6), 95% CI = 1.3-5.1) in multivariate analysis underscoring its relevance as a poor prognosticator of HNSCC patients.Our study demonstrated nuclear accumulation of S100A7 may serve as predictor of poor prognosis in HNSCC patients. Further, increased nuclear accumulation of S100A7 in HNSCC as compared to dysplastic lesions warrants a large-scale longitudinal study of patients with dysplasia to evaluate its potential as a determinant of increased risk of transformation of oral premalignant lesions
    corecore