8 research outputs found

    A novel ensemble learning approach to unsupervised record linkage

    Get PDF
    © 2017 Record linkage is a process of identifying records that refer to the same real-world entity. Many existing approaches to record linkage apply supervised machine learning techniques to generate a classification model that classifies a pair of records as either match or non-match. The main requirement of such an approach is a labelled training dataset. In many real-world applications no labelled dataset is available hence manual labelling is required to create a sufficiently sized training dataset for a supervised machine learning algorithm. Semi-supervised machine learning techniques, such as self-learning or active learning, which require only a small manually labelled training dataset have been applied to record linkage. These techniques reduce the requirement on the manual labelling of the training dataset. However, they have yet to achieve a level of accuracy similar to that of supervised learning techniques. In this paper we propose a new approach to unsupervised record linkage based on a combination of ensemble learning and enhanced automatic self-learning. In the proposed approach an ensemble of automatic self-learning models is generated with different similarity measure schemes. In order to further improve the automatic self-learning process we incorporate field weighting into the automatic seed selection for each of the self-learning models. We propose an unsupervised diversity measure to ensure that there is high diversity among the selected self-learning models. Finally, we propose to use the contribution ratios of self-learning models to remove those with poor accuracy from the ensemble. We have evaluated our approach on 4 publicly available datasets which are commonly used in the record linkage community. Our experimental results show that our proposed approach has advantages over the state-of-the-art semi-supervised and unsupervised record linkage techniques. In 3 out of 4 datasets it also achieves comparable results to those of the supervised approaches

    Privacy preserving record linkage in the presence of missing values

    Get PDF
    © 2017 The problem of record linkage is to identify records from two datasets, which refer to the same entities (e.g. patients). A particular issue of record linkage is the presence of missing values in records, which has not been fully addressed. Another issue is how privacy and confidentiality can be preserved in the process of record linkage. In this paper, we propose an approach for privacy preserving record linkage in the presence of missing values. For any missing value in a record, our approach imputes the similarity measure between the missing value and the value of the corresponding field in any of the possible matching records from another dataset. We use the k-NNs (k Nearest Neighbours in the same dataset) of the record with the missing value and their distances to the record for similarity imputation. For privacy preservation, our approach uses the Bloom filter protocol in the settings of both standard privacy preserving record linkage without missing values and privacy preserving record linkage with missing values. We have conducted an experimental evaluation using three pairs of synthetic datasets with different rates of missing values. Our experimental results show the effectiveness and efficiency of our proposed approach

    High expression of Linc00959 predicts poor prognosis in breast cancer

    No full text
    Abstract Background Accumulating studies have focused on the oncogenic roles of the newly identified lncRNAs in human cancers. The aim of this study was to examine the expression pattern of Linc00959 in BC and to evaluate its biological role and clinical significance in prediction of prognosis. Methods Expression of Linc00959 was detected in 290 BC tissues by quantitative reverse-transcription polymerase chain reaction (qRT-PCR). We analyzed the relationship between Linc00959 expression and clinic pathological features of BC patients. The correlation was calculated by SPSS software. Results Our results revealed that Linc00959 expression was correlated with ER status (p = 0.005), PR status (p = 0.036), Ki67 (p = 0.025) and HER2 status (p = 0.009). The Kaplan–Meier survival curves indicated that the overall survival (OS) (p = 0.022) and relapse-free survival (RFS) (p = 0.002) were significantly poor in high Linc00959 expression BC patients (p = 0.023). Furthermore, the survival analysis by Cox regression showed that Linc00959 served as an independent prognostic marker in breast cancer (p = 0.004). Conclusion Our studies indicate that Linc00959 is significantly associated with poor prognosis and may represent a new marker of prognosis in breast cancer

    DUSP4 enhances therapeutic sensitivity in HER2-positive breast cancer by inhibiting the G6PD pathway and ROS metabolism by interacting with ALDOB

    No full text
    Background: Breast cancer (BC) poses a global threat, with HER2-positive BC being a particularly hazardous subtype. Despite the promise shown by neoadjuvant therapy (NAT) in improving prognosis, resistance in HER2-positive BC persists despite emerging targeted therapies. The objective of this study is to identify markers that promote therapeutic sensitivity and unravel the underlying mechanisms. Methods: We conducted an analysis of 86 HER2-positive BC biopsy samples pre-NAT using RNA-seq. Validation was carried out using TCGA, Kaplan‒Meier Plotter, and Oncomine databases. Phenotype verification utilized IC50 assays, and prognostic validation involved IHC on tissue microarrays. RNA-seq was performed on wild-type/DUSP4-KO cells, while RT‒qPCR assessed ROS pathway regulation. Mechanistic insights were obtained through IP and MS assays. Results: Our findings reveal that DUSP4 enhances therapeutic efficacy in HER2-positive BC by inhibiting the ROS pathway. Elevated DUSP4 levels correlate with increased sensitivity to HER2-targeted therapies and improved clinical outcomes. DUSP4 independently predicts disease-free survival (DFS) and overall survival (OS) in HER2-positive BC. Moreover, DUSP4 hinders G6PD activity via ALDOB dephosphorylation, with a noteworthy association with heightened ROS levels. Conclusions: In summary, our study unveils a metabolic reprogramming paradigm in BC, highlighting DUSP4′s role in enhancing therapeutic sensitivity in HER2-positive BC cells. DUSP4 interacts with ALDOB, inhibiting G6PD activity and the ROS pathway, establishing it as an independent prognostic predictor for HER2-positive BC patients
    corecore