20 research outputs found

    PRIVACY PRESERVING DATA MINING TECHNIQUES USING RECENT ALGORITHMS

    Get PDF
    The privacy preserving data mining is playing crucial role act as rising technology to perform various data mining operations on private data and to pass on data in a secured way to protect sensitive data. Many types of technique such as randomization, secured sum algorithms and k-anonymity have been suggested in order to execute privacy preserving data mining. In this survey paper, on current researches made on privacy preserving data mining technique with fuzzy logic, neural network learning, secured sum and various encryption algorithm is presented. This will enable to grasp the various challenges faced in privacy preserving data mining and also help us to find best suitable technique for various data environment

    Data Mining

    Get PDF
    The availability of big data due to computerization and automation has generated an urgent need for new techniques to analyze and convert big data into useful information and knowledge. Data mining is a promising and leading-edge technology for mining large volumes of data, looking for hidden information, and aiding knowledge discovery. It can be used for characterization, classification, discrimination, anomaly detection, association, clustering, trend or evolution prediction, and much more in fields such as science, medicine, economics, engineering, computers, and even business analytics. This book presents basic concepts, ideas, and research in data mining

    A Collaborative Framework for Privacy Preserving Fuzzy Co-Clustering of Vertically Distributed Cooccurrence Matrices

    Get PDF
    In many real world data analysis tasks, it is expected that we can get much more useful knowledge by utilizing multiple databases stored in different organizations, such as cooperation groups, state organs, and allied countries. However, in many such organizations, they often hesitate to publish their databases because of privacy and security issues although they believe the advantages of collaborative analysis. This paper proposes a novel collaborative framework for utilizing vertically partitioned cooccurrence matrices in fuzzy co-cluster structure estimation, in which cooccurrence information among objects and items is separately stored in several sites. In order to utilize such distributed data sets without fear of information leaks, a privacy preserving procedure is introduced to fuzzy clustering for categorical multivariate data (FCCM). Withholding each element of cooccurrence matrices, only object memberships are shared by multiple sites and their (implicit) joint co-cluster structures are revealed through an iterative clustering process. Several experimental results demonstrate that collaborative analysis can contribute to revealing global intrinsic co-cluster structures of separate matrices rather than individual site-wise analysis. The novel framework makes it possible for many private and public organizations to share common data structural knowledge without fear of information leaks

    Advances in knowledge discovery and data mining Part II

    Get PDF
    19th Pacific-Asia Conference, PAKDD 2015, Ho Chi Minh City, Vietnam, May 19-22, 2015, Proceedings, Part II</p

    Fundamentals

    Get PDF
    Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters

    Fundamentals

    Get PDF
    Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters

    Developing and Applying CAD-generated Image Markers to Assist Disease Diagnosis and Prognosis Prediction

    Get PDF
    Developing computer-aided detection and/or diagnosis (CAD) schemes has been an active research topic in medical imaging informatics (MII) with promising results in assisting clinicians in making better diagnostic and/or clinical decisions in the last two decades. To build robust CAD schemes, we need to develop state-of-the-art image processing and machine learning (ML) algorithms to optimize each step in the CAD pipeline, including detection and segmentation of the region of interest, optimal feature generation, followed by integration to ML classifiers. In my dissertation, I conducted multiple studies investigating the feasibility of developing several novel CAD schemes in the field of medicine concerning different purposes. The first study aims to investigate how to optimally develop a CAD scheme of contrast-enhanced digital mammography (CEDM) images to classify breast masses. CEDM includes both low energy (LE) and dual-energy subtracted (DES) images. A CAD scheme was applied to segment mass regions depicting LE and DES images separately. Optimal segmentation results generated from DES images were also mapped to LE images or vice versa. After computing image features, multilayer perceptron-based ML classifiers integrated with a correlation-based feature subset evaluator and leave-one-case-out cross-validation method were built to classify mass regions. The study demonstrated that DES images eliminated the overlapping effect of dense breast tissue, which helps improve mass segmentation accuracy. By mapping mass regions segmented from DES images to LE images, CAD yields significantly improved performance. The second study aims to develop a new quantitative image marker computed from the pre-intervention computed tomography perfusion (CTP) images and evaluate its feasibility to predict clinical outcome among acute ischemic stroke (AIS) patients undergoing endovascular mechanical thrombectomy after diagnosis of large vessel occlusion. A CAD scheme is first developed to pre-process CTP images of different scanning series for each study case, perform image segmentation, quantify contrast-enhanced blood volumes in bilateral cerebral hemispheres, and compute image features related to asymmetrical cerebral blood flow patterns based on the cumulative cerebral blood flow curves of two hemispheres. Next, image markers based on a single optimal feature and ML models fused with multi-features are developed and tested to classify AIS cases into two classes of good and poor prognosis based on the Modified Rankin Scale. The study results show that ML model trained using multiple features yields significantly higher classification performance than the image marker using the best single feature (p<0.01). This study demonstrates the feasibility of developing a new CAD scheme to predict the prognosis of AIS patients in the hyperacute stage, which has the potential to assist clinicians in optimally treating and managing AIS patients. The third study aims to develop and test a new CAD scheme to predict prognosis in aneurysmal subarachnoid hemorrhage (aSAH) patients using brain CT images. Each patient had two sets of CT images acquired at admission and prior to discharge. CAD scheme was applied to segment intracranial brain regions into four subregions, namely, cerebrospinal fluid (CSF), white matter (WM), gray matter (GM), and extraparenchymal blood (EPB), respectively. CAD then computed nine image features related to 5 volumes of the segmented sulci, EPB, CSF, WM, GM, and four volumetrical ratios to sulci. Subsequently, 16 ML models were built using multiple features computed either from CT images acquired at admission or prior to discharge to predict eight prognosis related parameters. The results show that ML models trained using CT images acquired at admission yielded higher accuracy to predict short-term clinical outcomes, while ML models trained using CT images acquired prior to discharge had higher accuracy in predicting long-term clinical outcomes. Thus, this study demonstrated the feasibility of predicting the prognosis of aSAH patients using new ML model-generated quantitative image markers. The fourth study aims to develop and test a new interactive computer-aided detection (ICAD) tool to quantitatively assess hemorrhage volumes. After loading each case, the ICAD tool first segments intracranial brain volume, performs CT labeling of each voxel. Next, contour-guided image-thresholding techniques based on CT Hounsfield Unit are used to estimate and segment hemorrhage-associated voxels (ICH). Next, two experienced neurology residents examine and correct the markings of ICH categorized into either intraparenchymal hemorrhage (IPH) or intraventricular hemorrhage (IVH) to obtain the true markings. Additionally, volumes and maximum two-dimensional diameter of each sub-type of hemorrhage are also computed for understanding ICH prognosis. The performance to segment hemorrhage regions between semi-automated ICAD and the verified neurology residents’ true markings is evaluated using dice similarity coefficient (DSC). The data analysis results in the study demonstrate that the new ICAD tool enables to segment and quantify ICH and other hemorrhage volumes with higher DSC. Finally, the fifth study aims to bridge the gap between traditional radiomics and deep learning systems by comparing and assessing these two technologies in classifying breast lesions. First, one CAD scheme is applied to segment lesions and compute radiomics features. In contrast, another scheme applies a pre-trained residual net architecture (ResNet50) as a transfer learning model to extract automated features. Next, the principal component algorithm processes both initially computed radiomics and automated features to create optimal feature vectors. Then, several support vector machine (SVM) classifiers are built using the optimized radiomics or automated features. This study indicates that (1) CAD built using only deep transfer learning yields higher classification performance than the traditional radiomic-based model, (2) SVM trained using the fused radiomics and automated features does not yield significantly higher AUC, and (3) radiomics and automated features contain highly correlated information in lesion classification. In summary, in all these studies, I developed and investigated several key concepts of CAD pipeline, including (i) pre-processing algorithms, (ii) automatic detection and segmentation schemes, (iii) feature extraction and optimization methods, and (iv) ML and data analysis models. All developed CAD models are embedded with interactive and visually aided graphical user interfaces (GUIs) to provide user functionality. These techniques present innovative approaches for building quantitative image markers to build optimal ML models. The study results indicate the underlying CAD scheme's potential application to assist radiologists in clinical settings for their assessments in diagnosing disease and improving their overall performance

    Enhancing Media Personalization by Extracting Similarity Knowledge from Metadata

    Get PDF

    Congress UPV Proceedings of the 21ST International Conference on Science and Technology Indicators

    Get PDF
    This is the book of proceedings of the 21st Science and Technology Indicators Conference that took place in València (Spain) from 14th to 16th of September 2016. The conference theme for this year, ‘Peripheries, frontiers and beyond’ aimed to study the development and use of Science, Technology and Innovation indicators in spaces that have not been the focus of current indicator development, for example, in the Global South, or the Social Sciences and Humanities. The exploration to the margins and beyond proposed by the theme has brought to the STI Conference an interesting array of new contributors from a variety of fields and geographies. This year’s conference had a record 382 registered participants from 40 different countries, including 23 European, 9 American, 4 Asia-Pacific, 4 Africa and Near East. About 26% of participants came from outside of Europe. There were also many participants (17%) from organisations outside academia including governments (8%), businesses (5%), foundations (2%) and international organisations (2%). This is particularly important in a field that is practice-oriented. The chapters of the proceedings attest to the breadth of issues discussed. Infrastructure, benchmarking and use of innovation indicators, societal impact and mission oriented-research, mobility and careers, social sciences and the humanities, participation and culture, gender, and altmetrics, among others. We hope that the diversity of this Conference has fostered productive dialogues and synergistic ideas and made a contribution, small as it may be, to the development and use of indicators that, being more inclusive, will foster a more inclusive and fair world

    The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE)

    Get PDF
    corecore