35 research outputs found

    Scalable CAIM Discretization on Multiple GPUs Using Concurrent Kernels

    Get PDF
    CAIM(Class-Attribute InterdependenceMaximization) is one of the stateof- the-art algorithms for discretizing data for which classes are known. However, it may take a long time when run on high-dimensional large-scale data, with large number of attributes and/or instances. This paper presents a solution to this problem by introducing a GPU-based implementation of the CAIM algorithm that significantly speeds up the discretization process on big complex data sets. The GPU-based implementation is scalable to multiple GPU devices and enables the use of concurrent kernels execution capabilities ofmodernGPUs. The CAIMGPU-basedmodel is evaluated and compared with the original CAIM using single and multi-threaded parallel configurations on 40 data sets with different characteristics. The results show great speedup, up to 139 times faster using 4 GPUs, which makes discretization of big data efficient and manageable. For example, discretization time of one big data set is reduced from 2 hours to less than 2 minute

    Proposing a customized exokernel library to data mining

    Get PDF
    The implementation of customized system libraries in an exokernel environment is considered as a promising approach in optimizing data mining processes. Customized libraries in exokernel environments have been successfully used in optimizing other applications, and is potentially suitable to demanding applications such as data mining. A prototype, to test our hypothesis, is under construction. This work introduces data mining, the exokernel environment and describes our prototype's building strategy

    A Survey of Parallel Data Mining

    Get PDF
    With the fast, continuous increase in the number and size of databases, parallel data mining is a natural and cost-effective approach to tackle the problem of scalability in data mining. Recently there has been a considerable research on parallel data mining. However, most projects focus on the parallelization of a single kind of data mining algorithm/paradigm. This paper surveys parallel data mining with a broader perspective. More precisely, we discuss the parallelization of data mining algorithms of four knowledge discovery paradigms, namely rule induction, instance-based learning, genetic algorithms and neural networks. Using the lessons learned from this discussion, we also derive a set of heuristic principles for designing efficient parallel data mining algorithms

    APPLICATION OF THE MACHINE AND DEEP LEARNING METHODS FOR THE CLASSIFICATION OF CANNABINOID- AND CATHINONE-DERIVED COMPOUNDS

    Get PDF
    Objective: New psychoactive substances (NPS) have been rapidly developed to avoid legal entanglement. In 2013–2018, the number of cathinonederivedcompounds increased from 30 to 89. In 2016, of 56 NPS compounds, 21 were identified as cannabinoid-derived; only 43 were regulated inthe narcotics law. Artificial intelligence, such as machine and deep learning, is a method of data processing and object recognition, including humanposes and image classifications.Methods: Herein, the machine and deep learning methods for cathinone- and cannabinoid-derived compound classification were compared usingpharmacophore modeling as the reference method. For classifying cathinone-derived compounds, the structure was transformed into fingerprints,which was used as a learning parameter for the machine and deep learning methods. Contrarily, the physicochemical properties and fingerprint shapewere utilized as learning materials for the deep learning method to classify the cannabinoid-derived substances.Results: Consequently, in the cathinone-derived compound classification, the deep learning method produced the accuracy and Cohen kappa valuesof 0.9932 and 0.992, respectively. Furthermore, such values in the pharmacophore modeling method were higher than those in the machine learningmethod (0.911 and 0.708 vs. 0.718 and 0.673, respectively). In the cannabinoid-derived compound classification, the deep learning method with thefingerprint form had the highest accuracy and Cohen kappa values (0.9904 and 0.9876). Such values in this method with the descriptor form werehigher than those in the pharmacophore modeling method (0.8958 and 0.8622 vs. 0.68 and 0.396, respectively).Conclusion: The deep learning method has the potential in the NPS classification

    The application of data mining by classification in a database of notified covid-19 cases in Manaus-AM

    Get PDF
    This scientific article aims to present information on the cases of comorbidity that most aggravate the symptoms of SARS-CoV-2 (Covid 19) with data extracted from the database of the official website of the Ministry of Health, which defined a system to monitor the information detected in the diagnoses of each patient. Since the beginning of the pandemic, the city of Manaus has suffered great consequences in relation to the SARS-CoV-2 virus (Covid-19). predicting patients at higher risk of death. We describe the origin and spread of the virus and the use of the SGBD software MySql and MySql Workbench to improve data in the selection and pre-processing, with the resources of the weka tool for knowledge learning, ending with the objective achieved in the classification of comorbidities that further aggravate the clinical conditions

    Discovery of association rules from medical data -classical and evolutionary approaches

    Get PDF
    The paper presents a method of association rules discovering from medical data using the evolutionary approach. The elaborated method (EGAR) uses a genetic algorithm as a tool of knowledge discovering from a set of data, in the form of association rules. The method is compared with known and common method - FPTree. The developed computer program is applied for testing the proposed method and comparing the results with those produced by FPTree. The program is the general and flexible tool for the rules generation task using different data sets and two embodied methods. The presented experiments are performed using the actual medical data from the Wroclaw Clinic
    corecore