515 research outputs found

    Метод кластеризації даних на основі дерев розв’язків

    Get PDF
    Досліджено застосування дерев розв’язків для розв’язання завдання кластерного аналізу. Розроблено метод кластерного аналізу, що дозволяє виконувати розбиття простору екземплярів на кластери, при використанні якого відсутня необхідність задання інформації про кількість кластерів та їх форму, що суттєво розширює можливість його застосування на практиці. Проведено експерименти з розв’язання завдань кластер-аналізу з використанням запропонованого методу.Исследовано применение деревьев решений для задачи кластерного анализа. Разработан метод кластерного анализа, позволяющий выполнять разбиение пространства экземпляров на кластеры, при использовании которого отсутствует необходимость задания информации о количестве кластеров и их форме, что существенно расширяет возможности его применения на практике. Проведены эксперименты по решению задач кластер-анализа с использованием предложенного метода.The usage of decision trees for the problem of cluster analysis is investigated. The method of cluster analysis that allows the partition of instances into clusters, using which there is no need to specify information about the number of clusters and their shape that significantly expands possibilities of its usage in practice, is developed. The experiments for solving the cluster analysis problems using the proposed method are made

    Measurement of body temperature and heart rate for the development of healthcare system using IOT platform

    Get PDF
    Health can be define as a state of complete mental, physical and social well-being and not merely the absence of disease or infirmity according to the World Health Organization (WHO) [1]. Having a healthy body is the greatest blessing of life, hence healthcare is required to maintain or improve the health since the healthcare is the maintenance or improvement of health through the diagnosis, prevention, and treatment of injury, disease, illness, and other mental and physical impairments in human beings. The novel paradigm of Internet of Things (IoT) has the potential to transform modern healthcare and improve the well-being of entire society [2]. IoT is a concept aims to connec

    Improving SIEM for critical SCADA water infrastructures using machine learning

    Get PDF
    Network Control Systems (NAC) have been used in many industrial processes. They aim to reduce the human factor burden and efficiently handle the complex process and communication of those systems. Supervisory control and data acquisition (SCADA) systems are used in industrial, infrastructure and facility processes (e.g. manufacturing, fabrication, oil and water pipelines, building ventilation, etc.) Like other Internet of Things (IoT) implementations, SCADA systems are vulnerable to cyber-attacks, therefore, a robust anomaly detection is a major requirement. However, having an accurate anomaly detection system is not an easy task, due to the difficulty to differentiate between cyber-attacks and system internal failures (e.g. hardware failures). In this paper, we present a model that detects anomaly events in a water system controlled by SCADA. Six Machine Learning techniques have been used in building and evaluating the model. The model classifies different anomaly events including hardware failures (e.g. sensor failures), sabotage and cyber-attacks (e.g. DoS and Spoofing). Unlike other detection systems, our proposed work helps in accelerating the mitigation process by notifying the operator with additional information when an anomaly occurs. This additional information includes the probability and confidence level of event(s) occurring. The model is trained and tested using a real-world dataset

    Social Bots for Online Public Health Interventions

    Full text link
    According to the Center for Disease Control and Prevention, in the United States hundreds of thousands initiate smoking each year, and millions live with smoking-related dis- eases. Many tobacco users discuss their habits and preferences on social media. This work conceptualizes a framework for targeted health interventions to inform tobacco users about the consequences of tobacco use. We designed a Twitter bot named Notobot (short for No-Tobacco Bot) that leverages machine learning to identify users posting pro-tobacco tweets and select individualized interventions to address their interest in tobacco use. We searched the Twitter feed for tobacco-related keywords and phrases, and trained a convolutional neural network using over 4,000 tweets dichotomously manually labeled as either pro- tobacco or not pro-tobacco. This model achieves a 90% recall rate on the training set and 74% on test data. Users posting pro- tobacco tweets are matched with former smokers with similar interests who posted anti-tobacco tweets. Algorithmic matching, based on the power of peer influence, allows for the systematic delivery of personalized interventions based on real anti-tobacco tweets from former smokers. Experimental evaluation suggests that our system would perform well if deployed. This research offers opportunities for public health researchers to increase health awareness at scale. Future work entails deploying the fully operational Notobot system in a controlled experiment within a public health campaign

    Identifying Web Tables - Supporting a Neglected Type of Content on the Web

    Full text link
    The abundance of the data in the Internet facilitates the improvement of extraction and processing tools. The trend in the open data publishing encourages the adoption of structured formats like CSV and RDF. However, there is still a plethora of unstructured data on the Web which we assume contain semantics. For this reason, we propose an approach to derive semantics from web tables which are still the most popular publishing tool on the Web. The paper also discusses methods and services of unstructured data extraction and processing as well as machine learning techniques to enhance such a workflow. The eventual result is a framework to process, publish and visualize linked open data. The software enables tables extraction from various open data sources in the HTML format and an automatic export to the RDF format making the data linked. The paper also gives the evaluation of machine learning techniques in conjunction with string similarity functions to be applied in a tables recognition task.Comment: 9 pages, 4 figure

    Метод структурно-параметричного синтезу нейро-фаззі мереж

    Get PDF
    Abstract – A method of structural parametric synthesis of neuro-fuzzy networks is developed. The proposed method uses decision trees to build a neuro-fuzzy networks, is not highly iterative and does not require the solution of multidimensional optimization task for network parameters calculation. When you are citing the document, use the following link http://essuir.sumdu.edu.ua/handle/123456789/2880

    Data mining in medical records for the enhancement of strategic decisions: a case study

    Get PDF
    The impact and popularity of competition concept has been increasing in the last decades and this concept has escalated the importance of giving right decision for organizations. Decision makers have encountered the fact of using proper scientific methods instead of using intuitive and emotional choices in decision making process. In this context, many decision support models and relevant systems are still being developed in order to assist the strategic management mechanisms. There is also a critical need for automated approaches for effective and efficient utilization of massive amount of data to support corporate and individuals in strategic planning and decision-making. Data mining techniques have been used to uncover hidden patterns and relations, to summarize the data in novel ways that are both understandable and useful to the executives and also to predict future trends and behaviors in business. There has been a large body of research and practice focusing on different data mining techniques and methodologies. In this study, a large volume of record set extracted from an outpatient clinic’s medical database is used to apply data mining techniques. In the first phase of the study, the raw data in the record set are collected, preprocessed, cleaned up and eventually transformed into a suitable format for data mining. In the second phase, some of the association rule algorithms are applied to the data set in order to uncover rules for quantifying the relationship between some of the attributes in the medical records. The results are observed and comparative analysis of the observed results among different association algorithms is made. The results showed us that some critical and reasonable relations exist in the outpatient clinic operations of the hospital which could aid the hospital management to change and improve their managerial strategies regarding the quality of services given to outpatients.Decision Making, Medical Records, Data Mining, Association Rules, Outpatient Clinic.

    A General Framework for Fair Regression

    Full text link
    Fairness, through its many forms and definitions, has become an important issue facing the machine learning community. In this work, we consider how to incorporate group fairness constraints in kernel regression methods, applicable to Gaussian processes, support vector machines, neural network regression and decision tree regression. Further, we focus on examining the effect of incorporating these constraints in decision tree regression, with direct applications to random forests and boosted trees amongst other widespread popular inference techniques. We show that the order of complexity of memory and computation is preserved for such models and tightly bound the expected perturbations to the model in terms of the number of leaves of the trees. Importantly, the approach works on trained models and hence can be easily applied to models in current use and group labels are only required on training data.Comment: 8 pages, 4 figures, 2 pages reference
    corecore