515 research outputs found
Метод кластеризації даних на основі дерев розв’язків
Досліджено застосування дерев розв’язків для розв’язання завдання кластерного аналізу. Розроблено метод кластерного аналізу, що дозволяє виконувати розбиття простору екземплярів на кластери, при використанні якого відсутня необхідність задання інформації про кількість кластерів та їх форму, що суттєво розширює можливість його застосування на практиці. Проведено експерименти з розв’язання
завдань кластер-аналізу з використанням запропонованого методу.Исследовано применение деревьев решений для задачи кластерного анализа. Разработан метод кластерного анализа, позволяющий выполнять разбиение пространства экземпляров на кластеры, при использовании которого отсутствует необходимость задания информации о количестве кластеров и их форме, что существенно расширяет возможности его применения на практике. Проведены эксперименты по решению задач кластер-анализа с использованием предложенного метода.The usage of decision trees for the problem of cluster analysis is investigated. The method of cluster analysis that allows the partition of instances into clusters, using which there is no need to specify information about the number of clusters and their shape that significantly expands possibilities of its usage in practice, is developed. The experiments for solving the cluster analysis problems using the proposed method are made
Measurement of body temperature and heart rate for the development of healthcare system using IOT platform
Health can be define as a state of complete mental, physical and social well-being and not merely the absence of disease or infirmity according to the World Health Organization (WHO) [1]. Having a healthy body is the greatest blessing of life, hence healthcare is required to maintain or improve the health since the healthcare is the maintenance or improvement of health through the diagnosis, prevention, and treatment of injury, disease, illness, and other mental and physical impairments in human beings. The novel paradigm of Internet of Things (IoT) has the potential to transform modern healthcare and improve the well-being of entire society [2].
IoT is a concept aims to connec
Improving SIEM for critical SCADA water infrastructures using machine learning
Network Control Systems (NAC) have been used in many industrial processes. They aim to reduce the human factor burden and efficiently handle the complex process and communication of those systems. Supervisory control and data acquisition (SCADA) systems are used in industrial, infrastructure and facility processes (e.g. manufacturing, fabrication, oil and water pipelines, building ventilation, etc.) Like other Internet of Things (IoT) implementations, SCADA systems are vulnerable to cyber-attacks, therefore, a robust anomaly detection is a major requirement. However, having an accurate anomaly detection system is not an easy task, due to the difficulty to differentiate between cyber-attacks and system internal failures (e.g. hardware failures). In this paper, we present a model that detects anomaly events in a water system controlled by SCADA. Six Machine Learning techniques have been used in building and evaluating the model. The model classifies different anomaly events including hardware failures (e.g. sensor failures), sabotage and cyber-attacks (e.g. DoS and Spoofing). Unlike other detection systems, our proposed work helps in accelerating the mitigation process by notifying the operator with additional information when an anomaly occurs. This additional information includes the probability and confidence level of event(s) occurring. The model is trained and tested using a real-world dataset
Social Bots for Online Public Health Interventions
According to the Center for Disease Control and Prevention, in the United
States hundreds of thousands initiate smoking each year, and millions live with
smoking-related dis- eases. Many tobacco users discuss their habits and
preferences on social media. This work conceptualizes a framework for targeted
health interventions to inform tobacco users about the consequences of tobacco
use. We designed a Twitter bot named Notobot (short for No-Tobacco Bot) that
leverages machine learning to identify users posting pro-tobacco tweets and
select individualized interventions to address their interest in tobacco use.
We searched the Twitter feed for tobacco-related keywords and phrases, and
trained a convolutional neural network using over 4,000 tweets dichotomously
manually labeled as either pro- tobacco or not pro-tobacco. This model achieves
a 90% recall rate on the training set and 74% on test data. Users posting pro-
tobacco tweets are matched with former smokers with similar interests who
posted anti-tobacco tweets. Algorithmic matching, based on the power of peer
influence, allows for the systematic delivery of personalized interventions
based on real anti-tobacco tweets from former smokers. Experimental evaluation
suggests that our system would perform well if deployed. This research offers
opportunities for public health researchers to increase health awareness at
scale. Future work entails deploying the fully operational Notobot system in a
controlled experiment within a public health campaign
Identifying Web Tables - Supporting a Neglected Type of Content on the Web
The abundance of the data in the Internet facilitates the improvement of
extraction and processing tools. The trend in the open data publishing
encourages the adoption of structured formats like CSV and RDF. However, there
is still a plethora of unstructured data on the Web which we assume contain
semantics. For this reason, we propose an approach to derive semantics from web
tables which are still the most popular publishing tool on the Web. The paper
also discusses methods and services of unstructured data extraction and
processing as well as machine learning techniques to enhance such a workflow.
The eventual result is a framework to process, publish and visualize linked
open data. The software enables tables extraction from various open data
sources in the HTML format and an automatic export to the RDF format making the
data linked. The paper also gives the evaluation of machine learning techniques
in conjunction with string similarity functions to be applied in a tables
recognition task.Comment: 9 pages, 4 figure
Метод структурно-параметричного синтезу нейро-фаззі мереж
Abstract – A method of structural parametric synthesis of neuro-fuzzy networks is developed. The proposed method uses decision trees to build a neuro-fuzzy networks, is not highly iterative and does not require the solution of multidimensional optimization task for network parameters calculation.
When you are citing the document, use the following link http://essuir.sumdu.edu.ua/handle/123456789/2880
Data mining in medical records for the enhancement of strategic decisions: a case study
The impact and popularity of competition concept has been increasing in the last decades and this concept has escalated the importance of giving right decision for organizations. Decision makers have encountered the fact of using proper scientific methods instead of using intuitive and emotional choices in decision making process. In this context, many decision support models and relevant systems are still being developed in order to assist the strategic management mechanisms. There is also a critical need for automated approaches for effective and efficient utilization of massive amount of data to support corporate and individuals in strategic planning and decision-making. Data mining techniques have been used to uncover hidden patterns and relations, to summarize the data in novel ways that are both understandable and useful to the executives and also to predict future trends and behaviors in business. There has been a large body of research and practice focusing on different data mining techniques and methodologies. In this study, a large volume of record set extracted from an outpatient clinic’s medical database is used to apply data mining techniques. In the first phase of the study, the raw data in the record set are collected, preprocessed, cleaned up and eventually transformed into a suitable format for data mining. In the second phase, some of the association rule algorithms are applied to the data set in order to uncover rules for quantifying the relationship between some of the attributes in the medical records. The results are observed and comparative analysis of the observed results among different association algorithms is made. The results showed us that some critical and reasonable relations exist in the outpatient clinic operations of the hospital which could aid the hospital management to change and improve their managerial strategies regarding the quality of services given to outpatients.Decision Making, Medical Records, Data Mining, Association Rules, Outpatient Clinic.
A General Framework for Fair Regression
Fairness, through its many forms and definitions, has become an important
issue facing the machine learning community. In this work, we consider how to
incorporate group fairness constraints in kernel regression methods, applicable
to Gaussian processes, support vector machines, neural network regression and
decision tree regression. Further, we focus on examining the effect of
incorporating these constraints in decision tree regression, with direct
applications to random forests and boosted trees amongst other widespread
popular inference techniques. We show that the order of complexity of memory
and computation is preserved for such models and tightly bound the expected
perturbations to the model in terms of the number of leaves of the trees.
Importantly, the approach works on trained models and hence can be easily
applied to models in current use and group labels are only required on training
data.Comment: 8 pages, 4 figures, 2 pages reference
- …