45,098 research outputs found
Recommended from our members
The effect of missing values using genetic programming on evolvable diagnosis
Medical databases usually contain missing values due the policy of
reducing stress and harm to the patient. In practice missing values has been a
problem mainly due to the necessity to evaluate mathematical equations obtained
by genetic programming. The solution to this problem is to use fill in methods to
estimate the missing values. This paper analyses three fill in methods: (1) attribute
means, (2) conditional means, and (3) random number generation. The methods
are evaluated using sensitivity, specificity, and entropy to explain the exchange in
knowledge of the results. The results are illustrated based on the breast cancer
database. Conditional means produced the best fill in experimental results
Data Mining and Machine Learning in Astronomy
We review the current state of data mining and machine learning in astronomy.
'Data Mining' can have a somewhat mixed connotation from the point of view of a
researcher in this field. If used correctly, it can be a powerful approach,
holding the potential to fully exploit the exponentially increasing amount of
available data, promising great scientific advance. However, if misused, it can
be little more than the black-box application of complex computing algorithms
that may give little physical insight, and provide questionable results. Here,
we give an overview of the entire data mining process, from data collection
through to the interpretation of results. We cover common machine learning
algorithms, such as artificial neural networks and support vector machines,
applications from a broad range of astronomy, emphasizing those where data
mining techniques directly resulted in improved science, and important current
and future directions, including probability density functions, parallel
algorithms, petascale computing, and the time domain. We conclude that, so long
as one carefully selects an appropriate algorithm, and is guided by the
astronomical problem at hand, data mining can be very much the powerful tool,
and not the questionable black box.Comment: Published in IJMPD. 61 pages, uses ws-ijmpd.cls. Several extra
figures, some minor additions to the tex
CASP-DM: Context Aware Standard Process for Data Mining
We propose an extension of the Cross Industry Standard Process for Data
Mining (CRISPDM) which addresses specific challenges of machine learning and
data mining for context and model reuse handling. This new general
context-aware process model is mapped with CRISP-DM reference model proposing
some new or enhanced outputs
A log mining approach for process monitoring in SCADA
SCADA (Supervisory Control and Data Acquisition) systems are used for controlling and monitoring industrial processes. We propose a methodology to systematically identify potential process-related threats in SCADA. Process-related threats take place when an attacker gains user access rights and performs actions, which look legitimate, but which are intended to disrupt the SCADA process. To detect such threats, we propose a semi-automated approach of log processing. We conduct experiments on a real-life water treatment facility. A preliminary case study suggests that our approach is effective in detecting anomalous events that might alter the regular process workflow
Using machine learning techniques to automate sky survey catalog generation
We describe the application of machine classification techniques to the development of an automated tool for the reduction of a large scientific data set. The 2nd Palomar Observatory Sky Survey provides comprehensive photographic coverage of the northern celestial hemisphere. The photographic plates are being digitized into images containing on the order of 10(exp 7) galaxies and 10(exp 8) stars. Since the size of this data set precludes manual analysis and classification of objects, our approach is to develop a software system which integrates independently developed techniques for image processing and data classification. Image processing routines are applied to identify and measure features of sky objects. Selected features are used to determine the classification of each object. GID3* and O-BTree, two inductive learning techniques, are used to automatically learn classification decision trees from examples. We describe the techniques used, the details of our specific application, and the initial encouraging results which indicate that our approach is well-suited to the problem. The benefits of the approach are increased data reduction throughput, consistency of classification, and the automated derivation of classification rules that will form an objective, examinable basis for classifying sky objects. Furthermore, astronomers will be freed from the tedium of an intensely visual task to pursue more challenging analysis and interpretation problems given automatically cataloged data
Data-driven design of intelligent wireless networks: an overview and tutorial
Data science or "data-driven research" is a research approach that uses real-life data to gain insight about the behavior of systems. It enables the analysis of small, simple as well as large and more complex systems in order to assess whether they function according to the intended design and as seen in simulation. Data science approaches have been successfully applied to analyze networked interactions in several research areas such as large-scale social networks, advanced business and healthcare processes. Wireless networks can exhibit unpredictable interactions between algorithms from multiple protocol layers, interactions between multiple devices, and hardware specific influences. These interactions can lead to a difference between real-world functioning and design time functioning. Data science methods can help to detect the actual behavior and possibly help to correct it. Data science is increasingly used in wireless research. To support data-driven research in wireless networks, this paper illustrates the step-by-step methodology that has to be applied to extract knowledge from raw data traces. To this end, the paper (i) clarifies when, why and how to use data science in wireless network research; (ii) provides a generic framework for applying data science in wireless networks; (iii) gives an overview of existing research papers that utilized data science approaches in wireless networks; (iv) illustrates the overall knowledge discovery process through an extensive example in which device types are identified based on their traffic patterns; (v) provides the reader the necessary datasets and scripts to go through the tutorial steps themselves
- …