474,505 research outputs found
Batch-Incremental Learning for Mining Data Streams
The data stream model for data mining places harsh restrictions on a learning algorithm. First, a model must be induced incrementally. Second, processing time for instances must keep up with their speed of arrival. Third, a model may only use a constant amount of memory, and must be ready for prediction at any point in time. We attempt to overcome these restrictions by presenting a data stream classification algorithm where the data is split into a stream of disjoint batches. Single batches of data can be processed one after the other by any standard non-incremental learning algorithm. Our approach uses ensembles of decision trees. These tree ensembles are iteratively merged into a single interpretable model of constant maximal size. Using benchmark datasets the algorithm is evaluated for accuracy against state-of-the-art algorithms that make use of the entire dataset
Data Mining and Machine Learning in Astronomy
We review the current state of data mining and machine learning in astronomy.
'Data Mining' can have a somewhat mixed connotation from the point of view of a
researcher in this field. If used correctly, it can be a powerful approach,
holding the potential to fully exploit the exponentially increasing amount of
available data, promising great scientific advance. However, if misused, it can
be little more than the black-box application of complex computing algorithms
that may give little physical insight, and provide questionable results. Here,
we give an overview of the entire data mining process, from data collection
through to the interpretation of results. We cover common machine learning
algorithms, such as artificial neural networks and support vector machines,
applications from a broad range of astronomy, emphasizing those where data
mining techniques directly resulted in improved science, and important current
and future directions, including probability density functions, parallel
algorithms, petascale computing, and the time domain. We conclude that, so long
as one carefully selects an appropriate algorithm, and is guided by the
astronomical problem at hand, data mining can be very much the powerful tool,
and not the questionable black box.Comment: Published in IJMPD. 61 pages, uses ws-ijmpd.cls. Several extra
figures, some minor additions to the tex
Weka: A machine learning workbench for data mining
The Weka workbench is an organized collection of state-of-the-art machine learning algorithms and data preprocessing tools. The basic way of interacting with these methods is by invoking them from the command line. However, convenient interactive graphical user interfaces are provided for data exploration, for setting up large-scale experiments on distributed computing platforms, and for designing configurations for streamed data processing. These interfaces constitute an advanced environment for experimental data mining. The system is written in Java and distributed under the terms of the GNU General Public License
Joint Transaction Transmission and Channel Selection in Cognitive Radio Based Blockchain Networks: A Deep Reinforcement Learning Approach
To ensure that the data aggregation, data storage, and data processing are
all performed in a decentralized but trusted manner, we propose to use the
blockchain with the mining pool to support IoT services based on cognitive
radio networks. As such, the secondary user can send its sensing data, i.e.,
transactions, to the mining pools. After being verified by miners, the
transactions are added to the blocks. However, under the dynamics of the
primary channel and the uncertainty of the mempool state of the mining pool, it
is challenging for the secondary user to determine an optimal transaction
transmission policy. In this paper, we propose to use the deep reinforcement
learning algorithm to derive an optimal transaction transmission policy for the
secondary user. Specifically, we adopt a Double Deep-Q Network (DDQN) that
allows the secondary user to learn the optimal policy. The simulation results
clearly show that the proposed deep reinforcement learning algorithm
outperforms the conventional Q-learning scheme in terms of reward and learning
speed
Robust Learning from Bites for Data Mining
Some methods from statistical machine learning and from robust statistics have two drawbacks. Firstly, they are computer-intensive such that they can hardly be used for massive data sets, say with millions of data points. Secondly, robust and non-parametric confidence intervals for the predictions according to the fitted models are often unknown. Here, we propose a simple but general method to overcome these problems in the context of huge data sets. The method is scalable to the memory of the computer, can be distributed on several processors if available, and can help to reduce the computation time substantially. Our main focus is on robust general support vector machines (SVM) based on minimizing regularized risks. The method offers distribution-free confidence intervals for the median of the predictions. The approach can also be helpful to fit robust estimators in parametric models for huge data sets. --Breakdown point,convex risk minimization,data mining,distributed computing,influence function,logistic regression,robustness,scalability
Data mining technology for the evaluation of web-based teaching and learning systems
Instructional design for Web-based teaching and learning environments causes problems for two reasons. Firstly, virtual forms of teaching and learning result in little or no direct contact between instructor and learner, making the evaluation of course effectiveness difficult. Secondly, the Web as a relatively new teaching and learning medium still requires more research into learning processes with this technology. We propose data mining ā techniques to discover and extract knowledge from a database ā as a tool to support the analysis of student learning processes and the evaluation of the effectiveness and usability of
Web-based courses. We present and illustrate different data mining techniques for the evaluation of Web-based teaching and learning systems
- ā¦