8,466 research outputs found
Data Mining Using Relational Database Management Systems
Software packages providing a whole set of data mining and machine learning algorithms are attractive because they allow experimentation with many kinds of algorithms in an easy setup. However, these packages are often based on main-memory data structures, limiting the amount of data they can handle. In this paper we use a relational database as secondary storage in order to eliminate this limitation. Unlike existing approaches, which often focus on optimizing a single algorithm to work with a database backend, we propose a general approach, which provides a database interface for several algorithms at once. We have taken a popular machine learning software package, Weka, and added a relational storage manager as back-tier to the system. The extension is transparent to the algorithms implemented in Weka, since it is hidden behind Weka’s standard main-memory data structure interface. Furthermore, some general mining tasks are transfered into the database system to speed up execution. We tested the extended system, refered to as WekaDB, and our results show that it achieves a much higher scalability than Weka, while providing the same output and maintaining good computation time
Enhancing Undergraduate AI Courses through Machine Learning Projects
It is generally recognized that an undergraduate introductory Artificial Intelligence course is challenging to teach. This is, in part, due to the diverse and seemingly disconnected core topics that are typically covered. The paper presents work funded by the National Science Foundation to address this problem and to enhance the student learning experience in the course. Our work involves the development of an adaptable framework for the presentation of core AI topics through a unifying theme of machine learning. A suite of hands-on semester-long projects are developed, each involving the design and implementation of a learning system that enhances a commonly-deployed application. The projects use machine learning as a unifying theme to tie together the core AI topics. In this paper, we will first provide an overview of our model and the projects being developed and will then present in some detail our experiences with one of the projects – Web User Profiling which we have used in our AI class
A traffic classification method using machine learning algorithm
Applying concepts of attack investigation in IT industry, this idea has been developed to design
a Traffic Classification Method using Data Mining techniques at the intersection of Machine
Learning Algorithm, Which will classify the normal and malicious traffic. This classification will
help to learn about the unknown attacks faced by IT industry. The notion of traffic classification
is not a new concept; plenty of work has been done to classify the network traffic for
heterogeneous application nowadays. Existing techniques such as (payload based, port based
and statistical based) have their own pros and cons which will be discussed in this
literature later, but classification using Machine Learning techniques is still an open field to explore and has provided very promising results up till now
Review of analytical instruments for EEG analysis
Since it was first used in 1926, EEG has been one of the most useful
instruments of neuroscience. In order to start using EEG data we need not only
EEG apparatus, but also some analytical tools and skills to understand what our
data mean. This article describes several classical analytical tools and also
new one which appeared only several years ago. We hope it will be useful for
those researchers who have only started working in the field of cognitive EEG
Mining Bad Credit Card Accounts from OLAP and OLTP
Credit card companies classify accounts as a good or bad based on historical
data where a bad account may default on payments in the near future. If an
account is classified as a bad account, then further action can be taken to
investigate the actual nature of the account and take preventive actions. In
addition, marking an account as "good" when it is actually bad, could lead to
loss of revenue - and marking an account as "bad" when it is actually good,
could lead to loss of business. However, detecting bad credit card accounts in
real time from Online Transaction Processing (OLTP) data is challenging due to
the volume of data needed to be processed to compute the risk factor. We
propose an approach which precomputes and maintains the risk probability of an
account based on historical transactions data from offline data or data from a
data warehouse. Furthermore, using the most recent OLTP transactional data,
risk probability is calculated for the latest transaction and combined with the
previously computed risk probability from the data warehouse. If accumulated
risk probability crosses a predefined threshold, then the account is treated as
a bad account and is flagged for manual verification.Comment: Conference proceedings of ICCDA, 201
Automated data pre-processing via meta-learning
The final publication is available at link.springer.comA data mining algorithm may perform differently on datasets with different characteristics, e.g., it might perform better on a dataset with continuous attributes rather than with categorical attributes, or the other way around.
As a matter of fact, a dataset usually needs to be pre-processed. Taking into account all the possible pre-processing operators, there exists a staggeringly large number of alternatives and nonexperienced users become overwhelmed.
We show that this problem can be addressed by an automated approach, leveraging ideas from metalearning.
Specifically, we consider a wide range of data pre-processing techniques and a set of data mining algorithms. For each data mining algorithm and selected dataset, we are able to predict the transformations that improve the result
of the algorithm on the respective dataset. Our approach will help non-expert users to more effectively identify the transformations appropriate to their applications, and hence to achieve improved results.Peer ReviewedPostprint (published version
- …