22,397 research outputs found
Using Multi-Core HW/SW Co-design Architecture for Accelerating K-means Clustering Algorithm
The capability of classifying and clustering a desired set of data is an
essential part of building knowledge from data. However, as the size and
dimensionality of input data increases, the run-time for such clustering
algorithms is expected to grow superlinearly, making it a big challenge when
dealing with BigData. K-mean clustering is an essential tool for many big data
applications including data mining, predictive analysis, forecasting studies,
and machine learning. However, due to large size (volume) of Big-Data, and
large dimensionality of its data points, even the application of a simple
k-mean clustering may become extremely time and resource demanding. Specially
when it is necessary to have a fast and modular dataset analysis flow. In this
paper, we demonstrate that using a two-level filtering algorithm based on
binary kd-tree structure is able to decrease the time of convergence in K-means
algorithm for large datasets. The two-level filtering algorithm based on binary
kd-tree structure evolves the SW to naturally divide the classification into
smaller data sets, based on the number of available cores and size of logic
available in a target FPGA. The empirical result on this two-level structure
over multi-core FPGA-based architecture provides 330X speed-up compared to a
conventional software-only solution
FI-GRL: Fast Inductive Graph Representation Learning via Projection-Cost Preservation
Graph representation learning aims at transforming graph data into meaningful
low-dimensional vectors to facilitate the employment of machine learning and
data mining algorithms designed for general data. Most current graph
representation learning approaches are transductive, which means that they
require all the nodes in the graph are known when learning graph
representations and these approaches cannot naturally generalize to unseen
nodes. In this paper, we present a Fast Inductive Graph Representation Learning
framework (FI-GRL) to learn nodes' low-dimensional representations. Our
approach can obtain accurate representations for seen nodes with provable
theoretical guarantees and can easily generalize to unseen nodes. Specifically,
in order to explicitly decouple nodes' relations expressed by the graph, we
transform nodes into a randomized subspace spanned by a random projection
matrix. This stage is guaranteed to preserve the projection-cost of the
normalized random walk matrix which is highly related to the normalized cut of
the graph. Then feature extraction is achieved by conducting singular value
decomposition on the obtained matrix sketch. By leveraging the property of
projection-cost preservation on the matrix sketch, the obtained representation
result is nearly optimal. To deal with unseen nodes, we utilize folding-in
technique to learn their meaningful representations. Empirically, when the
amount of seen nodes are larger than that of unseen nodes, FI-GRL always
achieves excellent results. Our algorithm is fast, simple to implement and
theoretically guaranteed. Extensive experiments on real datasets demonstrate
the superiority of our algorithm on both efficacy and efficiency over both
macroscopic level (clustering) and microscopic level (structural hole
detection) applications.Comment: ICDM 2018, Full Versio
An intelligent recommendation system framework for student relationship management
In order to enhance student satisfaction, many services have been provided in order to meet student needs. A recommendation system is a significant service which can be used to assist students in several ways. This paper proposes a conceptual framework of an Intelligent Recommendation System in order to support Student Relationship Management (SRM) for a Thai private university. This article proposed the system architecture of an Intelligent Recommendation System (IRS) which aims to assist students to choose an appropriate course for their studies. Moreover, this study intends to compare different data mining techniques in various recommendation systems and to determine appropriate algorithms for the proposed electronic Intelligent Recommendation System (IRS). The IRS also aims to support Student Relationship Management (SRM) in the university. The IRS has been designed using data mining and artificial intelligent techniques such as clustering, association rule and classification
A Multi-Disciplinary Review of Knowledge Acquisition Methods: From Human to Autonomous Eliciting Agents
This paper offers a multi-disciplinary review of knowledge acquisition
methods in human activity systems. The review captures the degree of
involvement of various types of agencies in the knowledge acquisition process,
and proposes a classification with three categories of methods: the human
agent, the human-inspired agent, and the autonomous machine agent methods. In
the first two categories, the acquisition of knowledge is seen as a cognitive
task analysis exercise, while in the third category knowledge acquisition is
treated as an autonomous knowledge-discovery endeavour. The motivation for this
classification stems from the continuous change over time of the structure,
meaning and purpose of human activity systems, which are seen as the factor
that fuelled researchers' and practitioners' efforts in knowledge acquisition
for more than a century.
We show through this review that the KA field is increasingly active due to
the higher and higher pace of change in human activity, and conclude by
discussing the emergence of a fourth category of knowledge acquisition methods,
which are based on red-teaming and co-evolution
A Data as a Service (DaaS) Model for GPU-based Data Analytics
Cloud-based services with resources to be provisioned for consumers are
increasingly the norm, especially with respect to Big data, spatiotemporal data
mining and application services that impose a user's agreed Quality of Service
(QoS) rules or Service Level Agreement (SLA). Considering the pervasive nature
of data centers and cloud system, there is a need for a real-time analytics of
the systems considering cost, utility and energy. This work presents an overlay
model of GPU system for Data As A Service (DaaS) to give a real-time data
analysis of network data, customers, investors and users' data from the
datacenters or cloud system. Using a modeled layer to define a learning
protocol and system, we give a custom, profitable system for DaaS on GPU. The
GPU-enabled pre-processing and initial operations of the clustering model
analysis is promising as shown in the results. We examine the model on
real-world data sets to model a big data set or spatiotemporal data mining
services. We also produce results of our model with clustering, neural
networks' Self-organizing feature maps (SOFM or SOM) to produce a distribution
of the clustering for DaaS model. The experimental results thus far show a
promising model that could enhance SLA and or QoS based DaaS.Comment: Accepted, 23 December 2017, by the IEEE IFIP NTMS Workshop on Big
Data and Emerging Trends WBD-ET 2018; it was later withdrawn because of
funding issues. An extended/enhanced version will be published in future
dates in related journal
The Survey of Data Mining Applications And Feature Scope
In this paper we have focused a variety of techniques, approaches and
different areas of the research which are helpful and marked as the important
field of data mining Technologies. As we are aware that many Multinational
companies and large organizations are operated in different places of the
different countries.Each place of operation may generate large volumes of data.
Corporate decision makers require access from all such sources and take
strategic decisions.The data warehouse is used in the significant business
value by improving the effectiveness of managerial decision-making. In an
uncertain and highly competitive business environment, the value of strategic
information systems such as these are easily recognized however in todays
business environment,efficiency or speed is not the only key for
competitiveness.This type of huge amount of data are available in the form of
tera-topeta-bytes which has drastically changed in the areas of science and
engineering.To analyze,manage and make a decision of such type of huge amount
of data we need techniques called the data mining which will transforming in
many fields.This paper imparts more number of applications of the data mining
and also focuses scope of the data mining which will helpful in the further
research.Comment: International Journal of Computer Science, Engineering and
Information Technology (IJCSEIT), Vol.2, No.3, June 2012, 16 pages, 1 tabl
Exploiting Deep Features for Remote Sensing Image Retrieval: A Systematic Investigation
Remote sensing (RS) image retrieval is of great significant for geological
information mining. Over the past two decades, a large amount of research on
this task has been carried out, which mainly focuses on the following three
core issues: feature extraction, similarity metric and relevance feedback. Due
to the complexity and multiformity of ground objects in high-resolution remote
sensing (HRRS) images, there is still room for improvement in the current
retrieval approaches. In this paper, we analyze the three core issues of RS
image retrieval and provide a comprehensive review on existing methods.
Furthermore, for the goal to advance the state-of-the-art in HRRS image
retrieval, we focus on the feature extraction issue and delve how to use
powerful deep representations to address this task. We conduct systematic
investigation on evaluating correlative factors that may affect the performance
of deep features. By optimizing each factor, we acquire remarkable retrieval
results on publicly available HRRS datasets. Finally, we explain the
experimental phenomenon in detail and draw conclusions according to our
analysis. Our work can serve as a guiding role for the research of
content-based RS image retrieval
Grid-based Approaches for Distributed Data Mining Applications
The data mining field is an important source of large-scale applications and
datasets which are getting more and more common. In this paper, we present
grid-based approaches for two basic data mining applications, and a performance
evaluation on an experimental grid environment that provides interesting
monitoring capabilities and configuration tools. We propose a new distributed
clustering approach and a distributed frequent itemsets generation well-adapted
for grid environments. Performance evaluation is done using the Condor system
and its workflow manager DAGMan. We also compare this performance analysis to a
simple analytical model to evaluate the overheads related to the workflow
engine and the underlying grid system. This will specifically show that
realistic performance expectations are currently difficult to achieve on the
grid
Survey of data mining approaches to user modeling for adaptive hypermedia
The ability of an adaptive hypermedia system to create tailored environments depends mainly on the amount and accuracy of information stored in each user model. Some of the difficulties that user modeling faces are the amount of data available to create user models, the adequacy of the data, the noise within that data, and the necessity of capturing the imprecise nature of human behavior. Data mining and machine learning techniques have the ability to handle large amounts of data and to process uncertainty. These characteristics make these techniques suitable for automatic generation of user models that simulate human decision making. This paper surveys different data mining techniques that can be used to efficiently and accurately capture user behavior. The paper also presents guidelines that show which techniques may be used more efficiently according to the task implemented by the applicatio
- …