74,935 research outputs found
A Survey of Heterogeneous Information Network Analysis
Most real systems consist of a large number of interacting, multi-typed
components, while most contemporary researches model them as homogeneous
networks, without distinguishing different types of objects and links in the
networks. Recently, more and more researchers begin to consider these
interconnected, multi-typed data as heterogeneous information networks, and
develop structural analysis approaches by leveraging the rich semantic meaning
of structural types of objects and links in the networks. Compared to widely
studied homogeneous network, the heterogeneous information network contains
richer structure and semantic information, which provides plenty of
opportunities as well as a lot of challenges for data mining. In this paper, we
provide a survey of heterogeneous information network analysis. We will
introduce basic concepts of heterogeneous information network analysis, examine
its developments on different data mining tasks, discuss some advanced topics,
and point out some future research directions.Comment: 45 pages, 12 figure
A Multi-Disciplinary Review of Knowledge Acquisition Methods: From Human to Autonomous Eliciting Agents
This paper offers a multi-disciplinary review of knowledge acquisition
methods in human activity systems. The review captures the degree of
involvement of various types of agencies in the knowledge acquisition process,
and proposes a classification with three categories of methods: the human
agent, the human-inspired agent, and the autonomous machine agent methods. In
the first two categories, the acquisition of knowledge is seen as a cognitive
task analysis exercise, while in the third category knowledge acquisition is
treated as an autonomous knowledge-discovery endeavour. The motivation for this
classification stems from the continuous change over time of the structure,
meaning and purpose of human activity systems, which are seen as the factor
that fuelled researchers' and practitioners' efforts in knowledge acquisition
for more than a century.
We show through this review that the KA field is increasingly active due to
the higher and higher pace of change in human activity, and conclude by
discussing the emergence of a fourth category of knowledge acquisition methods,
which are based on red-teaming and co-evolution
State of the Art, Evaluation and Recommendations regarding "Document Processing and Visualization Techniques"
Several Networks of Excellence have been set up in the framework of the
European FP5 research program. Among these Networks of Excellence, the NEMIS
project focuses on the field of Text Mining.
Within this field, document processing and visualization was identified as
one of the key topics and the WG1 working group was created in the NEMIS
project, to carry out a detailed survey of techniques associated with the text
mining process and to identify the relevant research topics in related research
areas.
In this document we present the results of this comprehensive survey. The
report includes a description of the current state-of-the-art and practice, a
roadmap for follow-up research in the identified areas, and recommendations for
anticipated technological development in the domain of text mining.Comment: 54 pages, Report of Working Group 1 for the European Network of
Excellence (NoE) in Text Mining and its Applications in Statistics (NEMIS
Toward a Distributed Knowledge Discovery system for Grid systems
During the last decade or so, we have had a deluge of data from not only
science fields but also industry and commerce fields. Although the amount of
data available to us is constantly increasing, our ability to process it
becomes more and more difficult. Efficient discovery of useful knowledge from
these datasets is therefore becoming a challenge and a massive economic need.
This led to the need of developing large-scale data mining (DM) techniques to
deal with these huge datasets either from science or economic applications. In
this chapter, we present a new DDM system combining dataset-driven and
architecture-driven strategies. Data-driven strategies will consider the size
and heterogeneity of the data, while architecture driven will focus on the
distribution of the datasets. This system is based on a Grid middleware tools
that integrate appropriate large data manipulation operations. Therefore, this
allows more dynamicity and autonomicity during the mining, integrating and
processing phase
Network Vector: Distributed Representations of Networks with Global Context
We propose a neural embedding algorithm called Network Vector, which learns
distributed representations of nodes and the entire networks simultaneously. By
embedding networks in a low-dimensional space, the algorithm allows us to
compare networks in terms of structural similarity and to solve outstanding
predictive problems. Unlike alternative approaches that focus on node level
features, we learn a continuous global vector that captures each node's global
context by maximizing the predictive likelihood of random walk paths in the
network. Our algorithm is scalable to real world graphs with many nodes. We
evaluate our algorithm on datasets from diverse domains, and compare it with
state-of-the-art techniques in node classification, role discovery and concept
analogy tasks. The empirical results show the effectiveness and the efficiency
of our algorithm
Multi Relational Data Mining Approaches: A Data Mining Technique
The multi relational data mining approach has developed as an alternative way
for handling the structured data such that RDBMS. This will provides the mining
in multiple tables directly. In MRDM the patterns are available in multiple
tables (relations) from a relational database. As the data are available over
the many tables which will affect the many problems in the practice of the data
mining. To deal with this problem, one either constructs a single table by
Propositionalisation, or uses a Multi-Relational Data Mining algorithm. MRDM
approaches have been successfully applied in the area of bioinformatics. Three
popular pattern finding techniques classification, clustering and association
are frequently used in MRDM. Multi relational approach has developed as an
alternative for analyzing the structured data such as relational database. MRDM
allowing applying directly in the data mining in multiple tables. To avoid the
expensive joining operations and semantic losses we used the MRDM technique.
This paper focuses some of the application areas of MRDM and feature directions
as well as the comparison of ILP, GM, SSDM and MRDMComment: 10 pages, 1 Figure, 3 Tables "Published with International Journal of
Computer Applications (IJCA)
Combining complex networks and data mining: why and how
The increasing power of computer technology does not dispense with the need
to extract meaningful in- formation out of data sets of ever growing size, and
indeed typically exacerbates the complexity of this task. To tackle this
general problem, two methods have emerged, at chronologically different times,
that are now commonly used in the scientific community: data mining and complex
network theory. Not only do complex network analysis and data mining share the
same general goal, that of extracting information from complex systems to
ultimately create a new compact quantifiable representation, but they also
often address similar problems too. In the face of that, a surprisingly low
number of researchers turn out to resort to both methodologies. One may then be
tempted to conclude that these two fields are either largely redundant or
totally antithetic. The starting point of this review is that this state of
affairs should be put down to contingent rather than conceptual differences,
and that these two fields can in fact advantageously be used in a synergistic
manner. An overview of both fields is first provided, some fundamental concepts
of which are illustrated. A variety of contexts in which complex network theory
and data mining have been used in a synergistic manner are then presented.
Contexts in which the appropriate integration of complex network metrics can
lead to improved classification rates with respect to classical data mining
algorithms and, conversely, contexts in which data mining can be used to tackle
important issues in complex network theory applications are illustrated.
Finally, ways to achieve a tighter integration between complex networks and
data mining, and open lines of research are discussed.Comment: 58 pages, 19 figure
ABACUS: frequent pAttern mining-BAsed Community discovery in mUltidimensional networkS
Community Discovery in complex networks is the problem of detecting, for each
node of the network, its membership to one of more groups of nodes, the
communities, that are densely connected, or highly interactive, or, more in
general, similar, according to a similarity function. So far, the problem has
been widely studied in monodimensional networks, i.e. networks where only one
connection between two entities can exist. However, real networks are often
multidimensional, i.e., multiple connections between any two nodes can exist,
either reflecting different kinds of relationships, or representing different
values of the same type of tie. In this context, the problem of Community
Discovery has to be redefined, taking into account multidimensional structure
of the graph. We define a new concept of community that groups together nodes
sharing memberships to the same monodimensional communities in the different
single dimensions. As we show, such communities are meaningful and able to
group highly correlated nodes, even if they might not be connected in any of
the monodimensional networks. We devise ABACUS (Apriori-BAsed Community
discoverer in mUltidimensional networkS), an algorithm that is able to extract
multidimensional communities based on the apriori itemset miner applied to
monodimensional community memberships. Experiments on two different real
multidimensional networks confirm the meaningfulness of the introduced
concepts, and open the way for a new class of algorithms for community
discovery that do not rely on the dense connections among nodes
The Survey of Data Mining Applications And Feature Scope
In this paper we have focused a variety of techniques, approaches and
different areas of the research which are helpful and marked as the important
field of data mining Technologies. As we are aware that many Multinational
companies and large organizations are operated in different places of the
different countries.Each place of operation may generate large volumes of data.
Corporate decision makers require access from all such sources and take
strategic decisions.The data warehouse is used in the significant business
value by improving the effectiveness of managerial decision-making. In an
uncertain and highly competitive business environment, the value of strategic
information systems such as these are easily recognized however in todays
business environment,efficiency or speed is not the only key for
competitiveness.This type of huge amount of data are available in the form of
tera-topeta-bytes which has drastically changed in the areas of science and
engineering.To analyze,manage and make a decision of such type of huge amount
of data we need techniques called the data mining which will transforming in
many fields.This paper imparts more number of applications of the data mining
and also focuses scope of the data mining which will helpful in the further
research.Comment: International Journal of Computer Science, Engineering and
Information Technology (IJCSEIT), Vol.2, No.3, June 2012, 16 pages, 1 tabl
Online Machine Learning in Big Data Streams
The area of online machine learning in big data streams covers algorithms
that are (1) distributed and (2) work from data streams with only a limited
possibility to store past data. The first requirement mostly concerns software
architectures and efficient algorithms. The second one also imposes nontrivial
theoretical restrictions on the modeling methods: In the data stream model,
older data is no longer available to revise earlier suboptimal modeling
decisions as the fresh data arrives.
In this article, we provide an overview of distributed software architectures
and libraries as well as machine learning models for online learning. We
highlight the most important ideas for classification, regression,
recommendation, and unsupervised modeling from streaming data, and we show how
they are implemented in various distributed data stream processing systems.
This article is a reference material and not a survey. We do not attempt to
be comprehensive in describing all existing methods and solutions; rather, we
give pointers to the most important resources in the field. All related
sub-fields, online algorithms, online learning, and distributed data processing
are hugely dominant in current research and development with conceptually new
research results and software components emerging at the time of writing. In
this article, we refer to several survey results, both for distributed data
processing and for online machine learning. Compared to past surveys, our
article is different because we discuss recommender systems in extended detail
- …