299 research outputs found
Parallel Hierarchical Affinity Propagation with MapReduce
The accelerated evolution and explosion of the Internet and social media is
generating voluminous quantities of data (on zettabyte scales). Paramount
amongst the desires to manipulate and extract actionable intelligence from vast
big data volumes is the need for scalable, performance-conscious analytics
algorithms. To directly address this need, we propose a novel MapReduce
implementation of the exemplar-based clustering algorithm known as Affinity
Propagation. Our parallelization strategy extends to the multilevel
Hierarchical Affinity Propagation algorithm and enables tiered aggregation of
unstructured data with minimal free parameters, in principle requiring only a
similarity measure between data points. We detail the linear run-time
complexity of our approach, overcoming the limiting quadratic complexity of the
original algorithm. Experimental validation of our clustering methodology on a
variety of synthetic and real data sets (e.g. images and point data)
demonstrates our competitiveness against other state-of-the-art MapReduce
clustering techniques
Ontology mining for personalized search
Knowledge discovery for user information needs in user local information repositories is a challenging task. Traditional data mining techniques cannot provide a satisfactory solution for this challenge, because there exists a lot of uncertainties in the local information repositories. In this chapter, we introduce ontology mining,
a new methodology, for solving this challenging issue, which aims to discover interesting and useful knowledge in databases in order to meet the specified constraints on an ontology. In this way, users can efficiently specify their information needs on the ontology rather than dig useful knowledge from the huge amount of discorded patterns or rules. The proposed ontology mining model is evaluated by applying to an information gathering system, and the results are promising
Curbing domestic violence: instantiating C-K theory with formal concept analysis and emergent self organizing maps.
In this paper we propose a human-centered process for knowledge discovery from unstructured text that makes use of Formal Concept Analysis and Emergent Self Organizing Maps. The knowledge discovery process is conceptualized and interpreted as successive iterations through the Concept-Knowledge (C-K) theory design square. To illustrate its effectiveness, we report on a real-life case study of using the process at the Amsterdam-Amstelland police in the Netherlands aimed at distilling concepts to identify domestic violence from the unstructured text in actual police reports. The case study allows us to show how the process was not only able to uncover the nature of a phenomenon such as domestic violence, but also enabled analysts to identify many types of anomalies in the practice of policing. We will illustrate how the insights obtained from this exercise resulted in major improvements in the management of domestic violence cases.Formal concept analysis; Emergent self organizing map; C-K theory; Text mining; Actionable knowledge discovery; Domestic violence;
Exploring Constraints Inconsistence for Value Decomposition and Dimension Selection Using Subspace Clustering
Abstract: The datasets which are in the form of object-attribute-time is referred to as threedimensional (3D) data sets. As there are many timestamps in 3D datasets, it is very difficult to cluster. So a subspace clustering method is applied to cluster 3D data sets. Existing algorithms are inadequate to solve this clustering problem. Most of them are not actionable (ability to suggest profitable or beneficial action), and its 3D structure complicates clustering process. To cluster these three-dimensional (3D) data sets a new centroid based concept is introduced in the proposed system called PCA. This PCA framework is introduced to provide excellent performance on financial and stock domain datasets through the unique combination of Singular Value Decomposition, Principle Component Analysis and 3D frequent item set mining.PCA framework prunes the entire search space to identify the significant subspaces and clusters the datasets based on optimal centroid value. This framework acts as the parallelization technique to tackle the space and time complexities
Strategic activities in support of young French SMEs
In this paper we closely study young French Small and Medium Enteprises (SMEs). We highlight the structure of this target firms and we build a typology of corresponding business models. The business models stemming from this typology are typical (to the greatest extent possible) and actionable. We are particularly intersted in indentifying groups of SMEs where gouvernement assistance would be particularly effective and strategically valuable for the national economy. One of our conclusins is that the typology is not based on a classical growth model that reflects progressive phases of developement in the life of a young firm. Furthermore, it is ineffective and wasteful to focus governement assistance efforts on firms based on their age. We identify groups of business models where assistance would be more effecient and strategically more effective.SME ; growth ; growth model ; typology
Machine learning for Internet of Things data analysis: A survey
Rapid developments in hardware, software, and communication technologies have
allowed the emergence of Internet-connected sensory devices that provide
observation and data measurement from the physical world. By 2020, it is
estimated that the total number of Internet-connected devices being used will
be between 25 and 50 billion. As the numbers grow and technologies become more
mature, the volume of data published will increase. Internet-connected devices
technology, referred to as Internet of Things (IoT), continues to extend the
current Internet by providing connectivity and interaction between the physical
and cyber worlds. In addition to increased volume, the IoT generates Big Data
characterized by velocity in terms of time and location dependency, with a
variety of multiple modalities and varying data quality. Intelligent processing
and analysis of this Big Data is the key to developing smart IoT applications.
This article assesses the different machine learning methods that deal with the
challenges in IoT data by considering smart cities as the main use case. The
key contribution of this study is presentation of a taxonomy of machine
learning algorithms explaining how different techniques are applied to the data
in order to extract higher level information. The potential and challenges of
machine learning for IoT data analytics will also be discussed. A use case of
applying Support Vector Machine (SVM) on Aarhus Smart City traffic data is
presented for a more detailed exploration.Comment: Digital Communications and Networks (2017
- …