Search CORE

299 research outputs found

Parallel Hierarchical Affinity Propagation with MapReduce

Author: Haber Rana
Mijatovic Nenad
Peter Adrian M.
Rose Dillon Mark
Rouly Jean Michel
Publication venue
Publication date: 28/03/2014
Field of study

The accelerated evolution and explosion of the Internet and social media is generating voluminous quantities of data (on zettabyte scales). Paramount amongst the desires to manipulate and extract actionable intelligence from vast big data volumes is the need for scalable, performance-conscious analytics algorithms. To directly address this need, we propose a novel MapReduce implementation of the exemplar-based clustering algorithm known as Affinity Propagation. Our parallelization strategy extends to the multilevel Hierarchical Affinity Propagation algorithm and enables tiered aggregation of unstructured data with minimal free parameters, in principle requiring only a similarity measure between data points. We detail the linear run-time complexity of our approach, overcoming the limiting quadratic complexity of the original algorithm. Experimental validation of our clustering methodology on a variety of synthetic and real data sets (e.g. images and point data) demonstrates our competitiveness against other state-of-the-art MapReduce clustering techniques

arXiv.org e-Print Archive

Crossref

Ontology mining for personalized search

Author: Li Yuefeng
Tao Xiaohui
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Knowledge discovery for user information needs in user local information repositories is a challenging task. Traditional data mining techniques cannot provide a satisfactory solution for this challenge, because there exists a lot of uncertainties in the local information repositories. In this chapter, we introduce ontology mining, a new methodology, for solving this challenging issue, which aims to discover interesting and useful knowledge in databases in order to meet the specified constraints on an ontology. In this way, users can efficiently specify their information needs on the ontology rather than dig useful knowledge from the huge amount of discorded patterns or rules. The proposed ontology mining model is evaluated by applying to an information gathering system, and the results are promising

University of Southern Queensland ePrints

Curbing domestic violence: instantiating C-K theory with formal concept analysis and emergent self organizing maps.

Author: Dedene Guido
Elzinga Paul
Poelmans Jonas
Viaene Stijn
Publication venue
Publication date
Field of study

In this paper we propose a human-centered process for knowledge discovery from unstructured text that makes use of Formal Concept Analysis and Emergent Self Organizing Maps. The knowledge discovery process is conceptualized and interpreted as successive iterations through the Concept-Knowledge (C-K) theory design square. To illustrate its effectiveness, we report on a real-life case study of using the process at the Amsterdam-Amstelland police in the Netherlands aimed at distilling concepts to identify domestic violence from the unstructured text in actual police reports. The case study allows us to show how the process was not only able to uncover the nature of a phenomenon such as domestic violence, but also enabled analysts to identify many types of anomalies in the practice of policing. We will illustrate how the insights obtained from this exercise resulted in major improvements in the management of domestic violence cases.Formal concept analysis; Emergent self organizing map; C-K theory; Text mining; Actionable knowledge discovery; Domestic violence;

Research Papers in Economics

Exploring Constraints Inconsistence for Value Decomposition and Dimension Selection Using Subspace Clustering

Author: Dr S Karthik
K Prema
K Sangeetha
Publication venue
Publication date: 02/04/2020
Field of study

Abstract: The datasets which are in the form of object-attribute-time is referred to as threedimensional (3D) data sets. As there are many timestamps in 3D datasets, it is very difficult to cluster. So a subspace clustering method is applied to cluster 3D data sets. Existing algorithms are inadequate to solve this clustering problem. Most of them are not actionable (ability to suggest profitable or beneficial action), and its 3D structure complicates clustering process. To cluster these three-dimensional (3D) data sets a new centroid based concept is introduced in the proposed system called PCA. This PCA framework is introduced to provide excellent performance on financial and stock domain datasets through the unique combination of Singular Value Decomposition, Principle Component Analysis and 3D frequent item set mining.PCA framework prunes the entire search space to identify the significant subspaces and clusters the datasets based on optimal centroid value. This framework acts as the parallelization technique to tackle the space and time complexities

CiteSeerX

Strategic activities in support of young French SMEs

Author: B. Augier
B. Branchet
B. Quere
J.P. Boissin
Publication venue
Publication date
Field of study

In this paper we closely study young French Small and Medium Enteprises (SMEs). We highlight the structure of this target firms and we build a typology of corresponding business models. The business models stemming from this typology are typical (to the greatest extent possible) and actionable. We are particularly intersted in indentifying groups of SMEs where gouvernement assistance would be particularly effective and strategically valuable for the national economy. One of our conclusins is that the typology is not based on a classical growth model that reflects progressive phases of developement in the life of a young firm. Furthermore, it is ineffective and wasteful to focus governement assistance efforts on firms based on their age. We identify groups of business models where assistance would be more effecient and strategically more effective.SME ; growth ; growth model ; typology

Research Papers in Economics

Machine learning for Internet of Things data analysis: A survey

Author: Adibi Peyman
Barekatain Mohammadamin
Barnaghi Payam
Mahdavinejad Mohammad Saeid
Rezvan Mohammadreza
Sheth Amit P.
Publication venue: 'Elsevier BV'
Publication date: 12/10/2017
Field of study

Rapid developments in hardware, software, and communication technologies have allowed the emergence of Internet-connected sensory devices that provide observation and data measurement from the physical world. By 2020, it is estimated that the total number of Internet-connected devices being used will be between 25 and 50 billion. As the numbers grow and technologies become more mature, the volume of data published will increase. Internet-connected devices technology, referred to as Internet of Things (IoT), continues to extend the current Internet by providing connectivity and interaction between the physical and cyber worlds. In addition to increased volume, the IoT generates Big Data characterized by velocity in terms of time and location dependency, with a variety of multiple modalities and varying data quality. Intelligent processing and analysis of this Big Data is the key to developing smart IoT applications. This article assesses the different machine learning methods that deal with the challenges in IoT data by considering smart cities as the main use case. The key contribution of this study is presentation of a taxonomy of machine learning algorithms explaining how different techniques are applied to the data in order to extract higher level information. The potential and challenges of machine learning for IoT data analytics will also be discussed. A use case of applying Support Vector Machine (SVM) on Aarhus Smart City traffic data is presented for a more detailed exploration.Comment: Digital Communications and Networks (2017

arXiv.org e-Print Archive

Directory of Open Access Journals

Scholar Commons - Institutional Repository of the University of South Carolina

University of Surrey

Surrey Research Insight

CORE