Search CORE

213,331 research outputs found

Nonnegative matrix analysis for data clustering and compression

Author: Gong Liyun
Publication venue
Publication date
Field of study

Nonnegative matrix factorization (NMF) has becoming an increasingly popular data processing tool these years, widely used by various communities including computer vision, text mining and bioinformatics. It is able to approximate each data sample in a data collection by a linear combination of a set of nonnegative basis vectors weighted by nonnegative weights. This often enables meaningful interpretation of the data, motivates useful insights and facilitates tasks such as data compression, clustering and classification. These subsequently lead to various active roles of NMF in data analysis, e.g., dimensionality reduction tool [11, 75], clustering tool[94, 82, 13, 39], feature engine [40], source separation tool [38], etc. Different methods based on NMF are proposed in this thesis: The modification of k- means clustering is chosen as one of the initialisation methods for NMF. Experimental results demonstrate the excellence of this method with improved compression performance. Independent principal component analysis (IPCA) which combines the advantage of both principal component analysis (PCA) and independent component analysis (ICA) has been chosen as the significant initialisation method for NMF with improved clustering accuracy. We have proposed the new evolutionary optimization strategy for NMF driven by three proposed update schemes in the solution space, saying NMF rule (or original movement), firefly rule (or beta movement) and survival of the fittest rule (or best movement). This proposed update strategy facilitates both the clustering and compression problems by using the different system objective functions that make use of the clustering and compression quality measurements. A hybrid initialisation approach is used by including the state-of-the-art NMF initialization methods as seed knowledge to increase the rate of convergence. There is no limitation for the number and the type of the initialization methods used for the proposed optimisation approach. Numerous computer experiments using the benchmark datasets verify the theoretical results, make comparisons among the techniques in measures of clustering/compression accuracy. Experimental results demonstrate the excellence of these methods with im- proved clustering/compression performance. In the application of EEG dataset, we employed several standard algorithms to provide clustering on preprocessed EEG data. We also explored ensemble clustering to obtain some tight clusters. We can make some statements based on the results we have got: firstly, normalization is necessary for this EEG brain dataset to obtain reasonable clustering; secondly, k-means, k-medoids and HC-Ward provide relatively better clustering results; thirdly, ensemble clustering enables us to tune the tightness of the clusters so that the research can be focused

University of Liverpool Repository

Review of analytical instruments for EEG analysis

Author: Agapov S. N.
Bulanov V. A.
Sergeeva M. S.
Zakharov A. V.
Publication venue
Publication date: 04/03/2016
Field of study

Since it was first used in 1926, EEG has been one of the most useful instruments of neuroscience. In order to start using EEG data we need not only EEG apparatus, but also some analytical tools and skills to understand what our data mean. This article describes several classical analytical tools and also new one which appeared only several years ago. We hope it will be useful for those researchers who have only started working in the field of cognitive EEG

arXiv.org e-Print Archive

Collaborative Development and Evaluation of Text-processing Workflows in a UIMA-supported Web-based Workbench

Author: Ananiadou S
Rak R
Rowley A
Publication venue
Publication date: 01/05/2012
Field of study

Challenges in creating comprehensive text-processing worklows include a lack of the interoperability of individual components coming from different providers and/or a requirement imposed on the end users to know programming techniques to compose such workflows. In this paper we demonstrate Argo, a web-based system that addresses these issues in several ways. It supports the widely adopted Unstructured Information Management Architecture (UIMA), which handles the problem of interoperability; it provides a web browser-based interface for developing workflows by drawing diagrams composed of a selection of available processing components; and it provides novel user-interactive analytics such as the annotation editor which constitutes a bridge between automatic processing and manual correction. These features extend the target audience of Argo to users with a limited or no technical background. Here, we focus specifically on the construction of advanced workflows, involving multiple branching and merging points, to facilitate various comparative evalutions. Together with the use of user-collaboration capabilities supported in Argo, we demonstrate several use cases including visual inspections, comparisions of multiple processing segments or complete solutions against a reference standard, inter-annotator agreement, and shared task mass evaluations. Ultimetely, Argo emerges as a one-stop workbench for defining, processing, editing and evaluating text processing tasks

CiteSeerX

The University of Manchester - Institutional Repository

Estimation of capital and operation costs of backhoe loaders

Author: Lashgari A
Oraee B
Sayadi AR
Publication venue
Publication date: 12/09/2011
Field of study

Accurate estimation of equipment costs is a key factor in feasibility study and evaluation of design alternatives of mining projects. In this paper, capital and operation costs of backhoe loaders are estimated using multiple linear regression (MLR), based on principle component analysis (PCA). These cost functions are consisted of five independent variables; bucket size, digging depth, dump height, weight and horse power. The MLR is conducted in two steps. At the first correlation between independent variables is omitted using PCA technique. Thereafter, MLR functions are established using selected significant PCs and total cost functions are constituted as functions of initial variables. At the end, accuracy of functions are evaluated using mean absolute error rate method

Spiral - Imperial College Digital Repository

GraphCombEx: A Software Tool for Exploration of Combinatorial Optimisation Properties of Large Graphs

Author: A Rosete-Suárez
AL Barabási
C Bachmaier
C Binucci
D Brélaz
D Chalupa
D Chalupa
D Chalupa
D Chalupa
D Holten
David Chalupa
DJ Watts
DS Johnson
F Schreiber
G Csardi
I Xenarios
I Xenarios
I Xenarios
J Ellson
J Leskovec
J Leskovec
J Pattillo
JC Culberson
JS Turner
K Sugiyama
Ken A Hawick
L Salwinski
LM Abualigah
LM Abualigah
MEJ Newman
MM Halldórsson
MR Garey
MY Becker
P Bonami
P Csermely
R Albert
R Tamassia
U Brandes
U Brandes
V Chvátal
W Czech
Y Khosiawan
Z Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/01/2018
Field of study

We present a prototype of a software tool for exploration of multiple combinatorial optimisation problems in large real-world and synthetic complex networks. Our tool, called GraphCombEx (an acronym of Graph Combinatorial Explorer), provides a unified framework for scalable computation and presentation of high-quality suboptimal solutions and bounds for a number of widely studied combinatorial optimisation problems. Efficient representation and applicability to large-scale graphs and complex networks are particularly considered in its design. The problems currently supported include maximum clique, graph colouring, maximum independent set, minimum vertex clique covering, minimum dominating set, as well as the longest simple cycle problem. Suboptimal solutions and intervals for optimal objective values are estimated using scalable heuristics. The tool is designed with extensibility in mind, with the view of further problems and both new fast and high-performance heuristics to be added in the future. GraphCombEx has already been successfully used as a support tool in a number of recent research studies using combinatorial optimisation to analyse complex networks, indicating its promise as a research software tool

arXiv.org e-Print Archive

Crossref

VBN