Search CORE

Recommended from our members

Multi-perspective analysis of mobile phone call data records: A visual analytics approach

Author: G Andrienko
G Andrienko
G Andrienko
G Andrienko
G Andrienko
JW Sammon
N Andrienko
N Andrienko
N Andrienko
RH Güting
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Analysis of human mobility is currently a hot research topic in data mining, geographic information science and visual analytics. While a wide variety of methods and tools are available, it is still hard to find recommendations for considering a data set systematically from multiple perspectives. To fill this gap, we demonstrate a workflow of a comprehensive analysis of a publicly available data set about mobile phone calls of a large population over a long time period. We pay special attention to the evaluation of data properties. We outline potential applications of the proposed methods

City Research Online

Fraunhofer-ePrints

Recommended from our members

A visual analytics framework for spatio-temporal analysis and modelling

Author: D Guo
D Keim
G Andrienko
G Andrienko
Gennady Andrienko
JW Sammon
K Matković
MC Hao
N Andrienko
Natalia Andrienko
PC Kyriakidis
PC Kyriakidis
R Maciejewski
R Maciejewski
S Rinzivillo
T Kohonen
T Schreck
U Demšar
Y Kamarianakis
Y Kamarianakis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

To support analysis and modelling of large amounts of spatio-temporal data having the form of spatially referenced time series (TS) of numeric values, we combine interactive visual techniques with computational methods from machine learning and statistics. Clustering methods and interactive techniques are used to group TS by similarity. Statistical methods for TS modelling are then applied to representative TS derived from the groups of similar TS. The framework includes interactive visual interfaces to a library of modelling methods supporting the selection of a suitable method, adjustment of model parameters, and evaluation of the models obtained. The models can be externally stored, communicated, and used for prediction and in further computational analyses. From the visual analytics perspective, the framework suggests a way to externalize spatio-temporal patterns emerging in the mind of the analyst as a result of interactive visual analysis: the patterns are represented in the form of computer-processable and reusable models. From the statistical analysis perspective, the framework demonstrates how TS analysis and modelling can be supported by interactive visual interfaces, particularly, in a case of numerous TS that are hard to analyse individually. From the application perspective, the framework suggests a way to analyse large numbers of spatial TS with the use of well-established statistical methods for TS analysis

City Research Online

Fraunhofer-ePrints

Hybrid cloud and cluster computing paradigms for life science applications

Author: Adam Hughes
Bingjing Zhang
C Evangelinos
Chu
E Walker
G Fox
GC Fox
GC Fox
GC Fox
Geoffrey Fox
Hui Li
J Dean
J Ekanayake
J Ekanayake
J Ekanayake
J Ekanayake
J Ekanayake
J Lange
Jaliya Ekanayake
Jong Youl Choi
Judy Qiu
JW Sammon
Saliya Ekanayake
Seung-Hee Bae
SH Bae
T Gunarathne
Tak-Lon Wu
Thilina Gunarathne
X Qiu
Yang Ruan
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Springer - Publisher Connector

Mining protein loops using a structural alphabet and statistical exceptionality

Author: A Dembo
A Efimov
A Golovin
A Sacan
A Via
AC Camproux
AC Camproux
AC Camproux
Anne-Claude Camproux
AR Panchenko
AR Panchenko
B Oliva
BJ Polacco
BL Sibanda
BL Sibanda
BL Sibanda
BW Matthews
C Kiss
CG Hunter
CM Venkatachalam
D Leader
D Stuart
DF Burke
E Rocha
EG Hutchinson
EJ Milner-White
EJ Milner-White
F den Hollander
G Ausiello
G Ausiello
G Nuel
G Nuel
G Nuel
G Pugalenthi
GD Rose
Gregory Nuel
J Espadaler
J Martin
J Martin
J van Helden
J Wojcik
JF Leszczynski
JM Kwasigroch
JS Fetrow
JS Richardson
Juliette Martin
JW Sammon
JW Torrance
KC Chou
L Regad
LE Donate
Leslie Regad
LN Johnson
LR Rabiner
LS Bernstein
M Hollander
M Mönnigmann
M Saraste
MY Leung
N Colloc'h
N Fernandez-Fuentes
N Fernandez-Fuentes
O Sander
P Fuchs
PA Rice
PN Lewis
R Kolodny
S Karlin
S Kim
S Kullback
S Sourice
SA Benner
SA Benner
SD Rufino
V Pavone
W Kabsch
W Li
W Li
WL DeLano
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Protein loops encompass 50% of protein residues in available three-dimensional structures. These regions are often involved in protein functions, e.g. binding site, catalytic pocket... However, the description of protein loops with conventional tools is an uneasy task. Regular secondary structures, helices and strands, have been widely studied whereas loops, because they are highly variable in terms of sequence and structure, are difficult to analyze. Due to data sparsity, long loops have rarely been systematically studied. Results We developed a simple and accurate method that allows the description and analysis of the structures of short and long loops using structural motifs without restriction on loop length. This method is based on the structural alphabet HMM-SA. HMM-SA allows the simplification of a three-dimensional protein structure into a one-dimensional string of states, where each state is a four-residue prototype fragment, called structural letter. The difficult task of the structural grouping of huge data sets is thus easily accomplished by handling structural letter strings as in conventional protein sequence analysis. We systematically extracted all seven-residue fragments in a bank of 93000 protein loops and grouped them according to the structural-letter sequence, named structural word. This approach permits a systematic analysis of loops of all sizes since we consider the structural motifs of seven residues rather than complete loops. We focused the analysis on highly recurrent words of loops (observed more than 30 times). Our study reveals that 73% of loop-lengths are covered by only 3310 highly recurrent structural words out of 28274 observed words). These structural words have low structural variability (mean RMSd of 0.85 Å). As expected, half of these motifs display a flanking-region preference but interestingly, two thirds are shared by short (less than 12 residues) and long loops. Moreover, half of recurrent motifs exhibit a significant level of amino-acid conservation with at least four significant positions and 87% of long loops contain at least one such word. We complement our analysis with the detection of statistically over-represented patterns of structural letters as in conventional DNA sequence analysis. About 30% (930) of structural words are over-represented, and cover about 40% of loop lengths. Interestingly, these words exhibit lower structural variability and higher sequential specificity, suggesting structural or functional constraints. Conclusions We developed a method to systematically decompose and study protein loops using recurrent structural motifs. This method is based on the structural alphabet HMM-SA and not on structural alignment and geometrical parameters. We extracted meaningful structural motifs that are found in both short and long loops. To our knowledge, it is the first time that pattern mining helps to increase the signal-to-noise ratio in protein loops. This finding helps to better describe protein loops and might permit to decrease the complexity of long-loop analysis. Detailed results are available at <url>http://www.mti.univ-paris-diderot.fr/publication/supplementary/2009/ACCLoop/</url>.</p

Springer - Publisher Connector

A highly efficient multi-core algorithm for clustering extremely large datasets

Author: A Ben-Hur
A Bertoni
A Jain
AK Jain
AR Adl-Tabatabai
AWF Edwards
B Andreopoulos
B Chapman
C Herzeel
Consortium IH
D Lea
D Smirnov
DR Barr
E Levine
F Müller
G Dalgin
HA Kestler
HA Kestler
Hans A Kestler
HW Kuhn
J Fridlyand
J Handl
J Larus
J MacQueen
Johann M Kraus
JW Sammon
K Fukunaga
L Hubert
L Kuncheva
M Anderson
M Ng
MK Kerr
N Shavit
P Jaccard
P Sham
PA Bernstein
R Development Core Team
R Duan
R Graham
R Jonker
R Rajwar
R Tibshirani
R Xu
RC Gentleman
S Monti
S Peyton-Jones
S Selim
T Kohonen
T Lange
U Drepper
W Feng
W Gropp
W Rand
WJ Conover
X Gao
X Gao
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as cluster algorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols connecting and requiring multiple computers. One answer to this problem is to utilize the intrinsic capabilities in current multi-core hardware to distribute the tasks among the different cores of one computer. Results We introduce a multi-core parallelization of the k-means and k-modes cluster algorithms based on the design principles of transactional memory for clustering gene expression microarray type data and categorial SNP data. Our new shared memory parallel algorithms show to be highly efficient. We demonstrate their computational power and show their utility in cluster stability and sensitivity analysis employing repeated runs with slightly changed parameters. Computation speed of our Java based algorithm was increased by a factor of 10 for large data sets while preserving computational accuracy compared to single-core implementations and a recently published network based parallelization. Conclusions Most desktop computers and even notebooks provide at least dual-core processors. Our multi-core algorithms show that using modern algorithmic concepts, parallelization makes it possible to perform even such laborious tasks as cluster sensitivity and cluster number estimation on the laboratory computer.</p

CiteSeerX

Springer - Publisher Connector

Keele Research Repository

A Novel Semi-Supervised Methodology for Extracting Tumor Type-Specific MRS Sources in Human Brain Data

Author: A Devos
A Gibb
A Hyvärinen
A Hyvärinen
A Pérez-Ruiz
A Vellido
A Vellido
A Vilamala
AK Jain
Alfredo Vellido
AP Candiota
AR Tate
AR Tate
C Ding
C Jutten
Carles Arús
Daniel Monleon
DD Lee
DN Louis
DW Ellison
Enrique Romero
FA Howe
G Fan
H Ishimaru
H Ohgaki
H Ohgaki
HAL Kiers
Héctor Ruiz
I Barba
Ian H. Jarman
Iván Olier
JF Cardoso
JM García-Gómez
JM Kros
José D. Martín
JW Sammon
KS Opstad
L Lukas
LM DeAngelis
M Esposito
M Julià-Sapé
M Julià-Sapé
M Julià-Sapé
M Julià-Sapé
M Law
M Murphy
Margarida Julià-Sapé
MC Martínez-Bisbal
MG Kounelakis
P Paatero
P Sajda
Paulo J. G. Lisboa
PJG Lisboa
S Amari
S Herminghaus
S Zafeiriou
Sandra Ortega-Martorell
SW Coons
X Castells
Y Huang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

BackgroundThe clinical investigation of human brain tumors often starts with a non-invasive imaging study, providing information about the tumor extent and location, but little insight into the biochemistry of the analyzed tissue. Magnetic Resonance Spectroscopy can complement imaging by supplying a metabolic fingerprint of the tissue. This study analyzes single-voxel magnetic resonance spectra, which represent signal information in the frequency domain. Given that a single voxel may contain a heterogeneous mix of tissues, signal source identification is a relevant challenge for the problem of tumor type classification from the spectroscopic signal.Methodology/Principal FindingsNon-negative matrix factorization techniques have recently shown their potential for the identification of meaningful sources from brain tissue spectroscopy data. In this study, we use a convex variant of these methods that is capable of handling negatively-valued data and generating sources that can be interpreted as tumor class prototypes. A novel approach to convex non-negative matrix factorization is proposed, in which prior knowledge about class information is utilized in model optimization. Class-specific information is integrated into this semi-supervised process by setting the metric of a latent variable space where the matrix factorization is carried out. The reported experimental study comprises 196 cases from different tumor types drawn from two international, multi-center databases. The results indicate that the proposed approach outperforms a purely unsupervised process by achieving near perfect correlation of the extracted sources with the mean spectra of the tumor types. It also improves tissue type classification.Conclusions/SignificanceWe show that source extraction by unsupervised matrix factorization benefits from the integration of the available class information, so operating in a semi-supervised learning manner, for discriminative source identification and brain tumor labeling from single-voxel spectroscopy data. We are confident that the proposed methodology has wider applicability for biomedical signal processing

Public Library of Science (PLOS)

LJMU Research Online (Liverpool John Moores University)

E-space: Manchester Metropolitan University's Research Repository

Repositori d'Objectes Digitals per a l'Ensenyament la Recerca i la Cultura

Diposit Digital de Documents de la UAB

A New Evolving Tree-Based Model with Local Re-learning for Document Clustering and Visualization

Author: A Allahyar
A Hotho
A Rauber
BC Moore
CC Hsu
Chee Peng Lim
D Tao
D Tao
DD Lewis
E Lughofer
F Debole
GA Carpenter
GA Carpenter
HJ Kim
HS Hosseini
IH Witten
J Pakkanen
J Pakkanen
J Ye
J Yu
J Yu
J Yu
JC Bezdek
JW Sammon
K Lagus
Kai Meng Tay
M Belkin
MF Porter
N Bourgeois
NK Nagwani
NR Pal
R Arora
RJ Kuo
S Fabrizio
S Fabrizio
S Kaski
S Wold
SJ Pan
ST Roweis
T Kanungo
T Kohonen
T Kohonen
T Kohonen
Wui Lee Chang
X Rui
Y Liu
Y Liu
Y Luo
YS Lin
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

A social engineering model for poverty alleviation

Author: A Choi
A Mani
A Sen
A Smajgl
A Smajgl
AB Atkinson
AK Chattopadhyay
AK Chattopadhyay
AK Chattopadhyay
D Duffie
E Engel
GM Fisher
J Foster
JB Tenenbaum
JM Grandmont
JW Sammon
KA Fox
M Ravallion
ME Tipping
N Kakwani
P Demartines
PC Mahalanobis
R Conte
R Kutner
RJ Aumann
SI Denisov
SK Mohanty
ST Roweis
TK Kumar
TK Kumar
V Sitaramam
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/12/2020
Field of study

Poverty, the quintessential denominator of a developing nation, has been traditionally defined against an arbitrary poverty line; individuals (or countries) below this line are deemed poor and those above it, not so! This has two pitfalls. First, absolute reliance on a single poverty line, based on basic food consumption, and not on total consumption distribution, is only a partial poverty index at best. Second, a single expense descriptor is an exogenous quantity that does not evolve from income-expenditure statistics. Using extensive income-expenditure statistics from India, here we show how a self-consistent endogenous poverty line can be derived from an agent-based stochastic model of market exchange, combining all expenditure modes (basic food, other food and non-food), whose parameters are probabilistically estimated using advanced Machine Learning tools. Our mathematical study establishes a consumption based poverty measure that combines labor, commodity, and asset market outcomes, delivering an excellent tool for economic policy formulation

Aston Publications Explorer

Grid Cells, Place Cells, and Geodesic Generalization for Spatial Reinforcement Learning

Author: A Alvernhe
A Gorchetchnikov
A Johnson
A Johnson
A Johnson
A Samsonovich
AA Fenton
AD Redish
AD Redish
AD Redish
BL McNaughton
BL McNaughton
BL McNaughton
C Barry
C Barry
C Molter
CB Canto
D Derdikman
DJ Foster
DJ Foster
DM Finch
E Tolman
EA Ludvig
EA Zilli
EI Moser
ET Rolls
F Sargolini
G Dragoi
G Konidaris
H Mhatre
HT Blair
IR Fiete
J Houk
J Jeanblanc
J O'Keefe
J O'Keefe
J O'Keefe
J O'Keefe
JB Tenenbaum
JW Sammon
K Doya
K Doya
KB Kjelstrup
KI Blum
KJ Jeffery
Konrad P. Kording
LH Corbit
M Franzius
M Fyhn
MA Brown
MC Fuhs
ME Hasselmo
ME Hasselmo
ME Hasselmo
ME Hasselmo
ME Hasselmo
ME Hasselmo
MR Mehta
MW Jung
N Burgess
N Daw
Nathaniel D. Daw
ND Daw
Nicholas J. Gustafson
P Dayan
P Dayan
PE Sharp
PE Sharp
PF Krayniak
PJ Best
R Floyd
R Sutton
R Sutton
RE Suri
RF Langston
RS Sutton
RU Muller
RU Muller
RU Muller
S Mahadevan
S Mahadevan
S McClure
S Totterdell
T Hafting
T Solstad
T Solstad
TJ Wills
TS Collett
VH Brun
W Gerstner
W Schultz
WE Skaggs
Publication venue: Public Library of Science
Publication date: 01/10/2011
Field of study

Reinforcement learning (RL) provides an influential characterization of the brain's mechanisms for learning to make advantageous choices. An important problem, though, is how complex tasks can be represented in a way that enables efficient learning. We consider this problem through the lens of spatial navigation, examining how two of the brain's location representations—hippocampal place cells and entorhinal grid cells—are adapted to serve as basis functions for approximating value over space for RL. Although much previous work has focused on these systems' roles in combining upstream sensory cues to track location, revisiting these representations with a focus on how they support this downstream decision function offers complementary insights into their characteristics. Rather than localization, the key problem in learning is generalization between past and present situations, which may not match perfectly. Accordingly, although neural populations collectively offer a precise representation of position, our simulations of navigational tasks verify the suggestion that RL gains efficiency from the more diffuse tuning of individual neurons, which allows learning about rewards to generalize over longer distances given fewer training experiences. However, work on generalization in RL suggests the underlying representation should respect the environment's layout. In particular, although it is often assumed that neurons track location in Euclidean coordinates (that a place cell's activity declines “as the crow flies” away from its peak), the relevant metric for value is geodesic: the distance along a path, around any obstacles. We formalize this intuition and present simulations showing how Euclidean, but not geodesic, representations can interfere with RL by generalizing inappropriately across barriers. Our proposal that place and grid responses should be modulated by geodesic distances suggests novel predictions about how obstacles should affect spatial firing fields, which provides a new viewpoint on data concerning both spatial codes

CiteSeerX