Search CORE

3,222 research outputs found

Random Forests: some methodological insights

Author: Genuer Robin
Poggi Jean-Michel
Tuleau Christine
Publication venue
Publication date: 01/01/2008
Field of study

This paper examines from an experimental perspective random forests, the increasingly used statistical method for classification and regression problems introduced by Leo Breiman in 2001. It first aims at confirming, known but sparse, advice for using random forests and at proposing some complementary remarks for both standard problems as well as high dimensional ones for which the number of variables hugely exceeds the sample size. But the main contribution of this paper is twofold: to provide some insights about the behavior of the variable importance index based on random forests and in addition, to propose to investigate two classical issues of variable selection. The first one is to find important variables for interpretation and the second one is more restrictive and try to design a good prediction model. The strategy involves a ranking of explanatory variables using the random forests score of importance and a stepwise ascending variable introduction strategy

arXiv.org e-Print Archive

HAL-UNICE

INRIA a CCSD electronic archive server

Community Aliveness: Discovering Interaction Decay Patterns in Online Social Communities

Author: A Capocci
A-L Barabási
A-L Barabási
C Cortes
DJ Watts
EM Jin
F Pedregosa
G Kossinets
H Ebel
ME Newman
Mohammed Abufouda
S. N Dorogovtsev
Publication venue
Publication date: 14/07/2017
Field of study

Online Social Communities (OSCs) provide a medium for connecting people, sharing news, eliciting information, and finding jobs, among others. The dynamics of the interaction among the members of OSCs is not always growth dynamics. Instead, a

\textit{decay}

\textit{inactivity}

dynamics often happens, which makes an OSC obsolete. Understanding the behavior and the characteristics of the members of an inactive community help to sustain the growth dynamics of these communities and, possibly, prevents them from being out of service. In this work, we provide two prediction models for predicting the interaction decay of community members, namely: a Simple Threshold Model (STM) and a supervised machine learning classification framework. We conducted evaluation experiments for our prediction models supported by a

\textit{ground truth}

of decayed communities extracted from the StackExchange platform. The results of the experiments revealed that it is possible, with satisfactory prediction performance in terms of the F1-score and the accuracy, to predict the decay of the activity of the members of these communities using network-based attributes and network-exogenous attributes of the members. The upper bound of the prediction performance of the methods we used is

0.91

and

0.83

for the F1-score and the accuracy, respectively. These results indicate that network-based attributes are correlated with the activity of the members and that we can find decay patterns in terms of these attributes. The results also showed that the structure of the decayed communities can be used to support the alive communities by discovering inactive members.Comment: pre-print for the 4th European Network Intelligence Conference - 11-12 September 2017 Duisburg, German

arXiv.org e-Print Archive

Crossref

Segmentation of pelvic structures from preoperative images for surgical planning and guidance

Author: Gao Qinquan
Publication venue: Computing, Imperial College London
Publication date: 01/09/2014
Field of study

Prostate cancer is one of the most frequently diagnosed malignancies globally and the second leading cause of cancer-related mortality in males in the developed world. In recent decades, many techniques have been proposed for prostate cancer diagnosis and treatment. With the development of imaging technologies such as CT and MRI, image-guided procedures have become increasingly important as a means to improve clinical outcomes. Analysis of the preoperative images and construction of 3D models prior to treatment would help doctors to better localize and visualize the structures of interest, plan the procedure, diagnose disease and guide the surgery or therapy. This requires efficient and robust medical image analysis and segmentation technologies to be developed. The thesis mainly focuses on the development of segmentation techniques in pelvic MRI for image-guided robotic-assisted laparoscopic radical prostatectomy and external-beam radiation therapy. A fully automated multi-atlas framework is proposed for bony pelvis segmentation in MRI, using the guidance of MRI AE-SDM. With the guidance of the AE-SDM, a multi-atlas segmentation algorithm is used to delineate the bony pelvis in a new \ac{MRI} where there is no CT available. The proposed technique outperforms state-of-the-art algorithms for MRI bony pelvis segmentation. With the SDM of pelvis and its segmented surface, an accurate 3D pelvimetry system is designed and implemented to measure a comprehensive set of pelvic geometric parameters for the examination of the relationship between these parameters and the difficulty of robotic-assisted laparoscopic radical prostatectomy. This system can be used in both manual and automated manner with a user-friendly interface. A fully automated and robust multi-atlas based segmentation has also been developed to delineate the prostate in diagnostic MR scans, which have large variation in both intensity and shape of prostate. Two image analysis techniques are proposed, including patch-based label fusion with local appearance-specific atlases and multi-atlas propagation via a manifold graph on a database of both labeled and unlabeled images when limited labeled atlases are available. The proposed techniques can achieve more robust and accurate segmentation results than other multi-atlas based methods. The seminal vesicles are also an interesting structure for therapy planning, particularly for external-beam radiation therapy. As existing methods fail for the very onerous task of segmenting the seminal vesicles, a multi-atlas learning framework via random decision forests with graph cuts refinement has further been proposed to solve this difficult problem. Motivated by the performance of this technique, I further extend the multi-atlas learning to segment the prostate fully automatically using multispectral (T1 and T2-weighted) MR images via hybrid \ac{RF} classifiers and a multi-image graph cuts technique. The proposed method compares favorably to the previously proposed multi-atlas based prostate segmentation. The work in this thesis covers different techniques for pelvic image segmentation in MRI. These techniques have been continually developed and refined, and their application to different specific problems shows ever more promising results.Open Acces

Spiral - Imperial College Digital Repository

Selection of Ordinally Scaled Independent Variables

Author: Gertheiss Jan
Hogger Sara
Oberhauser Cornelia
Tutz Gerhard
Publication venue
Publication date: 01/07/2009
Field of study

Ordinal categorial variables are a common case in regression modeling. Although the case of ordinal response variables has been well investigated, less work has been done concerning ordinal predictors. This article deals with the selection of ordinally scaled independent variables in the classical linear model, where the ordinal structure is taken into account by use of a difference penalty on adjacent dummy coefficients. It is shown how the Group Lasso can be used for the selection of ordinal predictors, and an alternative blockwise Boosting procedure is proposed. Emphasis is placed on the application of the presented methods to the (Comprehensive) ICF Core Set for chronic widespread pain. The paper is a preprint of an article accepted for publication in the Journal of the Royal Statistical Society Series C (Applied Statistics). Please use the journal version for citation

Open Access LMU

Inferring transportation modes from GPS trajectories using a convolutional neural network

Author: Dabiri Sina
Heaslip Kevin
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

Identifying the distribution of users' transportation modes is an essential part of travel demand analysis and transportation planning. With the advent of ubiquitous GPS-enabled devices (e.g., a smartphone), a cost-effective approach for inferring commuters' mobility mode(s) is to leverage their GPS trajectories. A majority of studies have proposed mode inference models based on hand-crafted features and traditional machine learning algorithms. However, manual features engender some major drawbacks including vulnerability to traffic and environmental conditions as well as possessing human's bias in creating efficient features. One way to overcome these issues is by utilizing Convolutional Neural Network (CNN) schemes that are capable of automatically driving high-level features from the raw input. Accordingly, in this paper, we take advantage of CNN architectures so as to predict travel modes based on only raw GPS trajectories, where the modes are labeled as walk, bike, bus, driving, and train. Our key contribution is designing the layout of the CNN's input layer in such a way that not only is adaptable with the CNN schemes but represents fundamental motion characteristics of a moving object including speed, acceleration, jerk, and bearing rate. Furthermore, we ameliorate the quality of GPS logs through several data preprocessing steps. Using the clean input layer, a variety of CNN configurations are evaluated to achieve the best CNN architecture. The highest accuracy of 84.8% has been achieved through the ensemble of the best CNN configuration. In this research, we contrast our methodology with traditional machine learning algorithms as well as the seminal and most related studies to demonstrate the superiority of our framework.Comment: 12 pages, 3 figures, 7 tables, Transportation Research Part C: Emerging Technologie

arXiv.org e-Print Archive

Monash University, Institute of Transport Studies: World Transit Research (WTR)