3,222 research outputs found
Random Forests: some methodological insights
This paper examines from an experimental perspective random forests, the
increasingly used statistical method for classification and regression problems
introduced by Leo Breiman in 2001. It first aims at confirming, known but
sparse, advice for using random forests and at proposing some complementary
remarks for both standard problems as well as high dimensional ones for which
the number of variables hugely exceeds the sample size. But the main
contribution of this paper is twofold: to provide some insights about the
behavior of the variable importance index based on random forests and in
addition, to propose to investigate two classical issues of variable selection.
The first one is to find important variables for interpretation and the second
one is more restrictive and try to design a good prediction model. The strategy
involves a ranking of explanatory variables using the random forests score of
importance and a stepwise ascending variable introduction strategy
Community Aliveness: Discovering Interaction Decay Patterns in Online Social Communities
Online Social Communities (OSCs) provide a medium for connecting people,
sharing news, eliciting information, and finding jobs, among others. The
dynamics of the interaction among the members of OSCs is not always growth
dynamics. Instead, a or dynamics often
happens, which makes an OSC obsolete. Understanding the behavior and the
characteristics of the members of an inactive community help to sustain the
growth dynamics of these communities and, possibly, prevents them from being
out of service. In this work, we provide two prediction models for predicting
the interaction decay of community members, namely: a Simple Threshold Model
(STM) and a supervised machine learning classification framework. We conducted
evaluation experiments for our prediction models supported by a of decayed communities extracted from the StackExchange platform. The
results of the experiments revealed that it is possible, with satisfactory
prediction performance in terms of the F1-score and the accuracy, to predict
the decay of the activity of the members of these communities using
network-based attributes and network-exogenous attributes of the members. The
upper bound of the prediction performance of the methods we used is and
for the F1-score and the accuracy, respectively. These results indicate
that network-based attributes are correlated with the activity of the members
and that we can find decay patterns in terms of these attributes. The results
also showed that the structure of the decayed communities can be used to
support the alive communities by discovering inactive members.Comment: pre-print for the 4th European Network Intelligence Conference -
11-12 September 2017 Duisburg, German
Segmentation of pelvic structures from preoperative images for surgical planning and guidance
Prostate cancer is one of the most frequently diagnosed malignancies globally and the second leading cause of cancer-related mortality in males in the developed world. In recent decades, many techniques have been proposed for prostate cancer diagnosis and treatment. With the development of imaging technologies such as CT and MRI, image-guided procedures have become increasingly important as a means to improve clinical outcomes. Analysis of the preoperative images and construction of 3D models prior to treatment would help doctors to better localize and visualize the structures of interest, plan the procedure, diagnose disease and guide the surgery or therapy. This requires efficient and robust medical image analysis and segmentation technologies to be developed.
The thesis mainly focuses on the development of segmentation techniques in pelvic MRI for image-guided robotic-assisted laparoscopic radical prostatectomy and external-beam radiation therapy. A fully automated multi-atlas framework is proposed for bony pelvis segmentation in MRI, using the guidance of MRI AE-SDM. With the guidance of the AE-SDM, a multi-atlas segmentation algorithm is used to delineate the bony pelvis in a new \ac{MRI} where there is no CT available. The proposed technique outperforms state-of-the-art algorithms for MRI bony pelvis segmentation. With the SDM of pelvis and its segmented surface, an accurate 3D pelvimetry system is designed and implemented to measure a comprehensive set of pelvic geometric parameters for the examination of the relationship between these parameters and the difficulty of robotic-assisted laparoscopic radical prostatectomy. This system can be used in both manual and automated manner with a user-friendly interface.
A fully automated and robust multi-atlas based segmentation has also been developed to delineate the prostate in diagnostic MR scans, which have large variation in both intensity and shape of prostate. Two image analysis techniques are proposed, including patch-based label fusion with local appearance-specific atlases and multi-atlas propagation via a manifold graph on a database of both labeled and unlabeled images when limited labeled atlases are available. The proposed techniques can achieve more robust and accurate segmentation results than other multi-atlas based methods.
The seminal vesicles are also an interesting structure for therapy planning, particularly for external-beam radiation therapy. As existing methods fail for the very onerous task of segmenting the seminal vesicles, a multi-atlas learning framework via random decision forests with graph cuts refinement has further been proposed to solve this difficult problem. Motivated by the performance of this technique, I further extend the multi-atlas learning to segment the prostate fully automatically using multispectral (T1 and T2-weighted) MR images via hybrid \ac{RF} classifiers and a multi-image graph cuts technique. The proposed method compares favorably to the previously proposed multi-atlas based prostate segmentation.
The work in this thesis covers different techniques for pelvic image segmentation in MRI. These techniques have been continually developed and refined, and their application to different specific problems shows ever more promising results.Open Acces
Selection of Ordinally Scaled Independent Variables
Ordinal categorial variables are a common case in regression
modeling. Although the case of ordinal response variables has been well investigated, less work has been done concerning ordinal predictors. This article deals with the selection of ordinally scaled independent variables in the classical linear model, where the ordinal structure is taken into account by use of a difference penalty on adjacent dummy coefficients. It is shown how the Group Lasso can be used for the selection of ordinal predictors, and an alternative blockwise Boosting procedure is proposed. Emphasis is placed on the application of the presented methods to the (Comprehensive) ICF Core Set for chronic widespread pain.
The paper is a preprint of an article accepted for publication in the Journal of the Royal Statistical Society Series C (Applied Statistics). Please use the journal version for citation
Inferring transportation modes from GPS trajectories using a convolutional neural network
Identifying the distribution of users' transportation modes is an essential
part of travel demand analysis and transportation planning. With the advent of
ubiquitous GPS-enabled devices (e.g., a smartphone), a cost-effective approach
for inferring commuters' mobility mode(s) is to leverage their GPS
trajectories. A majority of studies have proposed mode inference models based
on hand-crafted features and traditional machine learning algorithms. However,
manual features engender some major drawbacks including vulnerability to
traffic and environmental conditions as well as possessing human's bias in
creating efficient features. One way to overcome these issues is by utilizing
Convolutional Neural Network (CNN) schemes that are capable of automatically
driving high-level features from the raw input. Accordingly, in this paper, we
take advantage of CNN architectures so as to predict travel modes based on only
raw GPS trajectories, where the modes are labeled as walk, bike, bus, driving,
and train. Our key contribution is designing the layout of the CNN's input
layer in such a way that not only is adaptable with the CNN schemes but
represents fundamental motion characteristics of a moving object including
speed, acceleration, jerk, and bearing rate. Furthermore, we ameliorate the
quality of GPS logs through several data preprocessing steps. Using the clean
input layer, a variety of CNN configurations are evaluated to achieve the best
CNN architecture. The highest accuracy of 84.8% has been achieved through the
ensemble of the best CNN configuration. In this research, we contrast our
methodology with traditional machine learning algorithms as well as the seminal
and most related studies to demonstrate the superiority of our framework.Comment: 12 pages, 3 figures, 7 tables, Transportation Research Part C:
Emerging Technologie
- …