143,417 research outputs found
Stochastic Blockmodeling for Online Advertising
Online advertising is an important and huge industry. Having knowledge of the
website attributes can contribute greatly to business strategies for
ad-targeting, content display, inventory purchase or revenue prediction.
Classical inferences on users and sites impose challenge, because the data is
voluminous, sparse, high-dimensional and noisy. In this paper, we introduce a
stochastic blockmodeling for the website relations induced by the event of
online user visitation. We propose two clustering algorithms to discover the
instrinsic structures of websites, and compare the performance with a
goodness-of-fit method and a deterministic graph partitioning method. We
demonstrate the effectiveness of our algorithms on both simulation and AOL
website dataset
Semantic-Preserving Feature Partitioning for Multi-View Ensemble Learning
In machine learning, the exponential growth of data and the associated
``curse of dimensionality'' pose significant challenges, particularly with
expansive yet sparse datasets. Addressing these challenges, multi-view ensemble
learning (MEL) has emerged as a transformative approach, with feature
partitioning (FP) playing a pivotal role in constructing artificial views for
MEL. Our study introduces the Semantic-Preserving Feature Partitioning (SPFP)
algorithm, a novel method grounded in information theory. The SPFP algorithm
effectively partitions datasets into multiple semantically consistent views,
enhancing the MEL process. Through extensive experiments on eight real-world
datasets, ranging from high-dimensional with limited instances to
low-dimensional with high instances, our method demonstrates notable efficacy.
It maintains model accuracy while significantly improving uncertainty measures
in scenarios where high generalization performance is achievable. Conversely,
it retains uncertainty metrics while enhancing accuracy where high
generalization accuracy is less attainable. An effect size analysis further
reveals that the SPFP algorithm outperforms benchmark models by large effect
size and reduces computational demands through effective dimensionality
reduction. The substantial effect sizes observed in most experiments underscore
the algorithm's significant improvements in model performance.Comment: 45 pages, 44 figures, 26 table
Recommended from our members
Segmentation and quantitative evaluation of brain MRI data with a multi-phase three-dimensional implicit deformable model
Segmentation of three-dimensional anatomical brain images into tissue classes has applications in both clinical and research settings. This paper presents the implementation and quantitative evaluation of a four-phase three-dimensional active contour implemented with a level set framework for automated segmentation of brain MRIs. The segmentation algorithm performs an optimal partitioning of three-dimensional data based on homogeneity measures that naturally evolves to the extraction of different tissue types in the brain. Random seed initialization was used to speed up numerical computation and avoid the need for a priori information. This random initialization ensures robustness of the method to variation of user expertise, biased a priori information and errors in input information that could be influenced by variations in image quality. Experimentation on three MRI brain data sets showed that an optimal partitioning successfully labeled regions that accurately identified white matter, gray matter and cerebrospinal fluid in the ventricles. Quantitative evaluation of the segmentation was performed with comparison to manually labeled data and computed false positive and false negative assignments of voxels for the three organs. We report high accuracy for the two comparison cases. These results demonstrate the efficiency and flexibility of this segmentation framework to perform the challenging task of automatically extracting brain tissue volume contours
A Recursive Partitioning Approach for Dynamic Discrete Choice Modeling in High Dimensional Settings
Dynamic discrete choice models are widely employed to answer substantive and
policy questions in settings where individuals' current choices have future
implications. However, estimation of these models is often computationally
intensive and/or infeasible in high-dimensional settings. Indeed, even
specifying the structure for how the utilities/state transitions enter the
agent's decision is challenging in high-dimensional settings when we have no
guiding theory. In this paper, we present a semi-parametric formulation of
dynamic discrete choice models that incorporates a high-dimensional set of
state variables, in addition to the standard variables used in a parametric
utility function. The high-dimensional variable can include all the variables
that are not the main variables of interest but may potentially affect people's
choices and must be included in the estimation procedure, i.e., control
variables. We present a data-driven recursive partitioning algorithm that
reduces the dimensionality of the high-dimensional state space by taking the
variation in choices and state transition into account. Researchers can then
use the method of their choice to estimate the problem using the discretized
state space from the first stage. Our approach can reduce the estimation bias
and make estimation feasible at the same time. We present Monte Carlo
simulations to demonstrate the performance of our method compared to standard
estimation methods where we ignore the high-dimensional explanatory variable
set
- …