Search CORE

143,417 research outputs found

Stochastic Blockmodeling for Online Advertising

Author: Chen Li
Patton Matthew
Publication venue
Publication date: 17/11/2014
Field of study

Online advertising is an important and huge industry. Having knowledge of the website attributes can contribute greatly to business strategies for ad-targeting, content display, inventory purchase or revenue prediction. Classical inferences on users and sites impose challenge, because the data is voluminous, sparse, high-dimensional and noisy. In this paper, we introduce a stochastic blockmodeling for the website relations induced by the event of online user visitation. We propose two clustering algorithms to discover the instrinsic structures of websites, and compare the performance with a goodness-of-fit method and a deterministic graph partitioning method. We demonstrate the effectiveness of our algorithms on both simulation and AOL website dataset

arXiv.org e-Print Archive

CiteSeerX

Association for the Advancement of Artificial Intelligence: AAAI Publications

A New Ensemble Method with Feature Space Partitioning for High-Dimensional Data Classification

Author: Buhyun Hwang
Cheng Hao Jin
Ho Sun Shon
Ji-Moon Chung
Keun Ho Ryu
Minghao Piao
Yongjun Piao
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2015
Field of study

Crossref

Semantic-Preserving Feature Partitioning for Multi-View Ensemble Learning

Author: Chen Fang
Gandomi Amir H.
Gharoun Hassan
Khorshidi Mohammad Sadegh
Nikoo Mohammad Reza
Yazdani Danial
Yazdanjue Navid
Publication venue
Publication date: 11/01/2024
Field of study

In machine learning, the exponential growth of data and the associated ``curse of dimensionality'' pose significant challenges, particularly with expansive yet sparse datasets. Addressing these challenges, multi-view ensemble learning (MEL) has emerged as a transformative approach, with feature partitioning (FP) playing a pivotal role in constructing artificial views for MEL. Our study introduces the Semantic-Preserving Feature Partitioning (SPFP) algorithm, a novel method grounded in information theory. The SPFP algorithm effectively partitions datasets into multiple semantically consistent views, enhancing the MEL process. Through extensive experiments on eight real-world datasets, ranging from high-dimensional with limited instances to low-dimensional with high instances, our method demonstrates notable efficacy. It maintains model accuracy while significantly improving uncertainty measures in scenarios where high generalization performance is achievable. Conversely, it retains uncertainty metrics while enhancing accuracy where high generalization accuracy is less attainable. An effect size analysis further reveals that the SPFP algorithm outperforms benchmark models by large effect size and reduces computational demands through effective dimensionality reduction. The substantial effect sizes observed in most experiments underscore the algorithm's significant improvements in model performance.Comment: 45 pages, 44 figures, 26 table

arXiv.org e-Print Archive

Recommended from our members

Segmentation and quantitative evaluation of brain MRI data with a multi-phase three-dimensional implicit deformable model

Author: Angelini Elsa D.
Laine Andrew F.
Mensh Brett D.
Song Ting
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2004
Field of study

Segmentation of three-dimensional anatomical brain images into tissue classes has applications in both clinical and research settings. This paper presents the implementation and quantitative evaluation of a four-phase three-dimensional active contour implemented with a level set framework for automated segmentation of brain MRIs. The segmentation algorithm performs an optimal partitioning of three-dimensional data based on homogeneity measures that naturally evolves to the extraction of different tissue types in the brain. Random seed initialization was used to speed up numerical computation and avoid the need for a priori information. This random initialization ensures robustness of the method to variation of user expertise, biased a priori information and errors in input information that could be influenced by variations in image quality. Experimentation on three MRI brain data sets showed that an optimal partitioning successfully labeled regions that accurately identified white matter, gray matter and cerebrospinal fluid in the ventricles. Quantitative evaluation of the segmentation was performed with comparison to manually labeled data and computed false positive and false negative assignments of voxels for the three organs. We report high accuracy for the two comparison cases. These results demonstrate the efficiency and flexibility of this segmentation framework to perform the challenging task of automatically extracting brain tissue volume contours

Columbia University Academic Commons

A Recursive Partitioning Approach for Dynamic Discrete Choice Modeling in High Dimensional Settings

Author: Barzegary Ebrahim
Yoganarasimhan Hema
Publication venue
Publication date: 02/08/2022
Field of study

Dynamic discrete choice models are widely employed to answer substantive and policy questions in settings where individuals' current choices have future implications. However, estimation of these models is often computationally intensive and/or infeasible in high-dimensional settings. Indeed, even specifying the structure for how the utilities/state transitions enter the agent's decision is challenging in high-dimensional settings when we have no guiding theory. In this paper, we present a semi-parametric formulation of dynamic discrete choice models that incorporates a high-dimensional set of state variables, in addition to the standard variables used in a parametric utility function. The high-dimensional variable can include all the variables that are not the main variables of interest but may potentially affect people's choices and must be included in the estimation procedure, i.e., control variables. We present a data-driven recursive partitioning algorithm that reduces the dimensionality of the high-dimensional state space by taking the variation in choices and state transition into account. Researchers can then use the method of their choice to estimate the problem using the discretized state space from the first stage. Our approach can reduce the estimation bias and make estimation feasible at the same time. We present Monte Carlo simulations to demonstrate the performance of our method compared to standard estimation methods where we ignore the high-dimensional explanatory variable set

arXiv.org e-Print Archive