24,004 research outputs found
Recommended from our members
On Nonregularized Estimation of Psychological Networks.
An important goal for psychological science is developing methods to characterize relationships between variables. Customary approaches use structural equation models to connect latent factors to a number of observed measurements, or test causal hypotheses between observed variables. More recently, regularized partial correlation networks have been proposed as an alternative approach for characterizing relationships among variables through off-diagonal elements in the precision matrix. While the graphical Lasso (glasso) has emerged as the default network estimation method, it was optimized in fields outside of psychology with very different needs, such as high dimensional data where the number of variables (p) exceeds the number of observations (n). In this article, we describe the glasso method in the context of the fields where it was developed, and then we demonstrate that the advantages of regularization diminish in settings where psychological networks are often fitted ( pâȘn ). We first show that improved properties of the precision matrix, such as eigenvalue estimation, and predictive accuracy with cross-validation are not always appreciable. We then introduce nonregularized methods based on multiple regression and a nonparametric bootstrap strategy, after which we characterize performance with extensive simulations. Our results demonstrate that the nonregularized methods can be used to reduce the false-positive rate, compared to glasso, and they appear to provide consistent performance across sparsity levels, sample composition (p/n), and partial correlation size. We end by reviewing recent findings in the statistics literature that suggest alternative methods often have superior performance than glasso, as well as suggesting areas for future research in psychology. The nonregularized methods have been implemented in the R package GGMnonreg
QCBA: Postoptimization of Quantitative Attributes in Classifiers based on Association Rules
The need to prediscretize numeric attributes before they can be used in
association rule learning is a source of inefficiencies in the resulting
classifier. This paper describes several new rule tuning steps aiming to
recover information lost in the discretization of numeric (quantitative)
attributes, and a new rule pruning strategy, which further reduces the size of
the classification models. We demonstrate the effectiveness of the proposed
methods on postoptimization of models generated by three state-of-the-art
association rule classification algorithms: Classification based on
Associations (Liu, 1998), Interpretable Decision Sets (Lakkaraju et al, 2016),
and Scalable Bayesian Rule Lists (Yang, 2017). Benchmarks on 22 datasets from
the UCI repository show that the postoptimized models are consistently smaller
-- typically by about 50% -- and have better classification performance on most
datasets
An intelligent assistant for exploratory data analysis
In this paper we present an account of the main features of SNOUT, an intelligent assistant for exploratory data analysis (EDA) of social science survey data that incorporates a range of data mining techniques. EDA has much in common with existing data mining techniques: its main objective is to help an investigator reach an understanding of the important relationships ina data set rather than simply develop predictive models for selectd variables. Brief descriptions of a number of novel techniques developed for use in SNOUT are presented. These include heuristic variable level inference and classification, automatic category formation, the use of similarity trees to identify groups of related variables, interactive decision tree construction and model selection using a genetic algorithm
Recommended from our members
Patterns of genomic and phenomic diversity in wine and table grapes.
Grapes are one of the most economically and culturally important crops worldwide, and they have been bred for both winemaking and fresh consumption. Here we evaluate patterns of diversity across 33 phenotypes collected over a 17-year period from 580 table and wine grape accessions that belong to one of the world's largest grape gene banks, the grape germplasm collection of the United States Department of Agriculture. We find that phenological events throughout the growing season are correlated, and quantify the marked difference in size between table and wine grapes. By pairing publicly available historical phenotype data with genome-wide polymorphism data, we identify large effect loci controlling traits that have been targeted during domestication and breeding, including hermaphroditism, lighter skin pigmentation and muscat aroma. Breeding for larger berries in table grapes was traditionally concentrated in geographic regions where Islam predominates and alcohol was prohibited, whereas wine grapes retained the ancestral smaller size that is more desirable for winemaking in predominantly Christian regions. We uncover a novel locus with a suggestive association with berry size that harbors a signature of positive selection for larger berries. Our results suggest that religious rules concerning alcohol consumption have had a marked impact on patterns of phenomic and genomic diversity in grapes
Scalable Privacy-Compliant Virality Prediction on Twitter
The digital town hall of Twitter becomes a preferred medium of communication
for individuals and organizations across the globe. Some of them reach
audiences of millions, while others struggle to get noticed. Given the impact
of social media, the question remains more relevant than ever: how to model the
dynamics of attention in Twitter. Researchers around the world turn to machine
learning to predict the most influential tweets and authors, navigating the
volume, velocity, and variety of social big data, with many compromises. In
this paper, we revisit content popularity prediction on Twitter. We argue that
strict alignment of data acquisition, storage and analysis algorithms is
necessary to avoid the common trade-offs between scalability, accuracy and
privacy compliance. We propose a new framework for the rapid acquisition of
large-scale datasets, high accuracy supervisory signal and multilanguage
sentiment prediction while respecting every privacy request applicable. We then
apply a novel gradient boosting framework to achieve state-of-the-art results
in virality ranking, already before including tweet's visual or propagation
features. Our Gradient Boosted Regression Tree is the first to offer
explainable, strong ranking performance on benchmark datasets. Since the
analysis focused on features available early, the model is immediately
applicable to incoming tweets in 18 languages.Comment: AffCon@AAAI-19 Best Paper Award; Presented at AAAI-19 W1: Affective
Content Analysi
arules - A Computational Environment for Mining Association Rules and Frequent Item Sets
Mining frequent itemsets and association rules is a popular and well researched approach for discovering interesting relationships between variables in large databases. The R package arules presented in this paper provides a basic infrastructure for creating and manipulating input data sets and for analyzing the resulting itemsets and rules. The package also includes interfaces to two fast mining algorithms, the popular C implementations of Apriori and Eclat by Christian Borgelt. These algorithms can be used to mine frequent itemsets, maximal frequent itemsets, closed frequent itemsets and association rules.
- âŠ