328 research outputs found
Stacked Penalized Logistic Regression for Selecting Views in Multi-View Learning
In biomedical research, many different types of patient data can be
collected, such as various types of omics data and medical imaging modalities.
Applying multi-view learning to these different sources of information can
increase the accuracy of medical classification models compared with
single-view procedures. However, collecting biomedical data can be expensive
and/or burdening for patients, so that it is important to reduce the amount of
required data collection. It is therefore necessary to develop multi-view
learning methods which can accurately identify those views that are most
important for prediction. In recent years, several biomedical studies have used
an approach known as multi-view stacking (MVS), where a model is trained on
each view separately and the resulting predictions are combined through
stacking. In these studies, MVS has been shown to increase classification
accuracy. However, the MVS framework can also be used for selecting a subset of
important views. To study the view selection potential of MVS, we develop a
special case called stacked penalized logistic regression (StaPLR). Compared
with existing view-selection methods, StaPLR can make use of faster
optimization algorithms and is easily parallelized. We show that nonnegativity
constraints on the parameters of the function which combines the views play an
important role in preventing unimportant views from entering the model. We
investigate the performance of StaPLR through simulations, and consider two
real data examples. We compare the performance of StaPLR with an existing view
selection method called the group lasso and observe that, in terms of view
selection, StaPLR is often more conservative and has a consistently lower false
positive rate.Comment: 26 pages, 9 figures. Accepted manuscrip
Crossing the Line: Evidence for the Categorization Theory of Spatial Voting
Bølstad and Dinas (2017) propose a model of spatial voting, based on social identity theory, that suggests supporting a candidate/policy on the other side of the ideological spectrum has a disutility that is not accounted for by common spatial models. Unfortunately, the data they use cannot speak directly to whether the disutility arises because individuals perceive their ideology as a social identity. We present the results of an experimental study that measures the norm against crossing the ideological spectrum; tests the cost of doing so, controlling for spatial effects; and demonstrates that this cost increases with the salience and strength of identity norms. By demonstrating the norm mechanism for the disutility of crossing the ideological spectrum, we provide strong support for B&D\u27s model
Recommended from our members
Computational mechanisms for resolving misunderstandings
Imagine discussing yesterdays dinner with a friend: It wasn’t particularly tasty. Your friend concurs, it was very salty!Thinking you were talking about the appetizer (which wasnt salty at all), youre forced to reconsider which course yourfriend was talking about. Was the appetizer salty to her? Was she talking about the main course? People encounter mis-understandings in everyday conversation, yet quickly and seamlessly resolve them. How people do this is an explanatorychallenge: the thing being talked about (i.e., the referent) is often not physically present during the conversation. Hence,theres no easy way for interlocutors to establish common ground via ostensive signaling (e.g., by pointing at the dish). Wedevelop a model of speakers that use pragmatic reasoning to infer the referent inferred by listeners. We explore the perfor-mance of this model using agent-based simulated conversations. The results imply necessary and sufficient conditions forsuccessful updating
Research Openness in Canadian Political Science: Toward an Inclusive and Differentiated Discussion
In this paper, we initiate a discussion within the Canadian political science community about research openness and its implications for our discipline. This discussion is important because the Tri-Agency has recently released guidelines on data management and because a number of political science journals, from several subfields, have signed the Journal Editors’ Transparency Statement requiring data access and research transparency (DA-RT). As norms regarding research openness develop, an increasing number and range of journals and funding agencies may begin to implement DA-RT-type requirements. If Canadian political scientists wish to continue to participate in the global political science community, we must take careful note of and be proactive participants in the ongoing developments concerning research openness
The Bradley–Terry Regression Trunk approach for Modeling Preference Data with Small Trees
This paper introduces the Bradley-Terry regression trunk model, a novel probabilistic approach for the analysis of preference data expressed through paired comparison rankings. In some cases, it may be reasonable to assume that the preferences expressed by individuals depend on their characteristics. Within the framework of tree-based partitioning, we specify a tree-based model estimating the joint effects of subject-specific covariates over and above their main effects. We, therefore, combine a tree-based model and the log-linear Bradley-Terry model using the outcome of the comparisons as response variable. The proposed model provides a solution to discover interaction effects when no a-priori hypotheses are available. It produces a small tree, called trunk, that represents a fair compromise between a simple interpretation of the interaction effects and an easy to read partition of judges based on their characteristics and the preferences they have expressed. We present an application on a real dataset following two different approaches, and a simulation study to test the model's performance. Simulations showed that the quality of the model performance increases when the number of rankings and objects increases. In addition, the performance is considerably amplified when the judges' characteristics have a high impact on their choices
Continuous Sweep: an improved, binary quantifier
Quantification is a supervised machine learning task, focused on estimating
the class prevalence of a dataset rather than labeling its individual
observations. We introduce Continuous Sweep, a new parametric binary quantifier
inspired by the well-performing Median Sweep. Median Sweep is currently one of
the best binary quantifiers, but we have changed this quantifier on three
points, namely 1) using parametric class distributions instead of empirical
distributions, 2) optimizing decision boundaries instead of applying discrete
decision rules, and 3) calculating the mean instead of the median. We derive
analytic expressions for the bias and variance of Continuous Sweep under
general model assumptions. This is one of the first theoretical contributions
in the field of quantification learning. Moreover, these derivations enable us
to find the optimal decision boundaries. Finally, our simulation study shows
that Continuous Sweep outperforms Median Sweep in a wide range of situations
Analyzing hierarchical multi-view MRI data with StaPLR: An application to Alzheimer's disease classification
Multi-view data refers to a setting where features are divided into feature
sets, for example because they correspond to different sources. Stacked
penalized logistic regression (StaPLR) is a recently introduced method that can
be used for classification and automatically selecting the views that are most
important for prediction. We introduce an extension of this method to a setting
where the data has a hierarchical multi-view structure. We also introduce a new
view importance measure for StaPLR, which allows us to compare the importance
of views at any level of the hierarchy. We apply our extended StaPLR algorithm
to Alzheimer's disease classification where different MRI measures have been
calculated from three scan types: structural MRI, diffusion-weighted MRI, and
resting-state fMRI. StaPLR can identify which scan types and which derived MRI
measures are most important for classification, and it outperforms elastic net
regression in classification performance.Comment: 36 pages, 9 figures. Accepted manuscrip
The detection and modeling of direct effects in latent class analysis
Several approaches have been proposed for latent class modeling with external variables, including one-step, two-step and three-step estimators. However, very little is known yet about the performance of these approaches when direct effects of the external variable to the indicators of latent class membership are present. In the current article, we compare those approaches and investigate the consequences of not modeling these direct effects when present, as well as the power of residual and fir statistics to identify such effects. The results of the simulations show that not modeling direct effect can lead to severe parameter bias, especially with a weak measurement model. Both residual and fit statistics can be used to identify such effects, as long as the number and strength of these effects is low and the measurement model is sufficiently strong
- …