3,667 research outputs found
On the Design and Analysis of Multiple View Descriptors
We propose an extension of popular descriptors based on gradient orientation
histograms (HOG, computed in a single image) to multiple views. It hinges on
interpreting HOG as a conditional density in the space of sampled images, where
the effects of nuisance factors such as viewpoint and illumination are
marginalized. However, such marginalization is performed with respect to a very
coarse approximation of the underlying distribution. Our extension leverages on
the fact that multiple views of the same scene allow separating intrinsic from
nuisance variability, and thus afford better marginalization of the latter. The
result is a descriptor that has the same complexity of single-view HOG, and can
be compared in the same manner, but exploits multiple views to better trade off
insensitivity to nuisance variability with specificity to intrinsic
variability. We also introduce a novel multi-view wide-baseline matching
dataset, consisting of a mixture of real and synthetic objects with ground
truthed camera motion and dense three-dimensional geometry
Recommended from our members
Beyond Standard Assumptions - Semiparametric Models, A Dyadic Item Response Theory Model, and Cluster-Endogenous Random Intercept Models
In most statistical analyses, quantitative education researchers often make simplifying assumptions regarding the manner in which their data was generated in order to answer some of these questions. These assumptions can help to reduce the complexity of the problem, and allow the researcher to describe their data using a simpler, and often times more interpretable, statistical model. However, making some of these assumptions when they are not true can lead to biased estimates and misleading answers. While the standard sets of assumptions associated with commonly-used statistical models are usually sufficient in a wide range of contexts, it will always be beneficial for education researchers to understand what they are, when they are reasonable, and how to modify them if necessary. This dissertation focuses on three of the most common models used in quantitative education research (viz. parametric models like Linear Models (LMs), Item Response Theory (IRT) models, and Random-Intercept Models (RIMs)), discusses the standard sets of assumptions that accompany these models, and then describes related models with less stringent sets of assumptions. In each of the following three chapters, we either explicitly unpack existing models that are useful but are currently still uncommon in the field of education research, or propose novel models and/or estimation strategies for these models. We begin in Chapter 1 with a common parametric model known as the Gaussian LM, and use it as a scaffold to better understand semiparametric models and their estimation. We begin by reviewing how the coefficients of the Gaussian LM are usually estimated using Maximum Likelihood (ML) or Least-Squares (LS). We then introduce the notion of an -estimator as well as that of a Regular Asymptotically Linear estimator, and show how they relate to the ML estimator. In particular, we introduce the notion of influence functions/curves and discuss their geometry together with concepts such as Hilbert spaces and tangent spaces. We then demonstrate, concretely, how to derive the so-called efficient influence function under the Gaussian LM, and show that it is precisely the influence function of the ML and (Ordinary) LS estimators. This shows that the ML estimator (at least under the Gaussian LM) is efficient. Using the foundation built, we move on from the Gaussian LM by relaxing both the assumption that the residuals are normally distributed, as well as the assumption that they have a constant variance, and define this as the Heteroskedastic Linear Model. Unlike the Gaussian LM, this is a semiparametric model. Where possible, we make use of intuition and analogous results from the parametric setting to help describe the workflow for obtaining an efficient estimator for the coefficients of the Heteroskedastic Linear Model. In particular, we derive the nuisance tangent space for this semiparametric model, and use it to obtain the efficient influence function for our model. We then show how to use the efficient influence function to obtain an efficient estimator (which happens to be the Weighted LS estimator) from the (Ordinary) LS estimator via a one-step approach as well as an estimating equations approach. We then conclude by directing readers to more advanced material, including references on more modern approaches to estimating more general semiparametric models such as Targeted Maximum Likelihood Estimation. In Chapter 2, we focus on a class of measurement models known as Item Response Theory models which are useful for measuring latent traits of a subject based on the subject's response to items. We relax the condition that the responses are only a result of the individual's latent trait (and possibly an external rater), and propose a dyadic Item Response Theory (dIRT) model for measuring interactions of pairs of individuals when the responses to items represent the actions (or behaviors, perceptions, etc.) of each individual (actor) made within the context of a dyad formed with another individual (partner). Examples of its use in education include the assessment of collaborative problem solving among students, or the evaluation of intra-departmental dynamics among teachers. The dIRT model generalizes both Item Response Theory models for measurement and the Social Relations Model for dyadic data. Here, the responses of an actor when paired with a partner are modeled as a function of not only the actor's inclination to act and the partner's tendency to elicit that action, but also the unique relationship of the pair, represented by two directional, possibly correlated, interaction latent variables. We discuss generalizations such as accommodating triads or larger groups, but focus on demonstrating the key idea in the dyadic case. We show that estimation may be performed using Markov-chain Monte Carlo implemented in \texttt{Stan}, making it straightforward to extend the dIRT model in various ways. Specifically, we show how the basic dIRT model can be extended to accommodate latent regressions, random effects, distal outcomes. We perform a simulation study that demonstrates that our estimation approach performs well. In the absence of educational data of this form, we demonstrate the usefulness of our proposed approach using speed-dating data instead, and find new evidence of pairwise interactions between participants, describing a mutual attraction that is inadequately characterized by individual properties alone.Finally, in Chapter 3, we consider the often implicit assumption made when estimating the coefficients of structural Random Intercept Models (RIMs) that covariates at all levels do not co-vary with the random intercepts. A violation of this assumption (called cluster-level endogeneity) leads to inconsistent estimates when using standard estimation procedures. For two-level RIMs with such endogeneity, Hausman and Taylor (HT) devised a consistent multi-step instrumental variable estimator using only internal instruments. We, instead, approach this problem by explicitly modeling the endogeneity using a Structural Equation Model (SEM). In this chapter, we compare, through simulation, the HT and SEM estimators, and evaluate their asymptotic and finite sample properties. We show that the SEM approach is also flexible enough to deal with different exchangeability assumptions for the covariates (e.g., whether the correlations between pairs of all units in a cluster are the same) and investigate how these exchangeability assumptions affect finite sample properties of the HT estimator. For the simulations, we propose a new procedure for generating cluster- and unit-level covariates and random intercepts with a fully flexible covariance structure. We also compare our approach to another common approach known as Multilevel Matching using data from the High School and Beyond survey
Parameter likelihood of intrinsic ellipticity correlations
Subject of this paper are the statistical properties of ellipticity
alignments between galaxies evoked by their coupled angular momenta. Starting
from physical angular momentum models, we bridge the gap towards ellipticity
correlations, ellipticity spectra and derived quantities such as aperture
moments, comparing the intrinsic signals with those generated by gravitational
lensing, with the projected galaxy sample of EUCLID in mind. We investigate the
dependence of intrinsic ellipticity correlations on cosmological parameters and
show that intrinsic ellipticity correlations give rise to non-Gaussian
likelihoods as a result of nonlinear functional dependencies. Comparing
intrinsic ellipticity spectra to weak lensing spectra we quantify the magnitude
of their contaminating effect on the estimation of cosmological parameters and
find that biases on dark energy parameters are very small in an
angular-momentum based model in contrast to the linear alignment model commonly
used. Finally, we quantify whether intrinsic ellipticities can be measured in
the presence of the much stronger weak lensing induced ellipticity
correlations, if prior knowledge on a cosmological model is assumed.Comment: 14 pages, 8 figures, submitted to MNRA
REST: A Thread Embedding Approach for Identifying and Classifying User-specified Information in Security Forums
How can we extract useful information from a security forum? We focus on
identifying threads of interest to a security professional: (a) alerts of
worrisome events, such as attacks, (b) offering of malicious services and
products, (c) hacking information to perform malicious acts, and (d) useful
security-related experiences. The analysis of security forums is in its infancy
despite several promising recent works. Novel approaches are needed to address
the challenges in this domain: (a) the difficulty in specifying the "topics" of
interest efficiently, and (b) the unstructured and informal nature of the text.
We propose, REST, a systematic methodology to: (a) identify threads of interest
based on a, possibly incomplete, bag of words, and (b) classify them into one
of the four classes above. The key novelty of the work is a multi-step weighted
embedding approach: we project words, threads and classes in appropriate
embedding spaces and establish relevance and similarity there. We evaluate our
method with real data from three security forums with a total of 164k posts and
21K threads. First, REST robustness to initial keyword selection can extend the
user-provided keyword set and thus, it can recover from missing keywords.
Second, REST categorizes the threads into the classes of interest with superior
accuracy compared to five other methods: REST exhibits an accuracy between
63.3-76.9%. We see our approach as a first step for harnessing the wealth of
information of online forums in a user-friendly way, since the user can loosely
specify her keywords of interest
Recommended from our members
REST: A thread embedding approach for identifying and classifying user-specified information in security forums
National mapping survey of indoor radon levels in the Maltese Islands (2010-2011)
Aim: To conduct a national geographically based survey to determine the distribution of the mean annual indoor radon gas concentration levels in dwellings in the Maltese Islands and map these levels; to identify any areas with annual mean indoor radon gas concentrations higher than the current proposed WHO reference level of 100 Bq/m3; to determine an advisory national reference level for radon concentration in buildings. Method: Radon measurements were carried out in 85 buildings distributed over the Maltese Islands between November 2010 and November 2011 using alpha-track radon detectors. Retrieved detectors were analysed by a Health Protection Agency-accredited laboratory in the UK. The overall annual arithmetic and geometric mean indoor radon gas concentrations for the Maltese Islands were calculated. Results: The mean annual indoor radon concentration for the Maltese Islands was 32 Bq/m3, with a geometric mean of 25 Bq/m3 (standard deviation (SD) 25). The maximum level measured was 92 Bq/m3 and the minimum 11 Bq/m3. A radon map of the Maltese Islands was produced using the geographic mean annual indoor radon gas concentration level for each building. Conclusion: The mean annual indoor radon concentration in Malta was found to be well below the lowest proposed WHO reference levels with no dwellings having a mean annual indoor radon gas concentration above 100 Bq/m3. This national mapping survey for mean annual indoor radon gas concentration in the Maltese Islands indicates that the current proposed reference level of 100 Bq/m3 by the WHO may be adopted as the national reference level for the Maltese Islands.peer-reviewe
Integrated Visualization of Human Brain Connectome Data
Visualization plays a vital role in the analysis of multi-modal neuroimaging data. A major challenge in neuroimaging visualization is how to integrate structural, functional and connectivity data to form a comprehensive visual context for data exploration, quality control, and hypothesis discovery. We develop a new integrated visualization solution for brain imaging data by combining scientific and information visualization techniques within the context of the same anatomic structure. New surface texture techniques are developed to map non-spatial attributes onto the brain surfaces from MRI scans. Two types of non-spatial information are represented: (1) time-series data from resting-state functional MRI measuring brain activation; (2) network properties derived from structural connectivity data for different groups of subjects, which may help guide the detection of differentiation features. Through visual exploration, this integrated solution can help identify brain regions with highly correlated functional activations as well as their activation patterns. Visual detection of differentiation features can also potentially discover image based phenotypic biomarkers for brain diseases
- …