53 research outputs found
Item-Focused Trees for the Detection of Differential Item Functioning in Partial Credit Models
Various methods to detect differential item functioning (DIF) in item response models are available. However, most of these methods assume that the responses are binary, and so for ordered response categories available methods are scarce. In the present article, DIF in the widely used partial credit model is investigated. An item-focused tree is proposed that allows the detection of DIF items, which might affect the performance of the partial credit model. The method uses tree methodology, yielding a tree for each item that is detected as DIF item. The visualization as trees makes the results easily accessible, as the obtained trees show which variables induce DIF and in which way. In the present paper, the new method is compared with alternative approaches and simulations demonstrate the performance of the method
Adjusting for informative cluster size in pseudo-value based regression approaches with clustered time to event data
Informative cluster size (ICS) arises in situations with clustered data where
a latent relationship exists between the number of participants in a cluster
and the outcome measures. Although this phenomenon has been sporadically
reported in statistical literature for nearly two decades now, further
exploration is needed in certain statistical methodologies to avoid potentially
misleading inferences. For inference about population quantities without
covariates, inverse cluster size reweightings are often employed to adjust for
ICS. Further, to study the effect of covariates on disease progression
described by a multistate model, the pseudo-value regression technique has
gained popularity in time-to-event data analysis. We seek to answer the
question: "How to apply pseudo-value regression to clustered time-to-event data
when cluster size is informative?" ICS adjustment by the reweighting method can
be performed in two steps; estimation of marginal functions of the multistate
model and fitting the estimating equations based on pseudo-value responses,
leading to four possible strategies. We present theoretical arguments and
thorough simulation experiments to ascertain the correct strategy for adjusting
for ICS. A further extension of our methodology is implemented to include
informativeness induced by the intra-cluster group size. We demonstrate the
methods in two real-world applications: (i) to determine predictors of tooth
survival in a periodontal study, and (ii) to identify indicators of ambulatory
recovery in spinal cord injury patients who participated in locomotor-training
rehabilitation.Comment: 22 pages, 4 figures, 4 table
Some Models and Tests for Carryover Effects and Trends in Recurrent Event Processes
Recurrent events experienced by individual units or systems occur in many fields. The main target of this thesis is to develop formal tests for certain features of recurrent event processes, and to discuss their properties. In particular, carryover effects and time trends are considered. The former is related to clustering of events together in time, and the latter refers to a tendency for the rate of event occurrence to change over time in some systematic way. Score tests are developed for models incorporating carryover effects or time trends. The tests considered are easily interpreted and based on simple models but have good robustness properties against a range of carryover and trend alternatives. Asymptotic properties of test statistics are discussed when the number of processes approaches infinity as well as when one process is under observation for a long time. In applications involving multiple systems or individuals, heterogeneity is often apparent, and there is a need for tests developed for such cases. Allowance for heterogeneity is, therefore, considered. Methods are applied to data sets from industry and medicine. The results are supported by simulation studies
Algebraic tools in phylogenomics.
En aquesta tesi interdisciplinar desenvolupem eines algebraiques per a problemes en filogenètica i genòmica.
Per estudiar l'evolució molecular de les espècies sovint s'usen models evolutius estocà stics. L'evolució es representa en un arbre (anomenat filogenètic) on les espècies actuals corresponen a fulles de l'arbre i els nodes interiors corresponen a ancestres comuns a elles. La longitud d'una branca de l'arbre representa la quantitat de mutacions que han ocorregut entre les dues espècies adjacents a la branca. Llavors l'evolució de seqüències d'ADN en aquestes espècies es modelitza amb un procés Markov ocult al llarg de l'arbre. Si el procés de Markov se suposa a temps continu, normalment s'assumeix que també és homogeni i, en tal cas, els parà metres del model són les entrades d'una raó de mutació instantà nia i les longituds de les branques. Si el procés de Markov és a temps discret, llavors els parà metres del model són les probabilitats condicionades de substitució de nucleòtids al llarg de l'arbre i no hi ha cap hipòtesi d'homogeneïtat. Aquests últims són els tipus de models que considerem en aquesta tesi i són, per tant, més generals que els de temps continu.
Des d'aquesta perspectiva s'estudien els problemes més bà sics de la filogenètica: donat un conjunt de seqüències d'ADN, com decidim quin és el model evolutiu més adequat? com inferim de forma eficient els parà metres del model? I fins i tot, tal i com també hem provat en aquesta tesi, és possible que les espècies no hagin evolucionat seguint un sol arbre sinó una mescla d'arbres i llavors cal abordar aquestes preguntes en aquest cas més general. Per a models evolutius a temps continu i homogenis, s'ha proposat solucions diverses a aquestes preguntes al llarg de les últimes dècades. En aquesta tesi resolem aquests dos problemes per a models evolutius a temps discret usant tècniques algebraiques provinents d'à lgebra lineal, teoria de grups, geometria algebraica i estadÃstica algebraica. A més a més, la nostra solució per al primer problema és và lida també per a mescles filogenètiques.
Hem fet tests dels mètodes proposats en aquesta tesi sobre dades simulades i dades reals del projectes ENCODE (Encyclopedia Of DNA Elements). Per tal de provar els nostres mètodes hem donat algoritmes per a generar seqüències evolucionant sota un model a temps discret amb un nombre esperat de mutacions prefixat. I aixà mateix, hem demostrat que aquests algorismes generen totes les seqüències possibles (per la majoria de models). Els tests sobre dades simulades mostren que els mètodes proposats són molt acurats i els resultats sobre dades reals permeten corroborar hipòtesis prèviament formulades. Tots els mètodes proposats en aquesta tesi han estat implementats per a un nombre arbitrari d'espècies i estan disponibles públicament.In this thesis we develop interdisciplinary algebraic tools for genomic and phylogenetic problems.
To study the molecular evolution of species one often uses stochastic evolutionary models. The evolution is represented in a tree (called phylogenetic tree) whose leaves represent current species and whose internal nodes correspond to their common ancestors. The length of a branch of the tree represents the number of mutations that have occurred between the two species adjacent to the branch. Then ,the evolution of DNA sequences in these species is modeled with a hidden Markov process along the tree. If the Markov process is assumed to be continuous in time, it is usually assumed homogeneous as well and, if so, the model parameters are the instantaneous rate of mutation and the lengths of the branches. If the Markov process is discrete in time, then the model parameters are the conditional probabilities of nucleotide substitution along the tree and there is no assumption of homogeneity. The latter are the types of models we consider in this thesis and are therefore more general than the homogeneous continuous ones.
From this perspective we study the basic problems of phylogenetics: Given a set of DNA sequences, what is the evolutionary model that best fits the data? how can we efficiently infer the model parameters? Also, as we also checked in this thesis, it is possible that species have not evolved along a single tree but a mixture of trees so that we need to address these questions in this more general case. For continuous-time, homogeneous, evolutionary models, several solutions to these questions have been proposed during the last decades. In this thesis we solve these two problems for discrete-time evolutionary models, using algebraic techniques from linear algebra, group theory, algebraic geometry and algebraic statistics. In addition, our solution to the first problem is also valid for phylogenetic mixtures.
We have made tests of the methods proposed in this thesis on simulated and real data from ENCODE Project (Encyclopedia Of DNA Elements). To test our methods, we also provide algorithms to generate sequences evolving under discrete-time models with a given expected number of mutations. Even more, we have proved that these algorithms generate all possible sequences (for most models). Tests on simulated data show that the methods are very accurate and our results on real data confirm hypotheses previously formulated. All the methods in this thesis have been implemented for an arbitrary number of species and are publicly available.Postprint (published version
Target Detection Architecture for Resource Constrained Wireless Sensor Networks within Internet of Things
Wireless sensor networks (WSN) within Internet of Things (IoT) have the potential
to address the growing detection and classi�cation requirements among many
surveillance applications. RF sensing techniques are the next generation technologies
which o�er distinct advantages over traditional passive means of sensing
such as acoustic and seismic which are used for surveillance and target detection
applications of WSN. RF sensing based WSN within IoT detect the presence of
designated targets by transmitting RF signals into the sensing environment and
observing the re
ected echoes. In this thesis, an RF sensing based target detection
architecture for surveillance applications of WSN has been proposed to detect the
presence of stationary targets within the sensing environment.
With multiple sensing nodes operating simultaneously within the sensing region,
diversity among the sensing nodes in the choice of transmit waveforms is required.
Existing multiple access techniques to accommodate multiple sensing nodes within
the sensing environment are not suitable for RF sensing based WSN. In this thesis,
a diversity in the choice of the transmit waveforms has been proposed and transmit
waveforms which are suitable for RF sensing based WSN have been discussed. A
criterion have been de�ned to quantify the ease of detecting the signal and energy
e�ciency of the signal based on which ease of detection index and energy e�ciency
index respectively have been generated. The waveform selection criterion proposed
in this thesis takes the WSN sensing conditions into account and identi�es the
optimum transmit waveform within the available choices of transmit waveforms
based on their respective ease of detection and energy e�ciency indexes.
A target detector analyses the received RF signals to make a decision regarding
the existence or absence of targets within the sensing region. Existing target detectors
which are discussed in the context of WSN do not take the factors such
as interference and nature of the sensing environment into account. Depending
on the nature of the sensing environment, in this thesis the sensing environments are classi�ed as homogeneous and heterogeneous sensing environments. Within
homogeneous sensing environments the presence of interference from the neighbouring
sensing nodes is assumed. A target detector has been proposed for WSN
within homogeneous sensing environments which can reliably detect the presence
of targets. Within heterogeneous sensing environments the presence of clutter and
interfering waveforms is assumed. A target detector has been proposed for WSN
within heterogeneous sensing environments to detect targets in the presence of
clutter and interfering waveforms. A clutter estimation technique has been proposed
to assist the proposed target detector to achieve increased target detection
reliability in the presence of clutter. A combination of compressive and two-step
target detection architectures has been proposed to reduce the transmission costs.
Finally, a 2-stage target detection architecture has been proposed to reduce the
computational complexity of the proposed target detection architecture
Recommended from our members
Towards Generalized Characterization of Exoplanet Atmospheres with Transit Spectroscopy
The field of exoplanetary sciences has grown from an era of detection to one of characterization. To date, over 4000 exoplanets have been discovered and over 50 of them have been observed with primary transit spectroscopy methods. The current population of characterized exoplanets spans a wide range of parameter space; from ultra-hot Jupiters with atmospheric temperatures beyond 3000 K, to temperate mini Neptunes that may host water in their atmospheres. Upcoming observational facilities in the next two decades will deliver exquisite spectra of exoplanet atmospheres at wavelengths never probed before, with unprecedented precision, and at much higher resolution than currently possible, effectively expanding the number of exoplanets with observed spectra. Nonetheless, an increasingly diverse planet population and higher fidelity data necessarily demand more flexible, complex, and generalized modeling frameworks.
In this thesis, we present our work on atmospheric retrievals of exoplanets, focusing on investigating the robustness of the model assumptions inevitably employed to infer basic planetary conditions, compositional trends across the exoplanet mass range, and considerations for next-generation generalized retrieval frameworks. First, we present our systematic investigation of degeneracies between different model considerations in retrievals of transmission spectra and the observations that can resolve them. This study used a combination of Bayesian atmospheric retrievals and a range of common model assumptions, focusing on H2-rich atmospheres. We find that a combination of models including variable cloud coverage, prominent opacity sources, and high-precision optical and infrared spectra with current facilities enable constraints on cloud/haze properties and chemical abundances.
Second, we apply our atmospheric retrieval framework to a large sample of 19 exoplanets ranging from cool mini-Neptunes to hot Jupiters. This effort constitutes the largest (i.e., broad wavelength coverage, multiple chemical species, mini-Neptunes to Jupiter sized planets) homogeneous chemical abundance survey for transiting exoplanets to date. We find a mass–metallicity trend of increasing H2O abundances with decreasing mass, significantly lower than the mass–metallicity relation for carbon in the solar system giant planets and similar predictions for exoplanets. On the other hand, the Na and K mass–metallicity trends are generally consistent with the solar system metallicity trend. We argue that the trends observed in this sample suggest different formation pathways for these close-in exoplanets compared to the long-period solar system giants.
Third, we introduce Aurora, a next-generation retrieval framework for the characterization of H-rich and H-poor atmospheres. Here, we build upon state-of-the-art architectures and incorporate the following key advancements (a) a generalized compositional retrieval allowing for H-rich and H-poor atmospheres, (b) a generalized prescription for inhomogeneous clouds/hazes, (c) multiple Bayesian inference algorithms for high-dimensional retrievals, (d) modular considerations for refraction, forward scattering, and Mie scattering, and (e) noise modeling functionalities. We then carry out an investigation of the current and future chemical composition constraints for exoplanet atmospheres using this new retrieval framework. We estimate the abundance constraints achievable for hot Jupiters, mini Neptunes, and rocky exoplanets with current and upcoming observational facilities.
Lastly, we present our contribution to recent studies characterizing exoplanet atmospheres using ground and space-based facilities. We perform atmospheric retrievals on a diverse population of exoplanets from ultra-hot Jupiters to temperate mini Neptunes. Among the planets studied are WASP-127b, WASP-33b, WASP-21b, K2-18b, KELT-11b, and HAT-P-41b. Our results add to the vast chemical inventory of atomic and molecular species found in exoplanet atmospheres. Moreover, our analyses unveil some of the challenges when interpreting high-precision spectroscopic data and possible instrument systematics. The atmospheric reconnaissance presented in this work explores some of the considerations needed for generalized characterization of exoplanet atmospheres with upcoming ground-based and space-based facilities.
We conclude this dissertation by summarizing our findings and their implications to the broader field of exoplanet characterization. We discuss some of the outstanding questions from our research and the prospect of future modeling and retrieval approaches to robustly characterize exoplanet atmospheres. The lessons from this work highlight that, although the inferences derived from observations are strongly influenced by model assumptions, the use of physically motivated models with minimal assumptions, and broadband transmission spectra with current and future facilities can provide plausible estimates for the atmospheric properties for planets outside our solar system.Gates Cambridge Trust,
Bill & Melinda Gates Foundation OPP114
A Study of Bootstrap and Likelihood Based Methods in Non-Standard Problems.
In this dissertation we investigate bootstrap and likelihood based methods for constructing confidence intervals in some non-standard problems. The non-standard problems studied include problems with non root-n convergence (e.g., cube-root convergence), estimation problems where the parameter is on the boundary and study of non-smooth/abrupt-change models.
We consider estimating a bounded parameter in presence of nuisance parameters and propose methods of constructing confidence intervals for the parameter of interest in some typical examples that arise in high energy physics and astronomy.
In epidemiological applications interest lies in constructing confidence sets for the distribution function of time to infection/illness (the failure time) with interval censored data. We use a pseudo-likelihood function based on the marginal likelihood of a Poisson process to construct a pseudo-likelihood ratio statistic for testing point null hypotheses for the distribution function and show that the test statistic converges to a pivotal quantity.
A major part of the thesis has been motivated by an astronomy application --estimation of dark matter distribution in dwarf galaxies. An essential component of the application involves estimation and inference on functions that obey shape restrictions, like monotonicity/convexity. We study the performance of bootstrap methods for inference in two non-parametric estimation problems – the estimation of a monotone density and the Wicksell’s problem. Our results show the inconsistency of conventional bootstrap methods in the monotone density estimation problem; in fact, we claim that the bootstrap estimate of the sampling distribution does not have any weak limit conditionally (given the data), in probability. We establish limit distributions of shape restricted estimators and the consistency of bootstrap methods in the Wicksell’s problem.
Whether a dwarf spheroidal galaxy is in equilibrium or being tidally disrupted by the Milky Way is an important question for the study of its dark matter content and distribution. We investigate the presence of such a streaming motion focusing our attention to the Leo I galaxy. Statistical tools include isotonic and change-point estimators, asymptotic theory and resampling methods. We find that although there is evidence for streaming, the effect is not alarming.Ph.D.StatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/61701/1/bodhi_1.pd
- …