194 research outputs found
Statistical analysis of progressively first-failure-censored data via beta-binomial removals
Progressive first-failure censoring has been widely-used in practice when the experimenter desires to remove some groups of test units before the first-failure is observed in all groups. Practically, some test groups may haphazardly quit the experiment at each progressive stage, which cannot be determined in advance. As a result, in this article, we propose a progressively first-failure censored sampling with random removals, which allows the removal of the surviving group(s) during the execution of the life test with uncertain probability, called the beta-binomial probability law. Generalized extreme value lifetime model has been widely-used to analyze a variety of extreme value data, including flood flows, wind speeds, radioactive emissions, and others. So, when the sample observations are gathered using the suggested censoring plan, the Bayes and maximum likelihood approaches are used to estimate the generalized extreme value distribution parameters. Furthermore, Bayes estimates are produced under balanced symmetric and asymmetric loss functions. A hybrid Gibbs within the Metropolis-Hastings method is suggested to gather samples from the joint posterior distribution. The highest posterior density intervals are also provided. To further understand how the suggested inferential approaches actually work in the long run, extensive Monte Carlo simulation experiments are carried out. Two applications of real-world datasets from clinical trials are examined to show the applicability and feasibility of the suggested methodology. The numerical results showed that the proposed sampling mechanism is more flexible to operate a classical (or Bayesian) inferential approach to estimate any lifetime parameter
Recommended from our members
Random Effect Models in the Statistical Analysis of Human Fecundability Data: Application to artificial insemination with sperm from donor.
The main aim of this dissertation is to explore methodological approaches to correlated binary data and to assess their suitability for the analysis of data on human fertility. The dataset concerns a study of Artificial Insemination by Donor (AID). AID represents an unusual research opportunity to study both male and female fecundability simultaneously. In each attempt to conceive, artificial insemination is carried out in consecutive ovulatory cycles until conception or change of treatment. The probability of conception may differ between women, so that the data are discrete time survival data with censoring and between-subject heterogeneity. There is also potential heterogeneity between donors. Non-systematic allocation of the donor to recipient ensures that the same woman receives semen from several donors, This added heterogeneity as well as other cycle dependent covariates have to be taken into account. The analysis must also take account of covariates, most of them time-varying. Our dataset have a crossed hierarchical structure due to the presence of both, female and male factors. The rather complicated "design" calls for unit specific regression models. These models are presented as well as their lack of tractability except in some rather specific cases. The motivation for choosing Gaussian random effects in unit specific regression models is discussed. We demonstrate the use of an approximate inference method (Penalized Quasi Likelihood). This method is shown to be a useful and practical way of carrying out preliminary data analysis. Finally a Bayesian procedure (Gibbs sampling) provides validation and more accurate results despite the intensive computation it needs.
The main substantive finding of the analysis is the unexpectedly pronounced heterogeneity of donor fecundability, even after inclusion of conventional measures of sperm quality into the model. These measures were shown to be predictive at the donor level but not at the level of individual donation
Bayesian Inference for Cure Rate Models
Η ανάλυση επιβίωσης αποτελείται από ένα σύνολο στατιστικών μεθόδων που στοχεύει στη μελέτη του χρόνου μέχρι την εμφάνιση ενός συγκεκριμένου γεγονότος όπως ο θάνατος. Για την πλειονότητα των μεθόδων αυτών, θεωρείται πως όλα τα άτομα που συμμετέχουν υπόκεινται στο γεγονός που μας ενδιαφέρει. Ωστόσο, υπάρχουν περιπτώσεις όπου η υπόθεση αυτή δεν είναι ρεαλιστική, καθώς υπάρχουν ασθενείς που δεν θα βιώσουν το γεγονός αυτό στη διάρκεια της μελέτης. Για αυτό το λόγο, έχουν αναπτυχθεί ορισμένα μοντέλα επιβίωσης που επιτρέπουν την ύπαρξη ασθενών οι οποίοι δε βιώνουν το συμβάν και ονομάζονται μακροχρόνια επιζώντες. Τα μοντέλα αυτά ονομάζονται μοντέλα ρυθμού θεραπείας και υποθέτουν ότι, καθώς ο χρόνος αυξάνεται, η συνάρτηση επιβίωσης τείνει σε μια τιμή p ∈ (0,1), που αντιπροσωπεύει το ποσοστό των μακροχρόνια επιζώντων, αντί να τείνει στο μηδέν όπως στην κλασική ανάλυση επιβίωσης.
Πρόσφατα, ο Rocha (2016) πρότεινεμία νέα προσέγγισητωνπροβλημάτωνεπιβίωσης μεμακροχρόνια επιζώντες. Η μεθοδολογία του για τη μοντελοποίηση του ποσοστού των μακροχρόνια επιζώντων βασίστηκε στη χρήση των «ελαττωματικών» (defective) κατανομών, οι οποίες χαρακτηρίζονται από το γεγονός ότι το ολοκλήρωμα της συνάρτησης πιθανότητάς τους δεν ισούται με τη μονάδα για ορισμένες επιλογές του πεδίου ορισμού κάποιων παραμέτρων τους. Σκοπός της παρούσας διπλωματικής εργασίας, είναι να παράσχει νέους Μπεϋζιανούς εκτιμητές των παραμέτρων των
«ελαττωματικών» μοντέλων κάτω από την υπόθέση της δεξιάς λογοκρισίας. Επίσης, θα αναπτυχθούν αλγόριθμοι Markov chain Monte Carlo (MCMC) για τη συμπερασματολογία σχετικά με τις παραμέτρους μιας ευρείας κατηγορίας μοντέλων ρυθμού θεραπείας βασισμένων στις «ελαττωματικές» αυτές κατανομές, ενώ οι Μπεϋζιανοί εκτιμητές και τα αντίστοιχα διαστήματα αξιοπιστίας θα ληφθούν από τα δείγματα της από κοινού εκ των υστέρων κατανομής. Επιπλέον, η συμπεριφορά των Μπεϋζιανών εκτιμητών θα αξιολογηθεί και θα συγκριθεί με αυτή των εκτιμητών μεγίστης πιθανοφάνειας του Rocha (2016) μέσω πειραμάτων προσομοίωσης. Ακόμη, τα προτεινόμενα αυτά μοντέλα-κατανομές θα εφαρμοσθούν σε πραγματικά σετ δεδομένων, όπου και θα συγκριθούν μεταξύ τους μέσω κατάλληλων στατιστικών μεγεθών. Τέλος, αξίζει να σημειωθεί πως
η παρούσα διπλωματική εργασία αποτελεί την πρώτη προσπάθεια διερεύνησης των πλεονεκτημάτων της Μπεϋζιανής προσέγγισης στη συμπερασματολογία για τις παραμέτρους αρκετών μοντέλων ρυθμού θεραπείας, κάτω από την υπόθεση της δεξιάς λογοκρισίας, καθώς και της απόκτησης νέων Μπεϋζιανών εκτιμητών, χωρίς όμως τη συμπερίληψη της πληροφορίας από συν μεταβλητές.Survival analysis consists of a set of statistical methods in the field of biostatistics, whose main aim is to study the time until the occurrence of a specified event, such as death. For the majority of these methods it is assumed that all the individuals taking part in the study are subject to the event of interest. However, there are situations where this assumption is unrealistic, since there are observations not susceptible to the event of interest or cured. For this reason, there have been developed some survival models which allow for patients that may never experience the event, usually called long-term survivors. These models, called Cure Rate Models, assume that, as time increases, the survival function tends to a value p ∈ (0,1), representing the cure rate, instead of tending to zero as in standard survival analysis.
Recently, Rocha (2016) proposed a new approach to modelling the situations in which there are long-term survivors in survival studies. His methodology was based on the use of defective distributions to model cure rates. In contrast to the standard distributions, the defective ones are characterized by having probability density functions which integrate to values less than one for certain choices of the domain of some of their parameters. The aim of the present thesis is to provide new Bayesian estimates for the parameters of the defective models used for cure rate modelling under the assumption of right censoring. We will develop Markov chain Monte Carlo (MCMC) algorithms for inferring the parameters of a broad class of defective models, both for the baseline distributions (Gompertz & Inverse Gaussian), as well as, for their extension under
the Marshall-Olkin family of distributions. The Bayesian estimates of the distributions’ parameters, as well as their associated credible intervals, will be obtained from the samples drawn from their joint posterior distribution.
In addition, Bayesian estimates’ behaviour will be evaluated and compared with the maximum likelihood estimates obtained by Rocha (2016) through simulation experiments. Finally, we will apply the competing models and approaches to real datasets and we will compare them through various statistical measures. This work will be the first attempt to explore the advantages of the Bayesian approach to inference for defective cure rate models under the assumption of right censoring mechanism, as well as the first presentation of new Bayesian estimates for several defective distributions, but without incorporating covariate information
Recommended from our members
Term burstiness: evidence, model and applications
The present thesis looks at the phenomenon of term burstiness in text. Term burstiness is defined as the multiple re-occurrences in short succession of a particular term after it has occurred once in a certain text. Term burstiness is important as it aids in providing structure and meaning to a document. Various kinds of term burstiness in text are studied and their effect on a dataset explored in a series of homogeneity experiments. A novel model of term burstiness is proposed and evaluations based on the proposed model are performed on three different applications. The “bag-of-words” assumption is often used in statistical Natural Language Processing and Information Retrieval applications. Under this assumption all structure and positional information of terms is lost and only frequency counts of the document are retained. As a result of counting frequencies only, the “bag-of-words” representation of text assumes that the probability of a word occurring remains constant throughout the text. This assumption is often used because of its simplicity and the ease it provides for the application of mathematical and statistical techniques on text. Though this assumption is known to be untrue [CG95b, CG95a, ChuOO], but applications [SB97, Lew98, MN98, Seb02] based on this assumption appear not to be much hampered. A series of homogeneity based experiments are carried out to study the presence and extent of term burstiness against the term independence based homogeneity assumption on the dataset. A null hypothesis stating the homogeneity of a dataset is formulated and defeated in a series of experiments based on the y2 test, which tests the equality between two partitions of a certain dataset. Various schemes of partitioning a dataset are adopted to illustrate the effect of term burstiness and structure in text. This provided evidence of term burstiness in the dataset, and fine-grained information about the distribution of terms that might be used for characterizing or profiling a dataset. A model for term burstiness in a dataset is proposed based on the gaps between successive occurrences of a particular term. This model is not merely based on frequency counts like other existing models, but takes into account the structural and positional information about the term’s occurrence in the document. The proposed term burstiness model looks at gaps between successive occurrences of the term. These gaps are modeled using a mixture of exponential distributions. The first exponential distribution provides the overall rate of occurrence of a term in a dataset and the second exponential distribution determines the term’s rate of re-occurrence in a burst or when it has already occurred once previously. Since most terms occur in only a few documents, there are a large number of documents with no occurrences of a particular term. In the proposed model, non-occurrence of a term in a document is accounted for by the method of data censoring. It is not straightforward to obtain parameter estimates for such a complex model. So, Bayesian statistics is used for flexibility and ease of fitting this model, and for obtaining parameter estimates. The model can be used for all kinds of terms, be they rare content words, medium frequency terms or frequent function words. The term re-occurrence model is instantiated and verified against the background of different collections, in the context of three different applications. The applications include studying various terms within a dataset to identify behavioral differences between the terms, studying similar terms across different datasets to detect stylistic features based on the term’s distribution and studying the characteristics of very frequent terms across different datasets. The model aids in the identification of term characteristics in a dataset. It helps distinguish between highly bursty content terms and less bursty function words. The model can differentiate between a frequent function word and a scattered one. It can be used to identify stylistic features in a term’s distribution across text of varying genres. The model also aids in understanding the behaviour of very frequent (usually function) words in a dataset
Random Number Generators
The quasi-negative-binomial distribution was applied to queuing theory for determining the distribution of total number of customers served before the queue vanishes under certain assumptions. Some structural properties (probability generating function, convolution, mode and recurrence relation) for the moments of quasi-negative-binomial distribution are discussed. The distribution’s characterization and its relation with other distributions were investigated. A computer program was developed using R to obtain ML estimates and the distribution was fitted to some observed sets of data to test its goodness of fit
- …