712 research outputs found
Efficient Two-Stage Group Testing Algorithms for Genetic Screening
Efficient two-stage group testing algorithms that are particularly suited for
rapid and less-expensive DNA library screening and other large scale biological
group testing efforts are investigated in this paper. The main focus is on
novel combinatorial constructions in order to minimize the number of individual
tests at the second stage of a two-stage disjunctive testing procedure.
Building on recent work by Levenshtein (2003) and Tonchev (2008), several new
infinite classes of such combinatorial designs are presented.Comment: 14 pages; to appear in "Algorithmica". Part of this work has been
presented at the ICALP 2011 Group Testing Workshop; arXiv:1106.368
Group Testing with Random Pools: optimal two-stage algorithms
We study Probabilistic Group Testing of a set of N items each of which is
defective with probability p. We focus on the double limit of small defect
probability, p>1, taking either p->0
after or with . In both settings
the optimal number of tests which are required to identify with certainty the
defectives via a two-stage procedure, , is known to scale as
. Here we determine the sharp asymptotic value of and construct a class of two-stage algorithms over which
this optimal value is attained. This is done by choosing a proper bipartite
regular graph (of tests and variable nodes) for the first stage of the
detection. Furthermore we prove that this optimal value is also attained on
average over a random bipartite graph where all variables have the same degree,
while the tests have Poisson-distributed degrees. Finally, we improve the
existing upper and lower bound for the optimal number of tests in the case
with .Comment: 12 page
Multiple testing problems in classical clinical trial and adaptive designs
Multiplicity issues arise prevalently in a variety of situations in clinical trials and statistical methods for multiple testing have gradually gained importance with the increasing number of complex clinical trial designs. In general, two types of multiple testing can be performed (Dmitrienko et al., 2009): union-intersection testing (UIT) and intersection-union testing (IUT). The UIT is of the interest in this dissertation. Thus, the familywise error rate (FWER) is required to be controlled in the strong sense.
A number of methods have been developed for controlling the FWER, including single-step and stepwise procedures. In single-step approaches, such as the simple Bonferroni method, the rejection decision of a hypothesis does not depend on the decision of any other hypotheses. Single-step approaches can be improved in terms of power through stepwise approaches, while also controlling for the desired error rate. Besides, it is also possible to improve those procedures by a parametric approach. In the first project, we developed a new and powerful single-step progressive parametric multiple (SPPM) testing procedure for correlated normal test statistics. Through simulation studies, we demonstrate that SPPM improves power substantially when the correlation is moderate and/or the magnitude of eect sizes are similar.
Group sequential designs (GSD) are clinical trials allowing interim looks with the possibility of early terminations due to ecacy, harm or futility, which can reduce the overall costs and timelines for the development of a new drug. However, repeated looks of data also have multiplicity issues and could inflate the type I error rate. The proper treatments to the error inflation have been discussed widely (Pocock, 1977), (O'Brien and Fleming, 1979), (Wang and Tsiatis, 1987), (Lan and DeMets, 1983). Most literature about GSD focuses on a single endpoint. GSD with multiple endpoints however, has also received considerable attention. The main focus of our second project is a GSD with multiple primary endpoints, in which the trial is to evaluate whether at least one of the endpoints is statistically signicant. In this study design, multiplicity issues arise from repeated interims and multiple endpoints. Therefore, the appropriate adjustments must be made to control the Type I error rate. Our second purpose here is to show that the combination of multiple endpoint and repeated interim analyses can lead to a more powerful design. Via the multivariate normal distribution, a method that allows for simultaneously consideration of interim analyses and all clinical endpoints was proposed. The new approach is derived from the closure principle, thus it can control type I error rate strongly. We evaluate the power under dierent scenarios and show that it compares favorably to other methods when correlation among endpoints is non-zero.
In the group sequential design framework, another interesting topic is multiple arm multiple stage design (MAMS), where multiple arms are involved in the trial at the beginning with the flexibility about treatment selection or stopping decisions during the interim analyses. One of major hurdles of MAMS is the computational cost with the increasing number of arms and interim looks. Various designs were implemented to overcome this diculty (Thall et al., 1988; Schaid et al., 1990; Follmann et al., 1994; Stallard and Todd, 2003; Stallard and Friede, 2008; Magirr et al., 2012; Wason et al., 2017), but also control the FWER with the potential inflation from the multiple arm comparisons and multiple interim tests. Here, we consider a more flexible drop-the-loser design allowing the safety information in the treatment selection without a pre-specied dropping-arms mechanism and it still retains reasonable high power. The two dierent types of stopping boundaries are proposed for such a design. A sample size is also adjustable if the winner arm is dropped due to the safety considerations
Using Bayesian Statistics in Confirmatory Clinical Trials in the Regulatory Setting
Bayesian statistics plays a pivotal role in advancing medical science by
enabling healthcare companies, regulators, and stakeholders to assess the
safety and efficacy of new treatments, interventions, and medical procedures.
The Bayesian framework offers a unique advantage over the classical framework,
especially when incorporating prior information into a new trial with quality
external data, such as historical data or another source of co-data. In recent
years, there has been a significant increase in regulatory submissions using
Bayesian statistics due to its flexibility and ability to provide valuable
insights for decision-making, addressing the modern complexity of clinical
trials where frequentist trials are inadequate. For regulatory submissions,
companies often need to consider the frequentist operating characteristics of
the Bayesian analysis strategy, regardless of the design complexity. In
particular, the focus is on the frequentist type I error rate and power for all
realistic alternatives. This tutorial review aims to provide a comprehensive
overview of the use of Bayesian statistics in sample size determination in the
regulatory environment of clinical trials. Fundamental concepts of Bayesian
sample size determination and illustrative examples are provided to serve as a
valuable resource for researchers, clinicians, and statisticians seeking to
develop more complex and innovative designs
Two-step estimation of simultaneous equation panel data models with censored endogenous variables
This paper presents some two-step estimators for a wide range of parametric panel data models with censored endogenous variables and sample selection bias. Our approach is to derive estimates of the unobserved heterogeneity responsible for the endogeneity/selection bias to include as additional explanatory variables in the primary equation. These are obtained through a decomposition of the reduced form residuals. The panel nature of the data allows adjustment, and testing, for two forms of endogeneity and/or sample selection bias. Furthermore, it incorporates roles for dynamics and state dependence in the reduced form. Finally, we provide an empirical illustration which features our procedure and highlights the ability to test several of the underlying assumptions.Estimation;Panel Data;statistics
Forgetting Exceptions is Harmful in Language Learning
We show that in language learning, contrary to received wisdom, keeping
exceptional training instances in memory can be beneficial for generalization
accuracy. We investigate this phenomenon empirically on a selection of
benchmark natural language processing tasks: grapheme-to-phoneme conversion,
part-of-speech tagging, prepositional-phrase attachment, and base noun phrase
chunking. In a first series of experiments we combine memory-based learning
with training set editing techniques, in which instances are edited based on
their typicality and class prediction strength. Results show that editing
exceptional instances (with low typicality or low class prediction strength)
tends to harm generalization accuracy. In a second series of experiments we
compare memory-based learning and decision-tree learning methods on the same
selection of tasks, and find that decision-tree learning often performs worse
than memory-based learning. Moreover, the decrease in performance can be linked
to the degree of abstraction from exceptions (i.e., pruning or eagerness). We
provide explanations for both results in terms of the properties of the natural
language processing tasks and the learning algorithms.Comment: 31 pages, 7 figures, 10 tables. uses 11pt, fullname, a4wide tex
styles. Pre-print version of article to appear in Machine Learning 11:1-3,
Special Issue on Natural Language Learning. Figures on page 22 slightly
compressed to avoid page overloa
CoCalc as a Learning Tool for Neural Network Simulation in the Special Course "Foundations of Mathematic Informatics"
The role of neural network modeling in the learning content of the special
course "Foundations of Mathematical Informatics" was discussed. The course was
developed for the students of technical universities - future IT-specialists
and directed to breaking the gap between theoretic computer science and it's
applied applications: software, system and computing engineering. CoCalc was
justified as a learning tool of mathematical informatics in general and neural
network modeling in particular. The elements of technique of using CoCalc at
studying topic "Neural network and pattern recognition" of the special course
"Foundations of Mathematic Informatics" are shown. The program code was
presented in a CoffeeScript language, which implements the basic components of
artificial neural network: neurons, synaptic connections, functions of
activations (tangential, sigmoid, stepped) and their derivatives, methods of
calculating the network's weights, etc. The features of the Kolmogorov-Arnold
representation theorem application were discussed for determination the
architecture of multilayer neural networks. The implementation of the
disjunctive logical element and approximation of an arbitrary function using a
three-layer neural network were given as an examples. According to the
simulation results, a conclusion was made as for the limits of the use of
constructed networks, in which they retain their adequacy. The framework topics
of individual research of the artificial neural networks is proposed.Comment: 16 pages, 3 figures, Proceedings of the 13th International Conference
on ICT in Education, Research and Industrial Applications. Integration,
Harmonization and Knowledge Transfer (ICTERI, 2018
- …