7,420 research outputs found

    The examination of baseline noise and the impact on the interpretation of low-template DNA samples

    Full text link
    It is common practice for DNA STR profiles to be analyzed using an analytical threshold (AT), but as more low template DNA (LT-DNA) samples are tested it has become evident that these thresholds do not adequately separate signal from noise. In order to confidently examine LT-DNA samples, the behavior and characteristics of the background noise of STR profiles must be better understood. Thus, the background noise of single source LT-DNA STR profiles were examined to characterize the noise distribution and determine how it changes with DNA template mass and injection time. Current noise models typically assume the noise is independent of fragment size but, given the tendency of the baseline noise to increase with template amount, it is important to establish whether the baseline noise is randomly found throughout the capillary electrophoresis (CE) run or whether it is situated in specific regions of the electropherogram. While it has been shown that the baseline noise of negative samples does not behave similarly to the baseline noise of profiles generated using optimal levels of DNA, the ATs determined using negative samples have shown to be similar to those developed with near-zero, low template mass samples. The distinction between low-template samples, where the noise is consistent regardless of target mass, and standard samples could be made at approximately 0.063 ng for samples amplified using the Identifiler^TM Plus amplification kit (29 cycle protocol), and injected for 5 and 10 seconds. At amplification target masses greater than 0.063 ng, the average noise peak height increased and began to plateau between 0.5 and 1.0 ng for samples injected for 5 and 10 seconds. To examine the time dependent nature of the baseline noise, the baselines of over 400 profiles were combined onto one axis for each target mass and each injection time. Areas of reproducibly higher noise peak heights were identified as areas of potential non-specific amplified product. When the samples were injected for five seconds, the baseline noise did not appear to be time dependent. However, when the samples were injected for either 10 or 20 seconds, there were three areas that exhibited an increase in noise; these areas were identified at 118 bases in green, 231 bases in yellow, and 106 bases in red. If a probabilistic analysis or AT is to be employed for DNA interpretation, consideration must be given as to how the validation or calibration samples are prepared. Ideally the validation data should include all the variation seen within typical samples. To this end, a study was performed to examine possible sources of variation in the baseline noise within the electropherogram. Specifically, three samples were prepared at seven target masses using four different kit lots, four capillary lots, in four amplification batches or four injection batches. The distribution of the noise peak heights in the blue and green channels for samples with variable capillary lots, amplifications, and injections were similar, but the distribution of the noise heights for samples with variable kit lots was shifted. This shift in the distribution of the samples with variable kit lots was due to the average peak height of the individual kit lots varying by approximately two. The yellow and red channels showed a general agreement between the distributions of the samples run with variable kit lots, amplifications, and injections, but the samples run with various capillary lots had a distribution shifted to the left. When the distribution of the noise height for each capillary was examined, the average peak height variation was less than two RFU between capillary lots. Use of a probabilistic method requires an accurate description of the distribution of the baseline noise. Three distributions were tested: Gaussian, log-normal, and Poisson. The Poisson distribution did not approximate the noise distributions well. The log-normal distribution was a better approximation than the Gaussian resulting in a smaller sum of the residuals squared. It was also shown that the distributions impacted the probability that a peak was noise; though how significant of an impact this difference makes on the final probability of an entire STR profile was not determined and may be of interest for future studies

    Capturing the zero: a new class of zero-augmented distributions and multiplicative error processes

    Get PDF
    We propose a novel approach to model serially dependent positive-valued variables which realize a non-trivial proportion of zero outcomes. This is a typical phenomenon in financial time series observed on high frequencies, such as cumulated trading volumes or the time between potentially simultaneously occurring market events. We introduce a flexible pointmass mixture distribution and develop a semiparametric specification test explicitly tailored for such distributions. Moreover, we propose a new type of multiplicative error model (MEM) based on a zero-augmented distribution, which incorporates an autoregressive binary choice component and thus captures the (potentially different) dynamics of both zero occurrences and of strictly positive realizations. Applying the proposed model to high-frequency cumulated trading volumes of liquid NYSE stocks, we show that the model captures both the dynamic and distribution properties of the data very well and is able to correctly predict future distributions. Keywords: High-frequency Data , Point-mass Mixture , Multiplicative Error Model , Excess Zeros , Semiparametric Specification Test , Market Microstructure JEL Classification: C22, C25, C14, C16, C5

    Analytical methods fort he study of color in digital images

    Get PDF
    La descripció qualitativa dels colors que composen una imatge digital és una tasca molt senzilla pel sistema visual humà. Per un ordinador aquesta tasca involucra una gran quantitat de qüestions i de dades que la converteixen en una operació de gran complexitat. En aquesta tesi desenvolupam un mètode automàtic per a la construcció d’una paleta de colors d’una imatge digital, intentant respondre a les diferents qüestions que se’ns plantegen quan treballam amb colors a dins el món computacional. El desenvolupament d’aquest mètode suposa l’obtenció d’un algorisme automàtic de segmentació d’histogrames, el qual és construït en detall a la tesi i diferents aplicacions del mateix son donades. Finalment, també s’explica el funcionament de CProcess, un ‘software’ amigable desenvolupat per a la fàcil comprensió del color

    Capturing the Zero: A New Class of Zero-Augmented Distributions and Multiplicative Error Processes

    Get PDF
    We propose a novel approach to model serially dependent positive-valued variables which realize a non-trivial proportion of zero outcomes. This is a typical phenomenon in financial time series observed on high frequencies, such as cumulated trading volumes or the time between potentially simultaneously occurring market events. We introduce a flexible point-mass mixture distribution and develop a semiparametric specification test explicitly tailored for such distributions. Moreover, we propose a new type of multiplicative error model (MEM) based on a zero-augmented distribution, which incorporates an autoregressive binary choice component and thus captures the (potentially different) dynamics of both zero occurrences and of strictly positive realizations. Applying the proposed model to high-frequency cumulated trading volumes of liquid NYSE stocks, we show that the model captures both the dynamic and distribution properties of the data very well and is able to correctly predict future distributions.high-frequency data, point-mass mixture, multiplicative error model, excess zeros, semiparametric specification test, market microstructure

    Bayesian factorizations of big sparse tensors

    Full text link
    It has become routine to collect data that are structured as multiway arrays (tensors). There is an enormous literature on low rank and sparse matrix factorizations, but limited consideration of extensions to the tensor case in statistics. The most common low rank tensor factorization relies on parallel factor analysis (PARAFAC), which expresses a rank kk tensor as a sum of rank one tensors. When observations are only available for a tiny subset of the cells of a big tensor, the low rank assumption is not sufficient and PARAFAC has poor performance. We induce an additional layer of dimension reduction by allowing the effective rank to vary across dimensions of the table. For concreteness, we focus on a contingency table application. Taking a Bayesian approach, we place priors on terms in the factorization and develop an efficient Gibbs sampler for posterior computation. Theory is provided showing posterior concentration rates in high-dimensional settings, and the methods are shown to have excellent performance in simulations and several real data applications
    corecore