30,877 research outputs found

    A Monte Carlo Approach to the Fluctuation Problem in Optimal Alignments of Random Strings

    Get PDF
    The problem of determining the correct order of fluctuation of the optimal alignment score of two random strings of length nn has been open for several decades. It is known [12] that the biased expected effect of a random letter-change on the optimal score implies an order of fluctuation linear in √nn. However, in many situations where such a biased effect is observed empirically, it has been impossible to prove analytically. The main result of this paper shows that when the rescaled-limit of the optimal alignment score increases in a certain direction, then the biased effect exists. On the basis of this result one can quantify a confidence level for the existence of such a biased effect and hence of an order √nn fluctuation based on simulation of optimal alignments scores. This is an important step forward, as the correct order of fluctuation was previously known only for certain special distributions [12],[13],[5],[10]. To illustrate the usefulness of our new methodology, we apply it to optimal alignments of strings written in the DNA-alphabet. As scoring function, we use the BLASTZ default-substitution matrix together with a realistic gap penalty. BLASTZ is one of the most widely used sequence alignment methodologies in bioinformatics. For this DNA-setting, we show that with a high level of confidence, the fluctuation of the optimal alignment score is of order Θ(√nn). An important special case of optimal alignment score is the Longest Common Subsequence (LCS) of random strings. For binary sequences with equiprobably symbols the question of the fluctuation of the LCS remains open. The symmetry in that case does not allow for our method. On the other hand, in real-life DNA sequences, it is not the case that all letters occur with the same frequency. So, for many real life situations, our method allows to determine the order of the fluctuation up to a high confidence level

    Record statistics and persistence for a random walk with a drift

    Full text link
    We study the statistics of records of a one-dimensional random walk of n steps, starting from the origin, and in presence of a constant bias c. At each time-step the walker makes a random jump of length \eta drawn from a continuous distribution f(\eta) which is symmetric around a constant drift c. We focus in particular on the case were f(\eta) is a symmetric stable law with a L\'evy index 0 < \mu \leq 2. The record statistics depends crucially on the persistence probability which, as we show here, exhibits different behaviors depending on the sign of c and the value of the parameter \mu. Hence, in the limit of a large number of steps n, the record statistics is sensitive to these parameters (c and \mu) of the jump distribution. We compute the asymptotic mean record number after n steps as well as its full distribution P(R,n). We also compute the statistics of the ages of the longest and the shortest lasting record. Our exact computations show the existence of five distinct regions in the (c, 0 < \mu \leq 2) strip where these quantities display qualitatively different behaviors. We also present numerical simulation results that verify our analytical predictions.Comment: 51 pages, 22 figures. Published version (typos have been corrected

    Are Inflation Forecasts from Major Swedish Forecasters Biased?

    Get PDF
    Inflation forecasts made 1999-2005 by Sveriges Riksbank and Konjunkturinstitet of Swedish inflation rates 1999-2007 are tested for unbiasedness; i.e., are the mean forecast errors zero? The bias is in the order of -0.1 percentage units for horizons below one year and in the order of 0.1 and 0.6 (depending on inflation measure) above one year. Using the maximum entropy bootstrap for inference bias is significant whereas inference using HAC indicates insignificance.Forecast evaluation; inflation; unbiasedness; maximum entropy bootstrap

    Who Moonlights and Why?: Evidence from the SIPP

    Get PDF
    Multiple job-holding is a significant characteristic of the labor market, with approximately 6 percent of all employed males reporting a second job in 1993 (Mishel and Bernstein, 1995, p. 226). Moonlighting reflects growing financial stress arising from declining earnings, as well as an increased need for flexibility to combine work and family. Approximately 40 percent of moonlighters report taking the second job due to economic hardship. Additionally, moonlighting is a reflection of the worker's choice to pursue entrepreneurial activities while maintaining the financial stability offered by the primary job. To restate in economic terminology, moonlighting arises from at least two distinct reasons. First, many individuals hold multiple jobs due to some sort of constraint on the primary job that limits that job's earnings capacity. Second, moonlighting may arise because the labor supplied to the two jobs are not perfect substitutes. That is, the wage paid and utility lost from the forgone leisure may not completely reflect the benefits and costs to working. For example, working on the primary job may provide the worker with the credentials to acquire a higher paying second job, such as a university psychologist testifying in a jury trial. Or, working on the second job may provide some satisfaction not received in the same amount or manner from the primary job, such as a comedian who has a "regular" job by day and performs at night. In either example, the costs and benefits of both jobs are more complex than the monetary wages paid and the forgone value of leisure. When faced with such nonpecuniary benefits and costs, optimizing behavior may lead a worker to take two jobs. In contrast to workers who moonlight because they are constrained on their primary jobs (PJ), we expect these kinds of moonlighters to moonlight for longer periods of time because optimizing behavior leads them to supply labor to more than one job, even in the long run. We might also expect to see smaller wage differences between jobs for such workers and the second job (SJ) wage could even be higher than the primary job wage in some situations. Previous research on moonlighting, including Shishko and Rostker (1976), O'Connell (1979) and Krishnan (1990), acknowledges that multiple motives may exist but focuses only on the constraint motive. In related studies, Paxson and Sicherman (1994) explore moonlighting as an alternative avenue for adjusting short-run labor supply, and Abdukadir (1992) examines the possibility that moonlighting is caused by short-term liquidity constraints. Another possible motivation for moonlighting is that certain types of job situations present greater opportunities for tax evasion. Plewes and Stinson (1991) provide survey evidence from the 1989 Current Population Survey of the many distinct reasons for moonlighting reported by workers. The only research in the moonlighting literature that models the joint motives for moonlighting correctly while controlling for the endogeneity of primary job hours are Lilja (1991) and Conway and Kimmel (1994). The latter improves upon Lilja (1991) by specifying a more plausible utility maximizing model and developing a superior instrument for PJ hours. This research examines the characteristics of moonlighters and the length of their moonlighting episodes with the goal of understanding who moonlights and why. The data are for prime-aged men and are drawn from the 1984 Survey of Income and Program Participation (SIPP) panel. The primary advantages of the SIPP are the detailed information provided on up to two jobs (including job start and end dates) and the relatively short length of time (four months) covered by each interview of the survey. Both of these qualities make it possible to identify brief (as well as long) periods of moonlighting, movements into and out of jobs, and the characteristics associated with each job. Because moonlighting may be motivated by short-term financial needs, being able to observe short moonlighting durations is important. We begin by studying the personal and job-related characteristics of moonlighters and how the length of the moonlighting episode varies with these characteristics. We then estimate a duration model with unobserved heterogeneity to identify formally the determinants of moonlighting behavior when multiple motives may exist. Our expectation is that individuals who moonlight because they are constrained on their primary jobs might do so for shorter periods than those who are "job-packaging." Therefore, the hazard rate for workers who moonlight because of primary job constraints should be greater than for those with alternative motives, ceteris paribus. The mixed hazard function will vary as the composition of the sample changes with the duration of the moonlighting episode. By exploring the importance of heterogeneity and the direction of duration dependence of the mixed and structural hazard functions, we gain new insights into the determinants of moonlighting behavior. The descriptive analyses reveal that most moonlighters in our sample work full-time on their primary jobs and 15 to 20 hours a week on lower paying second jobs, and, in spite of those long hours, tend to be poorer than the average worker. Yet, a significant minority earns a higher wage on their second job. Our duration model results suggest that the structural hazard increases over time and there is significant unobserved heterogeneity. Taken together, these results are consistent with the presence of multiple motives for moonlighting, with the constraint motive being the most common.moonlighting, jobs, Kimmel, Conway

    Locating regions in a sequence under density constraints

    Get PDF
    Several biological problems require the identification of regions in a sequence where some feature occurs within a target density range: examples including the location of GC-rich regions, identification of CpG islands, and sequence matching. Mathematically, this corresponds to searching a string of 0s and 1s for a substring whose relative proportion of 1s lies between given lower and upper bounds. We consider the algorithmic problem of locating the longest such substring, as well as other related problems (such as finding the shortest substring or a maximal set of disjoint substrings). For locating the longest such substring, we develop an algorithm that runs in O(n) time, improving upon the previous best-known O(n log n) result. For the related problems we develop O(n log log n) algorithms, again improving upon the best-known O(n log n) results. Practical testing verifies that our new algorithms enjoy significantly smaller time and memory footprints, and can process sequences that are orders of magnitude longer as a result.Comment: 17 pages, 8 figures; v2: minor revisions, additional explanations; to appear in SIAM Journal on Computin
    • …
    corecore