9,696 research outputs found

    Parameterized Strings: Algorithms and Applications

    Get PDF
    The parameterized string (p-string), a generalization of the traditional string, is composed of constant and parameter symbols. A parameterized match (p-match) exists between two p-strings if the constants match exactly and there exists a bijection between the parameter symbols. Historically, p-strings have been employed in source code cloning, plagiarism detection, and structural similarity between biological sequences. By handling the intricacies of the parameterized suffix, we can efficiently address complex applications with data structures also reusable in traditional matching scenarios. In this dissertation, we extend data structures for p-strings (and variants) to address sophisticated string computations.;We introduce a taxonomy of classes for longest factor problems. Using this taxonomy, we show an interesting connection between the parameterized longest previous factor (pLPF) and familiar data structures in string theory, including the border array, prefix array, longest common prefix array, and analogous p-string data structures. Exploiting this connection, we construct a multitude of data structures using the same general pLPF framework.;Before this dissertation, the p-match was defined predominately by the matching between uncompressed p-strings. Here, we introduce the compressed parameterized pattern match to find all p-matches between a pattern and a text, using only the pattern and a compressed form of the text. We present parameterized compression (p-compression) as a new way to losslessly compress data to support p-matching. Experimentally, it is shown that p-compression is competitive with standard compression schemes. Using p-compression, we address the compressed p-match independent of the underlying compression routine.;Currently, p-string theory lacks the capability to support indeterminate symbols, a staple essential for applications involving inexact matching such as in music analysis. In this work, we propose and efficiently address two new types of p-matching with indeterminate symbols. (1) We introduce the indeterminate parameterized match (ip-match) to permit matching with indeterminate holes in a p-string. We support the ip-match by introducing data structures that extend the prefix array. (2) From a different perspective, the equivalence parameterized match (e-match) evolves the p-match to consider intra-alphabet symbol classes as equivalence classes. We propose a method to perform the e-match using the p-string suffix array framework, i.e. the parameterized suffix array (pSA) and parameterized longest common prefix array (pLCP). Historically, direct constructions of the pSA and pLCP have suffered from quadratic time bounds in the worst-case. Here, we introduce new p-string theory to efficiently construct the pSA/pLCP and break the theoretical worst-case time barrier.;Biological applications have become a classical use of p-string theory. Here, we introduce the structural border array to provide a lightweight solution to the biologically-oriented variant of the p-match, i.e. the structural match (s-match) on structural strings (s-strings). Following the s-match, we show how to use s-string suffix structures to support various pattern matching problems involving RNA secondary structures. Finally, we propose/construct the forward stem matrix (FSM), a data structure to access RNA stem structures, and we apply the FSM to the detection of hairpins and pseudoknots in an RNA sequence.;This dissertation advances the state-of-the-art in p-string theory by developing data structures for p-strings/s-strings and using p-string/s-string theory in new and old contexts to address various applications. Due to the flexibility of the p-string/s-string, the data structures and algorithms in this work are also applicable to the myriad of problems in the string community that involve traditional strings

    Investigating the effects of inter-annual weather variation (1968- 2016) on the functional response of cereal grain yield to applied nitrogen, using data from the Rothamsted Long-Term experiments

    Get PDF
    The effect of weather on inter-annual variation in the crop yield response to nitrogen (N) fertilizer for winter wheat (Triticum aestivvum L.) and spring barley (Hordeum vulgare L.) was investigated using yield data from the Broadbalk Wheat and Hoosfield Spring Barley long-term experiments at Rothamsted Research. Grain yields of crops from 1968 to 2016 were modelled as a function of N rates using a linear-plus-exponential (LEXP) function. The extent to which inter-annual variation in the parameters of these responses was explained by variations in weather (monthly summarized temperatures and rainfall), and by changes in the cultivar grown, was assessed. The inter-annual variability in rainfall and underlying temperature influenced the crop N response and hence grain yields in both crops. Asymptotic yields in wheat were particularly sensitive to mean temperature in November, April and May, and to total rainfall in October, February and June. In spring barley asymptotic yields were sensitive to mean temperature in February and June, and to total rainfall in April to July inclusive and September. The method presented here explores the separation of agronomic and environmental (weather) influences on crop yield over time. Fitting N response curves across multiple treatments can support an informative analysis of the influence of weather variation on the yield variability. Whilst there are issues of the confounding and collinearity of explanatory variables within such models, and that other factors also influence yields over time, our study confirms the considerable impact of weather variables at certain times of the year. This emphasizes the importance of including weather temporal variation when evaluating the impacts of climate change on crops

    A survey on algorithmic aspects of modular decomposition

    Full text link
    The modular decomposition is a technique that applies but is not restricted to graphs. The notion of module naturally appears in the proofs of many graph theoretical theorems. Computing the modular decomposition tree is an important preprocessing step to solve a large number of combinatorial optimization problems. Since the first polynomial time algorithm in the early 70's, the algorithmic of the modular decomposition has known an important development. This paper survey the ideas and techniques that arose from this line of research

    Water use efficiency of China\u27s terrestrial ecosystems and responses to drought

    Get PDF
    Water use efficiency (WUE) measures the trade-off between carbon gain and water loss of terrestrial ecosystems, and better understanding its dynamics and controlling factors is essential for predicting ecosystem responses to climate change. We assessed the magnitude, spatial patterns, and trends of WUE of China’s terrestrial ecosystems and its responses to drought using a process-based ecosystem model. During the period from 2000 to 2011, the national average annual WUE (net primary productivity (NPP)/evapotranspiration (ET)) of China was 0.79 g C kg−1 H2O. Annual WUE decreased in the southern regions because of the decrease in NPP and the increase in ET and increased in most northern regions mainly because of the increase in NPP. Droughts usually increased annual WUE in Northeast China and central Inner Mongolia but decreased annual WUE in central China. “Turning-points” were observed for southern China where moderate and extreme droughts reduced annual WUE and severe drought slightly increased annual WUE. The cumulative lagged effect of drought on monthly WUE varied by region. Our findings have implications for ecosystem management and climate policy making. WUE is expected to continue to change under future climate change particularly as drought is projected to increase in both frequency and severity

    Incremental learning of skills in a task-parameterized Gaussian Mixture Model

    Get PDF
    The final publication is available at link.springer.comProgramming by demonstration techniques facilitate the programming of robots. Some of them allow the generalization of tasks through parameters, although they require new training when trajectories different from the ones used to estimate the model need to be added. One of the ways to re-train a robot is by incremental learning, which supplies additional information of the task and does not require teaching the whole task again. The present study proposes three techniques to add trajectories to a previously estimated task-parameterized Gaussian mixture model. The first technique estimates a new model by accumulating the new trajectory and the set of trajectories generated using the previous model. The second technique permits adding to the parameters of the existent model those obtained for the new trajectories. The third one updates the model parameters by running a modified version of the Expectation-Maximization algorithm, with the information of the new trajectories. The techniques were evaluated in a simulated task and a real one, and they showed better performance than that of the existent model.Peer ReviewedPostprint (author's final draft

    Characterization of Exoplanet Atmospheres with the Optical Coronagraph on WFIRST

    Get PDF
    WFIRST-CGI is a NASA technology demonstration mission that is charged with demonstrating key technologies for future exo-Earth imaging missions in space. In the process, it will obtain images and low-resolution spectra of a handful to a dozen extrasolar planets and possibly protoplanetary disks. Its unprecedented contrast levels in the optical will provide astronomers with their first direct look at mature, Jupiter sized planets at moderate separations. This paper addresses the question: what science can be done with such data? An analytic noise model, which is informed by the ongoing engineering developments, is used to compute maximum achievable signal-to-noise ratios and scientifically viable integration times for hypothetical star planet systems, as well as to investigate the constraining power of various combinations of WFIRST-CGI photometric and spectral observations. This work introduces two simple models for planetary geometric albedos, which are inspired largely by the solar system's gas giants. The first planet model is a hybrid Jupiter-Neptune model, which separately treats the short and long wavelengths where chromophores and methane dominate absorption, respectively. The second planet model fixes cloud and haze properties in CoolTLusty to match Jupiter's albedo spectrum, it then perturbs only the metallicity. MCMC retrievals performed on simulated observations are used to assess the precision with which planet model parameters can be measured subject to different exposure times and observing cases. Fit results for both models' parameterizations of geometric albedo spectra demonstrate that a rough indication of the metallicity or methane content should be possible for some WFIRST-CGI targets. We conclude that real observations will likely be able to differentiate between extreme cases using these models, but will lack the precision necessary to uncover subtle trends.Comment: 29 pages, 25 figures, 2 table

    A deeply embedded young protoplanetary disk around L1489 IRS observed by the submillimeter array

    Full text link
    Circumstellar disks are expected to form early in the process that leads to the formation of a young star, during the collapse of the dense molecular cloud core. It is currently not well understood at what stage of the collapse the disk is formed or how it subsequently evolves. We aim to identify whether an embedded Keplerian protoplanetary disk resides in the L1489 IRS system. Given the amount of envelope material still present, such a disk would respresent a very young example of a protoplanetary disk. Using the Submillimeter Array (SMA) we have observed the HCO+^+ J=J= 3--2 line with a resolution of about 1''. At this resolution a protoplanetary disk with a radius of a few hundred AUs should be detectable, if present. Radiative transfer tools are used to model the emission from both continuum and line data. We find that these data are consistent with theoretical models of a collapsing envelope and Keplerian circumstellar disk. Models reproducing both the SED and the interferometric continuum observations reveal that the disk is inclined by 40^\circ which is significantly different to the surrounding envelope (74^\circ). This misalignment of the angular momentum axes may be caused by a gradient within the angular momentum in the parental cloud or if L1489 IRS is a binary system rather than just a single star. In the latter case, future observations looking for variability at sub-arcsecond scales may be able to constrain these dynamical variations directly. However, if stars form from turbulent cores, the accreting material will not have a constant angular momentum axis (although the average is well defined and conserved) in which case it is more likely to have a misalignment of the angular momentum axes of the disk and the envelope.Comment: 11 pages, 13 figures, accepted by A&
    corecore