62 research outputs found

    Multiple serial episode matching

    Get PDF
    12In a previous paper we generalized the Knuth-Morris-Pratt (KMP) pattern matching algorithm and defined a non-conventional kind of RAM, the MP--RAMs (RAMS equipped with extra operations), and designed an O(n)O(n) on-line algorithm for solving the serial episode matching problem on MP--RAMs when there is only one single episode. We here give two extensions of this algorithm to the case when we search for several patterns simultaneously and compare them. More preciseley, given q+1q+1 strings (a text tt of length nn and qq patterns m1,,mqm_1,\ldots,m_q) and a natural number ww, the {\em multiple serial episode matching problem} consists in finding the number of size ww windows of text tt which contain patterns m1,,mqm_1,\ldots,m_q as subsequences, i.e. for each mim_i, if mi=p1,,pkm_i=p_1,\ldots ,p_k, the letters p1,,pkp_1,\ldots ,p_k occur in the window, in the same order as in mim_i, but not necessarily consecutively (they may be interleaved with other letters).} The main contribution is an algorithm solving this problem on-line in time O(nq)O(nq)

    Compressed Subsequence Matching and Packed Tree Coloring

    Get PDF
    We present a new algorithm for subsequence matching in grammar compressed strings. Given a grammar of size nn compressing a string of size NN and a pattern string of size mm over an alphabet of size σ\sigma, our algorithm uses O(n+nσw)O(n+\frac{n\sigma}{w}) space and O(n+nσw+mlogNlogwocc)O(n+\frac{n\sigma}{w}+m\log N\log w\cdot occ) or O(n+nσwlogw+mlogNocc)O(n+\frac{n\sigma}{w}\log w+m\log N\cdot occ) time. Here ww is the word size and occocc is the number of occurrences of the pattern. Our algorithm uses less space than previous algorithms and is also faster for occ=o(nlogN)occ=o(\frac{n}{\log N}) occurrences. The algorithm uses a new data structure that allows us to efficiently find the next occurrence of a given character after a given position in a compressed string. This data structure in turn is based on a new data structure for the tree color problem, where the node colors are packed in bit strings.Comment: To appear at CPM '1

    Faster subsequence recognition in compressed strings

    Full text link
    Computation on compressed strings is one of the key approaches to processing massive data sets. We consider local subsequence recognition problems on strings compressed by straight-line programs (SLP), which is closely related to Lempel--Ziv compression. For an SLP-compressed text of length mˉ\bar m, and an uncompressed pattern of length nn, C{\'e}gielski et al. gave an algorithm for local subsequence recognition running in time O(mˉn2logn)O(\bar mn^2 \log n). We improve the running time to O(mˉn1.5)O(\bar mn^{1.5}). Our algorithm can also be used to compute the longest common subsequence between a compressed text and an uncompressed pattern in time O(mˉn1.5)O(\bar mn^{1.5}); the same problem with a compressed pattern is known to be NP-hard

    Critical Differences and Clues in Eta Car's 2009 Event

    Get PDF
    We monitored Eta Carinae with HST WFPC2 and Gemini GMOS throughout the 2009 spectroscopic event, which was expected to differ from its predecessor in 2003 (Davidson et al. 2005). Here we report major observed differences between events, and their implications. Some of these results were quite unexpected. (1) The UV brightness minimum was much deeper in 2009. This suggests that physical conditions in the early stages of an event depend on different parameters than the "normal" inter-event wind. Extra mass ejection from the primary star is one possible cause. (2) The expected He II 4687 brightness maximum was followed several weeks later by another. We explain why this fact, and the timing of the 4687 maxima, strongly support a "shock breakup" hypothesis for X-ray and 4687 behavior as proposed 5-10 years ago. (3) We observed a polar view of the star via light reflected by dust in the Homunculus nebula. Surprisingly, at that location the variations of emission-line brightness and Doppler velocities closely resembled a direct view of the star; which should not have been true for any phenomena related to the orbit. This result casts very serious doubt on all the proposed velocity interpretations that depend on the secondary star's orbital motion. (4) Latitude-dependent variations of H I, He I and Fe II features reveal aspects of wind behavior during the event. In addition, we discuss implications of the observations for several crucial unsolved problems.Comment: 45 pages, 9 figures, submitted to Ap

    Subsequence Automata with Default Transitions

    Get PDF
    Let SS be a string of length nn with characters from an alphabet of size σ\sigma. The \emph{subsequence automaton} of SS (often called the \emph{directed acyclic subsequence graph}) is the minimal deterministic finite automaton accepting all subsequences of SS. A straightforward construction shows that the size (number of states and transitions) of the subsequence automaton is O(nσ)O(n\sigma) and that this bound is asymptotically optimal. In this paper, we consider subsequence automata with \emph{default transitions}, that is, special transitions to be taken only if none of the regular transitions match the current character, and which do not consume the current character. We show that with default transitions, much smaller subsequence automata are possible, and provide a full trade-off between the size of the automaton and the \emph{delay}, i.e., the maximum number of consecutive default transitions followed before consuming a character. Specifically, given any integer parameter kk, 1<kσ1 < k \leq \sigma, we present a subsequence automaton with default transitions of size O(nklogkσ)O(nk\log_{k}\sigma) and delay O(logkσ)O(\log_k \sigma). Hence, with k=2k = 2 we obtain an automaton of size O(nlogσ)O(n \log \sigma) and delay O(logσ)O(\log \sigma). On the other extreme, with k=σk = \sigma, we obtain an automaton of size O(nσ)O(n \sigma) and delay O(1)O(1), thus matching the bound for the standard subsequence automaton construction. Finally, we generalize the result to multiple strings. The key component of our result is a novel hierarchical automata construction of independent interest.Comment: Corrected typo

    Influence of the Landscape Template on Chemical and Physical Habitat for Brown Trout Within a Boreal Stream Network

    Get PDF
    We used the distribution of stream-dwelling brown trout (Salmo trutta) in a 67 km(2) boreal catchment to explore the importance of environmental organizing factors at a range of spatial scales, including whole-catchment characteristics derived from map data, and stream reach chemical and physical characteristics. Brown trout were not observed at any sites characterized by pH < 5.0 during the spring snowmelt episode, matching published toxicity thresholds. Brown trout distributions were patchy even in less acidic regions of the stream network, positively associated with glaciofluvial substrate and negatively associated with fine sand/silty sediments. A multivariate model including only whole-catchment characteristics explained 43% of the variation in brown trout densities, while models with local site physical habitat characteristics or local stream chemistry explained 33 and 25%, respectively. At the stream reach scale, physical habitat apparently played a primary role in organizing brown trout distributions in this stream network, with acidity placing an additional restriction by excluding brown trout from acidic headwater streams. Much of the strength of the catchment characteristics-fish association could be explained by the correlation of catchment-scale landscape characteristics with local stream chemistry and site physical characteristics. These results, consistent with the concept of multiple hierarchical environmental filters regulating the distribution of this fish species, underline the importance of considering a range of spatial scales and both physical and chemical environments when attempting to manage or restore streams for brown trout

    Mathematical Modelling and Machine Learning Methods for Bioinformatics and Data Science Applications

    Get PDF
    Mathematical modeling is routinely used in physical and engineering sciences to help understand complex systems and optimize industrial processes. Mathematical modeling differs from Artificial Intelligence because it does not exclusively use the collected data to describe an industrial phenomenon or process, but it is based on fundamental laws of physics or engineering that lead to systems of equations able to represent all the variables that characterize the process. Conversely, Machine Learning methods require a large amount of data to find solutions, remaining detached from the problem that generated them and trying to infer the behavior of the object, material or process to be examined from observed samples. Mathematics allows us to formulate complex models with effectiveness and creativity, describing nature and physics. Together with the potential of Artificial Intelligence and data collection techniques, a new way of dealing with practical problems is possible. The insertion of the equations deriving from the physical world in the data-driven models can in fact greatly enrich the information content of the sampled data, allowing to simulate very complex phenomena, with drastically reduced calculation times. Combined approaches will constitute a breakthrough in cutting-edge applications, providing precise and reliable tools for the prediction of phenomena in biological macro/microsystems, for biotechnological applications and for medical diagnostics, particularly in the field of precision medicine
    corecore