202 research outputs found

    Staircase polygons: moments of diagonal lengths and column heights

    Full text link
    We consider staircase polygons, counted by perimeter and sums of k-th powers of their diagonal lengths, k being a positive integer. We derive limit distributions for these parameters in the limit of large perimeter and compare the results to Monte-Carlo simulations of self-avoiding polygons. We also analyse staircase polygons, counted by width and sums of powers of their column heights, and we apply our methods to related models of directed walks.Comment: 24 pages, 7 figures; to appear in proceedings of Counting Complexity: An International Workshop On Statistical Mechanics And Combinatorics, 10-15 July 2005, Queensland, Australi

    The lineage process in Galton--Watson trees and globally centered discrete snakes

    Full text link
    We consider branching random walks built on Galton--Watson trees with offspring distribution having a bounded support, conditioned to have nn nodes, and their rescaled convergences to the Brownian snake. We exhibit a notion of ``globally centered discrete snake'' that extends the usual settings in which the displacements are supposed centered. We show that under some additional moment conditions, when nn goes to ++\infty, ``globally centered discrete snakes'' converge to the Brownian snake. The proof relies on a precise study of the lineage of the nodes in a Galton--Watson tree conditioned by the size, and their links with a multinomial process [the lineage of a node uu is the vector indexed by (k,j)(k,j) giving the number of ancestors of uu having kk children and for which uu is a descendant of the jjth one]. Some consequences concerning Galton--Watson trees conditioned by the size are also derived.Comment: Published in at http://dx.doi.org/10.1214/07-AAP450 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org

    The vertical profile of embedded trees

    Get PDF
    Consider a rooted binary tree with n nodes. Assign with the root the abscissa 0, and with the left (resp. right) child of a node of abscissa i the abscissa i-1 (resp. i+1). We prove that the number of binary trees of size n having exactly n_i nodes at abscissa i, for l =< i =< r (with n = sum_i n_i), is n0nlnr(n1+n1n01)liri0(ni1+ni+11ni1), \frac{n_0}{n_l n_r} {{n_{-1}+n_1} \choose {n_0-1}} \prod_{l\le i\le r \atop i\not = 0}{{n_{i-1}+n_{i+1}-1} \choose {n_i-1}}, with n_{l-1}=n_{r+1}=0. The sequence (n_l, ..., n_{-1};n_0, ..., n_r) is called the vertical profile of the tree. The vertical profile of a uniform random tree of size n is known to converge, in a certain sense and after normalization, to a random mesure called the integrated superbrownian excursion, which motivates our interest in the profile. We prove similar looking formulas for other families of trees whose nodes are embedded in Z. We also refine these formulas by taking into account the number of nodes at abscissa j whose parent lies at abscissa i, and/or the number of vertices at abscissa i having a prescribed number of children at abscissa j, for all i and j. Our proofs are bijective.Comment: 47 page

    Stochastic Continuous Time Neurite Branching Models with Tree and Segment Dependent Rates

    Full text link
    In this paper we introduce a continuous time stochastic neurite branching model closely related to the discrete time stochastic BES-model. The discrete time BES-model is underlying current attempts to simulate cortical development, but is difficult to analyze. The new continuous time formulation facilitates analytical treatment thus allowing us to examine the structure of the model more closely. We derive explicit expressions for the time dependent probabilities p(\gamma, t) for finding a tree \gamma at time t, valid for arbitrary continuous time branching models with tree and segment dependent branching rates. We show, for the specific case of the continuous time BES-model, that as expected from our model formulation, the sums needed to evaluate expectation values of functions of the terminal segment number \mu(f(n),t) do not depend on the distribution of the total branching probability over the terminal segments. In addition, we derive a system of differential equations for the probabilities p(n,t) of finding n terminal segments at time t. For the continuous BES-model, this system of differential equations gives direct numerical access to functions only depending on the number of terminal segments, and we use this to evaluate the development of the mean and standard deviation of the number of terminal segments at a time t. For comparison we discuss two cases where mean and variance of the number of terminal segments are exactly solvable. Then we discuss the numerical evaluation of the S-dependence of the solutions for the continuous time BES-model. The numerical results show clearly that higher S values, i.e. values such that more proximal terminal segments have higher branching rates than more distal terminal segments, lead to more symmetrical trees as measured by three tree symmetry indicators.Comment: 41 pages, 2 figures, revised structure and text improvement

    Unsupervised Anomaly Detection: investigations on Isolation Forest

    Get PDF
    Nel mondo di oggi, la crescente quantità di informazioni disponibili rende possibile analizzare diversi fattori. Uno di questo fattori è il rilevamento delle anomalie. Negli ultimi anni questo problema viene affrontato grazie al machine learning, il quale permette di riconoscere le istanze che non sono conformi al comportamento atteso di un sistema, i cosiddetti outlier. Uno dei settori che trae maggiore beneficio è quello industriale, dove i dati sono la nuova ricchezza delle industrie, basta pensare al potenziamento delle vendite o alla manutenzione predittiva. Negli anni sono stati proposti diverse classi di metodi, recentemente è stata introdotta una nuova classe basata sull’isolamento. Il primo metodo della classe basata sull’isolamento è Isolation Forest. Questo metodo ha riscosso un grande successo sia nelle applicazioni industriali sia nella ricerca accademica rendendo disponibile una notevole quantità di varianti. L’intuizione di base è molto semplice, ovvero, il punteggio di anomalia riflette la propensione di ogni istanza ad essere separata, in base al numero medio di suddivisioni casuali necessarie per isolare completamente un istanza di dati. In questo lavoro di tesi, dopo un’indagine preliminare dello stato dell’arte e un approfondimento del metodo Isolation Forest, vengono sviluppate diverse varianti di questo metodo, con l’obiettivo di migliorare il rilevamento delle anomalie. Queste varianti sono state sviluppate grazie a delle intuizioni sulle due fasi principali, la fase dove si selezione la caratteristica e il relativo valore di split e la fase dove si calcola il punteggio di anomalia per ogni istanza. In conclusione vengono forniti degli esperimenti numerici, utilizzando sia set di dati Artificiali sia set di dati del mondo Reale, con lo scopo di confrontare le prestazioni con il metodo standard, in termini di rilevamento di anomalie. Questi esperimenti hanno dimostrato che il metodo Prob Split sembra essere il più promettente tra tutti quelli sviluppati, perché ha incrementi delle prestazioni significativi nel rilevamento e mantiene il costo computazionale invariato.In today’s world, the increasing amount of available information makes it possible to analyse several factors. One of these factors is anomaly detection. In recent years, this problem has been addressed by machine learning, which makes it possible to recognise instances that do not conform to the expected behaviour of a system, so-called outliers. One of the sectors that benefits most is the industrial sector, where data is the new wealth of industries, just think of boosting sales or predictive maintenance. Over the years several classes of methods have been proposed, recently a new class based on isolation has been introduced. The first method of the isolation-based class is Isolation Forest. This method has been very successful both in industrial applications and in academic research, which has made a large number of variants available. The basic intuition is very simple, that is, the anomaly score reflects the propensity of each instance to be separated, based on the average number of random splits required to completely isolate a data instance. In this thesis, after a preliminary survey of the state of the art and an in-depth study of the Isolation Forest method, several variants of this method are developed, with the aim of improving anomaly detection. These variants were developed thanks to insights into the two main phases, the phase where the feature and its split value are selected and the phase where the anomaly score is calculated for each instance. In conclusion, numerical experiments are provided, using both Artificial and Real World datasets, with the aim of comparing performance in terms of anomaly detection. These experiments have shown that the Prob Split method appears to be the most promising of all those developed, because it has significant gains in detection and maintains the same computational cost as the Isolation Forest method

    The Horizontal Tunnelability Graph is Dual to Level Set Trees

    Get PDF
    Time series data, reflecting phenomena like climate patterns and stock prices, offer key insights for prediction and trend analysis. Contemporary research has independently developed disparate geometric approaches to time series analysis. These include tree methods, visibility algorithms, as well as persistence-based barcodes common to topological data analysis. This thesis enhances time series analysis by innovatively combining these perspectives through our concept of horizontal tunnelability. We prove that the level set tree gotten from its Harris Path (a time series), is dual to the time series' horizontal tunnelability graph, itself a subgraph of the more common horizontal visibility graph. This technique extends previous work by relating Merge, Chiral Merge, and Level Set Trees together along with visibility and persistence methodologies. Our method promises significant computational advantages and illuminates the tying threads between previously unconnected work. To facilitate its implementation, we provide accompanying empirical code and discuss its advantages
    corecore