202 research outputs found
Staircase polygons: moments of diagonal lengths and column heights
We consider staircase polygons, counted by perimeter and sums of k-th powers
of their diagonal lengths, k being a positive integer. We derive limit
distributions for these parameters in the limit of large perimeter and compare
the results to Monte-Carlo simulations of self-avoiding polygons. We also
analyse staircase polygons, counted by width and sums of powers of their column
heights, and we apply our methods to related models of directed walks.Comment: 24 pages, 7 figures; to appear in proceedings of Counting Complexity:
An International Workshop On Statistical Mechanics And Combinatorics, 10-15
July 2005, Queensland, Australi
The lineage process in Galton--Watson trees and globally centered discrete snakes
We consider branching random walks built on Galton--Watson trees with
offspring distribution having a bounded support, conditioned to have nodes,
and their rescaled convergences to the Brownian snake. We exhibit a notion of
``globally centered discrete snake'' that extends the usual settings in which
the displacements are supposed centered. We show that under some additional
moment conditions, when goes to , ``globally centered discrete
snakes'' converge to the Brownian snake. The proof relies on a precise study of
the lineage of the nodes in a Galton--Watson tree conditioned by the size, and
their links with a multinomial process [the lineage of a node is the vector
indexed by giving the number of ancestors of having children
and for which is a descendant of the th one]. Some consequences
concerning Galton--Watson trees conditioned by the size are also derived.Comment: Published in at http://dx.doi.org/10.1214/07-AAP450 the Annals of
Applied Probability (http://www.imstat.org/aap/) by the Institute of
Mathematical Statistics (http://www.imstat.org
The vertical profile of embedded trees
Consider a rooted binary tree with n nodes. Assign with the root the abscissa
0, and with the left (resp. right) child of a node of abscissa i the abscissa
i-1 (resp. i+1). We prove that the number of binary trees of size n having
exactly n_i nodes at abscissa i, for l =< i =< r (with n = sum_i n_i), is with n_{l-1}=n_{r+1}=0. The
sequence (n_l, ..., n_{-1};n_0, ..., n_r) is called the vertical profile of the
tree. The vertical profile of a uniform random tree of size n is known to
converge, in a certain sense and after normalization, to a random mesure called
the integrated superbrownian excursion, which motivates our interest in the
profile. We prove similar looking formulas for other families of trees whose
nodes are embedded in Z. We also refine these formulas by taking into account
the number of nodes at abscissa j whose parent lies at abscissa i, and/or the
number of vertices at abscissa i having a prescribed number of children at
abscissa j, for all i and j. Our proofs are bijective.Comment: 47 page
Stochastic Continuous Time Neurite Branching Models with Tree and Segment Dependent Rates
In this paper we introduce a continuous time stochastic neurite branching
model closely related to the discrete time stochastic BES-model. The discrete
time BES-model is underlying current attempts to simulate cortical development,
but is difficult to analyze. The new continuous time formulation facilitates
analytical treatment thus allowing us to examine the structure of the model
more closely. We derive explicit expressions for the time dependent
probabilities p(\gamma, t) for finding a tree \gamma at time t, valid for
arbitrary continuous time branching models with tree and segment dependent
branching rates. We show, for the specific case of the continuous time
BES-model, that as expected from our model formulation, the sums needed to
evaluate expectation values of functions of the terminal segment number
\mu(f(n),t) do not depend on the distribution of the total branching
probability over the terminal segments. In addition, we derive a system of
differential equations for the probabilities p(n,t) of finding n terminal
segments at time t. For the continuous BES-model, this system of differential
equations gives direct numerical access to functions only depending on the
number of terminal segments, and we use this to evaluate the development of the
mean and standard deviation of the number of terminal segments at a time t. For
comparison we discuss two cases where mean and variance of the number of
terminal segments are exactly solvable. Then we discuss the numerical
evaluation of the S-dependence of the solutions for the continuous time
BES-model. The numerical results show clearly that higher S values, i.e. values
such that more proximal terminal segments have higher branching rates than more
distal terminal segments, lead to more symmetrical trees as measured by three
tree symmetry indicators.Comment: 41 pages, 2 figures, revised structure and text improvement
Unsupervised Anomaly Detection: investigations on Isolation Forest
Nel mondo di oggi, la crescente quantità di informazioni disponibili rende possibile
analizzare diversi fattori. Uno di questo fattori è il rilevamento delle anomalie. Negli
ultimi anni questo problema viene affrontato grazie al machine learning, il quale
permette di riconoscere le istanze che non sono conformi al comportamento atteso di
un sistema, i cosiddetti outlier.
Uno dei settori che trae maggiore beneficio è quello industriale, dove i dati sono
la nuova ricchezza delle industrie, basta pensare al potenziamento delle vendite o
alla manutenzione predittiva.
Negli anni sono stati proposti diverse classi di metodi, recentemente è stata
introdotta una nuova classe basata sull’isolamento. Il primo metodo della classe
basata sull’isolamento è Isolation Forest. Questo metodo ha riscosso un grande
successo sia nelle applicazioni industriali sia nella ricerca accademica rendendo
disponibile una notevole quantità di varianti. L’intuizione di base è molto semplice,
ovvero, il punteggio di anomalia riflette la propensione di ogni istanza ad essere
separata, in base al numero medio di suddivisioni casuali necessarie per isolare
completamente un istanza di dati.
In questo lavoro di tesi, dopo un’indagine preliminare dello stato dell’arte e un
approfondimento del metodo Isolation Forest, vengono sviluppate diverse varianti di
questo metodo, con l’obiettivo di migliorare il rilevamento delle anomalie. Queste
varianti sono state sviluppate grazie a delle intuizioni sulle due fasi principali, la
fase dove si selezione la caratteristica e il relativo valore di split e la fase dove si
calcola il punteggio di anomalia per ogni istanza. In conclusione vengono forniti degli
esperimenti numerici, utilizzando sia set di dati Artificiali sia set di dati del mondo
Reale, con lo scopo di confrontare le prestazioni con il metodo standard, in termini
di rilevamento di anomalie.
Questi esperimenti hanno dimostrato che il metodo Prob Split sembra essere il
più promettente tra tutti quelli sviluppati, perché ha incrementi delle prestazioni
significativi nel rilevamento e mantiene il costo computazionale invariato.In today’s world, the increasing amount of available information makes it possible
to analyse several factors. One of these factors is anomaly detection. In recent
years, this problem has been addressed by machine learning, which makes it possible
to recognise instances that do not conform to the expected behaviour of a system,
so-called outliers.
One of the sectors that benefits most is the industrial sector, where data is the
new wealth of industries, just think of boosting sales or predictive maintenance.
Over the years several classes of methods have been proposed, recently a new
class based on isolation has been introduced. The first method of the isolation-based
class is Isolation Forest. This method has been very successful both in industrial
applications and in academic research, which has made a large number of variants
available. The basic intuition is very simple, that is, the anomaly score reflects the
propensity of each instance to be separated, based on the average number of random
splits required to completely isolate a data instance.
In this thesis, after a preliminary survey of the state of the art and an in-depth
study of the Isolation Forest method, several variants of this method are developed,
with the aim of improving anomaly detection. These variants were developed thanks
to insights into the two main phases, the phase where the feature and its split value
are selected and the phase where the anomaly score is calculated for each instance. In
conclusion, numerical experiments are provided, using both Artificial and Real World
datasets, with the aim of comparing performance in terms of anomaly detection.
These experiments have shown that the Prob Split method appears to be the most
promising of all those developed, because it has significant gains in detection and
maintains the same computational cost as the Isolation Forest method
The Horizontal Tunnelability Graph is Dual to Level Set Trees
Time series data, reflecting phenomena like climate patterns and stock prices, offer key insights for prediction and trend analysis. Contemporary research has independently developed disparate geometric approaches to time series analysis. These include tree methods, visibility algorithms, as well as persistence-based barcodes common to topological data analysis. This thesis enhances time series analysis by innovatively combining these perspectives through our concept of horizontal tunnelability. We prove that the level set tree gotten from its Harris Path (a time series), is dual to the time series' horizontal tunnelability graph, itself a subgraph of the more common horizontal visibility graph. This technique extends previous work by relating Merge, Chiral Merge, and Level Set Trees together along with visibility and persistence methodologies. Our method promises significant computational advantages and illuminates the tying threads between previously unconnected work. To facilitate its implementation, we provide accompanying empirical code and discuss its advantages
- …