23 research outputs found

    A New Estimator of Intrinsic Dimension Based on the Multipoint Morisita Index

    Full text link
    The size of datasets has been increasing rapidly both in terms of number of variables and number of events. As a result, the empty space phenomenon and the curse of dimensionality complicate the extraction of useful information. But, in general, data lie on non-linear manifolds of much lower dimension than that of the spaces in which they are embedded. In many pattern recognition tasks, learning these manifolds is a key issue and it requires the knowledge of their true intrinsic dimension. This paper introduces a new estimator of intrinsic dimension based on the multipoint Morisita index. It is applied to both synthetic and real datasets of varying complexities and comparisons with other existing estimators are carried out. The proposed estimator turns out to be fairly robust to sample size and noise, unaffected by edge effects, able to handle large datasets and computationally efficient

    Selecting Features by their Resilience to the Curse of Dimensionality

    Full text link
    Real-world datasets are often of high dimension and effected by the curse of dimensionality. This hinders their comprehensibility and interpretability. To reduce the complexity feature selection aims to identify features that are crucial to learn from said data. While measures of relevance and pairwise similarities are commonly used, the curse of dimensionality is rarely incorporated into the process of selecting features. Here we step in with a novel method that identifies the features that allow to discriminate data subsets of different sizes. By adapting recent work on computing intrinsic dimensionalities, our method is able to select the features that can discriminate data and thus weaken the curse of dimensionality. Our experiments show that our method is competitive and commonly outperforms established feature selection methods. Furthermore, we propose an approximation that allows our method to scale to datasets consisting of millions of data points. Our findings suggest that features that discriminate data and are connected to a low intrinsic dimensionality are meaningful for learning procedures.Comment: 16 pages, 1 figure, 2 table

    A novel filter algorithm for unsupervised feature selection based on a space filling measure

    Get PDF
    The research proposes a novel filter algorithm for the unsupervised feature selection problems based on a space filling measure. A well-known criterion of space filling design, called the coverage measure, is adapted to dimensionality reduction problems. Originally, this measure was developed to judge the quality of a space filling design. In this work it is used to reduce the redundancy in data. The proposed algorithm is evaluated on simulated data with several scenarios of noise injection. Furthermore, a comparison with some benchmark methods of feature selection is performed on real UCI datasets

    Model-based Filtering of Interfering Signals in Ultrasonic Time Delay Estimations

    Get PDF
    In dieser Arbeit werden modellbasierte algorithmische AnsĂ€tze zur Interferenz-invarianten ZeitverschiebungsschĂ€tzung vorgestellt, die speziell fĂŒr die SchĂ€tzung kleiner Zeitverschiebungsdifferenzen mit einer notwendigen Auflösung, die deutlich unterhalb der Abtastzeit liegt, geeignet sind. Daher lassen sich die Verfahren besonders gut auf die Laufzeit-basierte Ultraschalldurchflussmessung anwenden, da hier das Problem der Interferenzsignale besonders ausgeprĂ€gt ist. Das Hauptaugenmerk liegt auf der Frage, wie mehrere Messungen mit unterschiedlichen Zeitverschiebungen oder Prozessparametern zur UnterdrĂŒckung der Interferenzsignale in Ultraschalldurchflussmessungen verwendet werden können, wobei eine gute Robustheit gegenĂŒber additivem weißen Gauß\u27schen Rauschen und eine hohe Auflösung erhalten bleiben sollen. Zu diesem Zweck wird ein Signalmodell angenommen, welches aus stationĂ€ren Interferenzsignalen, die nicht von wechselnden Zeitverschiebungen abhĂ€ngig sind, und aus Zielsignalen, die den Messeffekt enthalten, besteht. ZunĂ€chst wird das Signalmodell einer Ultraschalldurchflussmessung und sein dynamisches Verhalten bei Temperatur- oder Zeitverschiebungsschwankungen untersucht. Ziel ist es, valide SimulationsdatensĂ€tze zu erzeugen, mit denen die entwickelten Methoden sowohl unter der PrĂ€misse, dass die Daten perfekt zum Signalmodell passen, als auch unter der PrĂ€misse, dass Modellfehler vorliegen, getestet werden können. Dabei werden die Eigenschaften der Signalmodellkomponenten, wie Bandbreite, StationaritĂ€t und TemperaturabhĂ€ngigkeit, identifiziert. Zu diesem Zweck wird eine neue Methode zur Modellierung der TemperaturabhĂ€ngigkeit der Interferenzsignale vorgestellt. Nach der Charakterisierung des gesamten Messsystems wird das Signalmodell -- angepasst an die Ultraschalldurchflussmessung -- als Grundlage fĂŒr zwei neue Methoden verwendet, deren Ziel es ist, die Auswirkungen der Interferenzsignale zu reduzieren. Die erste vorgeschlagene Technik erweitert die auf der Signaldynamik basierenden AnsĂ€tze in der Literatur, indem sie die Voraussetzungen fĂŒr die erforderliche Varianz der Zeitverschiebungen abschwĂ€cht. Zu diesem Zweck wird eine neue Darstellung von mehreren Messsignalen als Punktwolken eingefĂŒhrt. Die Punktwolken werden dann mithilfe der Hauptkomponentenanalyse und B-Splines verarbeitet, was entweder zu Interferenz-invarianten ZeitverschiebungsschĂ€tzungen oder geschĂ€tzten Interferenzsignalen fĂŒhrt. In diesem Zusammenhang wird eine neuartige gemeinsame B-Spline- und RegistrierungsschĂ€tzung entwickelt, um die Robustheit zu erhöhen. Der zweite Ansatz besteht in einer regressionsbasierten SchĂ€tzung der Zeitverschiebungsdifferenzen durch das Erlernen angepasster SignalunterrĂ€ume. Diese UnterrĂ€ume werden effizient durch die Analytische Wavelet Packet Transformation berechnet, bevor die resultierenden Koeffizienten in Merkmale transformiert werden, die gut mit den Zeitverschiebungssdifferenzen korrelieren. DarĂŒber hinaus wird ein neuartiger, unbeaufsichtigter Unterraum-Trainingsansatz vorgeschlagen und mit den konventionellen Filter- und Wrapper-basierten Merkmalsauswahlmethoden verglichen. Schließlich werden beide Methoden in einem experimentellen Ultraschalldurchflussmesssystem mit einem hohen Maß an vorhandenen Interferenzsignalen getestet, wobei sich zeigt, dass sie in den meisten FĂ€llen den Methoden aus der Literatur ĂŒberlegen sind. Die QualitĂ€t der Methoden wird anhand der Genauigkeit der ZeitverschiebungsschĂ€tzung bewertet, da die Grundwahrheit fĂŒr die Interferenzsignale nicht zuverlĂ€ssig bestimmt werden kann. Anhand verschiedener DatensĂ€tze werden die AbhĂ€ngigkeiten von den Hyperparametern, den Prozessbedingungen und, im Falle der regressionsbasierten Methode, dem Trainingsdatensatz analysiert

    Model-based Filtering of Interfering Signals in Ultrasonic Time Delay Estimations

    Get PDF
    This work presents model-based algorithmic approaches for interference-invariant time delay estimation, which are specifically suited for the estimation of small time delay differences with a necessary resolution well below the sampling time. Therefore, the methods can be applied particularly well for transit-time ultrasonic flow measurements, since the problem of interfering signals is especially prominent in this application

    Model-based Filtering of Interfering Signals in Ultrasonic Time Delay Estimations

    Get PDF
    This work presents model-based algorithmic approaches for interference-invariant time delay estimation, which are specifically suited for the estimation of small time delay differences with a necessary resolution well below the sampling time. Therefore, the methods can be applied particularly well for transit-time ultrasonic flow measurements, since the problem of interfering signals is especially prominent in this application

    Geomorphometry 2020. Conference Proceedings

    Get PDF
    Geomorphometry is the science of quantitative land surface analysis. It gathers various mathematical, statistical and image processing techniques to quantify morphological, hydrological, ecological and other aspects of a land surface. Common synonyms for geomorphometry are geomorphological analysis, terrain morphometry or terrain analysis and land surface analysis. The typical input to geomorphometric analysis is a square-grid representation of the land surface: a digital elevation (or land surface) model. The first Geomorphometry conference dates back to 2009 and it took place in ZĂŒrich, Switzerland. Subsequent events were in Redlands (California), NĂĄnjÄ«ng (China), Poznan (Poland) and Boulder (Colorado), at about two years intervals. The International Society for Geomorphometry (ISG) and the Organizing Committee scheduled the sixth Geomorphometry conference in Perugia, Italy, June 2020. Worldwide safety measures dictated the event could not be held in presence, and we excluded the possibility to hold the conference remotely. Thus, we postponed the event by one year - it will be organized in June 2021, in Perugia, hosted by the Research Institute for Geo-Hydrological Protection of the Italian National Research Council (CNR IRPI) and the Department of Physics and Geology of the University of Perugia. One of the reasons why we postponed the conference, instead of canceling, was the encouraging number of submitted abstracts. Abstracts are actually short papers consisting of four pages, including figures and references, and they were peer-reviewed by the Scientific Committee of the conference. This book is a collection of the contributions revised by the authors after peer review. We grouped them in seven classes, as follows: ‱ Data and methods (13 abstracts) ‱ Geoheritage (6 abstracts) ‱ Glacial processes (4 abstracts) ‱ LIDAR and high resolution data (8 abstracts) ‱ Morphotectonics (8 abstracts) ‱ Natural hazards (12 abstracts) ‱ Soil erosion and fluvial processes (16 abstracts) The 67 abstracts represent 80% of the initial contributions. The remaining ones were either not accepted after peer review or withdrawn by their Authors. Most of the contributions contain original material, and an extended version of a subset of them will be included in a special issue of a regular journal publication

    DATA-DRIVEN ANALYSIS AND MAPPING OF THE POTENTIAL DISTRIBUTION OF MOUNTAIN PERMAFROST

    Get PDF
    In alpine environments, mountain permafrost is defined as a thermal state of the ground and it corresponds to any lithosphere material that is at or below 0°C for at least two years. Its degradation is potentially leading to an increasing rock fall activity and sediment transfer rates. During the last 20 years, knowledge on this phenomenon has significantly improved thanks to many studies and monitoring projects, revealing an extremely discontinuous and complex spatial distribution, especially at the micro scale (scale of a specific landform; tens to several hundreds of metres). The objective of this thesis was the systematic and detailed investigation of the potential of data-driven techniques for mountain permafrost distribution modelling. Machine learning (ML) algorithms are able to consider a greater number of pa- rameters compared to classic approaches. Not only can permafrost distribution be modelled by using topo-climatic parameters as a proxy, but also by taking into ac- count known field permafrost evidences. These latter were collected in a sector of the Western Swiss Alps and they were mapped from field data (thermal and geoelectrical data) and ortho-image interpretations (rock glacier inventorying). A permafrost dataset was built from these evidences and completed with environmental and mor- phological predictors. Data were firstly analysed with feature relevance techniques in order to identify the statistical contribution of each controlling factor and to exclude non-relevant or redundant predictors. Five classification algorithms, belonging to statistics and machine learning, were then applied to the dataset and tested: Logistic regression (LR), linear and non-linear Support Vector Machines (SVM), Multilayer perceptrons (MLP) and Random forests (RF). These techniques inferred a classifica- tion function from labelled training data (pixels of permafrost absence and presence) to predict the permafrost occurrence where this was unknown. Classification performances, assessed with AUROC curves, ranged between 0.75 (linear SVM) and 0.88 (RF). These values are generally indicative of good model performances. Besides these statistical measures, a qualitative evaluation was performed by using field expert knowledge. Both quantitative and qualitative evaluation approaches suggested to employ the RF algorithm to obtain the best model. As machine learning is a non-deterministic approach, an overview of the model uncertainties is also offered. It informs about the location of most uncertain sectors where further field investigations are required to be carried out to improve the reliability of permafrost maps. RF demonstrated to be efficient for permafrost distribution modelling thanks to consistent results that are comparable to the field observations. The employment of environmental variables illustrating the micro-topography and the ground charac- teristics (such as curvature indices, NDVI or grain size) favoured the prediction of the permafrost distribution at the micro scale. These maps presented variations of probability of permafrost occurrence within distances of few tens of metres. In some talus slopes, for example, a lower probability of occurrence in the mid-upper part of the slope was predicted. In addition, permafrost lower limits were automatically recognized from permafrost evidences. Lastly, the high resolution of the input dataset (10 metres) allowed elaborating maps at the micro scale with a modelled permafrost spatial distribution, which was less optimistic than traditional spatial models. The permafrost prediction was indeed computed without recurring to altitude thresh- olds (above which permafrost may be found) and the representation of the strong discontinuity of mountain permafrost at the micro scale was better respected. -- Dans les environnements alpins, le pergĂ©lisol de montagne est dĂ©fini comme un Ă©tat thermique du sol et correspond Ă  tout matĂ©riau de la lithosphĂšre qui maintient une tempĂ©rature Ă©gale ou infĂ©rieure Ă  0°C pendant au moins deux ans. Sa dĂ©gradation peut conduire Ă  une activitĂ© croissante de chutes de blocs et Ă  une augmentation des taux de transfert de sĂ©diments. Au cours des 20 derniĂšres annĂ©es, les connaissances sur ce phĂ©nomĂšne ont considĂ©rablement augmentĂ© grĂące Ă  de nombreuses Ă©tudes et projets de suivi, qui ont rĂ©vĂ©lĂ© une distribution spatiale extrĂȘmement discontinue et complexe du phĂ©nomĂšne, en particulier Ă  la micro-Ă©chelle (Ă©chelle d’une forme gĂ©omorphologique; dizaines Ă  plusieurs centaines de mĂštres). L’objectif de cette recherche Ă©tait l’étude systĂ©matique et dĂ©taillĂ©e des potentialitĂ©s offertes par une approche axĂ©e sur les donnĂ©es dans le cadre de la modĂ©lisation de la distribution du pergĂ©lisol de montagne. Les algorithmes d’apprentissage au- tomatique (machine learning) sont capables de considĂ©rer un plus grand nombre de variables que les approches classiques. La distribution du pergĂ©lisol peut ĂȘtre modĂ©lisĂ©e non seulement en utilisant des paramĂštres topo-climatiques (altitude, radiation solaire, etc.), mais aussi en tenant compte de la prĂ©sence et de l’absence connues du pergĂ©lisol (observations de terrain). CollectĂ©es dans un secteur des Alpes occidentales suisses, ces derniĂšres ont Ă©tĂ© cartographiĂ©es sur la base d’investigations de terrain (donnĂ©es thermiques et gĂ©oĂ©lectriques), d’interprĂ©tation d’orthophotos et d’inventaires de glaciers rocheux. Un jeu de donnĂ©es a Ă©tĂ© construit Ă  partir de ces Ă©vidences de terrain et complĂ©tĂ© par des prĂ©dicteurs environnementaux et morphologiques. Les donnĂ©es ont d’abord Ă©tĂ© analysĂ©es avec des techniques mon- trant la pertinence des variables permettant d’identifier la contribution statistique de chaque facteur de contrĂŽle et d’exclure les prĂ©dicteurs non pertinents ou redondants. Cinq algorithmes de classification appartenant aux domaines des statistiques et de l’apprentissage automatique ont ensuite Ă©tĂ© appliquĂ©s et testĂ©s : Logistic regression (LR), la version linĂ©aire et non-linĂ©aire de Support Vector Machines (SVM), Mul- tilayer perceptrons (MLP) et Random forests (RF). Ces techniques dĂ©duisent une fonction de classification Ă  partir des donnĂ©es dites d’entraĂźnement reprĂ©sentant l’absence et la prĂ©sence certaine du pergĂ©lisol. Elles permettent ensuite de prĂ©dire l’occurrence du phĂ©nomĂšne lĂ  oĂč elle est inconnue. Les performances de classification, Ă©valuĂ©es avec des courbes AUROC, variaient entre 0.75 (SVM linĂ©aire) et 0.88 (RF). Ces valeurs sont gĂ©nĂ©ralement indicatives de bonnes performances. En plus de ces mesures statistiques, une Ă©valuation qualitative a Ă©tĂ© rĂ©alisĂ©e et se base sur l’expertise gĂ©omorphologique. Les RF se sont rĂ©vĂ©lĂ©es ĂȘtre la technique produisant le meilleur modĂšle. Comme l’apprentissage automatique est une approche non dĂ©terministe, il a Ă©galement offert un aperçu des incertitudes de la modĂ©lisation, qui informent sur la localisation des secteurs les plus incertains dans lesquels des futures campagnes de terrain mĂ©ritent d’ĂȘtre menĂ©es afin d’amĂ©liorer la fiabilitĂ© des cartes produites. Finalement, RF ont dĂ©montrĂ© leur efficacitĂ© dans le cadre de la modĂ©lisation de la distribution du pergĂ©lisol grĂące Ă  des rĂ©sultats comparables aux observations de terrain. L’emploi de variables environnementales illustrant la micro-topographie du relief et les caractĂ©ristiques du sol (tels que les indices de courbure, le NDVI et la granulomĂ©trie) favorise la prĂ©diction de la distribution du pergĂ©lisol Ă  la micro- Ă©chelle, avec des cartes prĂ©sentant des variations de la probabilitĂ© d’occurrence du pergĂ©lisol sur des distances de quelques dizaines de mĂštres. Par exemple, dans cer- tains Ă©boulis, les cartes illustrent une probabilitĂ© plus faible dans la partie amont de la pente, ce qui s’avĂšre cohĂ©rent avec les observations de terrain. La limite infĂ©rieure du pergĂ©lisol a ainsi Ă©tĂ© automatiquement reconnue Ă  partir des Ă©vidences de terrain fournies Ă  l’algorithme. Enfin, la haute rĂ©solution du jeu de donnĂ©es (10 mĂštres) a permis d’élaborer des cartes prĂ©sentant une distribution spatiale du pergĂ©lisol moins optimiste que celle offerte par les modĂšles spatiaux classiques. La prĂ©diction du pergĂ©lisol a en effet Ă©tĂ© calculĂ©e sans utiliser des seuils d’altitude (au-dessus desquels on peut trouver du pergĂ©lisol) et respecte ainsi mieux la reprĂ©sentation de la forte discontinuitĂ© du pergĂ©lisol de montagne Ă  la micro-Ă©chelle. -- Negli ambienti alpini, il permafrost di montagna Ăš definito come uno stato termico del suolo e corrisponde a qualsiasi materiale nella litosfera che mantiene una temper- atura uguale o inferiore a 0° C per almeno due anni. La sua degradazione puĂČ portare ad una crescente attivitĂ  di caduta di blocchi e ad un aumento dei tassi di trasferi- mento dei sedimenti. Negli ultimi 20 anni, le conoscenze riguardanti il permafrost di montagna sono aumentate considerevolmente grazie ai numerosi studi e progetti di monitoraggio che hanno rivelato una distribuzione spaziale fortemente discontinua e complessa del fenomeno, in particolare alla scala della forma geomorfologica (definita come la micro scala, da decine a diverse centinaia di metri). L’obiettivo di questa ricerca Ă© lo studio sistematico e dettagliato delle potenzialitĂ  offerte da un approccio basato sui dati, nell’ottica di una modellizzazione della distribuzione del permafrost di montagna. Gli algoritmi di apprendimento auto- matico (machine learning) sono in grado di considerare piĂč variabili rispetto agli approcci classici. La distribuzione del permafrost puĂČ essere modellizzata non solo utilizzando i parametri topo-climatici classici (altitudine, radiazione solare, ecc.), ma anche considerando esempi di presenza e assenza del permafrost (osservazioni sul campo). Raccolti in un’area delle Alpi occidentali svizzere, questi ultimi sono stati mappati sulla base di indagini di terreno (dati termici e geoelettrici), interpretazione di ortofoto e inventari di ghiacciai rocciosi. A partire dalle evidenze di terreno, Ăš stato creato un set di dati, al quale sono stati integrati diversi predittori ambien- tali e morfologici. I dati sono stati dapprima analizzati con tecniche di indagine della rilevanza delle variabili; tali tecniche sono capaci di identificare il contributo statistico di ciascun fattore di controllo del permafrost e sono in grado di escludere i predittori non pertinenti o ridondanti. Sono stati, quindi, applicati e testati cinque al- goritmi di classificazione appartenenti ai campi della statistica e dell’apprendimento automatico: Logistic regression (LR), la versione lineare e non lineare di Support Vector Machines (SVM), Multilayer Perceptron (MLP) e Random forest (RF). Queste tecniche deducono una funzione di classificazione dai cosiddetti dati di allenamento, che rappresentano l’assenza e la presenza certa del permafrost, e permettono in seguito di predire il fenomeno laddove Ăš sconosciuto. Le prestazioni di classificazione, valutate con le curve AUROC, variavano da 0.75 (SVM lineare) a 0.88 (RF). Questi valori sono generalmente indicativi di buone prestazioni. Oltre a queste misure statistiche, Ăš stata effettuata una valutazione qualitativa. RF si Ă© rivelata essere la tecnica che produce il modello migliore. PoichĂ© l’apprendimento automatico Ăš un approccio non deterministico, Ă© stato possibile ottenere informazioni sulle incertezze della modellizzazione. Quest’ultime indicano in quali aree il modello Ă© piĂč incerto e, dunque, dove occorre pianificare nuove campagne di terreno per migliorare l’affidabilitĂ  delle mappe prodotte. RF ha dimostrato la sua efficacia nella modellizzazione della distribuzione del per- mafrost con risultati paragonabili alle osservazioni sul campo. L’uso di variabili ambientali che illustrano la topografia e le caratteristiche del suolo (come indici di curvatura, NDVI e granulometria) aiuta a predire la distribuzione del permafrost alla micro scala, con mappe che mostrano variazioni spaziali importanti della probabilitĂ  del permafrost su distanze di poche decine di metri. In alcune falde di detrito le mappe mostrano una probabilitĂ  inferiore nella parte a monte, risultato coerente con le osservazioni sul campo. Il limite inferiore del permafrost Ăš stato inoltre riconosci- uto automaticamente dagli esempi forniti all’algoritmo. Infine, l’alta risoluzione del set di dati (10 metri) ha permesso una simulazione della distribuzione spaziale del fenomeno meno ottimistica rispetto a quella fornita dai modelli classici. La previsione del permafrost Ăš stata, infatti, calcolata senza utilizzare delle soglie di altitudine e quindi rispetta meglio la rappresentazione dell’alta discontinuitĂ  del permafrost di montagna alla micro scala
    corecore