4,041 research outputs found

    Shortest Path versus Multi-Hub Routing in Networks with Uncertain Demand

    Full text link
    We study a class of robust network design problems motivated by the need to scale core networks to meet increasingly dynamic capacity demands. Past work has focused on designing the network to support all hose matrices (all matrices not exceeding marginal bounds at the nodes). This model may be too conservative if additional information on traffic patterns is available. Another extreme is the fixed demand model, where one designs the network to support peak point-to-point demands. We introduce a capped hose model to explore a broader range of traffic matrices which includes the above two as special cases. It is known that optimal designs for the hose model are always determined by single-hub routing, and for the fixed- demand model are based on shortest-path routing. We shed light on the wider space of capped hose matrices in order to see which traffic models are more shortest path-like as opposed to hub-like. To address the space in between, we use hierarchical multi-hub routing templates, a generalization of hub and tree routing. In particular, we show that by adding peak capacities into the hose model, the single-hub tree-routing template is no longer cost-effective. This initiates the study of a class of robust network design (RND) problems restricted to these templates. Our empirical analysis is based on a heuristic for this new hierarchical RND problem. We also propose that it is possible to define a routing indicator that accounts for the strengths of the marginals and peak demands and use this information to choose the appropriate routing template. We benchmark our approach against other well-known routing templates, using representative carrier networks and a variety of different capped hose traffic demands, parameterized by the relative importance of their marginals as opposed to their point-to-point peak demands

    Deep learning for clinical decision support in oncology

    Get PDF
    In den letzten Jahrzehnten sind medizinische Bildgebungsverfahren wie die Computertomographie (CT) zu einem unersetzbaren Werkzeug moderner Medizin geworden, welche eine zeitnahe, nicht-invasive Begutachtung von Organen und Geweben ermöglichen. Die Menge an anfallenden Daten ist dabei rapide gestiegen, allein innerhalb der letzten Jahre um den Faktor 15, und aktuell verantwortlich für 30 % des weltweiten Datenvolumens. Die Anzahl ausgebildeter Radiologen ist weitestgehend stabil, wodurch die medizinische Bildanalyse, angesiedelt zwischen Medizin und Ingenieurwissenschaften, zu einem schnell wachsenden Feld geworden ist. Eine erfolgreiche Anwendung verspricht Zeitersparnisse, und kann zu einer höheren diagnostischen Qualität beitragen. Viele Arbeiten fokussieren sich auf „Radiomics“, die Extraktion und Analyse von manuell konstruierten Features. Diese sind jedoch anfällig gegenüber externen Faktoren wie dem Bildgebungsprotokoll, woraus Implikationen für Reproduzierbarkeit und klinische Anwendbarkeit resultieren. In jüngster Zeit sind Methoden des „Deep Learning“ zu einer häufig verwendeten Lösung algorithmischer Problemstellungen geworden. Durch Anwendungen in Bereichen wie Robotik, Physik, Mathematik und Wirtschaft, wurde die Forschung im Bereich maschinellen Lernens wesentlich verändert. Ein Kriterium für den Erfolg stellt die Verfügbarkeit großer Datenmengen dar. Diese sind im medizinischen Bereich rar, da die Bilddaten strengen Anforderungen bezüglich Datenschutz und Datensicherheit unterliegen, und oft heterogene Qualität, sowie ungleichmäßige oder fehlerhafte Annotationen aufweisen, wodurch ein bedeutender Teil der Methoden keine Anwendung finden kann. Angesiedelt im Bereich onkologischer Bildgebung zeigt diese Arbeit Wege zur erfolgreichen Nutzung von Deep Learning für medizinische Bilddaten auf. Mittels neuer Methoden für klinisch relevante Anwendungen wie die Schätzung von Läsionswachtum, Überleben, und Entscheidungkonfidenz, sowie Meta-Learning, Klassifikator-Ensembling, und Entscheidungsvisualisierung, werden Wege zur Verbesserungen gegenüber State-of-the-Art-Algorithmen aufgezeigt, welche ein breites Anwendungsfeld haben. Hierdurch leistet die Arbeit einen wesentlichen Beitrag in Richtung einer klinischen Anwendung von Deep Learning, zielt auf eine verbesserte Diagnose, und damit letztlich eine verbesserte Gesundheitsversorgung insgesamt.Over the last decades, medical imaging methods, such as computed tomography (CT), have become an indispensable tool of modern medicine, allowing for a fast, non-invasive inspection of organs and tissue. Thus, the amount of acquired healthcare data has rapidly grown, increased 15-fold within the last years, and accounts for more than 30 % of the world's generated data volume. In contrast, the number of trained radiologists remains largely stable. Thus, medical image analysis, settled between medicine and engineering, has become a rapidly growing research field. Its successful application may result in remarkable time savings and lead to a significantly improved diagnostic performance. Many of the work within medical image analysis focuses on radiomics, i. e. the extraction and analysis of hand-crafted imaging features. Radiomics, however, has been shown to be highly sensitive to external factors, such as the acquisition protocol, having major implications for reproducibility and clinical applicability. Lately, deep learning has become one of the most employed methods for solving computational problems. With successful applications in diverse fields, such as robotics, physics, mathematics, and economy, deep learning has revolutionized the process of machine learning research. Having large amounts of training data is a key criterion for its successful application. These data, however, are rare within medicine, as medical imaging is subject to a variety of data security and data privacy regulations. Moreover, medical imaging data often suffer from heterogeneous quality, label imbalance, and label noise, rendering a considerable fraction of deep learning-based algorithms inapplicable. Settled in the field of CT oncology, this work addresses these issues, showing up ways to successfully handle medical imaging data using deep learning. It proposes novel methods for clinically relevant tasks, such as lesion growth and patient survival prediction, confidence estimation, meta-learning and classifier ensembling, and finally deep decision explanation, yielding superior performance in comparison to state-of-the-art approaches, and being applicable to a wide variety of applications. With this, the work contributes towards a clinical translation of deep learning-based algorithms, aiming for an improved diagnosis, and ultimately overall improved patient healthcare

    Mehitamata õhusõiduki rakendamine põllukultuuride saagikuse ja maa harimisviiside tuvastamisel

    Get PDF
    A Thesis for applying for the degree of Doctor of Philosophy in Environmental Protection.Väitekiri filosoofiadoktori kraadi taotlemiseks keskkonnakaitse erialal.This thesis aims to examine how machine learning (ML) technologies have aided significant advancements in image analysis in the area of precision agriculture. These multimodal computing technologies extend the use of machine learning to a broader spectrum of data collecting and selection for the advancement of agricultural practices (Nawar et al., 2017) These techniques will assist complicated cropping systems with more informed decisions with less human intervention, and provide a scalable framework for incorporating expert knowledge of the PA system. (Chlingaryan et al., 2018). Complexity, on the other hand, can be seen as a disadvantage in crop trials, as machine learning models require training/testing databases, limited areas with insignificant sampling sizes, time and space-specificity, and environmental factor interventions, all of which complicate parameter selection and make using a single empirical model for an entire region impractical. During the early stages of writing this thesis, we used a relatively traditional machine learning method to address the regression problem of crop yield and biomass prediction [(i.e., random forest regression (RFR), support vector regression (SVR), and artificial neural network (ANN)] to predicted dry matter (DM) yields of red clover. It obtained favourable results, however, the choosing of hyperparameters, the lengthy algorithms selection process, data cleaning, and redundant collinearity issues significantly limited the way of the machine learning application. We will further discuss the recent trend of automated machine learning (AutoML) that has been driving further significant technological innovation in the application of artificial intelligence from its automated algorithm selection and hyperparameter optimization of the deployable pipeline model for unravelling substance problems. However, a present knowledge gap exists in the integration of machine learning (ML) technology with unmanned aerial systems (UAS) and hyperspectral-based imaging data categorization and regression applications. In this thesis, we explored a state-of-the-art (SOTA) and entirely open-source AutoML framework, Auto-sklearn, which was built on one of the most frequently used machine learning systems, Scikit-learn. It was integrated with two unique AutoML visualization tools to examine the recognition and acceptance of multispectral vegetation indices (VI) data collected from UAS and hyperspectral narrow-band VIs across a varied spectrum of agricultural management practices (AMP). These procedures incorporate soil tillage method (STM), cultivation method (CM), and manure application (MA), and are classified as four-crop combination fields (i.e., red clover-grass mixture, spring wheat, pea-oat mixture, and spring barley). Additionally, they have not been thoroughly evaluated and lack characteristics that are accessible in agriculture remote sensing applications. This thesis further explores the existing gaps in the knowledge base for several critical crop categories and cultivation management methods referring to biomass and yield analysis, as well as to gain a better understanding of the potential for remotely sensed solutions to field-based and multifunctional platforms to meet precision agriculture demands. To overcome these knowledge gaps, this research introduces a rapid, non-destructive, and low-cost framework for field-based biomass and grain yield modelling, as well as the identification of agricultural management practices. The results may aid agronomists and farmers in establishing more accurate agricultural methods and in monitoring environmental conditions more effectively.Doktoritöö eesmärk oli uurida, kuidas masinõppe (MÕ) tehnoloogiad võimaldavad edusamme täppispõllumajanduse valdkonna pildianalüüsis. Multimodaalsed arvutustehnoloogiad laiendavad masinõppe kasutamist põllumajanduses andmete kogumisel ja valimisel (Nawar et al., 2017). Selline täpsemal informatsioonil põhinev tehnoloogia võimaldab keerukate viljelussüsteemide puhul teha otsuseid inimese vähema sekkumisega, ja loob skaleeritava raamistiku täppispõllumajanduse jaoks (Chlingaryan et al., 2018). Põllukultuuride katsete korral on komplekssete masinõppemudelite kasutamine keerukas, sest alad on piiratud ning valimi suurus ei ole piisav; vaja on testandmebaase, kindlaid aja- ja ruumitingimusi ning keskkonnategureid. See komplitseerib parameetrite valikut ning muudab ebapraktiliseks ühe empiirilise mudeli kasutamise terves piirkonnas. Siinse uurimuse algetapis rakendati suhteliselt traditsioonilist masinõppemeetodit, et lahendada saagikuse ja biomassi prognoosimise regressiooniprobleem (otsustusmetsa regression, tugivektori regressioon ja tehisnärvivõrk) punase ristiku prognoositava kuivaine saagikuse suhtes. Saadi sobivaid tulemusi, kuid hüperparameetrite valimine, pikk algoritmide valimisprotsess, andmete puhastamine ja kollineaarsusprobleemid takistasid masinõpet oluliselt. Automatiseeritud masinõppe (AMÕ) uusimate suundumustena rakendatakse tehisintellekti, et lahendada põhiprobleemid automatiseeritud algoritmi valiku ja rakendatava pipeline-mudeli hüperparameetrite optimeerimise abil. Seni napib teadmisi MÕ tehnoloogia integreerimiseks mehitamata õhusõidukite ning hüperspektripõhiste pildiandmete kategoriseerimise ja regressioonirakendustega. Väitekirjas uuriti nüüdisaegset ja avatud lähtekoodiga AMÕ tehnoloogiat Auto-sklearn, mis on ühe enimkasutatava masinõppesüsteemi Scikit-learn edasiarendus. Süsteemiga liideti kaks unikaalset AMÕ visualiseerimisrakendust, et uurida mehitamata õhusõidukiga kogutud andmete multispektraalsete taimkatteindeksite ja hüperspektraalsete kitsaribaandmete taimkatteindeksite tuvastamist ja rakendamist põllumajanduses. Neid võtteid kasutatakse mullaharimisel, kultiveerimisel ja sõnnikuga väetamisel nelja kultuuriga põldudel (punase ristiku rohusegu, suvinisu, herne-kaera segu, suvioder). Neid ei ole põhjalikult hinnatud, samuti ei hõlma need omadusi, mida kasutatatakse põllumajanduses kaugseire rakendustes. Uurimus käsitleb biomassi ja saagikuse seni uurimata analüüsivõimalusi oluliste põllukultuuride ja viljelusmeetodite näitel. Hinnatakse ka kaugseirelahenduste potentsiaali põllupõhiste ja multifunktsionaalsete platvormide kasutamisel täppispõllumajanduses. Uurimus tutvustab kiiret, keskkonna suhtes kahjutut ja mõõduka hinnaga tehnoloogiat põllupõhise biomassi ja teraviljasaagi modelleerimiseks, et leida sobiv viljelusviis. Töö tulemused võimaldavad põllumajandustootjatel ja agronoomidel tõhusamalt valida põllundustehnoloogiaid ning arvestada täpsemalt keskkonnatingimustega.Publication of this thesis is supported by the Estonian University of Life Scieces and by the Doctoral School of Earth Sciences and Ecology created under the auspices of the European Social Fund

    Autonomous Navigation for Unmanned Aerial Systems - Visual Perception and Motion Planning

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Learning to compare nodes in branch and bound with graph neural networks

    Full text link
    En informatique, la résolution de problèmes NP-difficiles en un temps raisonnable est d’une grande importance : optimisation de la chaîne d’approvisionnement, planification, routage, alignement de séquences biologiques multiples, inference dans les modèles graphiques pro- babilistes, et même certains problèmes de cryptographie sont tous des examples de la classe NP-complet. En pratique, nous modélisons beaucoup d’entre eux comme un problème d’op- timisation en nombre entier, que nous résolvons à l’aide de la méthodologie séparation et évaluation. Un algorithme de ce style divise un espace de recherche pour l’explorer récursi- vement (séparation), et obtient des bornes d’optimalité en résolvant des relaxations linéaires sur les sous-espaces (évaluation). Pour spécifier un algorithme, il faut définir plusieurs pa- ramètres, tel que la manière d’explorer les espaces de recherche, de diviser une recherche l’espace une fois exploré, ou de renforcer les relaxations linéaires. Ces politiques peuvent influencer considérablement la performance de résolution. Ce travail se concentre sur une nouvelle manière de dériver politique de recherche, c’est à dire le choix du prochain sous-espace à séparer étant donné une partition en cours, en nous servant de l’apprentissage automatique profond. Premièrement, nous collectons des données résumant, sur une collection de problèmes donnés, quels sous-espaces contiennent l’optimum et quels ne le contiennent pas. En représentant ces sous-espaces sous forme de graphes bipartis qui capturent leurs caractéristiques, nous entraînons un réseau de neurones graphiques à déterminer la probabilité qu’un sous-espace contienne la solution optimale par apprentissage supervisé. Le choix d’un tel modèle est particulièrement utile car il peut s’adapter à des problèmes de différente taille sans modifications. Nous montrons que notre approche bat celle de nos concurrents, consistant à des modèles d’apprentissage automatique plus simples entraînés à partir des statistiques du solveur, ainsi que la politique par défaut de SCIP, un solveur open-source compétitif, sur trois familles NP-dures: des problèmes de recherche de stables de taille maximum, de flots de réseau multicommodité à charge fixe, et de satisfiabilité maximum.In computer science, solving NP-hard problems in a reasonable time is of great importance, such as in supply chain optimization, scheduling, routing, multiple biological sequence align- ment, inference in probabilistic graphical models, and even some problems in cryptography. In practice, we model many of them as a mixed integer linear optimization problem, which we solve using the branch and bound framework. An algorithm of this style divides a search space to explore it recursively (branch) and obtains optimality bounds by solving linear relaxations in such sub-spaces (bound). To specify an algorithm, one must set several pa- rameters, such as how to explore search spaces, how to divide a search space once it has been explored, or how to tighten these linear relaxations. These policies can significantly influence resolution performance. This work focuses on a novel method for deriving a search policy, that is, a rule for select- ing the next sub-space to explore given a current partitioning, using deep machine learning. First, we collect data summarizing which subspaces contain the optimum, and which do not. By representing these sub-spaces as bipartite graphs encoding their characteristics, we train a graph neural network to determine the probability that a subspace contains the optimal so- lution by supervised learning. The choice of such design is particularly useful as the machine learning model can automatically adapt to problems of different sizes without modifications. We show that our approach beats the one of our competitors, consisting of simpler machine learning models trained from solver statistics, as well as the default policy of SCIP, a state- of-the-art open-source solver, on three NP-hard benchmarks: generalized independent set, fixed-charge multicommodity network flow, and maximum satisfiability problems

    Understanding Random Forests: From Theory to Practice

    Get PDF
    Data analysis and machine learning have become an integrative part of the modern scientific methodology, offering automated procedures for the prediction of a phenomenon based on past observations, unraveling underlying patterns in data and providing insights about the problem. Yet, caution should avoid using machine learning as a black-box tool, but rather consider it as a methodology, with a rational thought process that is entirely dependent on the problem under study. In particular, the use of algorithms should ideally require a reasonable understanding of their mechanisms, properties and limitations, in order to better apprehend and interpret their results. Accordingly, the goal of this thesis is to provide an in-depth analysis of random forests, consistently calling into question each and every part of the algorithm, in order to shed new light on its learning capabilities, inner workings and interpretability. The first part of this work studies the induction of decision trees and the construction of ensembles of randomized trees, motivating their design and purpose whenever possible. Our contributions follow with an original complexity analysis of random forests, showing their good computational performance and scalability, along with an in-depth discussion of their implementation details, as contributed within Scikit-Learn. In the second part of this work, we analyse and discuss the interpretability of random forests in the eyes of variable importance measures. The core of our contributions rests in the theoretical characterization of the Mean Decrease of Impurity variable importance measure, from which we prove and derive some of its properties in the case of multiway totally randomized trees and in asymptotic conditions. In consequence of this work, our analysis demonstrates that variable importances [...].Comment: PhD thesis. Source code available at https://github.com/glouppe/phd-thesi

    Aprendizado ativo com aplicações ao diagnóstico de parasitos

    Get PDF
    Orientadores: Alexandre Xavier Falcão, Pedro Jussieu de RezendeTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Conjuntos de imagens têm crescido consideravelmente com o rápido avanço de inúmeras tecnologias de imagens, demandando soluções urgentes para o processamento, organização e recuperação da informação. O processamento, neste caso, objetiva anotar uma dada imagem atribuindo-na um rótulo que representa seu conteúdo semântico. A anotação é crucial para a organizaçao e recuperação efetiva da informação relacionada às imagens. No entanto, a anotação manual é inviável em grandes conjuntos de dados. Além disso, a anotação automática bem sucedida por um classificador de padrões depende fortemente da qualidade de um conjunto de treinamento reduzido. Técnicas de aprendizado ativo têm sido propostas para selecionar, a partir de um grande conjunto, amostras de treinamento representativas, com uma sugestão de rótulo que pode ser confirmado ou corrigido pelo especialista. Apesar disso, essas técnicas muitas vezes ignoram a necessidade de tempos de resposta interativos durante o processo de aprendizado ativo. Portanto, esta tese de doutorado apresenta métodos de aprendizado ativo que podem reduzir e/ou organizar um grande conjunto de dados, tal que a fase de seleção não requer reprocessá-lo inteiramente a cada iteração do aprendizado. Além disso, tal seleção pode ser interrompida quando o número de amostras desejadas, a partir do conjunto de dados reduzido e organizado, é identificado. Os métodos propostos mostram um progresso cada vez maior, primeiro apenas com a redução de dados, e em seguida com a subsequente organização do conjunto reduzido. Esta tese também aborda um problema real --- o diagnóstico de parasitos --- em que a existência de uma classe diversa (isto é, uma classe de impureza), com tamanho muito maior e amostras que são similares a alguns tipos de parasitos, torna a redução de dados consideravelmente menos eficaz. Este problema é finalmente contornado com um tipo de organização de dados diferente, que ainda permite tempos de resposta interativos e produz uma abordagem de aprendizado ativo melhor e robusta para o diagnóstico de parasitos. Os métodos desenvolvidos foram extensivamente avaliados com diferentes tipos de classificadores supervisionados e não-supervisionados utilizando conjunto de dados a partir de aplicações distintas e abordagens baselines que baseiam-se em seleção aleatória de amostras e/ou reprocessamento de todo o conjunto de dados a cada iteração do aprendizado. Por fim, esta tese demonstra que outras melhorias são obtidas com o aprendizado semi-supervisionadoAbstract: Image datasets have grown large with the fast advances and varieties of the imaging technologies, demanding urgent solutions for information processing, organization, and retrieval. Processing here aims to annotate the image by assigning to it a label that represents its semantic content. Annotation is crucial for the effective organization and retrieval of the information related to the images. However, manual annotation is unfeasible in large datasets and successful automatic annotation by a pattern classifier strongly depends on the quality of a much smaller training set. Active learning techniques have been proposed to select those representative training samples from the large dataset with a label suggestion, which can be either confirmed or corrected by the expert. Nevertheless, these techniques very often ignore the need for interactive response times during the active learning process. Therefore, this PhD thesis presents active learning methods that can reduce and/or organize the large dataset such that sample selection does not require to reprocess it entirely at every learning iteration. Moreover, it can be interrupted as soon as a desired number of samples from the reduced and organized dataset is identified. These methods show an increasing progress, first with data reduction only, and then with subsequent organization of the reduced dataset. However, the thesis also addresses a real problem --- the diagnosis of parasites --- in which the existence of a diverse class (i.e., the impurity class), with much larger size and samples that are similar to some types of parasites, makes data reduction considerably less effective. The problem is finally circumvented with a different type of data organization, which still allows interactive response times and yields a better and robust active learning approach for the diagnosis of parasites. The methods have been extensively assessed with different types of unsupervised and supervised classifiers using datasets from distinct applications and baseline approaches that rely on random sample selection and/or reprocess the entire dataset at each learning iteration. Finally, the thesis demonstrates that further improvements are obtained with semi-supervised learningDoutoradoCiência da ComputaçãoDoutora em Ciência da Computaçã
    corecore