597 research outputs found
Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning
Learning-based pattern classifiers, including deep networks, have shown
impressive performance in several application domains, ranging from computer
vision to cybersecurity. However, it has also been shown that adversarial input
perturbations carefully crafted either at training or at test time can easily
subvert their predictions. The vulnerability of machine learning to such wild
patterns (also referred to as adversarial examples), along with the design of
suitable countermeasures, have been investigated in the research field of
adversarial machine learning. In this work, we provide a thorough overview of
the evolution of this research area over the last ten years and beyond,
starting from pioneering, earlier work on the security of non-deep learning
algorithms up to more recent work aimed to understand the security properties
of deep learning algorithms, in the context of computer vision and
cybersecurity tasks. We report interesting connections between these
apparently-different lines of work, highlighting common misconceptions related
to the security evaluation of machine-learning algorithms. We review the main
threat models and attacks defined to this end, and discuss the main limitations
of current work, along with the corresponding future challenges towards the
design of more secure learning algorithms.Comment: Accepted for publication on Pattern Recognition, 201
Predicting Pilot Misperception of Runway Excursion Risk Through Machine Learning Algorithms of Recorded Flight Data
The research used predictive models to determine pilot misperception of runway excursion risk associated with unstable approaches. The Federal Aviation Administration defined runway excursion as a veer-off or overrun of the runway surface. The Federal Aviation Administration also defined a stable approach as an aircraft meeting the following criteria: (a) on target approach airspeed, (b) correct attitude, (c) landing configuration, (d) nominal descent angle/rate, and (e) on a straight flight path to the runway touchdown zone. Continuing an unstable approach to landing was defined as Unstable Approach Risk Misperception in this research. A review of the literature revealed that an unstable approach followed by the failure to execute a rejected landing was a common contributing factor in runway excursions.
Flight Data Recorder data were archived and made available by the National Aeronautics and Space Administration for public use. These data were collected over a four-year period from the flight data recorders of a fleet of 35 regional jets operating in the National Airspace System. The archived data were processed and explored for evidence of unstable approaches and to determine whether or not a rejected landing was executed. Once identified, those data revealing evidence of unstable approaches were processed for the purposes of building predictive models.
SASâą Enterprise MinerR was used to explore the data, as well as to build and assess predictive models. The advanced machine learning algorithms utilized included: (a) support vector machine, (b) random forest, (c) gradient boosting, (d) decision tree, (e) logistic regression, and (f) neural network. The models were evaluated and compared to determine the best prediction model. Based on the model comparison, the decision tree model was determined to have the highest predictive value.
The Flight Data Recorder data were then analyzed to determine predictive accuracy of the target variable and to determine important predictors of the target variable, Unstable Approach Risk Misperception. Results of the study indicated that the predictive accuracy of the best performing model, decision tree, was 99%. Findings indicated that six variables stood out in the prediction of Unstable Approach Risk Misperception: (1) glideslope deviation, (2) selected approach speed deviation (3) localizer deviation, (4) flaps not extended, (5) drift angle, and (6) approach speed deviation. These variables were listed in order of importance based on results of the decision tree predictive model analysis.
The results of the study are of interest to aviation researchers as well as airline pilot training managers. It is suggested that the ability to predict the probability of pilot misperception of runway excursion risk could influence the development of new pilot simulator training scenarios and strategies. The research aids avionics providers in the development of predictive runway excursion alerting display technologies
A study of machine learning models application for porosity prediction using petrophysical well logs. Case Study: The Brent Group â Statfjord field
The use of machine learning algorithms for predictive analytics is making a growing impact in the field of petroleum geosciences. With the increasing cost and time-related factors for obtaining accurate porosity measurements from well logging and coring operations, machine learning (ML) provides a more economical and efficient solution to this challenge.
In this thesis, various ML models are applied to predict porosity in a well penetrating the reservoir interval of the Brent Group to Top Cook formation. The study area is the Statfjord field, located in the Norwegian sector of the North Sea. Statfjord produces oil and associated gas from Jurassic sandstone in the Cook formation, Brent and Statfjord Group.
Sixteen wells with several well logs serve as input features to predict the porosity in a blind well 33/9-4, all located in the field. The machine learning input features are the well logs, feature engineered logs, location points and the measured depth. The logs include: caliper, resistivity, gamma-ray, sonic, density; the engineered logs include: acoustic impedance and facies; the location: x,y,z; and the wellâs measured depth. The input features are varied and ingested into the ML models to estimate the porosity in the predefined reservoir interval.
The predicted porosity results for the blind well indicated an excellent performance demonstrated by the Bayesian ridge regression, linear regression and random forest models compared to the other ML models used in this study. These three algorithms are highly effective and accurate in predicting porosity with the limited range of the dataset and the results show they can be applied as a more general porosity estimation technique by varying the scale of the data samples and the number of wells
Detecting Hypoglycemia Incidents Reported in Patients\u27 Secure Messages: Using Cost-Sensitive Learning and Oversampling to Reduce Data Imbalance
BACKGROUND: Improper dosing of medications such as insulin can cause hypoglycemic episodes, which may lead to severe morbidity or even death. Although secure messaging was designed for exchanging nonurgent messages, patients sometimes report hypoglycemia events through secure messaging. Detecting these patient-reported adverse events may help alert clinical teams and enable early corrective actions to improve patient safety.
OBJECTIVE: We aimed to develop a natural language processing system, called HypoDetect (Hypoglycemia Detector), to automatically identify hypoglycemia incidents reported in patients\u27 secure messages.
METHODS: An expert in public health annotated 3000 secure message threads between patients with diabetes and US Department of Veterans Affairs clinical teams as containing patient-reported hypoglycemia incidents or not. A physician independently annotated 100 threads randomly selected from this dataset to determine interannotator agreement. We used this dataset to develop and evaluate HypoDetect. HypoDetect incorporates 3 machine learning algorithms widely used for text classification: linear support vector machines, random forest, and logistic regression. We explored different learning features, including new knowledge-driven features. Because only 114 (3.80%) messages were annotated as positive, we investigated cost-sensitive learning and oversampling methods to mitigate the challenge of imbalanced data.
RESULTS: The interannotator agreement was Cohen kappa=.976. Using cross-validation, logistic regression with cost-sensitive learning achieved the best performance (area under the receiver operating characteristic curve=0.954, sensitivity=0.693, specificity 0.974, F1 score=0.590). Cost-sensitive learning and the ensembled synthetic minority oversampling technique improved the sensitivity of the baseline systems substantially (by 0.123 to 0.728 absolute gains). Our results show that a variety of features contributed to the best performance of HypoDetect.
CONCLUSIONS: Despite the challenge of data imbalance, HypoDetect achieved promising results for the task of detecting hypoglycemia incidents from secure messages. The system has a great potential to facilitate early detection and treatment of hypoglycemia
DATA-DRIVEN ANALYSIS AND MAPPING OF THE POTENTIAL DISTRIBUTION OF MOUNTAIN PERMAFROST
In alpine environments, mountain permafrost is defined as a thermal state of the ground and it corresponds to any lithosphere material that is at or below 0°C for at least two years. Its degradation is potentially leading to an increasing rock fall activity and sediment transfer rates. During the last 20 years, knowledge on this phenomenon has significantly improved thanks to many studies and monitoring projects, revealing an extremely discontinuous and complex spatial distribution, especially at the micro scale (scale of a specific landform; tens to several hundreds of metres).
The objective of this thesis was the systematic and detailed investigation of the potential of data-driven techniques for mountain permafrost distribution modelling. Machine learning (ML) algorithms are able to consider a greater number of pa- rameters compared to classic approaches. Not only can permafrost distribution be modelled by using topo-climatic parameters as a proxy, but also by taking into ac- count known field permafrost evidences. These latter were collected in a sector of the Western Swiss Alps and they were mapped from field data (thermal and geoelectrical data) and ortho-image interpretations (rock glacier inventorying). A permafrost dataset was built from these evidences and completed with environmental and mor- phological predictors. Data were firstly analysed with feature relevance techniques in order to identify the statistical contribution of each controlling factor and to exclude non-relevant or redundant predictors. Five classification algorithms, belonging to statistics and machine learning, were then applied to the dataset and tested: Logistic regression (LR), linear and non-linear Support Vector Machines (SVM), Multilayer perceptrons (MLP) and Random forests (RF). These techniques inferred a classifica- tion function from labelled training data (pixels of permafrost absence and presence) to predict the permafrost occurrence where this was unknown.
Classification performances, assessed with AUROC curves, ranged between 0.75 (linear SVM) and 0.88 (RF). These values are generally indicative of good model performances. Besides these statistical measures, a qualitative evaluation was performed by using field expert knowledge. Both quantitative and qualitative evaluation approaches suggested to employ the RF algorithm to obtain the best model. As machine learning is a non-deterministic approach, an overview of the model uncertainties is also offered. It informs about the location of most uncertain sectors where further field investigations are required to be carried out to improve the reliability of permafrost maps.
RF demonstrated to be efficient for permafrost distribution modelling thanks to consistent results that are comparable to the field observations. The employment of environmental variables illustrating the micro-topography and the ground charac- teristics (such as curvature indices, NDVI or grain size) favoured the prediction of the permafrost distribution at the micro scale. These maps presented variations of probability of permafrost occurrence within distances of few tens of metres. In some talus slopes, for example, a lower probability of occurrence in the mid-upper part of the slope was predicted. In addition, permafrost lower limits were automatically recognized from permafrost evidences. Lastly, the high resolution of the input dataset (10 metres) allowed elaborating maps at the micro scale with a modelled permafrost spatial distribution, which was less optimistic than traditional spatial models. The permafrost prediction was indeed computed without recurring to altitude thresh- olds (above which permafrost may be found) and the representation of the strong discontinuity of mountain permafrost at the micro scale was better respected.
--
Dans les environnements alpins, le pergĂ©lisol de montagne est dĂ©fini comme un Ă©tat thermique du sol et correspond Ă tout matĂ©riau de la lithosphĂšre qui maintient une tempĂ©rature Ă©gale ou infĂ©rieure Ă 0°C pendant au moins deux ans. Sa dĂ©gradation peut conduire Ă une activitĂ© croissante de chutes de blocs et Ă une augmentation des taux de transfert de sĂ©diments. Au cours des 20 derniĂšres annĂ©es, les connaissances sur ce phĂ©nomĂšne ont considĂ©rablement augmentĂ© grĂące Ă de nombreuses Ă©tudes et projets de suivi, qui ont rĂ©vĂ©lĂ© une distribution spatiale extrĂȘmement discontinue et complexe du phĂ©nomĂšne, en particulier Ă la micro-Ă©chelle (Ă©chelle dâune forme gĂ©omorphologique; dizaines Ă plusieurs centaines de mĂštres).
Lâobjectif de cette recherche Ă©tait lâĂ©tude systĂ©matique et dĂ©taillĂ©e des potentialitĂ©s offertes par une approche axĂ©e sur les donnĂ©es dans le cadre de la modĂ©lisation de la distribution du pergĂ©lisol de montagne. Les algorithmes dâapprentissage au- tomatique (machine learning) sont capables de considĂ©rer un plus grand nombre de variables que les approches classiques. La distribution du pergĂ©lisol peut ĂȘtre modĂ©lisĂ©e non seulement en utilisant des paramĂštres topo-climatiques (altitude, radiation solaire, etc.), mais aussi en tenant compte de la prĂ©sence et de lâabsence connues du pergĂ©lisol (observations de terrain). CollectĂ©es dans un secteur des Alpes occidentales suisses, ces derniĂšres ont Ă©tĂ© cartographiĂ©es sur la base dâinvestigations de terrain (donnĂ©es thermiques et gĂ©oĂ©lectriques), dâinterprĂ©tation dâorthophotos et dâinventaires de glaciers rocheux. Un jeu de donnĂ©es a Ă©tĂ© construit Ă partir de ces Ă©vidences de terrain et complĂ©tĂ© par des prĂ©dicteurs environnementaux et morphologiques. Les donnĂ©es ont dâabord Ă©tĂ© analysĂ©es avec des techniques mon- trant la pertinence des variables permettant dâidentifier la contribution statistique de chaque facteur de contrĂŽle et dâexclure les prĂ©dicteurs non pertinents ou redondants. Cinq algorithmes de classification appartenant aux domaines des statistiques et de lâapprentissage automatique ont ensuite Ă©tĂ© appliquĂ©s et testĂ©s : Logistic regression (LR), la version linĂ©aire et non-linĂ©aire de Support Vector Machines (SVM), Mul- tilayer perceptrons (MLP) et Random forests (RF). Ces techniques dĂ©duisent une fonction de classification Ă partir des donnĂ©es dites dâentraĂźnement reprĂ©sentant lâabsence et la prĂ©sence certaine du pergĂ©lisol. Elles permettent ensuite de prĂ©dire lâoccurrence du phĂ©nomĂšne lĂ oĂč elle est inconnue.
Les performances de classification, Ă©valuĂ©es avec des courbes AUROC, variaient entre 0.75 (SVM linĂ©aire) et 0.88 (RF). Ces valeurs sont gĂ©nĂ©ralement indicatives de bonnes performances. En plus de ces mesures statistiques, une Ă©valuation qualitative a Ă©tĂ© rĂ©alisĂ©e et se base sur lâexpertise gĂ©omorphologique. Les RF se sont rĂ©vĂ©lĂ©es ĂȘtre la technique produisant le meilleur modĂšle. Comme lâapprentissage automatique est une approche non dĂ©terministe, il a Ă©galement offert un aperçu des incertitudes de la modĂ©lisation, qui informent sur la localisation des secteurs les plus incertains dans lesquels des futures campagnes de terrain mĂ©ritent dâĂȘtre menĂ©es afin dâamĂ©liorer la fiabilitĂ© des cartes produites.
Finalement, RF ont dĂ©montrĂ© leur efficacitĂ© dans le cadre de la modĂ©lisation de la distribution du pergĂ©lisol grĂące Ă des rĂ©sultats comparables aux observations de terrain. Lâemploi de variables environnementales illustrant la micro-topographie du relief et les caractĂ©ristiques du sol (tels que les indices de courbure, le NDVI et la granulomĂ©trie) favorise la prĂ©diction de la distribution du pergĂ©lisol Ă la micro- Ă©chelle, avec des cartes prĂ©sentant des variations de la probabilitĂ© dâoccurrence du pergĂ©lisol sur des distances de quelques dizaines de mĂštres. Par exemple, dans cer- tains Ă©boulis, les cartes illustrent une probabilitĂ© plus faible dans la partie amont de la pente, ce qui sâavĂšre cohĂ©rent avec les observations de terrain. La limite infĂ©rieure du pergĂ©lisol a ainsi Ă©tĂ© automatiquement reconnue Ă partir des Ă©vidences de terrain fournies Ă lâalgorithme. Enfin, la haute rĂ©solution du jeu de donnĂ©es (10 mĂštres) a permis dâĂ©laborer des cartes prĂ©sentant une distribution spatiale du pergĂ©lisol moins optimiste que celle offerte par les modĂšles spatiaux classiques. La prĂ©diction du pergĂ©lisol a en effet Ă©tĂ© calculĂ©e sans utiliser des seuils dâaltitude (au-dessus desquels on peut trouver du pergĂ©lisol) et respecte ainsi mieux la reprĂ©sentation de la forte discontinuitĂ© du pergĂ©lisol de montagne Ă la micro-Ă©chelle.
--
Negli ambienti alpini, il permafrost di montagna Ăš definito come uno stato termico del suolo e corrisponde a qualsiasi materiale nella litosfera che mantiene una temper- atura uguale o inferiore a 0° C per almeno due anni. La sua degradazione puĂČ portare ad una crescente attivitĂ di caduta di blocchi e ad un aumento dei tassi di trasferi- mento dei sedimenti. Negli ultimi 20 anni, le conoscenze riguardanti il permafrost di montagna sono aumentate considerevolmente grazie ai numerosi studi e progetti di monitoraggio che hanno rivelato una distribuzione spaziale fortemente discontinua e complessa del fenomeno, in particolare alla scala della forma geomorfologica (definita come la micro scala, da decine a diverse centinaia di metri).
Lâobiettivo di questa ricerca Ă© lo studio sistematico e dettagliato delle potenzialitĂ offerte da un approccio basato sui dati, nellâottica di una modellizzazione della distribuzione del permafrost di montagna. Gli algoritmi di apprendimento auto- matico (machine learning) sono in grado di considerare piĂč variabili rispetto agli approcci classici. La distribuzione del permafrost puĂČ essere modellizzata non solo utilizzando i parametri topo-climatici classici (altitudine, radiazione solare, ecc.), ma anche considerando esempi di presenza e assenza del permafrost (osservazioni sul campo). Raccolti in unâarea delle Alpi occidentali svizzere, questi ultimi sono stati mappati sulla base di indagini di terreno (dati termici e geoelettrici), interpretazione di ortofoto e inventari di ghiacciai rocciosi. A partire dalle evidenze di terreno, Ăš stato creato un set di dati, al quale sono stati integrati diversi predittori ambien- tali e morfologici. I dati sono stati dapprima analizzati con tecniche di indagine della rilevanza delle variabili; tali tecniche sono capaci di identificare il contributo statistico di ciascun fattore di controllo del permafrost e sono in grado di escludere i predittori non pertinenti o ridondanti. Sono stati, quindi, applicati e testati cinque al- goritmi di classificazione appartenenti ai campi della statistica e dellâapprendimento automatico: Logistic regression (LR), la versione lineare e non lineare di Support Vector Machines (SVM), Multilayer Perceptron (MLP) e Random forest (RF). Queste tecniche deducono una funzione di classificazione dai cosiddetti dati di allenamento, che rappresentano lâassenza e la presenza certa del permafrost, e permettono in seguito di predire il fenomeno laddove Ăš sconosciuto.
Le prestazioni di classificazione, valutate con le curve AUROC, variavano da 0.75 (SVM lineare) a 0.88 (RF). Questi valori sono generalmente indicativi di buone prestazioni. Oltre a queste misure statistiche, Ăš stata effettuata una valutazione qualitativa. RF si Ă© rivelata essere la tecnica che produce il modello migliore. PoichĂ© lâapprendimento automatico Ăš un approccio non deterministico, Ă© stato possibile ottenere informazioni sulle incertezze della modellizzazione. Questâultime indicano in quali aree il modello Ă© piĂč incerto e, dunque, dove occorre pianificare nuove campagne di terreno per migliorare lâaffidabilitĂ delle mappe prodotte.
RF ha dimostrato la sua efficacia nella modellizzazione della distribuzione del per- mafrost con risultati paragonabili alle osservazioni sul campo. Lâuso di variabili ambientali che illustrano la topografia e le caratteristiche del suolo (come indici di curvatura, NDVI e granulometria) aiuta a predire la distribuzione del permafrost alla micro scala, con mappe che mostrano variazioni spaziali importanti della probabilitĂ del permafrost su distanze di poche decine di metri. In alcune falde di detrito le mappe mostrano una probabilitĂ inferiore nella parte a monte, risultato coerente con le osservazioni sul campo. Il limite inferiore del permafrost Ăš stato inoltre riconosci- uto automaticamente dagli esempi forniti allâalgoritmo. Infine, lâalta risoluzione del set di dati (10 metri) ha permesso una simulazione della distribuzione spaziale del fenomeno meno ottimistica rispetto a quella fornita dai modelli classici. La previsione del permafrost Ăš stata, infatti, calcolata senza utilizzare delle soglie di altitudine e quindi rispetta meglio la rappresentazione dellâalta discontinuitĂ del permafrost di montagna alla micro scala
Raman spectroscopy for point of care urinary tract infection diagnosis
Urinary tract infections (UTIs) are one of the most common bacterial infections experience by humans, with 150 million people suffering one or more UTIs each year. The massive scale at which UTIs occurs translates to a tremendous health burden comprising of patient morbidity and mortality, massive societal costs and a recognised contribution to expanding antimicrobial resistance. The considerable disease burden caused by UTIs is severely exacerbated by an outdated diagnostic paradigm characterised by inaccuracy and delay. Poor accuracy of screening tests, such as urinalysis, lead to misdiagnosis which in turn result in delayed recognition or overtreatment. Additionally, these screening tests fail to identify the causative pathogen, causing an overreliance on broad-spectrum antimicrobials which exacerbate burgeoning antimicrobial resistance. While diagnosis may be accurately confirmed though culture and sensitivity testing, the prolonged delay incurred negates the value of the information provided doing so.
A novel diagnostic paradigm is required that that targets rapid and accurate diagnosis of UTIs, while providing real-time identification of the causative pathogen. Achieving this precision management is contingent on the development of novel diagnostic technologies that bring accurate diagnosis and pathogen classification to the point of care.
The purpose of this thesis is to develop a technology that may form the core of a point-of-care diagnostic capable of delivering rapid and accurate pathogen identification direct from urine sample. Raman spectroscopy is identified as a technology with the potential to fulfil this role, primarily mediated though its ability to provide rapid biochemical phenotyping without requiring prior biomass expansion. Raman spectroscopy has demonstrated an ability to achieve pathogen classification through the analysis of inelastically scattered light arising from pathogens. The central challenge to developing a Raman-based diagnostic for UTIs is enhancing the weak bacterial Raman signal while limiting the substantial background noise.
Developing a technology using Raman spectroscopy able to provide UTI diagnosis with uropathogen classification is contingent on developing a robust experimental methodology that harnesses the multitude of experimental and analytical parameters. The refined methodology is applied in a series of experimental works that demonstrate the unique Raman spectra of pathogens has the potential for accurate classification. Achieving this at a clinically relevant pathogen load and in a clinically relevant timeframe is, however, dependent on overcoming weak bacterial signal to improve signal-to-noise ratio.
Surface-enhanced Raman spectroscopy (SERS) provides massive Raman signal enhancement of pathogens held in close apposition to noble metal nanostructures. Additionally, vacuum filtration is identified as a means of rapidly capturing pathogens directly from urine. SERS-active filters are developed by applying a gold nanolayer to commercially available membrane filters through physical vapour deposition. These SERS-active membrane filter perform multiple roles of capturing pathogens, separating them from urine, while providing Raman signal enhancement through SERS. The diagnostic and classification performance of SERS-active filters for UTIs is demonstrated to achieve rapid and accurate diagnosis of infected samples, with real-time uropathogen classification, using phantom urine samples, before piloting the technology using clinical urine samples.
The Raman technology developed in this thesis will be further developed toward a clinically implementable technology capable of ameliorating the substantial burden of disease caused by UTIs.Open Acces
Machine Learning in Wireless Sensor Networks: Algorithms, Strategies, and Applications
Wireless sensor networks monitor dynamic environments that change rapidly
over time. This dynamic behavior is either caused by external factors or
initiated by the system designers themselves. To adapt to such conditions,
sensor networks often adopt machine learning techniques to eliminate the need
for unnecessary redesign. Machine learning also inspires many practical
solutions that maximize resource utilization and prolong the lifespan of the
network. In this paper, we present an extensive literature review over the
period 2002-2013 of machine learning methods that were used to address common
issues in wireless sensor networks (WSNs). The advantages and disadvantages of
each proposed algorithm are evaluated against the corresponding problem. We
also provide a comparative guide to aid WSN designers in developing suitable
machine learning solutions for their specific application challenges.Comment: Accepted for publication in IEEE Communications Surveys and Tutorial
- âŠ