847 research outputs found

    Machine Learning for Microcontroller-Class Hardware -- A Review

    Get PDF
    The advancements in machine learning opened a new opportunity to bring intelligence to the low-end Internet-of-Things nodes such as microcontrollers. Conventional machine learning deployment has high memory and compute footprint hindering their direct deployment on ultra resource-constrained microcontrollers. This paper highlights the unique requirements of enabling onboard machine learning for microcontroller class devices. Researchers use a specialized model development workflow for resource-limited applications to ensure the compute and latency budget is within the device limits while still maintaining the desired performance. We characterize a closed-loop widely applicable workflow of machine learning model development for microcontroller class devices and show that several classes of applications adopt a specific instance of it. We present both qualitative and numerical insights into different stages of model development by showcasing several use cases. Finally, we identify the open research challenges and unsolved questions demanding careful considerations moving forward.Comment: Accepted for publication at IEEE Sensors Journa

    Geometry- and Accuracy-Preserving Random Forest Proximities with Applications

    Get PDF
    Many machine learning algorithms use calculated distances or similarities between data observations to make predictions, cluster similar data, visualize patterns, or generally explore the data. Most distances or similarity measures do not incorporate known data labels and are thus considered unsupervised. Supervised methods for measuring distance exist which incorporate data labels and thereby exaggerate separation between data points of different classes. This approach tends to distort the natural structure of the data. Instead of following similar approaches, we leverage a popular algorithm used for making data-driven predictions, known as random forests, to naturally incorporate data labels into similarity measures known as random forest proximities. In this dissertation, we explore previously defined random forest proximities and demonstrate their weaknesses in popular proximity-based applications. Additionally, we develop a new proximity definition that can be used to recreate the random forest’s predictions. We call these random forest-geometry-and accuracy-Preserving proximities or RF-GAP. We show by proof and empirical demonstration can be used to perfectly reconstruct the random forest’s predictions and, as a result, we argue that RF-GAP proximities provide a truer representation of the random forest’s learning when used in proximity-based applications. We provide evidence to suggest that RF-GAP proximities improve applications including imputing missing data, detecting outliers, and visualizing the data. We also introduce a new random forest proximity-based technique that can be used to generate 2- or 3-dimensional data representations which can be used as a tool to visually explore the data. We show that this method does well at portraying the relationship between data variables and the data labels. We show quantitatively and qualitatively that this method surpasses other existing methods for this task

    Point-set manifold processing for computational mechanics: thin shells, reduced order modeling, cell motility and molecular conformations

    Get PDF
    In many applications, one would like to perform calculations on smooth manifolds of dimension d embedded in a high-dimensional space of dimension D. Often, a continuous description of such manifold is not known, and instead it is sampled by a set of scattered points in high dimensions. This poses a serious challenge. In this thesis, we approximate the point-set manifold as an overlapping set of smooth parametric descriptions, whose geometric structure is revealed by statistical learning methods, and then parametrized by meshfree methods. This approach avoids any global parameterization, and hence is applicable to manifolds of any genus and complex geometry. It combines four ingredients: (1) partitioning of the point set into subregions of trivial topology, (2) the automatic detection of the local geometric structure of the manifold by nonlinear dimensionality reduction techniques, (3) the local parameterization of the manifold using smooth meshfree (here local maximum-entropy) approximants, and (4) patching together the local representations by means of a partition of unity. In this thesis we show the generality, flexibility, and accuracy of the method in four different problems. First, we exercise it in the context of Kirchhoff-Love thin shells, (d=2, D=3). We test our methodology against classical linear and non linear benchmarks in thin-shell analysis, and highlight its ability to handle point-set surfaces of complex topology and geometry. We then tackle problems of much higher dimensionality. We perform reduced order modeling in the context of finite deformation elastodynamics, considering a nonlinear reduced configuration space, in contrast with classical linear approaches based on Principal Component Analysis (d=2, D=10000's). We further quantitatively unveil the geometric structure of the motility strategy of a family of micro-organisms called Euglenids from experimental videos (d=1, D~30000's). Finally, in the context of enhanced sampling in molecular dynamics, we automatically construct collective variables for the molecular conformational dynamics (d=1...6, D~30,1000's)

    Improving the Generalisability of Brain Computer Interface Applications via Machine Learning and Search-Based Heuristics

    Get PDF
    Brain Computer Interfaces (BCI) are a domain of hardware/software in which a user can interact with a machine without the need for motor activity, communicating instead via signals generated by the nervous system. These interfaces provide life-altering benefits to users, and refinement will both allow their application to a much wider variety of disabilities, and increase their practicality. The primary method of acquiring these signals is Electroencephalography (EEG). This technique is susceptible to a variety of different sources of noise, which compounds the inherent problems in BCI training data: large dimensionality, low numbers of samples, and non-stationarity between users and recording sessions. Feature Selection and Transfer Learning have been used to overcome these problems, but they fail to account for several characteristics of BCI. This thesis extends both of these approaches by the use of Search-based algorithms. Feature Selection techniques, known as Wrappers use ‘black box’ evaluation of feature subsets, leading to higher classification accuracies than ranking methods known as Filters. However, Wrappers are more computationally expensive, and are prone to over-fitting to training data. In this thesis, we applied Iterated Local Search (ILS) to the BCI field for the first time in literature, and demonstrated competitive results with state-of-the-art methods such as Least Absolute Shrinkage and Selection Operator and Genetic Algorithms. We then developed ILS variants with guided perturbation operators. Linkage was used to develop a multivariate metric, Intrasolution Linkage. This takes into account pair-wise dependencies of features with the label, in the context of the solution. Intrasolution Linkage was then integrated into two ILS variants. The Intrasolution Linkage Score was discovered to have a stronger correlation with the solutions predictive accuracy on unseen data than Cross Validation Error (CVE) on the training set, the typical approach to feature subset evaluation. Mutual Information was used to create Minimum Redundancy Maximum Relevance Iterated Local Search (MRMR-ILS). In this algorithm, the perturbation operator was guided using an existing Mutual Information measure, and compared with current Filter and Wrapper methods. It was found to achieve generally lower CVE rates and higher predictive accuracy on unseen data than existing algorithms. It was also noted that solutions found by the MRMR-ILS provided CVE rates that had a stronger correlation with the accuracy on unseen data than solutions found by other algorithms. We suggest that this may be due to the guided perturbation leading to solutions that are richer in Mutual Information. Feature Selection reduces computational demands and can increase the accuracy of our desired models, as evidenced in this thesis. However, limited quantities of training samples restricts these models, and greatly reduces their generalisability. For this reason, utilisation of data from a wide range of users is an ideal solution. Due to the differences in neural structures between users, creating adequate models is difficult. We adopted an existing state-of-the-art ensemble technique Ensemble Learning Generic Information (ELGI), and developed an initial optimisation phase. This involved using search to transplant instances between user subsets to increase the generalisability of each subset, before combination in the ELGI. We termed this Evolved Ensemble Learning Generic Information (eELGI). The eELGI achieved higher accuracy than user-specific BCI models, across all eight users. Optimisation of the training dataset allowed smaller training sets to be used, offered protection against neural drift, and created models that performed similarly across participants, regardless of neural impairment. Through the introduction and hybridisation of search based algorithms to several problems in BCI we have been able to show improvements in modelling accuracy and efficiency. Ultimately, this represents a step towards more practical BCI systems that will provide life altering benefits to users

    Multivariate methods for interpretable analysis of magnetic resonance spectroscopy data in brain tumour diagnosis

    Get PDF
    Malignant tumours of the brain represent one of the most difficult to treat types of cancer due to the sensitive organ they affect. Clinical management of the pathology becomes even more intricate as the tumour mass increases due to proliferation, suggesting that an early and accurate diagnosis is vital for preventing it from its normal course of development. The standard clinical practise for diagnosis includes invasive techniques that might be harmful for the patient, a fact that has fostered intensive research towards the discovery of alternative non-invasive brain tissue measurement methods, such as nuclear magnetic resonance. One of its variants, magnetic resonance imaging, is already used in a regular basis to locate and bound the brain tumour; but a complementary variant, magnetic resonance spectroscopy, despite its higher spatial resolution and its capability to identify biochemical metabolites that might become biomarkers of tumour within a delimited area, lags behind in terms of clinical use, mainly due to its difficult interpretability. The interpretation of magnetic resonance spectra corresponding to brain tissue thus becomes an interesting field of research for automated methods of knowledge extraction such as machine learning, always understanding its secondary role behind human expert medical decision making. The current thesis aims at contributing to the state of the art in this domain by providing novel techniques for assistance of radiology experts, focusing on complex problems and delivering interpretable solutions. In this respect, an ensemble learning technique to accurately discriminate amongst the most aggressive brain tumours, namely glioblastomas and metastases, has been designed; moreover, a strategy to increase the stability of biomarker identification in the spectra by means of instance weighting is provided. From a different analytical perspective, a tool based on signal source separation, guided by tumour type-specific information has been developed to assess the existence of different tissues in the tumoural mass, quantifying their influence in the vicinity of tumoural areas. This development has led to the derivation of a probabilistic interpretation of some source separation techniques, which provide support for uncertainty handling and strategies for the estimation of the most accurate number of differentiated tissues within the analysed tumour volumes. The provided strategies should assist human experts through the use of automated decision support tools and by tackling interpretability and accuracy from different anglesEls tumors cerebrals malignes representen un dels tipus de càncer més difícils de tractar degut a la sensibilitat de l’òrgan que afecten. La gestió clínica de la patologia esdevé encara més complexa quan la massa tumoral s'incrementa degut a la proliferació incontrolada de cèl·lules; suggerint que una diagnosis precoç i acurada és vital per prevenir el curs natural de desenvolupament. La pràctica clínica estàndard per a la diagnosis inclou la utilització de tècniques invasives que poden arribar a ser molt perjudicials per al pacient, factor que ha fomentat la recerca intensiva cap al descobriment de mètodes alternatius de mesurament dels teixits del cervell, tals com la ressonància magnètica nuclear. Una de les seves variants, la imatge de ressonància magnètica, ja s'està actualment utilitzant de forma regular per localitzar i delimitar el tumor. Així mateix, una variant complementària, la espectroscòpia de ressonància magnètica, malgrat la seva alta resolució espacial i la seva capacitat d'identificar metabòlits bioquímics que poden esdevenir biomarcadors de tumor en una àrea delimitada, està molt per darrera en termes d'ús clínic, principalment per la seva difícil interpretació. Per aquest motiu, la interpretació dels espectres de ressonància magnètica corresponents a teixits del cervell esdevé un interessant camp de recerca en mètodes automàtics d'extracció de coneixement tals com l'aprenentatge automàtic, sempre entesos com a una eina d'ajuda per a la presa de decisions per part d'un metge expert humà. La tesis actual té com a propòsit la contribució a l'estat de l'art en aquest camp mitjançant l'aportació de noves tècniques per a l'assistència d'experts radiòlegs, centrades en problemes complexes i proporcionant solucions interpretables. En aquest sentit, s'ha dissenyat una tècnica basada en comitè d'experts per a una discriminació acurada dels diferents tipus de tumors cerebrals agressius, anomenats glioblastomes i metàstasis; a més, es proporciona una estratègia per a incrementar l'estabilitat en la identificació de biomarcadors presents en un espectre mitjançant una ponderació d'instàncies. Des d'una perspectiva analítica diferent, s'ha desenvolupat una eina basada en la separació de fonts, guiada per informació específica de tipus de tumor per a avaluar l'existència de diferents tipus de teixits existents en una massa tumoral, quantificant-ne la seva influència a les regions tumorals veïnes. Aquest desenvolupament ha portat cap a la derivació d'una interpretació probabilística d'algunes d'aquestes tècniques de separació de fonts, proporcionant suport per a la gestió de la incertesa i estratègies d'estimació del nombre més acurat de teixits diferenciats en cada un dels volums tumorals analitzats. Les estratègies proporcionades haurien d'assistir els experts humans en l'ús d'eines automatitzades de suport a la decisió, donada la interpretabilitat i precisió que presenten des de diferents angles

    Visual and Camera Sensors

    Get PDF
    This book includes 13 papers published in Special Issue ("Visual and Camera Sensors") of the journal Sensors. The goal of this Special Issue was to invite high-quality, state-of-the-art research papers dealing with challenging issues in visual and camera sensors

    Computerized Classification of Surface Spikes in Three-Dimensional Electron Microscopic Reconstructions of Viruses

    Full text link
    The purpose of this research is to develop computer techniques for improved three-dimensional (3D) reconstruction of viruses from electron microscopic images of them and for the subsequent improved classification of the surface spikes in the resulting reconstruction. The broader impact of such work is the following. Influenza is an infectious disease caused by rapidly-changing viruses that appear seasonally in the human population. New strains of influenza viruses appear every year, with the potential to cause a serious global pandemic. Two kinds of spikes – hemagglutinin (HA) and neuraminidase (NA) – decorate the surface of the virus particles and these proteins are primarily responsible for the antigenic changes observed in influenza viruses. Identification of the locations of the surface spikes of both kinds in a new strain of influenza virus can be of critical importance for the development of a vaccine that protects against such a virus. Two major categories of reconstruction techniques are transform methods such as weighted backprojection (WBP) and series expansion methods such as the algebraic reconstruction techniques (ART) and the simultaneous iterative reconstruction technique (SIRT). Series expansion methods aim at estimating the object to be reconstructed by a linear combination of some fixed basis functions and they typically estimate the coefficients in such an expansion by an iterative algorithm. The choice of the set of basis functions greatly influences the efficacy of the output of a series expansion method. It has been demonstrated that using spherically symmetric basis functions (blobs), instead of the more traditional voxels, results in reconstructions of superior quality. Our own research shows that, with the recommended data-processing steps performed on the projection images prior to reconstruction, ART (with its free parameters appropriately tuned) provides 3D reconstructions of viruses from tomographic tilt series that allow reliable quantification of the surface proteins and that the same is not achieved using WBP or SIRT, which are the methods that have been routinely applied by practicing electron microscopists. Image segmentation is the process of recognizing different objects in an image. Segmenting an object from a background is not a trivial task, especially when the image is corrupted by noise and/or shading. One concept that has been successfully used to achieve segmentation in such corrupted images is fuzzy connectedness. This technique assigns to each element in an image a grade of membership in an object. Classifications methods use set of relevant features to identify the objects of each class. To distinguish between HA and NA spikes in this research, discussions with biologists suggest that there may be a single feature that can be used reliably for the classification process. The result of the fuzzy connectedness technique we conducted to segment spikes from the background confirms the correctness of the biologists’ assumption. The single feature we used is the ratio of the width of the spike’s head to the width of its stem in 3D space; the ratio appears to be greater for NA than it is for HA. The proposed classifier is tested on different types of 3D reconstructions derived from simulated data. A statistical hypothesis testing based methodology allowed us to evaluate the relative suitability of reconstruction methods for the given classification task
    • …
    corecore