10,322 research outputs found

    Machine learning in solar physics

    Full text link
    The application of machine learning in solar physics has the potential to greatly enhance our understanding of the complex processes that take place in the atmosphere of the Sun. By using techniques such as deep learning, we are now in the position to analyze large amounts of data from solar observations and identify patterns and trends that may not have been apparent using traditional methods. This can help us improve our understanding of explosive events like solar flares, which can have a strong effect on the Earth environment. Predicting hazardous events on Earth becomes crucial for our technological society. Machine learning can also improve our understanding of the inner workings of the sun itself by allowing us to go deeper into the data and to propose more complex models to explain them. Additionally, the use of machine learning can help to automate the analysis of solar data, reducing the need for manual labor and increasing the efficiency of research in this field.Comment: 100 pages, 13 figures, 286 references, accepted for publication as a Living Review in Solar Physics (LRSP

    Machine Learning Approaches for the Prioritisation of Cardiovascular Disease Genes Following Genome- wide Association Study

    Get PDF
    Genome-wide association studies (GWAS) have revealed thousands of genetic loci, establishing itself as a valuable method for unravelling the complex biology of many diseases. As GWAS has grown in size and improved in study design to detect effects, identifying real causal signals, disentangling from other highly correlated markers associated by linkage disequilibrium (LD) remains challenging. This has severely limited GWAS findings and brought the method’s value into question. Although thousands of disease susceptibility loci have been reported, causal variants and genes at these loci remain elusive. Post-GWAS analysis aims to dissect the heterogeneity of variant and gene signals. In recent years, machine learning (ML) models have been developed for post-GWAS prioritisation. ML models have ranged from using logistic regression to more complex ensemble models such as random forests and gradient boosting, as well as deep learning models (i.e., neural networks). When combined with functional validation, these methods have shown important translational insights, providing a strong evidence-based approach to direct post-GWAS research. However, ML approaches are in their infancy across biological applications, and as they continue to evolve an evaluation of their robustness for GWAS prioritisation is needed. Here, I investigate the landscape of ML across: selected models, input features, bias risk, and output model performance, with a focus on building a prioritisation framework that is applied to blood pressure GWAS results and tested on re-application to blood lipid traits

    Fault diagnosis in aircraft fuel system components with machine learning algorithms

    Get PDF
    There is a high demand and interest in considering the social and environmental effects of the component’s lifespan. Aircraft are one of the most high-priced businesses that require the highest reliability and safety constraints. The complexity of aircraft systems designs also has advanced rapidly in the last decade. Consequently, fault detection, diagnosis and modification/ repair procedures are becoming more challenging. The presence of a fault within an aircraft system can result in changes to system performances and cause operational downtime or accidents in a worst-case scenario. The CBM method that predicts the state of the equipment based on data collected is widely used in aircraft MROs. CBM uses diagnostics and prognostics models to make decisions on appropriate maintenance actions based on the Remaining Useful Life (RUL) of the components. The aircraft fuel system is a crucial system of aircraft, even a minor failure in the fuel system can affect the aircraft's safety greatly. A failure in the fuel system that impacts the ability to deliver fuel to the engine will have an immediate effect on system performance and safety. There are very few diagnostic systems that monitor the health of the fuel system and even fewer that can contain detected faults. The fuel system is crucial for the operation of the aircraft, in case of failure, the fuel in the aircraft will become unusable/unavailable to reach the destination. It is necessary to develop fault detection of the aircraft fuel system. The future aircraft fuel system must have the function of fault detection. Through the information of sensors and Machine Learning Techniques, the aircraft fuel system’s fault type can be detected in a timely manner. This thesis discusses the application of a Data-driven technique to analyse the healthy and faulty data collected using the aircraft fuel system model, which is similar to Boeing-777. The data is collected is processed through Machine learning Techniques and the results are comparedPhD in Manufacturin

    Beam scanning by liquid-crystal biasing in a modified SIW structure

    Get PDF
    A fixed-frequency beam-scanning 1D antenna based on Liquid Crystals (LCs) is designed for application in 2D scanning with lateral alignment. The 2D array environment imposes full decoupling of adjacent 1D antennas, which often conflicts with the LC requirement of DC biasing: the proposed design accommodates both. The LC medium is placed inside a Substrate Integrated Waveguide (SIW) modified to work as a Groove Gap Waveguide, with radiating slots etched on the upper broad wall, that radiates as a Leaky-Wave Antenna (LWA). This allows effective application of the DC bias voltage needed for tuning the LCs. At the same time, the RF field remains laterally confined, enabling the possibility to lay several antennas in parallel and achieve 2D beam scanning. The design is validated by simulation employing the actual properties of a commercial LC medium

    Towards optimal sensor placement for inverse problems in spaces of measures

    Full text link
    This paper studies the identification of a linear combination of point sources from a finite number of measurements. Since the data are typically contaminated by Gaussian noise, a statistical framework for its recovery is considered. It relies on two main ingredients, first, a convex but non-smooth Tikhonov point estimator over the space of Radon measures and, second, a suitable mean-squared error based on its Hellinger-Kantorovich distance to the ground truth. Assuming standard non-degenerate source conditions as well as applying careful linearization arguments, a computable upper bound on the latter is derived. On the one hand, this allows to derive asymptotic convergence results for the mean-squared error of the estimator in the small small variance case. On the other, it paves the way for applying optimal sensor placement approaches to sparse inverse problems.Comment: 31 pages, 8 figure

    Advertiser Learning in Direct Advertising Markets

    Full text link
    Direct buy advertisers procure advertising inventory at fixed rates from publishers and ad networks. Such advertisers face the complex task of choosing ads amongst myriad new publisher sites. We offer evidence that advertisers do not excel at making these choices. Instead, they try many sites before settling on a favored set, consistent with advertiser learning. We subsequently model advertiser demand for publisher inventory wherein advertisers learn about advertising efficacy across publishers' sites. Results suggest that advertisers spend considerable resources advertising on sites they eventually abandon -- in part because their prior beliefs about advertising efficacy on those sites are too optimistic. The median advertiser's expected CTR at a new site is 0.23%, five times higher than the true median CTR of 0.045%. We consider how pooling advertiser information remediates this problem. Specifically, we show that ads with similar visual elements garner similar CTRs, enabling advertisers to better predict ad performance at new sites. Counterfactual analyses indicate that gains from pooling advertiser information are substantial: over six months, we estimate a median advertiser welfare gain of \$2,756 (a 15.5% increase) and a median publisher revenue gain of \$9,618 (a 63.9% increase)

    Introduction to Facial Micro Expressions Analysis Using Color and Depth Images: A Matlab Coding Approach (Second Edition, 2023)

    Full text link
    The book attempts to introduce a gentle introduction to the field of Facial Micro Expressions Recognition (FMER) using Color and Depth images, with the aid of MATLAB programming environment. FMER is a subset of image processing and it is a multidisciplinary topic to analysis. So, it requires familiarity with other topics of Artifactual Intelligence (AI) such as machine learning, digital image processing, psychology and more. So, it is a great opportunity to write a book which covers all of these topics for beginner to professional readers in the field of AI and even without having background of AI. Our goal is to provide a standalone introduction in the field of MFER analysis in the form of theorical descriptions for readers with no background in image processing with reproducible Matlab practical examples. Also, we describe any basic definitions for FMER analysis and MATLAB library which is used in the text, that helps final reader to apply the experiments in the real-world applications. We believe that this book is suitable for students, researchers, and professionals alike, who need to develop practical skills, along with a basic understanding of the field. We expect that, after reading this book, the reader feels comfortable with different key stages such as color and depth image processing, color and depth image representation, classification, machine learning, facial micro-expressions recognition, feature extraction and dimensionality reduction. The book attempts to introduce a gentle introduction to the field of Facial Micro Expressions Recognition (FMER) using Color and Depth images, with the aid of MATLAB programming environment.Comment: This is the second edition of the boo

    Optical Remote Sensing of Oil Spills by using Machine Learning Methods in the Persian Gulf: A Multi-Class Approach

    Get PDF
    Marine oil spills are harmful for the environment and costly for society. Coastal areas are particularly vulnerable since they provide habitats for organisms, animals and marine ecosystems. This thesis studied machine learning methods to classify thick oil in a multi-class case, using remotely sensed multi-spectral data in the Persian Gulf. The study area covers a large area between United Arab Emirates (UAE) and Iran. The dataset is extracted from 10 Sentinel-2 tiles on six spectral bands between 492 nm to 2202 nm. These images were annotated for four classes, namely thick oil, thin oil, ocean water and turbid water by using the Bonn Agreement to analyse true color composite images. A variety of machine learning methods were trained and evaluated using this dataset. Then a robustness evaluation was done by using selected machine learning methods on an independent dataset. Initially multiple machine learning methods were included; three decision trees, six K-Nearest Neighbor (KNN) models, two Artificial Neural Network (ANN) models, two Naive bayes models, and two discriminant models. Two KNN models and two ANN models were then picked for further evaluation. The results show that the fine KNN approach with two nearest neighbors had the best performance based on the computed statistical measures. However, the robustness evaluation showed that the tri-layered NN performed better. This thesis has shown that supervised machine learning with a multi-class approach can be used for oil spill monitoring using multi-spectral remote sensing data in the Persian Gulf

    Understanding and Mitigating Privacy Vulnerabilities in Deep Learning

    Get PDF
    Advancements in Deep Learning (DL) have enabled leveraging large-scale datasets to train models that perform challenging tasks at a level that mimics human intelligence. In several real-world scenarios, the data used for training, the trained model, and the data used for inference can be private and distributed across multiple distrusting parties, posing a challenge for training and inference. Several privacy-preserving training and inference frameworks have been developed to address this challenge. For instance, frameworks like federated learning and split learning have been proposed to train a model collaboratively on distributed data without explicitly sharing the private data to protect training data privacy. To protect model privacy during inference, the model owners have adopted a client-server architecture to provide inference services, wherein the end-users are only allowed black-box access to the model’s predictions for their input queries. The goal of this thesis is to provide a better understanding of the privacy properties of the DL frameworks used for privacy-preserving training and inference. While these frameworks have the appearance of keeping the data and model private, the information exchanged during training/inference has the potential to compromise the privacy of the parties involved by leaking sensitive data. We aim to understand if these frameworks are truly capable of preventing the leakage of model and training data in realistic settings. In this pursuit, we discover new vulnerabilities that can be exploited to design powerful attacks that can overcome the limitations of prior works and break the illusion of privacy. Our findings highlight the limitations of these frameworks and underscore the importance of principled techniques to protect privacy. Furthermore, we leverage our improved understanding to design better defenses that can significantly deter the efficacy of an attack.Ph.D

    DataComp: In search of the next generation of multimodal datasets

    Full text link
    Multimodal datasets are a critical component in recent breakthroughs such as Stable Diffusion and GPT-4, yet their design does not receive the same research attention as model architectures or training algorithms. To address this shortcoming in the ML ecosystem, we introduce DataComp, a testbed for dataset experiments centered around a new candidate pool of 12.8 billion image-text pairs from Common Crawl. Participants in our benchmark design new filtering techniques or curate new data sources and then evaluate their new dataset by running our standardized CLIP training code and testing the resulting model on 38 downstream test sets. Our benchmark consists of multiple compute scales spanning four orders of magnitude, which enables the study of scaling trends and makes the benchmark accessible to researchers with varying resources. Our baseline experiments show that the DataComp workflow leads to better training sets. In particular, our best baseline, DataComp-1B, enables training a CLIP ViT-L/14 from scratch to 79.2% zero-shot accuracy on ImageNet, outperforming OpenAI's CLIP ViT-L/14 by 3.7 percentage points while using the same training procedure and compute. We release DataComp and all accompanying code at www.datacomp.ai
    • …
    corecore