446 research outputs found

    Sparse reduced-rank regression for imaging genetics studies: models and applications

    Get PDF
    We present a novel statistical technique; the sparse reduced rank regression (sRRR) model which is a strategy for multivariate modelling of high-dimensional imaging responses and genetic predictors. By adopting penalisation techniques, the model is able to enforce sparsity in the regression coefficients, identifying subsets of genetic markers that best explain the variability observed in subsets of the phenotypes. To properly exploit the rich structure present in each of the imaging and genetics domains, we additionally propose the use of several structured penalties within the sRRR model. Using simulation procedures that accurately reflect realistic imaging genetics data, we present detailed evaluations of the sRRR method in comparison with the more traditional univariate linear modelling approach. In all settings considered, we show that sRRR possesses better power to detect the deleterious genetic variants. Moreover, using a simple genetic model, we demonstrate the potential benefits, in terms of statistical power, of carrying out voxel-wise searches as opposed to extracting averages over regions of interest in the brain. Since this entails the use of phenotypic vectors of enormous dimensionality, we suggest the use of a sparse classification model as a de-noising step, prior to the imaging genetics study. Finally, we present the application of a data re-sampling technique within the sRRR model for model selection. Using this approach we are able to rank the genetic markers in order of importance of association to the phenotypes, and similarly rank the phenotypes in order of importance to the genetic markers. In the very end, we illustrate the application perspective of the proposed statistical models in three real imaging genetics datasets and highlight some potential associations

    Application of machine learning to agricultural soil data

    Get PDF
    Agriculture is a major sector in the Indian economy. One key advantage of classification and prediction of soil parameters is to save time of specialized technicians developing expensive chemical analysis. In this context, this PhD thesis has been developed in three stages: 1. Classification for soil data: we used chemical soil measurements to classify many relevant soil parameters: village-wise fertility indices; soil pH and type; soil nutrients, in order to recommend suitable amounts of fertilizers; and preferable crop. 2. Regression for generic data: we developed an experimental comparison of many regressors to a large collection of generic datasets selected from the University of California at Irving (UCI) machine learning repository. 3. Regression for soil data: We applied the regressors used in stage 2 to the soil datasets, developing a direct prediction of their numeric values. The accuracy of the prediction was evaluated for the ten soil problems, as an alternative to the prediction of the quantified values (classification) developed in stage 1

    Degradation modeling and degradation-aware control of power electronic systems

    Get PDF
    The power electronics market is valued at 23.25billionin2019andisprojectedtoreach23.25 billion in 2019 and is projected to reach 36.64 billion by 2027. Power electronic systems (PES) have been extensively used in a wide range of critical applications, including automotive, renewable energy, industrial variable-frequency drive, etc. Thus, the PESs\u27 reliability and robustness are immensely important for the smooth operation of mission-critical applications. Power semiconductor switches are one of the most vulnerable components in the PES. The vulnerability of these switches impacts the reliability and robustness of the PES. Thus, switch-health monitoring and prognosis are critical for avoiding unexpected shutdowns and preventing catastrophic failures. The importance of the prognosis study increases dramatically with the growing popularity of the next-generation power semiconductor switches, wide bandgap switches. These switches show immense promise in the high-power high-frequency operations due to their higher breakdown voltage and lower switch loss. But their wide adaptation is limited by the inadequate reliability study. A thorough prognosis study comprising switch degradation modeling, remaining useful life (RUL) estimation, and degradation-aware controller development, is important to enhance the PESs\u27 robustness, especially with wide bandgap switches. In this dissertation, three studies are conducted to achieve these objectives- 1) Insulated Gate Bipolar Transistor (IGBT) degradation modeling and RUL estimation, 2) cascode Gallium Nitride (GaN) Field-Effect Transistor (FET) degradation modeling and RUL estimation, and 3) Degradation-aware controller design for a PES, solid-state transformer (SST). The first two studies have addressed the significant variation in RUL estimation and proposed degradation identification methods for IGBT and cascode GaN FET. In the third study, a system-level integration of the switch degradation model is implemented in the SST. The insight into the switch\u27s degradation pattern from the first two studies is integrated into developing a degradation-aware controller for the SST. State-of-the-art controllers do not consider the switch degradation that results in premature system failure. The proposed low-complexity degradation-aware and adaptive SST controller ensures optimal degradation-aware power transfer and robust operation over the lifetime

    An extensive experimental survey of regression methods

    Get PDF
    Regression is a very relevant problem in machine learning, with many different available approaches. The current work presents a comparison of a large collection composed by 77 popular regression models which belong to 19 families: linear and generalized linear models, generalized additive models, least squares, projection methods, LASSO and ridge regression, Bayesian models, Gaussian processes, quantile regression, nearest neighbors, regression trees and rules, random forests, bagging and boosting, neural networks, deep learning and support vector regression. These methods are evaluated using all the regression datasets of the UCI machine learning repository (83 datasets), with some exceptions due to technical reasons. The experimental work identifies several outstanding regression models: the M5 rule-based model with corrections based on nearest neighbors (cubist), the gradient boosted machine (gbm), the boosting ensemble of regression trees (bstTree) and the M5 regression tree. Cubist achieves the best squared correlation (R2) in 15.7% of datasets being very near to it, with difference below 0.2 for 89.1% of datasets, and the median of these differences over the dataset collection is very low (0.0192), compared e.g. to the classical linear regression (0.150). However, cubist is slow and fails in several large datasets, while other similar regression models as M5 never fail and its difference to the best R2 is below 0.2 for 92.8% of datasets. Other well-performing regression models are the committee of neural networks (avNNet), extremely randomized regression trees (extraTrees, which achieves the best R2 in 33.7% of datasets), random forest (rf) and ε-support vector regression (svr), but they are slower and fail in several datasets. The fastest regression model is least angle regression lars, which is 70 and 2,115 times faster than M5 and cubist, respectively. The model which requires least memory is non-negative least squares (nnls), about 2 GB, similarly to cubist, while M5 requires about 8 GB. For 97.6% of datasets there is a regression model among the 10 bests which is very near (difference below 0.1) to the best R2, which increases to 100% allowing differences of 0.2. Therefore, provided that our dataset and model collection are representative enough, the main conclusion of this study is that, for a new regression problem, some model in our top-10 should achieve R2 near to the best attainable for that problemThis work has received financial support from the Erasmus Mundus Euphrates programme [project number 2013-2540/001-001-EMA2], from the Xunta de Galicia (Centro singular de investigación de Galicia, accreditation 2016–2019) and the European Union (European Regional Development Fund — ERDF), Project MTM2016–76969–P (Spanish State Research Agency, AEI)co-funded by the European Regional Development Fund (ERDF) and IAP network from Belgian Science PolicyS

    Data Reduction and Deep-Learning Based Recovery for Geospatial Visualization and Satellite Imagery

    Get PDF
    The storage, retrieval and distribution of data are some critical aspects of big data management. Data scientists and decision-makers often need to share large datasets and make decisions on archiving or deleting historical data to cope with resource constraints. As a consequence, there is an urgency of reducing the storage and transmission requirement. A potential approach to mitigate such problems is to reduce big datasets into smaller ones, which will not only lower storage requirements but also allow light load transfer over the network. The high dimensional data often exhibit high repetitiveness and paradigm across different dimensions. Carefully prepared data by removing redundancies, along with a machine learning model capable of reconstructing the whole dataset from its reduced version, can improve the storage scalability, data transfer, and speed up the overall data management pipeline. In this thesis, we explore some data reduction strategies for big datasets, while ensuring that the data can be transferred and used ubiquitously by all stakeholders, i.e., the entire dataset can be reconstructed with high quality whenever necessary. One of our data reduction strategies follows a straightforward uniform pattern, which guarantees a minimum of 75% data size reduction. We also propose a novel variance based reduction technique, which focuses on removing only redundant data and offers additional 1% to 2% deletion rate. We have adopted various traditional machine learning and deep learning approaches for high-quality reconstruction. We evaluated our pipelines with big geospatial data and satellite imageries. Among them, our deep learning approaches have performed very well both quantitatively and qualitatively with the capability of reconstructing high quality features. We also show how to leverage temporal data for better reconstruction. For uniform deletion, the reconstruction accuracy observed is as high as 98.75% on an average for spatial meteorological data (e.g., soil moisture and albedo), and 99.09% for satellite imagery. Pushing the deletion rate further by following variance based deletion method, the decrease in accuracy remains within 1% for spatial meteorological data and 7% for satellite imagery

    On the predictability of U.S. stock market using machine learning and deep learning techniques

    Get PDF
    Conventional market theories are considered to be inconsistent approach in modern financial analysis. This thesis focuses mainly on the application of sophisticated machine learning and deep learning techniques in stock market statistical predictability and economic significance over the benchmark conventional efficient market hypothesis and econometric models. Five chapters and three publishable papers were proposed altogether, and each chapter is developed to solve specific identifiable problem(s). Chapter one gives the general introduction of the thesis. It presents the statement of the research problems identified in the relevant literature, the objective of the study and the significance of the study. Chapter two applies a plethora of machine learning techniques to forecast the direction of the U.S. stock market. The notable sophisticated techniques such as regularization, discriminant analysis, classification trees, Bayesian and neural networks were employed. The empirical findings revealed that the discriminant analysis classifiers, classification trees, Bayesian classifiers and penalized binary probit models demonstrate significant outperformance over the binary probit models both statistically and economically, proving significant alternatives to portfolio managers. Chapter three focuses mainly on the application of regression training (RT) techniques to forecast the U.S. equity premium. The RT models demonstrate significant evidence of equity premium predictability both statistically and economically relative to the benchmark historical average, delivering significant utility gains. Chapter four investigates the statistical predictive power and economic significance of financial stock market data by deep learning techniques. Chapter five give the summary, conclusion and present area(s) of further research. The techniques are proven to be robust both statistically and economically when forecasting the equity premium out-of-sample using recursive window method. Overall, the deep learning techniques produced the best result in this thesis. They seek to provide meaningful economic information on mean-variance portfolio investment for investors who are timing the market to earn future gains at minimal risk

    Lazy Lasso for local regression

    Get PDF
    Locally weighted regression is a technique that predicts the response for new data items from their neighbors in the training data set, where closer data items are assigned higher weights in the prediction. However, the original method may suffer from overfitting and fail to select the relevant variables. In this paper we propose combining a regularization approach with locally weighted regression to achieve sparse models. Specifically, the lasso is a shrinkage and selection method for linear regression. We present an algorithm that embeds lasso in an iterative procedure that alternatively computes weights and performs lasso-wise regression. The algorithm is tested on three synthetic scenarios and two real data sets. Results show that the proposed method outperforms linear and local models for several kinds of scenario

    Leverage and influence diagnostics for Gibbs spatial point processes

    Get PDF
    For point process models fitted to spatial point pattern data, we describe diagnostic quantities analogous to the classical regression diagnostics of leverage and influence. We develop a simple and accessible approach to these diagnostics, and use it to extend previous results for Poisson point process models to the vastly larger class of Gibbs point processes. Explicit expressions, and efficient calculation formulae, are obtained for models fitted by maximum pseudolikelihood, maximum logistic composite likelihood, and regularised composite likelihoods. For practical applications we introduce new graphical tools, and a new diagnostic analogous to the effect measure DFFIT in regression
    • …
    corecore