336 research outputs found

    Early Detection of Alzheimer's Disease with Blood Plasma Proteins using Support Vector Machines

    Get PDF
    The successful development of amyloid-based biomarkers and tests for Alzheimer's disease (AD) represents an important milestone in AD diagnosis. However, two major limitations remain. Amyloid-based diagnostic biomarkers and tests provide limited information about the disease process and they are unable to identify individuals with the disease before significant amyloid-beta accumulation in the brain develops. The objective in this study is to develop a method to identify potential blood-based non-amyloid biomarkers for early AD detection. The use of blood is attractive because it is accessible and relatively inexpensive. Our method is mainly based on machine learning (ML) techniques (support vector machines in particular) because of their ability to create multivariable models by learning patterns from complex data. Using novel feature selection and evaluation modalities, we identified 5 novel panels of non-amyloid proteins with the potential to serve as biomarkers of early AD. In particular, we found that the combination of A2M, ApoE, BNP, Eot3, RAGE and SGOT may be a key biomarker profile of early disease. Disease detection models based on the identified panels achieved sensitivity (SN) > 80%, specificity (SP) > 70%, and area under receiver operating curve (AUC) of at least 0.80 at prodromal stage (with higher performance at later stages) of the disease. Existing ML models performed poorly in comparison at this stage of the disease, suggesting that the underlying protein panels may not be suitable for early disease detection. Our results demonstrate the feasibility of early detection of AD using non-amyloid based biomarkers

    Development of Quantitative Structure-Property Relationships (QSPR) using calculated descriptors for the prediction of the physico-chemical properties (nD, r, bp, e and h) of a series of organic solvents.

    Get PDF
    Quantitative structure-property relationship (QSPR) models were derived for predicting boiling point (at 760 mmHg), density (at 25 \ub0C), viscosity (at 25 \ub0C), static dielectric constant (at 25 \ub0C), and refractive index (at 20 \ub0C) of a series of pure organic solvents of structural formula X-CH2CH2-Y. A very large number of calculated molecular descriptors were derived by quantum chemical methods, molecular topology, and molecular geometry by using the CODESSA software package. A comparative analysis of the multiple linear regression techniques (heuristic and best multilinear regression) implemented in CODESSA, with the multivariate PLS/GOLPE method, has been carried out. The performance of the different regression models has been evaluated by the standard deviation of prediction errors, calculated for the compounds of both the training set (internal validation) and the test set (external validation). Satisfactory QSPR models, from both predictive and interpretative point of views, have been obtained for all the studied properties

    Joint covariate selection and joint subspace selection for multiple classification problems

    Get PDF
    We address the problem of recovering a common set of covariates that are relevant simultaneously to several classification problems. By penalizing the sum of ℓ2-norms of the blocks of coefficients associated with each covariate across different classification problems, similar sparsity patterns in all models are encouraged. To take computational advantage of the sparsity of solutions at high regularization levels, we propose a blockwise path-following scheme that approximately traces the regularization path. As the regularization coefficient decreases, the algorithm maintains and updates concurrently a growing set of covariates that are simultaneously active for all problems. We also show how to use random projections to extend this approach to the problem of joint subspace selection, where multiple predictors are found in a common low-dimensional subspace. We present theoretical results showing that this random projection approach converges to the solution yielded by trace-norm regularization. Finally, we present a variety of experimental results exploring joint covariate selection and joint subspace selection, comparing the path-following approach to competing algorithms in terms of prediction accuracy and running time

    Decision models for fast-fashion supply and stocking problems in internet fulfillment warehouses

    Get PDF
    Internet technology is being widely used to transform all aspects of the modern supply chain. Specifically, accelerated product flows and wide spread information sharing across the supply chain have generated new sets of decision problems. This research addresses two such problems. The first focuses on fast fashion supply chains in which inventory and price are managed in real time to maximize retail cycle revenue. The second is concerned with explosive storage policies in Internet Fulfillment Warehouses (IFW). Fashion products are characterized by short product life cycles and market success uncertainty. An unsuccessful product will often require multiple price discounts to clear the inventory. The first topic proposes a switching solution for fast-fashion retailers who have preordered an initial or block inventory, and plan to use channel switching as opposed to multiple discounting steps. The FFS Multi-Channel Switching (MCS) problem then is to monitor real-time demand and store inventory, such that at the optimal period the remaining store inventory is sold at clearance, and the warehouse inventory is switched to the outlet channel. The objective is to maximize the total revenue. With a linear projection of the moving average demand trend, an estimation of the remaining cycle revenue at any time in the cycle is shown to be a concave function of the switching time. Using a set of conditions the objective is further simplified into cases. The Linear Moving Average Trend (LMAT) heuristic then prescribes whether a channel switch should be made in the next period. The LMAT is compared with the optimal policy and the No-Switch and Beta-Switch rules. The LMAT performs very well and the majority of test problems provide a solution within 0.4% of the optimal. This confirms that LMAT can readily and effectively be applied to real time decision making in a FFS. An IFW is a facility built and operated exclusively for online retail, and a key differentiator is the explosive storage policy. Breaking the single stocking location tradition, in an IFW small batches of the same stock keeping unit (SKU) are dispersed across the warehouse. Order fulfillment time performance is then closely related to the storage location decision, that is, for every incoming bulk, what is the specific storage location for each batch. Faster fulfillment is possible when SKUs are clustered such that narrow band picklists can be efficiently generated. Stock location decisions are therefore a function of the demand arrival behavior and correlations with other SKUs. Faster fulfillment is possible when SKUs are clustered such that narrow band picklists can be efficiently generated. Stock location decisions are therefore a function of the demand behavior and correlations with other SKUs. A Joint Item Correlation and Density Oriented (JICDO) Stocking Algorithm is developed and tested. JICDO is formulated to increase the probability that M pick able order items are stocked in a δ band of storage locations. It scans the current inventory dispersion to identify location bands with low SKU density and combines the storage affinity with correlated items. In small problem testing against a MIP formulation and large scale testing in a simulator the JICDO performance is confirmed

    The problem of variable selection for financial distress: applying GRASP methaeuristics

    Get PDF
    We use the GRASP procedure to select a subset of financial ratios that are then used to estimate a model of logistic regression to anticipate financial distress on a sample of Spanish firms. The algorithm we suggest is designed "ad-hoc" for this type of variables. Reducing dimensionality has several advantages such as reducing the cost of data acquisition, better understanding of the final classification model, and increasing the efficiency and the efficacy. The application of the GRASP procedure to preselect a reduced subset of financial ratios generated better results than those obtained directly by applying a model of logistic regression to the set of the 141 original financial ratios.Genetic algorithms, Financial distress, Failure, Financial ratios, Variable selection, GRASP, Metaheuristic

    Deducing corticotropin-releasing hormone receptor type 1 signaling networks from gene expression data by usage of genetic algorithms and graphical Gaussian models

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Dysregulation of the hypothalamic-pituitary-adrenal (HPA) axis is a hallmark of complex and multifactorial psychiatric diseases such as anxiety and mood disorders. About 50-60% of patients with major depression show HPA axis dysfunction, i.e. hyperactivity and impaired negative feedback regulation. The neuropeptide corticotropin-releasing hormone (CRH) and its receptor type 1 (CRHR1) are key regulators of this neuroendocrine stress axis. Therefore, we analyzed CRH/CRHR1-dependent gene expression data obtained from the pituitary corticotrope cell line AtT-20, a well-established <it>in vitro </it>model for CRHR1-mediated signal transduction. To extract significantly regulated genes from a genome-wide microarray data set and to deduce underlying CRHR1-dependent signaling networks, we combined supervised and unsupervised algorithms.</p> <p>Results</p> <p>We present an efficient variable selection strategy by consecutively applying univariate as well as multivariate methods followed by graphical models. First, feature preselection was used to exclude genes not differentially regulated over time from the dataset. For multivariate variable selection a maximum likelihood (MLHD) discriminant function within GALGO, an R package based on a genetic algorithm (GA), was chosen. The topmost genes representing major nodes in the expression network were ranked to find highly separating candidate genes. By using groups of five genes (chromosome size) in the discriminant function and repeating the genetic algorithm separately four times we found eleven genes occurring at least in three of the top ranked result lists of the four repetitions. In addition, we compared the results of GA/MLHD with the alternative optimization algorithms greedy selection and simulated annealing as well as with the state-of-the-art method random forest. In every case we obtained a clear overlap of the selected genes independently confirming the results of MLHD in combination with a genetic algorithm.</p> <p>With two unsupervised algorithms, principal component analysis and graphical Gaussian models, putative interactions of the candidate genes were determined and reconstructed by literature mining. Differential regulation of six candidate genes was validated by qRT-PCR.</p> <p>Conclusions</p> <p>The combination of supervised and unsupervised algorithms in this study allowed extracting a small subset of meaningful candidate genes from the genome-wide expression data set. Thereby, variable selection using different optimization algorithms based on linear classifiers as well as the nonlinear random forest method resulted in congruent candidate genes. The calculated interacting network connecting these new target genes was bioinformatically mapped to known CRHR1-dependent signaling pathways. Additionally, the differential expression of the identified target genes was confirmed experimentally.</p

    Combining handcrafted features with latent variables in machine learning for prediction of radiationâ induced lung damage

    Full text link
    Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/149351/1/mp13497.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/149351/2/mp13497_am.pd
    corecore