195 research outputs found

    A factor analysis model for functional genomics

    Get PDF
    BACKGROUND: Expression array data are used to predict biological functions of uncharacterized genes by comparing their expression profiles to those of characterized genes. While biologically plausible, this is both statistically and computationally challenging. Typical approaches are computationally expensive and ignore correlations among expression profiles and functional categories. RESULTS: We propose a factor analysis model (FAM) for functional genomics and give a two-step algorithm, using genome-wide expression data for yeast and a subset of Gene-Ontology Biological Process functional annotations. We show that the predictive performance of our method is comparable to the current best approach while our total computation time was faster by a factor of 4000. We discuss the unique challenges in performance evaluation of algorithms used for genome-wide functions genomics. Finally, we discuss extensions to our method that can incorporate the inherent correlation structure of the functional categories to further improve predictive performance. CONCLUSION: Our factor analysis model is a computationally efficient technique for functional genomics and provides a clear and unified statistical framework with potential for incorporating important gene ontology information to improve predictions

    Weighted Distance Weighted Discrimination and Pairwise Variable Selection for Classification

    Get PDF
    Statistical machine learning has attracted a lot of attention in recent years due to its broad applications in various fields. The driving statistical problem that is common throughout this dissertation is classification. This dissertation covers two major topics in classification. The first topic is weighted Distance Weighted Discrimination (weighted DWD or wDWD), an improved version of a recently proposed classification method. We show significant improvements are available in several situations. Using our proposed optimal weighting schemes, we show that wDWD is Fisher consistent under the overall misclassification criterion. In addition, we propose three alternative criteria and provide the corresponding optimal weights or adaptive weighting schemes for each of them. Mathematical validation of these ideas is established through the High-Dimensional, Low Sample-Size (HDLSS) asymptotic properties of wDWD. An important contribution is the weakening of the assumptions from Hall et al. (2005) and Ahn et al. (2007). We then extend the results to two classes. The HDLSS asymptotic properties of wDWD that we discuss here contain two results, one is about the misclassification rate of wDWD, the other explores the angle between the DWD direction and the optimal classification direction. The second topic of this dissertation is variable selection for classification. The goal is to find those variables that have weak marginal effects, but can lead to good classification results when they are viewed jointly. To accomplish this, we use a within-class permutation test called Significance test of Joint Effect (SigJEff). The resulting object of SigJEff is a set of pairs of variables with statistically significant joint effects. To extend our scope to joint effects with more than two variables, we introduce a new visualization approach to display the mutiscale joint effects, called Multiscale Significance Display (MSD), and a general framework for variable selection procedures based on MSD, called Multiscale Variable Screening (MVS). MSD is a moving window approach, and it evaluates the joint effects of the variables in this window. The moving window is based on an order of variables. MVS seeks to find the best initial ordering in an iterative manner.Doctor of Philosoph

    Random Number Generators

    Get PDF
    The quasi-negative-binomial distribution was applied to queuing theory for determining the distribution of total number of customers served before the queue vanishes under certain assumptions. Some structural properties (probability generating function, convolution, mode and recurrence relation) for the moments of quasi-negative-binomial distribution are discussed. The distribution’s characterization and its relation with other distributions were investigated. A computer program was developed using R to obtain ML estimates and the distribution was fitted to some observed sets of data to test its goodness of fit

    An M-QAM Signal Modulation Recognition Algorithm in AWGN Channel

    Full text link
    Computing the distinct features from input data, before the classification, is a part of complexity to the methods of Automatic Modulation Classification (AMC) which deals with modulation classification was a pattern recognition problem. Although the algorithms that focus on MultiLevel Quadrature Amplitude Modulation (M-QAM) which underneath different channel scenarios was well detailed. A search of the literature revealed indicates that few studies were done on the classification of high order M-QAM modulation schemes like128-QAM, 256-QAM, 512-QAM and1024-QAM. This work is focusing on the investigation of the powerful capability of the natural logarithmic properties and the possibility of extracting Higher-Order Cumulant's (HOC) features from input data received raw. The HOC signals were extracted under Additive White Gaussian Noise (AWGN) channel with four effective parameters which were defined to distinguished the types of modulation from the set; 4-QAM~1024-QAM. This approach makes the recognizer more intelligent and improves the success rate of classification. From simulation results, which was achieved under statistical models for noisy channels, manifest that recognized algorithm executes was recognizing in M-QAM, furthermore, most results were promising and showed that the logarithmic classifier works well over both AWGN and different fading channels, as well as it can achieve a reliable recognition rate even at a lower signal-to-noise ratio (less than zero), it can be considered as an Integrated Automatic Modulation Classification (AMC) system in order to identify high order of M-QAM signals that applied a unique logarithmic classifier, to represents higher versatility, hence it has a superior performance via all previous works in automatic modulation identification systemComment: 18 page

    Preparation of Silver Decorated Reduced Graphene Oxide Nanohybrid for Effective Photocatalytic Degradation of Indigo Carmine Dye

    Get PDF
    Background: Even though silver decorated reduced graphene oxide (Ag-rGO) shows max- imum absorptivity in the UV region, most of the research on the degradation of dyes using Ag-rGO is in the visible region. Therefore the present work focused on the photocatalytic degradation of indigo carmine (IC) dye in the presence of Ag-rGO as a catalyst by UV light irradiation. Methods: In this context, silver-decorated reduced graphene oxide hybrid material was fabricated and explored its potential for the photocatalytic degradation of aqueous IC solution in the UV region. The decoration of Ag nanoparticles on the surface of the rGO nanosheets is evidenced by TEM analysis. The extent of mineralization of the dye was measured by estimating chemical oxygen demand (COD) values before and after irradiation. Results: The synthesized Ag-rGO binary composites displayed excellent photocatalytic activity in 2 Χ 10-5 M IC concentration and 5mg catalyst loading. The optical absorption spectrum of Ag-rGO showed that the energy band-gap was found to be 2.27 eV, which is significantly smaller compared to the band-gap of GO. 5 mg of Ag-rGO was found to be an optimum quantity for the effective degrada- tion of IC dye. The degradation rate increases with the decrease in the concentration of the dye at al- kaline pH conditions. The photocatalytic efficiency was 92% for the second time. Conclusion: The impact of the enhanced reactive species generation was consistent with higher pho- tocatalytic dye degradation. The photocatalytic mechanism has been proposed and the hydroxyl radi- cal was found to be the reactive species responsible for the degradation of dye. The feasibility of reus- ing the photocatalyst showed that the photocatalytic efficiency was very effective for the second tim

    Classification software technique assessment

    Get PDF
    A catalog of software options is presented for the use of local user communities to obtain software for analyzing remotely sensed multispectral imagery. The resources required to utilize a particular software program are described. Descriptions of how a particular program analyzes data and the performance of that program for an application and data set provided by the user are shown. An effort is made to establish a statistical performance base for various software programs with regard to different data sets and analysis applications, to determine the status of the state-of-the-art

    Classification of non-heat generating outdoor objects in thermal scenes for autonomous robots

    Get PDF
    We have designed and implemented a physics-based adaptive Bayesian pattern classification model that uses a passive thermal infrared imaging system to automatically characterize non-heat generating objects in unstructured outdoor environments for mobile robots. In the context of this research, non-heat generating objects are defined as objects that are not a source for their own emission of thermal energy, and so exclude people, animals, vehicles, etc. The resulting classification model complements an autonomous bot\u27s situational awareness by providing the ability to classify smaller structures commonly found in the immediate operational environment. Since GPS depends on the availability of satellites and onboard terrain maps which are often unable to include enough detail for smaller structures found in an operational environment, bots will require the ability to make decisions such as go through the hedges or go around the brick wall. A thermal infrared imaging modality mounted on a small mobile bot is a favorable choice for receiving enough detailed information to automatically interpret objects at close ranges while unobtrusively traveling alongside pedestrians. The classification of indoor objects and heat generating objects in thermal scenes is a solved problem. A missing and essential piece in the literature has been research involving the automatic characterization of non-heat generating objects in outdoor environments using a thermal infrared imaging modality for mobile bots. Seeking to classify non-heat generating objects in outdoor environments using a thermal infrared imaging system is a complex problem due to the variation of radiance emitted from the objects as a result of the diurnal cycle of solar energy. The model that we present will allow bots to see beyond vision to autonomously assess the physical nature of the surrounding structures for making decisions without the need for an interpretation by humans.;Our approach is an application of Bayesian statistical pattern classification where learning involves labeled classes of data (supervised classification), assumes no formal structure regarding the density of the data in the classes (nonparametric density estimation), and makes direct use of prior knowledge regarding an object class\u27s existence in a bot\u27s immediate area of operation when making decisions regarding class assignments for unknown objects. We have used a mobile bot to systematically capture thermal infrared imagery for two categories of non-heat generating objects (extended and compact) in several different geographic locations. The extended objects consist of objects that extend beyond the thermal camera\u27s field of view, such as brick walls, hedges, picket fences, and wood walls. The compact objects consist of objects that are within the thermal camera\u27s field of view, such as steel poles and trees. We used these large representative data sets to explore the behavior of thermal-physical features generated from the signals emitted by the classes of objects and design our Adaptive Bayesian Classification Model. We demonstrate that our novel classification model not only displays exceptional performance in characterizing non-heat generating outdoor objects in thermal scenes but it also outperforms the traditional KNN and Parzen classifiers

    Vol. 16, No. 2 (Full Issue)

    Get PDF

    Penalized mixed-effects ordinal response models for high-dimensional genomic data in twins and families

    Get PDF
    The Brisbane Longitudinal Twin Study (BLTS) was being conducted in Australia and was funded by the US National Institute on Drug Abuse (NIDA). Adolescent twins were sampled as a part of this study and surveyed about their substance use as part of the Pathways to Cannabis Use, Abuse and Dependence project. The methods developed in this dissertation were designed for the purpose of analyzing a subset of the Pathways data that includes demographics, cannabis use metrics, personality measures, and imputed genotypes (SNPs) for 493 complete twin pairs (986 subjects.) The primary goal was to determine what combination of SNPs and additional covariates may predict cannabis use, measured on an ordinal scale as: “never tried,” “used moderately,” or “used frequently”. To conduct this analysis, we extended the ordinal Generalized Monotone Incremental Forward Stagewise (GMIFS) method for mixed models. This extension includes allowance for a unpenalized set of covariates to be coerced into the model as well as flexibility for user-specified correlation patterns between twins in a family. The proposed methods are applicable to high-dimensional (genomic or otherwise) data with ordinal response and specific, known covariance structure within clusters

    A DISCOURSE ON CHILD MALNUTRITION: ANTHROPOMETRY, EMERGENT THEMES, QUALITY CONTROL MAXIMS, AND CLIMATIC AND ECONOMIC DETERMINANTS

    Get PDF
    Malnutrition is a detrimental and significant plight for young children, responsible for 45% of all deaths among children worldwide. The aim of my dissertation is to assess the history of the science of anthropometry, synthesize the cumulative findings within the contemporary child malnutrition literature, dispute certain quality control maxims of anthropometric child-health surveys, and quantify the responsible latent factors of child malnutrition. These efforts are in service of a better characterization of malnutrition, a more reliable estimate of how many children are malnourished, and a better understanding of the geographical distribution and dynamic stochastic characteristics of malnutrition. It is essential to better understand malnutrition and its causes to suggest appropriate corrective policy. This dissertation consists of four principal essays, each from a unique conceptual perspective. The first essay is a historical and epistemological perspective of the science of anthropometry. I contextualize the legacy of child malnutrition efforts, including the link between eugenics and contemporary notions of “normal” child growth, the institutional power-struggle for child growth chart superiority, the obfuscated distinction between growth references and standards of growth, and the consequences of universal standards that do not reflect observable populations. The second essay is a systematic review of the literature, the largest of its kind to date. I synthesize 184 disaggregate empirical studies of the determinants of child malnutrition in Africa published since 1990. I find numerous opportunities for development within this corpus, in particular opportunities to enrich the scope, scale, and quantification of the field. The third essay is an analytical perspective on the quality control mechanisms applied to anthropometric surveys. I challenge the practice of rejecting datasets based on overlarge z-score standard deviation values and offer an alternative approach. The fourth essay is an econometric empirical analysis in Kenya and Nigeria of child malnutrition determinants. I use spatial Bayesian kriging and four-level random intercept hierarchical logit models to show the spatial heterogeneity of malnutrition prevalence, and to quantify various socio-economic and climatic determinants of child malnutrition. I find significant spatial and hierarchical relationships and determinants, which can move malnutrition rates by over 50%
    corecore