970 research outputs found

    A Cluster Elastic Net for Multivariate Regression

    Get PDF
    We propose a method for estimating coefficients in multivariate regression when there is a clustering structure to the response variables. The proposed method includes a fusion penalty, to shrink the difference in fitted values from responses in the same cluster, and an L1 penalty for simultaneous variable selection and estimation. The method can be used when the grouping structure of the response variables is known or unknown. When the clustering structure is unknown the method will simultaneously estimate the clusters of the response and the regression coefficients. Theoretical results are presented for the penalized least squares case, including asymptotic results allowing for p >> n. We extend our method to the setting where the responses are binomial variables. We propose a coordinate descent algorithm for both the normal and binomial likelihood, which can easily be extended to other generalized linear model (GLM) settings. Simulations and data examples from business operations and genomics are presented to show the merits of both the least squares and binomial methods.Comment: 37 Pages, 11 Figure

    Ridge Fusion in Statistical Learning

    Full text link
    We propose a penalized likelihood method to jointly estimate multiple precision matrices for use in quadratic discriminant analysis and model based clustering. A ridge penalty and a ridge fusion penalty are used to introduce shrinkage and promote similarity between precision matrix estimates. Block-wise coordinate descent is used for optimization, and validation likelihood is used for tuning parameter selection. Our method is applied in quadratic discriminant analysis and semi-supervised model based clustering.Comment: 24 pages and 9 tables, 3 figure

    On the Use of Minimum Penalties in Statistical Learning

    Get PDF
    Modern multivariate machine learning and statistical methodologies estimate parameters of interest while leveraging prior knowledge of the association between outcome variables. The methods that do allow for estimation of relationships do so typically through an error covariance matrix in multivariate regression which does not scale to other types of models. In this article we proposed the MinPEN framework to simultaneously estimate regression coefficients associated with the multivariate regression model and the relationships between outcome variables using mild assumptions. The MinPen framework utilizes a novel penalty based on the minimum function to exploit detected relationships between responses. An iterative algorithm that generalizes current state of the art methods is proposed as a solution to the non-convex optimization that is required to obtain estimates. Theoretical results such as high dimensional convergence rates, model selection consistency, and a framework for post selection inference are provided. We extend the proposed MinPen framework to other exponential family loss functions, with a specific focus on multiple binomial responses. Tuning parameter selection is also addressed. Finally, simulations and two data examples are presented to show the finite sample properties of this framewok

    Effects of Training Set Size on Supervised Machine-Learning Land-Cover Classification of Large-Area High-Resolution Remotely Sensed Data

    Get PDF
    The size of the training data set is a major determinant of classification accuracy. Neverthe- less, the collection of a large training data set for supervised classifiers can be a challenge, especially for studies covering a large area, which may be typical of many real-world applied projects. This work investigates how variations in training set size, ranging from a large sample size (n = 10,000) to a very small sample size (n = 40), affect the performance of six supervised machine-learning algo- rithms applied to classify large-area high-spatial-resolution (HR) (1–5 m) remotely sensed data within the context of a geographic object-based image analysis (GEOBIA) approach. GEOBIA, in which adjacent similar pixels are grouped into image-objects that form the unit of the classification, offers the potential benefit of allowing multiple additional variables, such as measures of object geometry and texture, thus increasing the dimensionality of the classification input data. The six supervised machine-learning algorithms are support vector machines (SVM), random forests (RF), k-nearest neighbors (k-NN), single-layer perceptron neural networks (NEU), learning vector quantization (LVQ), and gradient-boosted trees (GBM). RF, the algorithm with the highest overall accuracy, was notable for its negligible decrease in overall accuracy, 1.0%, when training sample size decreased from 10,000 to 315 samples. GBM provided similar overall accuracy to RF; however, the algorithm was very expensive in terms of training time and computational resources, especially with large training sets. In contrast to RF and GBM, NEU, and SVM were particularly sensitive to decreasing sample size, with NEU classifications generally producing overall accuracies that were on average slightly higher than SVM classifications for larger sample sizes, but lower than SVM for the smallest sample sizes. NEU however required a longer processing time. The k-NN classifier saw less of a drop in overall accuracy than NEU and SVM as training set size decreased; however, the overall accuracies of k-NN were typically less than RF, NEU, and SVM classifiers. LVQ generally had the lowest overall accuracy of all six methods, but was relatively insensitive to sample size, down to the smallest sample sizes. Overall, due to its relatively high accuracy with small training sample sets, and minimal variations in overall accuracy between very large and small sample sets, as well as relatively short processing time, RF was a good classifier for large-area land-cover classifications of HR remotely sensed data, especially when training data are scarce. However, as performance of different supervised classifiers varies in response to training set size, investigating multiple classification algorithms is recommended to achieve optimal accuracy for a project

    Investigating the Present Day Cosmic Dust Flux at the Earth's Surface: Initial Results from the Kwajalein Micrometeorite Collection

    Get PDF
    Examination of impact craters on the Long Duration Exposure Facility satellite indicate a present day micrometeoroid flux of approx. 30,000 tonnes [1 after 2]. But what portion of this material arrives at the Earth's surface as micrometeorites? Studies of available micrometeorite collections from deep sea sediments [e.g. 3], Greenland blue ice [e.g. 4] and the South Pole water well [e.g. 1] may be complicated by terrestrial weathering and, in some cases, collection bias (magnetic separation for deep sea sediments) and poorly constrained ages. We have recently set up a micrometeorite collection station on Kwajalein Island in the Republic of the Marshall Islands in the Pacific Ocean, using high volume air samplers to collect particles directly from the atmosphere. By collecting in this way, the terrestrial age of the particles is known, the weathering they experience is minimal, and we are able to constrain particle arrival times. Collecting at this location also exploits the considerably reduced anthropogenic background [5]. Method: High volume air samplers were installed on top of the two-story airport building on Kwajalein. These were fitted with polycarbonate membrane filters with 5m diameter perforations. The flow rates were set to 0.5m3/min, and filters were changed once a week. After collection, filters were washed to remove salt and concentrate particles [see 5] in preparation for analysis by SEM. Results and Discussion: A selection of filters have been prepared and surveyed. Due to their ease of identification our initial investigations have focused on particles resembling cosmic spherules. The spheres can be divided into three main groups: 1. Silicate spherules rich in Al, Ca, K and Na (to varying degrees), 2. Silicate spherules rich in Mg and Fe and 3. Fe-rich spherules. Group 1 spherules are often vesiculated and can occur as aggregates. They are similar in appearance and composition to volcanic microspheres [e.g. 6] and are thus likely terrestrial in origin (volcanic). Those of groups 2 and 3, however, typically exhibit quenched surface textures consistent with cosmic spherules. Initial results suggest there is significant variation in the abundance of these groups from filter to filter. Work is ongoing to fully characterize these spherules and to constrain their flux with time

    Analysis of Cosmic Spherule Candidates from the Kwajalein Micrometeorite Collection

    Get PDF
    The Kwajalein micrometeorite collection utilised high volume air samplers fitted with 5 micrometer laser-etched polycarbonate membrane filters to capture particles directly from the atmosphere. The filters were changed weekly over several months throughout 2011/12, providing the opportunity to investigate the contemporary flux of micrometeorites. We recently reported the results of our initial survey of cosmic spherule-like particles on several of these filters. We identified three main groups of particle based on bulk compositions: 1. Silicate spherules rich in Mg, Ca and Fe, 2. Silicate spherules rich in Al, Ca, K and/or Na and 3. Fe-rich spherules. Abundances appeared to change over time suggesting links with celestial activity (e.g. meteor showers), however, spherules similar to groups 2 and 3 can be produced by terrestrial and anthropogenic activity (e.g. volcanic microspherules exhibit similar compositions to group 2 spherules and metallic spherules similar to those of group 3 can be formed during fuel combustion). We are now studying the internal structures and chemistries of these spherules and comparing against cosmic spherules identified in other collections to confrim their origins and further contrain the contemporary micrometeorite flux. Particles are being picked, embedded in resin and polished through to reveal their interiors. Here we will describe our ongoing analyses of these particles via SEM. We will also introduce our new collection using this method that is currently being performed in the Antarctic

    Emotional valence and arousal affect reading in an interactive way: neuroimaging evidence for an approach-withdrawal framework

    Get PDF
    A growing body of literature shows that the emotional content of verbal material affects reading, wherein emotional words are given processing priority compared to neutral words. Human emotions can be conceptualised within a two-dimensional model comprised of emotional valence and arousal (intensity). These variables are at least in part distinct, but recent studies report interactive effects during implicit emotion processing and relate these to stimulus-evoked approach-withdrawal tendencies. The aim of the present study was to explore how valence and arousal interact at the neural level, during implicit emotion word processing. The emotional attributes of written word stimuli were orthogonally manipulated based on behavioural ratings from a corpus of emotion words. Stimuli were presented during an fMRI experiment while 16 participants performed a lexical decision task, which did not require explicit evaluation of a word's emotional content. Results showed greater neural activation within right insular cortex in response to stimuli evoking conflicting approach-withdrawal tendencies (i.e., positive high-arousal and negative low-arousal words) compared to stimuli evoking congruent approach vs. withdrawal tendencies (i.e., positive low-arousal and negative high-arousal words). Further, a significant cluster of activation in the left extra-striate cortex was found in response to emotional than neutral words, suggesting enhanced perceptual processing of emotionally salient stimuli. These findings support an interactive two-dimensional approach to the study of emotion word recognition and suggest that the integration of valence and arousal dimensions recruits a brain region associated with interoception, emotional awareness and sympathetic functions

    Long-lived photoexcited states in polydiacetylenes with different molecular and supramolecular organization

    Get PDF
    With the aim of determining the importance of the molecular and supramolecular organization on the excited states of polydiacetylenes, we have studied the photoinduced absorption spectra of the red form of poly[1,6-bis(3,6-didodecyl-N-carbazolyl)-2,4-hexadiyne] (polyDCHD-S) and the results compared with those of the blue form of the same polymer. An interpretation of the data is given in terms of both the conjugation length and the interbackbone separation also in relation to the photoinduced absorption spectra of both blue and red forms of poly[1,6-bis(N-carbazolyl)-2,4-hexadiyne] (polyDCHD), which does not carry the alkyl substituents on the carbazolyl side groups. Information on the above properties is derived from the analysis of the absorption and Raman spectra of this class of polydiacetylenes
    • …
    corecore