4,005 research outputs found

    The NWRA Classification Infrastructure: Description and Extension to the Discriminant Analysis Flare Forecasting System (DAFFS)

    Full text link
    A classification infrastructure built upon Discriminant Analysis has been developed at NorthWest Research Associates for examining the statistical differences between samples of two known populations. Originating to examine the physical differences between flare-quiet and flare-imminent solar active regions, we describe herein some details of the infrastructure including: parametrization of large datasets, schemes for handling "null" and "bad" data in multi-parameter analysis, application of non-parametric multi-dimensional Discriminant Analysis, an extension through Bayes' theorem to probabilistic classification, and methods invoked for evaluating classifier success. The classifier infrastructure is applicable to a wide range of scientific questions in solar physics. We demonstrate its application to the question of distinguishing flare-imminent from flare-quiet solar active regions, updating results from the original publications that were based on different data and much smaller sample sizes. Finally, as a demonstration of "Research to Operations" efforts in the space-weather forecasting context, we present the Discriminant Analysis Flare Forecasting System (DAFFS), a near-real-time operationally-running solar flare forecasting tool that was developed from the research-directed infrastructure.Comment: J. Space Weather Space Climate: Accepted / in press; access supplementary materials through journal; some figures are less than full resolution for arXi

    A Survey of Prediction and Classification Techniques in Multicore Processor Systems

    Get PDF
    In multicore processor systems, being able to accurately predict the future provides new optimization opportunities, which otherwise could not be exploited. For example, an oracle able to predict a certain application\u27s behavior running on a smart phone could direct the power manager to switch to appropriate dynamic voltage and frequency scaling modes that would guarantee minimum levels of desired performance while saving energy consumption and thereby prolonging battery life. Using predictions enables systems to become proactive rather than continue to operate in a reactive manner. This prediction-based proactive approach has become increasingly popular in the design and optimization of integrated circuits and of multicore processor systems. Prediction transforms from simple forecasting to sophisticated machine learning based prediction and classification that learns from existing data, employs data mining, and predicts future behavior. This can be exploited by novel optimization techniques that can span across all layers of the computing stack. In this survey paper, we present a discussion of the most popular techniques on prediction and classification in the general context of computing systems with emphasis on multicore processors. The paper is far from comprehensive, but, it will help the reader interested in employing prediction in optimization of multicore processor systems

    Multi-Temporal Remote-Sensing-based Mapping and Characterization of Landscape Evolution of a Meandering River Floodplain

    Get PDF
    Large meandering river floodplains are critical components of the Earth ecosystems for their high biodiversity and productivity. However, it is challenging to study these regions because of their complex land-covers and dynamic surface processes. This study applies soft classification and change-detection analysis to five Landsat 5 Thematic Mapper (TM) satellite images to examine long-term surface-cover composition and configuration change of the Rio Beni floodplain in Bolivia from 1987 to 2006. One hard/crisp classification algorithm (i.e., ISODATA) and two soft classification algorithms (i.e., Bayes classification and fuzzy classification) were applied to the study-area satellite images to examine the performances of classifying and mapping meandering river-floodplain environments between hard and soft classification approaches. In all five scenes, three algorithms achieved ~90% classification accuracy via hard classification outputs. However, the two soft algorithms were of more utility in this study because their results were less affected by “salt-and-pepper” noise and provided extra land-cover probability/membership layers. A novel change-detection algorithm was proposed in this study, namely Modified Change Vector Analysis (MCVA). The MCVA operated in fuzzy-membership space, considered change uncertainty during the thresholding stage, and utilized change-vector directions to modify the determination of change/no-change status for each pixel. A fuzzy Markov Random Field (FMRF) model was applied to further refine the change maps by incorporating spatial change uncertainty. A second thresholding stage was also applied to separate a type of change referred to as “transitional change,” which preserved fuzzy membership information and provided a concise map output. Compared with three traditional change-detection algorithms, the MCVA achieved higher change-detection accuracy and provided more detailed change dynamics regarding the land-surface change. Dynamics of major floodplain cover types (i.e., oxbow lakes, river, sand, forest, non-forest vegetation, and dry and wet soil) were investigated via multi-temporal analysis. Over the observing period of 1987 to 2006, 74.4% of pixels remained the same land-cover, 20% experienced clear land-cover change and 5.6% experienced transitional land-cover change. The riparian area experienced more dramatic change than other parts of the Rio Beni floodplain during this period. Additional analysis of landscape metrics provided information regarding the spatial patterns of the land-cover, but future work would be needed to further examine its utility in understanding floodplain dynamics. This study provides information on remote-sensing-based mapping and quantitative characterization methods for meandering river floodplains. The spatiotemporal patterns of landscape on Rio Beni floodplain can be used in sustainable management and protection of floodplain ecosystems

    Thirty Years of Machine Learning: The Road to Pareto-Optimal Wireless Networks

    Full text link
    Future wireless networks have a substantial potential in terms of supporting a broad range of complex compelling applications both in military and civilian fields, where the users are able to enjoy high-rate, low-latency, low-cost and reliable information services. Achieving this ambitious goal requires new radio techniques for adaptive learning and intelligent decision making because of the complex heterogeneous nature of the network structures and wireless services. Machine learning (ML) algorithms have great success in supporting big data analytics, efficient parameter estimation and interactive decision making. Hence, in this article, we review the thirty-year history of ML by elaborating on supervised learning, unsupervised learning, reinforcement learning and deep learning. Furthermore, we investigate their employment in the compelling applications of wireless networks, including heterogeneous networks (HetNets), cognitive radios (CR), Internet of things (IoT), machine to machine networks (M2M), and so on. This article aims for assisting the readers in clarifying the motivation and methodology of the various ML algorithms, so as to invoke them for hitherto unexplored services as well as scenarios of future wireless networks.Comment: 46 pages, 22 fig

    Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values

    Full text link
    This work is motivated by the needs of predictive analytics on healthcare data as represented by Electronic Medical Records. Such data is invariably problematic: noisy, with missing entries, with imbalance in classes of interests, leading to serious bias in predictive modeling. Since standard data mining methods often produce poor performance measures, we argue for development of specialized techniques of data-preprocessing and classification. In this paper, we propose a new method to simultaneously classify large datasets and reduce the effects of missing values. It is based on a multilevel framework of the cost-sensitive SVM and the expected maximization imputation method for missing values, which relies on iterated regression analyses. We compare classification results of multilevel SVM-based algorithms on public benchmark datasets with imbalanced classes and missing values as well as real data in health applications, and show that our multilevel SVM-based method produces fast, and more accurate and robust classification results.Comment: arXiv admin note: substantial text overlap with arXiv:1503.0625

    Aleatoric and Epistemic Discrimination: Fundamental Limits of Fairness Interventions

    Full text link
    Machine learning (ML) models can underperform on certain population groups due to choices made during model development and bias inherent in the data. We categorize sources of discrimination in the ML pipeline into two classes: aleatoric discrimination, which is inherent in the data distribution, and epistemic discrimination, which is due to decisions made during model development. We quantify aleatoric discrimination by determining the performance limits of a model under fairness constraints, assuming perfect knowledge of the data distribution. We demonstrate how to characterize aleatoric discrimination by applying Blackwell's results on comparing statistical experiments. We then quantify epistemic discrimination as the gap between a model's accuracy when fairness constraints are applied and the limit posed by aleatoric discrimination. We apply this approach to benchmark existing fairness interventions and investigate fairness risks in data with missing values. Our results indicate that state-of-the-art fairness interventions are effective at removing epistemic discrimination on standard (overused) tabular datasets. However, when data has missing values, there is still significant room for improvement in handling aleatoric discrimination

    Spectral Detection of Acute Mental Stress with VIS-SWIR Hyperspectral Imagery

    Get PDF
    The ability to identify a stressed person is becoming an important aspect across different work environments. Especially in higher-stress career fields, such as first responders and air traffic controllers, mental stress can inhibit a person\u27s ability to accomplish their job. A person\u27s efficiency and psychological state in the work environment can be impeded due to poor mental health. Stress can result in harmful effects on the body, both physically and mentally, including depression, lack of sleep, and fatigue, which can lead to reduced work productivity. Research is being conducted to detect stress in workload-intensive environments. This thesis implements an imaging approach that utilizes hyperspectral data across the visible through shortwave infrared electromagnetic spectrum. The data is applied to the feature selection algorithms ReliefF, Support Vector Machine Attribute Evaluator (SVM AE), and Non-Correlated Aided Simulated Annealing Feature Selection-Integrated Distribution Function (NASAFS-IDF) to obtain features that discriminate between the classes, stress and non-stress. This data is classified using naive Bayes, Support Vector Machine (SVM), and decision tree methodologies. The feature set and classifier that produce the highest classification results are calculated using percent accuracy and area under the curve (AUC). The reported results are divided into contact and non-contact (NC) validation sets. The contact validation returned a high accuracy of 96.30% and high AUC of 0.979. Validation on NC models returned a high accuracy of 99.64% and high AUC of 0.998

    Feature Selection for Identification of Transcriptome and Clinical Biomarkers for Relapse in Colon Cancer

    Get PDF
    This study attempts to find good predictive biomarkers for recurrence in colon cancer between two data sources of both mRNA and miRNA expression from frozen tumor samples. In total four datasets, two data sources and two data types, were examined; mRNA TCGA (n=446), miRNA TCGA (n=416), mRNA HDS (n=79), and miRNA HDS (n=128). The intersection of the feature space of both data sources was used in the analysis such that models trained on one data source could be tested on the other. A set of wrapper and filter methods were applied to each dataset separately to perform feature selection, and from each model the k best number of features was selected, where k is taken from a list of set numbers between 2 and 250. A randomized grid search was used to optimize four classifiers over their hyperparameter space where an additional hyperparameter was the feature selection method used. All models were trained with cross validation and tested on the other data source to determine generalization. Most models failed to generalize to the other data source, showing clear signs of overfitting. Furthermore, there was next to no overlap between selected features from one data source to the other, indicating that the underlying feature distribution was different between the two sources, which is shown to be the case in a few examples. The best generalizing models where based on clinical information and second best was on the combined feature space of mRNA and miRNA data.Master's Thesis in InformaticsINF399MAMN-PROGMAMN-IN
    • …
    corecore