Search CORE

202,663 research outputs found

Identifying a Minimal Class of Models for High-dimensional Data

Author: Daniel Nevo
Ya &apos
Publication venue
Publication date: 01/01/2017
Field of study

Abstract Model selection consistency in the high-dimensional regression setting can be achieved only if strong assumptions are fulfilled. We therefore suggest to pursue a different goal, which we call a minimal class of models. The minimal class of models includes models that are similar in their prediction accuracy but not necessarily in their elements. We suggest a random search algorithm to reveal candidate models. The algorithm implements simulated annealing while using a score for each predictor that we suggest to derive using a combination of the lasso and the elastic net. The utility of using a minimal class of models is demonstrated in the analysis of two data sets

CiteSeerX

A survey of outlier detection methodologies

Author: Austin J.
Hodge V.J.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review

CiteSeerX

Crossref

White Rose Research Online

Machine Learning and Integrative Analysis of Biomedical Big Data.

Author: Choi Howard
Chung Neo Christopher
Mirza Bilal
Ping Peipei
Wang Jie
Wang Wei
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

Multidisciplinary Digital Publishing Institute

Ezid

Directory of Open Access Journals

eScholarship - University of California

What are the Best Hierarchical Descriptors for Complex Networks?

Author: Abe S
Clauset A Moore C Newman M E
Costa L da F
Costa L da F Kaiser M Hilgetag C
Duda R O
Erdos P
Han J
Hand D
Luciano da Fontoura Costa
McLachlan G J
Roberto Fernandes Silva Andrade
Wasserman S
Publication venue: 'IOP Publishing'
Publication date: 29/05/2007
Field of study

This work reviews several hierarchical measurements of the topology of complex networks and then applies feature selection concepts and methods in order to quantify the relative importance of each measurement with respect to the discrimination between four representative theoretical network models, namely Erd\"{o}s-R\'enyi, Barab\'asi-Albert, Watts-Strogatz as well as a geographical type of network. The obtained results confirmed that the four models can be well-separated by using a combination of measurements. In addition, the relative contribution of each considered feature for the overall discrimination of the models was quantified in terms of the respective weights in the canonical projection into two dimensions, with the traditional clustering coefficient, hierarchical clustering coefficient and neighborhood clustering coefficient resulting particularly effective. Interestingly, the average shortest path length and hierarchical node degrees contributed little for the separation of the four network models.Comment: 9 pages, 4 figure

arXiv.org e-Print Archive

Crossref