3,626 research outputs found

    Harnessing machine learning for fiber-induced nonlinearity mitigation in long-haul coherent optical OFDM

    Get PDF
    Ā© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).Coherent optical orthogonal frequency division multiplexing (CO-OFDM) has attracted a lot of interest in optical fiber communications due to its simplified digital signal processing (DSP) units, high spectral-efficiency, flexibility, and tolerance to linear impairments. However, CO-OFDMā€™s high peak-to-average power ratio imposes high vulnerability to fiber-induced non-linearities. DSP-based machine learning has been considered as a promising approach for fiber non-linearity compensation without sacrificing computational complexity. In this paper, we review the existing machine learning approaches for CO-OFDM in a common framework and review the progress in this area with a focus on practical aspects and comparison with benchmark DSP solutions.Peer reviewe

    Comparison of Ī½-support vector regression and logistic equation for descriptive modeling of Lactobacillus plantarum growth

    Get PDF
    Due to the complexity and high non-linearity of bioprocess, most simple mathematical models fail to describe the exact behavior of biochemistry systems. As a novel type of learning method, support vector regression (SVR) owns the powerful capability to characterize problems via small sample, nonlinearity, high dimension and local minima. In this paper, we developed a Ī½-SVR model with genetic algorithms (GA) in the pre-estimate in Lactobacillus plantarum fermentation by comparing the predicting capability of logistic model and SVR model. 5-fold cross validation technique was applied in the SVR train to avoid over-fitting. The information of SVR parameters were obtained in the generation of 150 and the optimal parameters were C= 235.8935, Ļƒ= 8.3608, Ī½=0.7587. Correspondingly, the logistic model parameters Ī¼max and xmax were estimated as 0.4791 and 0.3498, respectively. The experimental results demonstrated that, SVR model excelled the logistic model based on the normalized mean square error (NMSE), mean absolute percentage error (MAPE) and the Pearson correlation coefficient R. We found that the Ī½-SVR model optimized by genetic algorithms could be a potential monitoring method for prediction of biomass.Key words: Support vector regression, genetic algorithm, logistic model, prediction of biomass

    The permafrost carbon inventory on the Tibetan Plateau : a new evaluation using deep sediment cores

    Get PDF
    Acknowledgements We are grateful for Dr. Jens Strauss and the other two anonymous reviewers for their insightful comments on an earlier version of this MS, and appreciate members of the IBCAS Sampling Campaign Teams for their assistance in field investigation. This work was supported by the National Basic Research Program of China on Global Change (2014CB954001 and 2015CB954201), National Natural Science Foundation of China (31322011 and 41371213), and the Thousand Young Talents Program.Peer reviewedPostprin

    Mapping the geogenic radon potential for Germany by machine learning

    Get PDF
    The radioactive gas radon (Rn) is considered as an indoor air pollutant due to its detrimental effects on humanhealth. In fact, exposure to Rn belongs to the most important causes for lung cancer after tobacco smoking. Thedominant source of indoor Rn is the ground beneath the house. The geogenic Rn potential (GRP) - a functionof soil gas Rn concentration and soil gas permeability - quantifies whatā€œearth delivers in terms of Rnā€and rep-resents a hazard indicator for elevated indoor Rn concentration. In this study, we aim at developing an improvedspatial continuous GRP map based on 4448field measurements of GRP distributed across Germany. Wefittedthree different machine learning algorithms, multivariate adaptive regression splines, random forest and supportvector machines utilizing 36 candidate predictors. Predictor selection, hyperparameter tuning and performanceassessment were conducted using a spatial cross-validation where the data was iteratively left out by spatialblocks of 40 km*40 km. Thisprocedure counteracts the effectofspatial auto-correlation in predictorand responsedata and minimizes dependence of training and test data. The spatial cross-validated performance statistics re-vealed that random forest provided the most accurate predictions. The predictors selected as informative reflectgeology, climate (temperature,precipitation and soil moisture), soil hydraulic, soilphysical (field capacity, coarsefraction) and soil chemical properties (potassium and nitrogen concentration). Model interpretation techniquessuch as predictor importance as well as partial and spatial dependence plots confirmed the hypothesized domi-nant effect of geology on GRP, but also revealed significant contributions of the other predictors. Partial and spa-tial dependence plots gave further valuable insight into the quantitative predictor-response relationship and itsspatial distribution. A comparison with a previous version of the German GRP map using 1359 independent testdata indicates a significantly better performance of the random forest based map

    When less is more: How increasing the complexity of machine learning strategies for geothermal energy assessments may not lead toward better estimates

    Get PDF
    Previous moderate- and high-temperature geothermal resource assessments of the western United States utilized data-driven methods and expert decisions to estimate resource favorability. Although expert decisions can add confidence to the modeling process by ensuring reasonable models are employed, expert decisions also introduce human and, thereby, model bias. This bias can present a source of error that reduces the predictive performance of the models and confidence in the resulting resource estimates. Our study aims to develop robust data-driven methods with the goals of reducing bias and improving predictive ability. We present and compare nine favorability maps for geothermal resources in the western United States using data from the U.S. Geological Survey\u27s 2008 geothermal resource assessment. Two favorability maps are created using the expert decision-dependent methods from the 2008 assessment (i.e., weight-of-evidence and logistic regression). With the same data, we then create six different favorability maps using logistic regression (without underlying expert decisions), XGBoost, and support-vector machines paired with two training strategies. The training strategies are customized to address the inherent challenges of applying machine learning to the geothermal training data, which have no negative examples and severe class imbalance. We also create another favorability map using an artificial neural network. We demonstrate that modern machine learning approaches can improve upon systems built with expert decisions. We also find that XGBoost, a non-linear algorithm, produces greater agreement with the 2008 results than linear logistic regression without expert decisions, because the expert decisions in the 2008 assessment rendered the otherwise linear approaches non-linear despite the fact that the 2008 assessment used only linear methods. The F1 scores for all approaches appear low (F1 score \u3c 0.10), do not improve with increasing model complexity, and, therefore, indicate the fundamental limitations of the input features (i.e., training data). Until improved feature data are incorporated into the assessment process, simple non-linear algorithms (e.g., XGBoost) perform equally well or better than more complex methods (e.g., artificial neural networks) and remain easier to interpret

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
    • ā€¦
    corecore