16 research outputs found

    Computational methods reveal novel functionalities of PIWI-interacting RNAs in human papillomavirus-induced head and neck squamous cell carcinoma.

    Get PDF
    Human papillomavirus (HPV) infection is the fastest growing cause of head and neck squamous cell carcinoma (HNSCC) today, but its role in malignant transformation remains unclear. This study aimed to conduct a comprehensive investigation of PIWI-interacting RNA (piRNA) alterations and functionalities in HPV-induced HNSCC. Using 77 RNA-sequencing datasets from TCGA, we examined differential expression of piRNAs between HPV16(+) HNSCC and HPV(-) Normal samples, identifying a panel of 30 HPV-dysregulated piRNAs. We then computationally investigated the potential mechanistic significances of these transcripts in HPV-induced HNSCC, identifying our panel of piRNAs to associate with the protein PIWIL4 as well as the RTL family of retrotransposon-like genes, possibly through direct binding interactions. We also recognized several HPV-dysregulated transcripts for their correlations with well-documented mutations and copy number variations in HNSCC as well as HNSCC clinical variables, demonstrating the potential ability of our piRNAs to play important roles in large-scale modulation of HNSCC in addition to their direct, smaller-scale interactions in this malignancy. The differential expression of key piRNAs, including NONHSAT077364, NONHSAT102574, and NONHSAT128479, was verified in vitro by evaluating endogenous expression in HPV(+) cancer vs. HPV(-) normal cell lines. Overall, our novel study provides a rigorous investigation of piRNA dysregulation in HPV-related HNSCC, and lends critical insight into the idea that these small regulatory transcripts may play crucial and previously unidentified roles in tumor pathogenesis and progression

    Classification of ozone pollution and analysis of meteorological factors in the Yangtze River Delta

    No full text
    ABSTRACT Serious regional ozone (O3) pollution often plagues the Yangtze River Delta (YRD). The formation mechanism of these regional pollution events, including the meteorological and emission factors leading to these pollution events and how to affect the distribution of O3, is still need further research and exploration. In this study, we first define the standard of O3 regional pollution in the YRD, and then select 248 regional pollution cases from 2015 to 2020 according to the defined standard. For the pollution cases in pollution months (May and June), PCT (principal component analysis in T-mode) classification method is used to classify the ozone concentration distribution in YRD area. The regional distribution of the O3 concentrations in the YRD is divided into five types, and the overall type (Type 1) accounts for 15%, which is related to the control of YRD area by high-pressure center. Under the control of high pressure, the weather is sunny with the high temperature, and this weather condition is favorable for ozone generation and intercity transmission, causing extensive pollution. The double center type (Type 2) accounts for 8%. This type of YRD is controlled by the front of the high pressure (the high-pressure center is located in North China), and the weather in the middle and north is conducive to the generation and transmission of O3. Inland type (Type 3) accounts for 24%. The main body of this type of high pressure is located in Mongolia. The easterly wind in YRD area is conducive to the inland transmission of O3 precursors. The northern coastal type (Type 4) accounts for 44%. This type of YRD area is mainly controlled by the weak pressure field. The weather in the northern coastal area is sunny and the solar radiation for a long time is conducive to the formation of O3. The southern coastal type (Type 5) accounts for 10%, the solar radiation is strong in the southern region mainly under the influence of the post-offshore high pressure. This study provides new insights into the relationship between O3 pollution distribution types and atmospheric circulation in YRD area, and reveals the difference of potential meteorological impacts of different O3 pollution distribution types

    Application of Deep Learning for Early Screening of Colorectal Precancerous Lesions under White Light Endoscopy

    No full text
    Background and Objective. Colorectal cancer (CRC) is a common gastrointestinal tumour with high morbidity and mortality. Endoscopic examination is an effective method for early detection of digestive system tumours. However, due to various reasons, missed diagnoses and misdiagnoses are common occurrences. Our goal is to use deep learning methods to establish colorectal lesion detection, positioning, and classification models based on white light endoscopic images and to design a computer-aided diagnosis (CAD) system to help physicians reduce the rate of missed diagnosis and improve the accuracy of the detection rate. Methods. We collected and sorted out the white light endoscopic images of some patients undergoing colonoscopy. The convolutional neural network model is used to detect whether the image contains lesions: CRC, colorectal adenoma (CRA), and colorectal polyps. The accuracy, sensitivity, and specificity rates are used as indicators to evaluate the model. Then, the instance segmentation model is used to locate and classify the lesions on the images containing lesions, and mAP (mean average precision), AP50, and AP75 are used to evaluate the performance of an instance segmentation model. Results. In the process of detecting whether the image contains lesions, we compared ResNet50 with the other four models, that is, AlexNet, VGG19, ResNet18, and GoogLeNet. The result is that ResNet50 performs better than several other models. It scored an accuracy of 93.0%, a sensitivity of 94.3%, and a specificity of 90.6%. In the process of localization and classification of the lesion in images containing lesions by Mask R-CNN, its mAP, AP50, and AP75 were 0.676, 0.903, and 0.833, respectively. Conclusion. We developed and compared five models for the detection of lesions in white light endoscopic images. ResNet50 showed the optimal performance, and Mask R-CNN model could be used to locate and classify lesions in images containing lesions

    Application of Machine-Learning-Based Fusion Model in Visibility Forecast: A Case Study of Shanghai, China

    No full text
    A visibility forecast model called a boosting-based fusion model (BFM) was established in this study. The model uses a fusion machine learning model based on multisource data, including air pollutants, meteorological observations, moderate resolution imaging spectroradiometer (MODIS) aerosol optical depth (AOD) data, and an operational regional atmospheric environmental modeling System for eastern China (RAEMS) outputs. Extreme gradient boosting (XGBoost), a light gradient boosting machine (LightGBM), and a numerical prediction method, i.e., RAEMS were fused to establish this prediction model. Three sets of prediction models, that is, BFM, LightGBM based on multisource data (LGBM), and RAEMS, were used to conduct visibility prediction tasks. The training set was from 1 January 2015 to 31 December 2018 and used several data pre-processing methods, including a synthetic minority over-sampling technique (SMOTE) data resampling, a loss function adjustment, and a 10-fold cross verification. Moreover, apart from the basic features (variables), more spatial and temporal gradient features were considered. The testing set was from 1 January to 31 December 2019 and was adopted to validate the feasibility of the BFM, LGBM, and RAEMS. Statistical indicators confirmed that the machine learning methods improved the RAEMS forecast significantly and consistently. The root mean square error and correlation coefficient of BFM for the next 24/48 h were 5.01/5.47 km and 0.80/0.77, respectively, which were much higher than those of RAEMS. The statistics and binary score analysis for different areas in Shanghai also proved the reliability and accuracy of using BFM, particularly in low-visibility forecasting. Overall, BFM is a suitable tool for predicting the visibility. It provides a more accurate visibility forecast for the next 24 and 48 h in Shanghai than LGBM and RAEMS. The results of this study provide support for real-time operational visibility forecasts

    Multi-Model Grand Ensemble Hydrologic Forecasting in the Fu River Basin Using Bayesian Model Averaging

    No full text
    Statistical post-processing for multi-model grand ensemble (GE) hydrologic predictions is necessary, in order to achieve more accurate and reliable probabilistic forecasts. This paper presents a case study which applies Bayesian model averaging (BMA) to statistically post-process raw GE runoff forecasts in the Fu River basin in China, at lead times ranging from 6 to 120 h. The raw forecasts were generated by running the Xinanjiang hydrologic model with ensemble forecasts (164 forecast members), using seven different “THORPEX Interactive Grand Global Ensemble” (TIGGE) weather centres as forcing inputs. Some measures, such as data transformation and high-dimensional optimization, were included in the experiment after considering the practical water regime and data conditions. The results indicate that the BMA post-processing method is capable of improving the performance of raw GE runoff forecasts, yielding more calibrated and sharp predictive probability density functions (PDFs), over a range of lead times from 24 to 120 h. The analysis of percentile forecasts in two different flood events illustrates the great potential and prospects of BMA GE probabilistic river discharge forecasts, for taking precautions against severe flooding events

    A Fast Weighted Fuzzy C-Medoids Clustering for Time Series Data Based on P-Splines

    No full text
    The rapid growth of digital information has produced massive amounts of time series data on rich features and most time series data are noisy and contain some outlier samples, which leads to a decline in the clustering effect. To efficiently discover the hidden statistical information about the data, a fast weighted fuzzy C-medoids clustering algorithm based on P-splines (PS-WFCMdd) is proposed for time series datasets in this study. Specifically, the P-spline method is used to fit the functional data related to the original time series data, and the obtained smooth-fitting data is used as the input of the clustering algorithm to enhance the ability to process the data set during the clustering process. Then, we define a new weighted method to further avoid the influence of outlier sample points in the weighted fuzzy C-medoids clustering process, to improve the robustness of our algorithm. We propose using the third version of mueen’s algorithm for similarity search (MASS 3) to measure the similarity between time series quickly and accurately, to further improve the clustering efficiency. Our new algorithm is compared with several other time series clustering algorithms, and the performance of the algorithm is evaluated experimentally on different types of time series examples. The experimental results show that our new method can speed up data processing and the comprehensive performance of each clustering evaluation index are relatively good

    Estimating the Routing Parameter of the Xin’anjiang Hydrological Model Based on Remote Sensing Data and Machine Learning

    No full text
    The parameters of hydrological models should be determined before applying those models to estimate or predict hydrological processes. The Xin’anjiang (XAJ) hydrological model is widely used throughout China. Since the prediction in ungauged basins (PUB) era, the regionalization of the XAJ model parameters has been a subject of intense focus; nevertheless, while many efforts have targeted parameters related to runoff yield using in-site data sets, classic regression has predominantly been applied. In this paper, we employed remotely sensed underlying surface data and a machine learning approach to establish models for estimating the runoff routing parameter, namely, CS, of the XAJ model. The study was conducted on 114 catchments from the Catchment Attributes and MEteorology for Large-sample Studies (CAMELS) data set, and the relationships between CS and various underlying surface characteristics were explored by a gradient-boosted regression tree (GBRT). The results showed that the drainage density, stream source density and area of the catchment were the three major factors with the most significant impact on CS. The best correlation coefficient (r), root mean square error (RMSE) and mean absolute error (MAE) between the GBRT-estimated and calibrated CS were 0.96, 0.06 and 0.04, respectively, verifying the good performance of GBRT in estimating CS. Although bias was noted between the GBRT-estimated and calibrated CS, runoff simulations using the GBRT-estimated CS could still achieve results comparable to those using the calibrated CS. Further validations based on two catchments in China confirmed the overall robustness and accuracy of simulating runoff processes using the GBRT-estimated CS. Our results confirm the following hypotheses: (1) with the help of large sample of catchments and associated remote sensing data, the ML-based approach can capture the nonstationary and nonlinear relationships between CS and the underlying surface characteristics and (2) CS estimated by ML from large samples has a robustness that can guarantee the overall performance of the XAJ mode. This study advances the methodology for quantitatively estimating the XAJ model parameters and can be extended to other parameters or other models

    Estimating the Routing Parameter of the Xin’anjiang Hydrological Model Based on Remote Sensing Data and Machine Learning

    No full text
    The parameters of hydrological models should be determined before applying those models to estimate or predict hydrological processes. The Xin’anjiang (XAJ) hydrological model is widely used throughout China. Since the prediction in ungauged basins (PUB) era, the regionalization of the XAJ model parameters has been a subject of intense focus; nevertheless, while many efforts have targeted parameters related to runoff yield using in-site data sets, classic regression has predominantly been applied. In this paper, we employed remotely sensed underlying surface data and a machine learning approach to establish models for estimating the runoff routing parameter, namely, CS, of the XAJ model. The study was conducted on 114 catchments from the Catchment Attributes and MEteorology for Large-sample Studies (CAMELS) data set, and the relationships between CS and various underlying surface characteristics were explored by a gradient-boosted regression tree (GBRT). The results showed that the drainage density, stream source density and area of the catchment were the three major factors with the most significant impact on CS. The best correlation coefficient (r), root mean square error (RMSE) and mean absolute error (MAE) between the GBRT-estimated and calibrated CS were 0.96, 0.06 and 0.04, respectively, verifying the good performance of GBRT in estimating CS. Although bias was noted between the GBRT-estimated and calibrated CS, runoff simulations using the GBRT-estimated CS could still achieve results comparable to those using the calibrated CS. Further validations based on two catchments in China confirmed the overall robustness and accuracy of simulating runoff processes using the GBRT-estimated CS. Our results confirm the following hypotheses: (1) with the help of large sample of catchments and associated remote sensing data, the ML-based approach can capture the nonstationary and nonlinear relationships between CS and the underlying surface characteristics and (2) CS estimated by ML from large samples has a robustness that can guarantee the overall performance of the XAJ mode. This study advances the methodology for quantitatively estimating the XAJ model parameters and can be extended to other parameters or other models
    corecore