5 research outputs found
Dimensionality reduction for machine learning using statistical methods: A case study on predicting mechanical properties of steels
Steel manufacturing is a long and complicated process including refining, casting, and rolling; hundreds of process parameters can potentially influence the mechanical properties of final products. This complexity results in significant challenges in correlating input parameters with final mechanical properties. Machine learning models, neural networks and XGBoost, have been used in the prediction of mechanical properties, however, interpretability remains an issue, especially in the case of neural networks. In this study, a statistical method - iGATE is utilised to reduce dimension of inputs in predicting mechanical properties of hot-rolled steel plates. It is found that iGATE can successfully extract the key features and reduce the dimension of inputs while maintaining a high prediction accuracy. With relative errors lower than 5 %, XGboost with full inputs has the best prediction performance. With reduced input dimensions, interference of irrelevant features diminishes, and the ranking of important key features is more reliable. The iGATE methodology offers industry opportunities to identify the key input parameters in terms of materials chemistry and process variables to optimise mechanical properties of rolled plates
Multiple morphological constraints based complex gland segmentation in colorectal cancer pathology image analysis
Histological assessment of glands is one of the major concerns in colon cancer grading. Considering that poorly differentiated colorectal glands cannot be accurately segmented, we propose an approach for segmentation of glands in colon cancer images, based on the characteristics of lumens and rough gland boundaries. First, we use a U-net for stain separation to obtain H-, E-, and background stain intensity maps. Subsequently, epithelial nucleus is identified on the histopathology images, and the lumen segmentation is performed on the background intensity map. Then, we use the axis of least inertia based similar triangles as the spatial characteristics of lumens and epithelial nucleus, and a triangle membership is used to select glandular contour candidates from epithelial nucleus. By connecting lumens and epithelial nucleus, more accurate gland segmentation is performed based on the rough gland boundary. The proposed stain separation approach is unsupervised, and the stain separation makes the category information contained in the H&E image easy to identify and deal with the uneven stain intensity and the inconspicuous stain difference. In this project, we use deep learning to achieve stain separation by predicting the stain coefficient. Under the deep learning framework, we design a stain coefficient interval model to improve the stain generalization performance. Another innovation is that we propose the combination of the internal lumen contour of adenoma and the outer contour of epithelial cells to obtain a precise gland contour. We compare the performance of the proposed algorithm against that of several state of the art technologies on publicly available datasets. The results show that the segmentation approach combining the characteristics of lumens and rough gland boundary have better segmentation accuracy
Metallurgical Data Science for Steel Industry: A Case Study on Basic Oxygen Furnace
The steel industry has developed sensorization to generate data, monitoring systems, and steelmaking process control. The remaining challenges are data storage issues, lack of cross-production data links, and erroneous datasets, which significantly increase the quality control complexity. The development of a data-driven approach through artificial intelligence (AI) techniques enables machine learning techniques to big datasets aiming to provide process–property optimization and identify challenges and gaps in the data. Recently, computational capabilities and algorithmic developments have significantly grown in power and complexity, accelerating process optimization. Addressing large-scale industrial data process–property optimization strategies involve numerous influencing possessing factors but limited data. As one of the largest production chains in the world, the steel industry faces an ever-increasing demand for larger components, high levels of functionality, and quality of the final product. Herein, an integrated data-driven steelmaking case study is built with the aim of predicting and optimizing the final product composition and quality. Machine learning is used collaboratively with fundamental knowledge, first-principal calculation, and feedback into a backpropagation neural network (NN) model. Integrating data mining and machine learning generates reasonable predictions and addresses process efficiencies within the steelmaking furnaces. The ultimate goal is to enhance the digitalization of the steel industry.</p
A structural variation reference for medical and population genetics
Structural variants (SVs) rearrange large segments of DNA1 and can have profound consequences in evolution and human disease2,3. As national biobanks, disease-association studies, and clinical genetic testing have grown increasingly reliant on genome sequencing, population references such as the Genome Aggregation Database (gnomAD)4 have become integral in the interpretation of single-nucleotide variants (SNVs)5. However, there are no reference maps of SVs from high-coverage genome sequencing comparable to those for SNVs. Here we present a reference of sequence-resolved SVs constructed from 14,891 genomes across diverse global populations (54% non-European) in gnomAD. We discovered a rich and complex landscape of 433,371 SVs, from which we estimate that SVs are responsible for 25–29% of all rare protein-truncating events per genome. We found strong correlations between natural selection against damaging SNVs and rare SVs that disrupt or duplicate protein-coding sequence, which suggests that genes that are highly intolerant to loss-of-function are also sensitive to increased dosage6. We also uncovered modest selection against noncoding SVs in cis-regulatory elements, although selection against protein-truncating SVs was stronger than all noncoding effects. Finally, we identified very large (over one megabase), rare SVs in 3.9% of samples, and estimate that 0.13% of individuals may carry an SV that meets the existing criteria for clinically important incidental findings7. This SV resource is freely distributed via the gnomAD browser8 and will have broad utility in population genetics, disease-association studies, and diagnostic screening
Longitudinal double-spin asymmetry and cross section for inclusive jet production in polarized proton collisions at square root of s = 200 GeV
We report a measurement of the longitudinal double-spin asymmetry A(LL) and the differential cross section for inclusive midrapidity jet production in polarized proton collisions at s=200 GeV. The cross section data cover transverse momenta 5 < p(T)< 50 GeV/c and agree with next-to-leading order perturbative QCD evaluations. The A(LL) data cover 5 < p(T)< 17 GeV/c and disfavor at 98% C.L. maximal positive gluon polarization in the polarized nucleon