8 research outputs found
Deep Reinforcement Learning-based Image Captioning with Embedding Reward
Image captioning is a challenging problem owing to the complexity in
understanding the image content and diverse ways of describing it in natural
language. Recent advances in deep neural networks have substantially improved
the performance of this task. Most state-of-the-art approaches follow an
encoder-decoder framework, which generates captions using a sequential
recurrent prediction model. However, in this paper, we introduce a novel
decision-making framework for image captioning. We utilize a "policy network"
and a "value network" to collaboratively generate captions. The policy network
serves as a local guidance by providing the confidence of predicting the next
word according to the current state. Additionally, the value network serves as
a global and lookahead guidance by evaluating all possible extensions of the
current state. In essence, it adjusts the goal of predicting the correct words
towards the goal of generating captions similar to the ground truth captions.
We train both networks using an actor-critic reinforcement learning model, with
a novel reward defined by visual-semantic embedding. Extensive experiments and
analyses on the Microsoft COCO dataset show that the proposed framework
outperforms state-of-the-art approaches across different evaluation metrics
Estimating Brain Age with Global and Local Dependencies
The brain age has been proven to be a phenotype of relevance to cognitive
performance and brain disease. Achieving accurate brain age prediction is an
essential prerequisite for optimizing the predicted brain-age difference as a
biomarker. As a comprehensive biological characteristic, the brain age is hard
to be exploited accurately with models using feature engineering and local
processing such as local convolution and recurrent operations that process one
local neighborhood at a time. Instead, Vision Transformers learn global
attentive interaction of patch tokens, introducing less inductive bias and
modeling long-range dependencies. In terms of this, we proposed a novel network
for learning brain age interpreting with global and local dependencies, where
the corresponding representations are captured by Successive Permuted
Transformer (SPT) and convolution blocks. The SPT brings computation efficiency
and locates the 3D spatial information indirectly via continuously encoding 2D
slices from different views. Finally, we collect a large cohort of 22645
subjects with ages ranging from 14 to 97 and our network performed the best
among a series of deep learning methods, yielding a mean absolute error (MAE)
of 2.855 in validation set, and 2.911 in an independent test set
Local learning algorithms with application to action recognition and video analysis
Title from PDF of title page (University of Missouri--Columbia, viewed on March 18, 2013).The entire thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file; a non-technical public abstract, appears in the public.pdf file.Dissertation advisor: Dr. Tony X. HanIncludes bibliographical references.Vita.Ph. D. University of Missouri--Columbia 2012."December 2012"[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT AUTHOR'S REQUEST.] Activity analysis has been an active research area in recent years, due to its difficulties lying in the feature descriptor, video representation and learning algorithms. Several algorithms are proposed to conquer these difficulties. First, a mid-level feature, Histogram of Oriented Gradients Variations (HOGV), is developed for action recognition to solve the feature description problem. The proposed HOGV is not only stable due to its cell-block structure, but is also capable of capturing the static and dynamic characteristics of human actions. Second, video representation is one of the key problems in video analysis. To cope with the multi-label and rich context nature of videos, we propose to represent a video with a textual description resulted from video-to-text translation. This fully automatic translation is achieved by mapping local visual descriptors to key words distributions using a Visual-Textual Distribution (VTD) tree learned autonomously from the Internet. Third, we propose a new local learning algorithm noticing that rarely training data are evenly distributed in the input space, which downgrades the performance of the linear SVM classifier. Partitioning the input space in tandem with local learning may alleviate the unevenly data distribution problem. However, the extra model complexity introduced by partitioning frequently leads to overfitting. To solve this problem, we proposed Randomized Support Vector Forest (RSVF): Many partitions of the input space are constructed with partitioning regions amenable to the corresponding linear SVMs. The randomness of the partitions prevents the overfitting introduced by the over-complicated partitioning. Finally, we further explored the potential of non-patch feature. A generalized superpixelization algorithm with boundary preserving distance metric is proposed. It outperforms state-of-the-art superpixelization algorithms in three aspects: Generalizing to color and highly-textured images, generating compact superpixels, and computational efficiency.Includes bibliographical references
Observing strain glass transition in Ti 33 Nb 15 Zr 25 Hf 25 O 2 high entropy alloy with Elinvar effect
Exploring the phase transition of high entropy alloys (HEAs) with multiple major elements is of great importance for understanding the underlying physical mechanisms. Macroscopic martensitic phase tran-sition has been frequently reported in HEAs, however, nanoscale microstructural phase evolution has not been investigated to the same extent. Herein, we have prepared the Ti33Nb15Zr25Hf25O2 HEA and investi-gated the strain glass transition and its associated properties using dynamic mechanical analysis and mi-crostructure characterization. We have found that the elastic modulus in Ti33Nb15Zr25Hf25O2 HEA deviates from Wachtman's equation and observed the Elinvar effect in the form of temperature-independent mod-ulus in the temperature range from 150 K to 450 K and frequency-dependence modulus around 220 K. The strain glass transition has been evidenced in Ti33Nb15Zr25Hf25O2 HEA by the formation and growth of nano-sized domains during in-situ transmission electron microscopy (TEM) cooling, and substantiated by the broken ergodicity during zero-field-cooling/field-cooling. The strain glass transition is believed to account for the Elinvar effect, where the modulus hardening of nano-sized domains compensates dynam-ically with the modulus softening of the transformable matrix.& COPY; 2023 Published by Elsevier Ltd on behalf of The editorial office of Journal of Materials Science & Technology
DataSheet_1_Using a classification model for determining the value of liver radiological reports of patients with colorectal cancer.xlsx
BackgroundMedical imaging is critical in clinical practice, and high value radiological reports can positively assist clinicians. However, there is a lack of methods for determining the value of reports.ObjectiveThe purpose of this study was to establish an ensemble learning classification model using natural language processing (NLP) applied to the Chinese free text of radiological reports to determine their value for liver lesion detection in patients with colorectal cancer (CRC).MethodsRadiological reports of upper abdominal computed tomography (CT) and magnetic resonance imaging (MRI) were divided into five categories according to the results of liver lesion detection in patients with CRC. The NLP methods including word segmentation, stop word removal, and n-gram language model establishment were applied for each dataset. Then, a word-bag model was built, high-frequency words were selected as features, and an ensemble learning classification model was constructed. Several machine learning methods were applied, including logistic regression (LR), random forest (RF), and so on. We compared the accuracy between priori choosing pertinent word strings and our machine language methodologies.ResultsThe dataset of 2790 patients included CT without contrast (10.2%), CT with/without contrast (73.3%), MRI without contrast (1.8%), and MRI with/without contrast (14.6%). The ensemble learning classification model determined the value of reports effectively, reaching 95.91% in the CT with/without contrast dataset using XGBoost. The logistic regression, random forest, and support vector machine also achieved good classification accuracy, reaching 95.89%, 95.04%, and 95.00% respectively. The results of XGBoost were visualized using a confusion matrix. The numbers of errors in categories I, II and V were very small. ELI5 was used to select important words for each category. Words such as “no abnormality”, “suggest”, “fatty liver”, and “transfer” showed a relatively large degree of positive correlation with classification accuracy. The accuracy based on string pattern search method model was lower than that of machine learning.ConclusionsThe learning classification model based on NLP was an effective tool for determining the value of radiological reports focused on liver lesions. The study made it possible to analyze the value of medical imaging examinations on a large scale.</p