8 research outputs found

    Deep Reinforcement Learning-based Image Captioning with Embedding Reward

    Full text link
    Image captioning is a challenging problem owing to the complexity in understanding the image content and diverse ways of describing it in natural language. Recent advances in deep neural networks have substantially improved the performance of this task. Most state-of-the-art approaches follow an encoder-decoder framework, which generates captions using a sequential recurrent prediction model. However, in this paper, we introduce a novel decision-making framework for image captioning. We utilize a "policy network" and a "value network" to collaboratively generate captions. The policy network serves as a local guidance by providing the confidence of predicting the next word according to the current state. Additionally, the value network serves as a global and lookahead guidance by evaluating all possible extensions of the current state. In essence, it adjusts the goal of predicting the correct words towards the goal of generating captions similar to the ground truth captions. We train both networks using an actor-critic reinforcement learning model, with a novel reward defined by visual-semantic embedding. Extensive experiments and analyses on the Microsoft COCO dataset show that the proposed framework outperforms state-of-the-art approaches across different evaluation metrics

    Estimating Brain Age with Global and Local Dependencies

    Full text link
    The brain age has been proven to be a phenotype of relevance to cognitive performance and brain disease. Achieving accurate brain age prediction is an essential prerequisite for optimizing the predicted brain-age difference as a biomarker. As a comprehensive biological characteristic, the brain age is hard to be exploited accurately with models using feature engineering and local processing such as local convolution and recurrent operations that process one local neighborhood at a time. Instead, Vision Transformers learn global attentive interaction of patch tokens, introducing less inductive bias and modeling long-range dependencies. In terms of this, we proposed a novel network for learning brain age interpreting with global and local dependencies, where the corresponding representations are captured by Successive Permuted Transformer (SPT) and convolution blocks. The SPT brings computation efficiency and locates the 3D spatial information indirectly via continuously encoding 2D slices from different views. Finally, we collect a large cohort of 22645 subjects with ages ranging from 14 to 97 and our network performed the best among a series of deep learning methods, yielding a mean absolute error (MAE) of 2.855 in validation set, and 2.911 in an independent test set

    Local learning algorithms with application to action recognition and video analysis

    No full text
    Title from PDF of title page (University of Missouri--Columbia, viewed on March 18, 2013).The entire thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file; a non-technical public abstract, appears in the public.pdf file.Dissertation advisor: Dr. Tony X. HanIncludes bibliographical references.Vita.Ph. D. University of Missouri--Columbia 2012."December 2012"[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT AUTHOR'S REQUEST.] Activity analysis has been an active research area in recent years, due to its difficulties lying in the feature descriptor, video representation and learning algorithms. Several algorithms are proposed to conquer these difficulties. First, a mid-level feature, Histogram of Oriented Gradients Variations (HOGV), is developed for action recognition to solve the feature description problem. The proposed HOGV is not only stable due to its cell-block structure, but is also capable of capturing the static and dynamic characteristics of human actions. Second, video representation is one of the key problems in video analysis. To cope with the multi-label and rich context nature of videos, we propose to represent a video with a textual description resulted from video-to-text translation. This fully automatic translation is achieved by mapping local visual descriptors to key words distributions using a Visual-Textual Distribution (VTD) tree learned autonomously from the Internet. Third, we propose a new local learning algorithm noticing that rarely training data are evenly distributed in the input space, which downgrades the performance of the linear SVM classifier. Partitioning the input space in tandem with local learning may alleviate the unevenly data distribution problem. However, the extra model complexity introduced by partitioning frequently leads to overfitting. To solve this problem, we proposed Randomized Support Vector Forest (RSVF): Many partitions of the input space are constructed with partitioning regions amenable to the corresponding linear SVMs. The randomness of the partitions prevents the overfitting introduced by the over-complicated partitioning. Finally, we further explored the potential of non-patch feature. A generalized superpixelization algorithm with boundary preserving distance metric is proposed. It outperforms state-of-the-art superpixelization algorithms in three aspects: Generalizing to color and highly-textured images, generating compact superpixels, and computational efficiency.Includes bibliographical references

    Observing strain glass transition in Ti 33 Nb 15 Zr 25 Hf 25 O 2 high entropy alloy with Elinvar effect

    No full text
    Exploring the phase transition of high entropy alloys (HEAs) with multiple major elements is of great importance for understanding the underlying physical mechanisms. Macroscopic martensitic phase tran-sition has been frequently reported in HEAs, however, nanoscale microstructural phase evolution has not been investigated to the same extent. Herein, we have prepared the Ti33Nb15Zr25Hf25O2 HEA and investi-gated the strain glass transition and its associated properties using dynamic mechanical analysis and mi-crostructure characterization. We have found that the elastic modulus in Ti33Nb15Zr25Hf25O2 HEA deviates from Wachtman's equation and observed the Elinvar effect in the form of temperature-independent mod-ulus in the temperature range from 150 K to 450 K and frequency-dependence modulus around 220 K. The strain glass transition has been evidenced in Ti33Nb15Zr25Hf25O2 HEA by the formation and growth of nano-sized domains during in-situ transmission electron microscopy (TEM) cooling, and substantiated by the broken ergodicity during zero-field-cooling/field-cooling. The strain glass transition is believed to account for the Elinvar effect, where the modulus hardening of nano-sized domains compensates dynam-ically with the modulus softening of the transformable matrix.& COPY; 2023 Published by Elsevier Ltd on behalf of The editorial office of Journal of Materials Science & Technology

    DataSheet_1_Using a classification model for determining the value of liver radiological reports of patients with colorectal cancer.xlsx

    No full text
    BackgroundMedical imaging is critical in clinical practice, and high value radiological reports can positively assist clinicians. However, there is a lack of methods for determining the value of reports.ObjectiveThe purpose of this study was to establish an ensemble learning classification model using natural language processing (NLP) applied to the Chinese free text of radiological reports to determine their value for liver lesion detection in patients with colorectal cancer (CRC).MethodsRadiological reports of upper abdominal computed tomography (CT) and magnetic resonance imaging (MRI) were divided into five categories according to the results of liver lesion detection in patients with CRC. The NLP methods including word segmentation, stop word removal, and n-gram language model establishment were applied for each dataset. Then, a word-bag model was built, high-frequency words were selected as features, and an ensemble learning classification model was constructed. Several machine learning methods were applied, including logistic regression (LR), random forest (RF), and so on. We compared the accuracy between priori choosing pertinent word strings and our machine language methodologies.ResultsThe dataset of 2790 patients included CT without contrast (10.2%), CT with/without contrast (73.3%), MRI without contrast (1.8%), and MRI with/without contrast (14.6%). The ensemble learning classification model determined the value of reports effectively, reaching 95.91% in the CT with/without contrast dataset using XGBoost. The logistic regression, random forest, and support vector machine also achieved good classification accuracy, reaching 95.89%, 95.04%, and 95.00% respectively. The results of XGBoost were visualized using a confusion matrix. The numbers of errors in categories I, II and V were very small. ELI5 was used to select important words for each category. Words such as “no abnormality”, “suggest”, “fatty liver”, and “transfer” showed a relatively large degree of positive correlation with classification accuracy. The accuracy based on string pattern search method model was lower than that of machine learning.ConclusionsThe learning classification model based on NLP was an effective tool for determining the value of radiological reports focused on liver lesions. The study made it possible to analyze the value of medical imaging examinations on a large scale.</p
    corecore