231 research outputs found
Trajectory-Oriented Policy Optimization with Sparse Rewards
Mastering deep reinforcement learning (DRL) proves challenging in tasks
featuring scant rewards. These limited rewards merely signify whether the task
is partially or entirely accomplished, necessitating various exploration
actions before the agent garners meaningful feedback. Consequently, the
majority of existing DRL exploration algorithms struggle to acquire practical
policies within a reasonable timeframe. To address this challenge, we introduce
an approach leveraging offline demonstration trajectories for swifter and more
efficient online RL in environments with sparse rewards. Our pivotal insight
involves treating offline demonstration trajectories as guidance, rather than
mere imitation, allowing our method to learn a policy whose distribution of
state-action visitation marginally matches that of offline demonstrations. We
specifically introduce a novel trajectory distance relying on maximum mean
discrepancy (MMD) and cast policy optimization as a distance-constrained
optimization problem. We then illustrate that this optimization problem can be
streamlined into a policy-gradient algorithm, integrating rewards shaped by
insights from offline demonstrations. The proposed algorithm undergoes
evaluation across extensive discrete and continuous control tasks with sparse
and misleading rewards. The experimental findings demonstrate the significant
superiority of our proposed algorithm over baseline methods concerning diverse
exploration and the acquisition of an optimal policy.Comment: 6 pages, 7 figure
Engineering Heterologous Production of Salicylate Glucoside and Glycosylated Variants
Salicylate 2-O-β-D-glucoside (SAG) is a plant-derived natural product with potential utility as both an anti-inflammatory and as a plant protectant compound. Heterologous biosynthesis of SAG has been established in Escherichia coli through metabolic engineering of the shikimate pathways and introduction of a heterologous biosynthetic step to allow a more directed route to the salicylate precursor. The final SAG compound resulted from the separate introduction of an Arabidopsis thaliana glucosyltransferase enzyme. In this study, a range of heterologous engineering parameters were varied (including biosynthetic pathway construction, expression plasmid, and E. coli strain) for the improvement of SAG specific production in conjunction with a system demonstrating improved plasmid stability. In addition, the glucoside moiety of SAG was systematically varied through the introduction of the heterologous oliose and olivose deoxysugar pathways. Production of analogs was observed for each newly constructed pathway, demonstrating biosynthetic diversification potential; however, production titers were reduced relative to the original SAG compound
A NEW METHOD TO CONTROL THE REGIONAL STRATA MOVEMENT OF SUPER-THICK WEAK CEMENTATION OVERBURDEN IN DEEP MINING
In the western of china, the deep mining area with super-thick and weak cementation overburden is vast, sparsely populated and the ecological environment is extremely fragile. With the large-scale exploitation of deep coal resources, it is inevitable to face green mining problem, whose essence is the surface subsidence control. Therefore, it is necessary to study the control technology for the regional mining based on the evolution law of subsidence movement and energy-polling of super-thick and weak cementation overburden, and put forward the economically design scheme that can control strata movement and surface subsidence in a certain degree. Based on the key strata control theory, this paper puts forward the subsidence control scheme of partial filling -partial caving in multi-working face coordinated mining, and further studies its control mechanism through the numerical simulation and then analyzes the control effect of the strata movement and energy-polling in the fully caving mining, backfill mining, wide strip skip-mining and mixed filling mining method etc., the following conclusions are detailed as follows: (1) The maximum value of energy-polling occurs on the coal pillars or on both sides of goaf. With the width of goaf, the maximum value of energy-polling increases in a parabola. (2) In the partial filling-partial caving multiple working faces coordinated mining based on the main key stratum, the stress distribution of the composite backfill in the filling working face is parabolic, and it is high on both sides and low in the middle. Moreover, in the composite backfill, the stress concentration degree of a outside coal pillar is greater than that of the inside coal pillar. (3)The control mechanism of partial filling-partial caving harmonious mining based on main key layer structure is the double-control cooperative deformation system, formed by the composite backfill and the main and sub-key layers structure. They jointly control the movement and energy accumulation of overlying strata by greatly reducing the effective space to transmit upward, and absorb the wave subsidence trend of the overburden until it develops into a single flat subsidence basin. (4) Considering the recovery rate, pillar rate, area filling rate, technical difficulty and subsidence coefficient etc., the partial filling-partial caving multiple working faces coordinated mining based on the main key stratum is the most cost-effective mining method to control surface subsidence. This paper takes a guiding role in controlling the regional strata movement and surface subsidence of deep mining with super-thick and weak cementation overburden
Learning Diverse Policies with Soft Self-Generated Guidance
Reinforcement learning (RL) with sparse and deceptive rewards is challenging
because non-zero rewards are rarely obtained. Hence, the gradient calculated by
the agent can be stochastic and without valid information. Recent studies that
utilize memory buffers of previous experiences can lead to a more efficient
learning process. However, existing methods often require these experiences to
be successful and may overly exploit them, which can cause the agent to adopt
suboptimal behaviors. This paper develops an approach that uses diverse past
trajectories for faster and more efficient online RL, even if these
trajectories are suboptimal or not highly rewarded. The proposed algorithm
combines a policy improvement step with an additional exploration step using
offline demonstration data. The main contribution of this paper is that by
regarding diverse past trajectories as guidance, instead of imitating them, our
method directs its policy to follow and expand past trajectories while still
being able to learn without rewards and approach optimality. Furthermore, a
novel diversity measurement is introduced to maintain the team's diversity and
regulate exploration. The proposed algorithm is evaluated on discrete and
continuous control tasks with sparse and deceptive rewards. Compared with the
existing RL methods, the experimental results indicate that our proposed
algorithm is significantly better than the baseline methods regarding diverse
exploration and avoiding local optima.Comment: 23 pages, 19 figure
Adaptive trajectory-constrained exploration strategy for deep reinforcement learning
Deep reinforcement learning (DRL) faces significant challenges in addressing
the hard-exploration problems in tasks with sparse or deceptive rewards and
large state spaces. These challenges severely limit the practical application
of DRL. Most previous exploration methods relied on complex architectures to
estimate state novelty or introduced sensitive hyperparameters, resulting in
instability. To mitigate these issues, we propose an efficient adaptive
trajectory-constrained exploration strategy for DRL. The proposed method guides
the policy of the agent away from suboptimal solutions by leveraging incomplete
offline demonstrations as references. This approach gradually expands the
exploration scope of the agent and strives for optimality in a constrained
optimization manner. Additionally, we introduce a novel policy-gradient-based
optimization algorithm that utilizes adaptively clipped trajectory-distance
rewards for both single- and multi-agent reinforcement learning. We provide a
theoretical analysis of our method, including a deduction of the worst-case
approximation error bounds, highlighting the validity of our approach for
enhancing exploration. To evaluate the effectiveness of the proposed method, we
conducted experiments on two large 2D grid world mazes and several MuJoCo
tasks. The extensive experimental results demonstrate the significant
advantages of our method in achieving temporally extended exploration and
avoiding myopic and suboptimal behaviors in both single- and multi-agent
settings. Notably, the specific metrics and quantifiable results further
support these findings. The code used in the study is available at
\url{https://github.com/buaawgj/TACE}.Comment: 35 pages, 36 figures; accepted by Knowledge-Based Systems, not
publishe
Effects of tillage and maturity stage on the yield, nutritive composition, and silage fermentation quality of whole-crop wheat
Whole-crop wheat (Triticum aestivum, WCW) has a high nutritional value and digestibility. No-tillage (NT) can reduces energy and labor inputs in the agricultural production process, thus decreasing production costs. There are many studies on planting techniques of WCW at present, few being on no-tillage planting. This study aimed to compare the effects of different tillage methods and maturity stages on the yield, nutritive value, and silage fermentation quality of WCW. The experiment included two tillage methods (NT; conventional tillage, CT), two maturity stages (flowering stage; milk stage), and three years (2016-2017; 2017-2018; 2018-2019). Years had a strong influence on the yield and nutritional composition of WCW. This was mainly related to the amount of rainfall, as it affects the seedling emergence rate of wheat. Although tillage methods showed no significant effects on the yield, plant height, and stem number per plant of WCW (P > 0.05), compared to CT, the dry matter (DM) and crude protein (CP) yields of NT decreased by 0.74 t/ha and 0.13 t/ha. Tillage methods showed no significant effects on the nutritive composition of WCW (P > 0.05). The WCW at the milk stage had greater DM (5.25 t/ha) and CP (0.60 t/ha) yields than that at the flowering stage (3.19 t/ha and 0.39 t/ha) (P< 0.05). The acid detergent fiber concentration of WCW decreased by 34.5% from the flowering to the milk stage, whereas water-soluble carbohydrates concentration increased by 50.6%. The CP concentration at the milk stage was lower than that at the flowering stage (P< 0.05). The lactic acid concentration of NT (17.1 g/kg DM) silage was lower than that of CT (26.6 g/kg DM) silage (P< 0.05). The WCW silage at the milk stage had a lower NH3-N concentration (125 g/kg TN) than that at the flowering stage (169 g/kg TN) (P< 0.05). Wheat sown by NT and CT was of similar yield and nutritional value, irrespective of harvest stages. WCW harvested at the milk stage had greater yield and better nutritional composition and silage fermentation quality than that at the flowering stage. Based upon the results of the membership function analysis, no-tillage sowing of wheat was feasible and harvesting at milk stage was recommended
Investigation of the microcrack evolution in a Ti-based bulk metallic glass matrix composite
AbstractThe initiation and evolution behavior of the shear-bands and microcracks in a Ti-based metallic-glass–matrix composite (MGMC) were investigated by using an in-situ tensile test under transmission electron microscopy (TEM). It was found that the plastic deformation of the Ti-based MGMC related with the generation of the plastic deformation zone in crystalline and shear deformation zone in glass phase near the crack tip. The dendrites can suppress the propagation of the shear band effectively. Before the rapid propagation of cracks, the extending of plastic deformation zone and shear deformation zone ahead of crack tip is the main pattern in the composite
A new model construction based on the knowledge graph for mining elite polyphenotype genes in crops
Identifying polyphenotype genes that simultaneously regulate important agronomic traits (e.g., plant height, yield, and disease resistance) is critical for developing novel high-quality crop varieties. Predicting the associations between genes and traits requires the organization and analysis of multi-dimensional scientific data. The existing methods for establishing the relationships between genomic data and phenotypic data can only elucidate the associations between genes and individual traits. However, there are relatively few methods for detecting elite polyphenotype genes. In this study, a knowledge graph for traits regulating-genes was constructed by collecting data from the PubMed database and eight other databases related to the staple food crops rice, maize, and wheat as well as the model plant Arabidopsis thaliana. On the basis of the knowledge graph, a model for predicting traits regulating-genes was constructed by combining the data attributes of the gene nodes and the topological relationship attributes of the gene nodes. Additionally, a scoring method for predicting the genes regulating specific traits was developed to screen for elite polyphenotype genes. A total of 125,591 nodes and 547,224 semantic relationships were included in the knowledge graph. The accuracy of the knowledge graph-based model for predicting traits regulating-genes was 0.89, the precision rate was 0.91, the recall rate was 0.96, and the F1 value was 0.94. Moreover, 4,447 polyphenotype genes for 31 trait combinations were identified, among which the rice polyphenotype gene IPA1 and the A. thaliana polyphenotype gene CUC2 were verified via a literature search. Furthermore, the wheat gene TraesCS5A02G275900 was revealed as a potential polyphenotype gene that will need to be further characterized. Meanwhile, the result of venn diagram analysis between the polyphenotype gene datasets (consists of genes that are predicted by our model) and the transcriptome gene datasets (consists of genes that were differential expression in response to disease, drought or salt) showed approximately 70% and 54% polyphenotype genes were identified in the transcriptome datasets of Arabidopsis and rice, respectively. The application of the model driven by knowledge graph for predicting traits regulating-genes represents a novel method for detecting elite polyphenotype genes
MONITORING DYNAMIC GLOBAL DEFLECTION OF A BRIDGE BY MONOCULAR DIGITAL PHOTOGRAPHY
This study uses MDP (monocular digital photography) to monitor the dynamic global deflection of a bridge with the PST-TBP (Photographing scale transformation-time baseline parallax) method in which the reference system set near the camera is perpendicular to the photographing direction and does not need parallel to the bridge plane. A SONY350 camera was used to shoot the bridge every two seconds when the excavator was moving on the bridge and produced ten image sequences. Results show that the PST-TBP method is effective in solving the problem of the photographing direction being perpendicular to the bridge plane in monitoring the bridge by MDP. The PST-TBP method can achieve sub-pixel matching accuracy (0.3 pixels). The maximal deflection of the bridge is 55.34 mm which is within the bridge’s allowed value of 75mm. The MDPS (monocular digital photography system) depicts deflection trends of the bridge in real time, which can warn the possible danger of the bridge in time. It provides key information to assess the bridge health on site and to study the dynamic global deformation mechanism of a bridge caused by dynamic vehicle load. MDP is expected to be applied to monitor the dynamic global deflection of a bridge
Prevalence of the GJB2 IVS1+1G >A mutation in Chinese hearing loss patients with monoallelic pathogenic mutation in the coding region of GJB2
<p>Abstract</p> <p>Background</p> <p>Mutations in the GJB2 gene are the most common cause of nonsyndromic recessive hearing loss in China. In about 6% of Chinese patients with severe to profound sensorineural hearing impairment, only monoallelic <it>GJB2 </it>mutations known to be either recessive or of unclear pathogenicity have been identified. This paper reports the prevalence of the <it>GJB2 </it>IVS1+1G>A mutation in a population of Chinese hearing loss patients with monoallelic pathogenic mutation in the coding region of <it>GJB2</it>.</p> <p>Methods</p> <p>Two hundred and twelve patients, screened from 7133 cases of nonsyndromic hearing loss in China, with monoallelic mutation (mainly frameshift and nonsense mutation) in the coding region of <it>GJB2 </it>were examined for the <it>GJB2 </it>IVS1+1G>A mutation and mutations in the promoter region of this gene. Two hundred and sixty-two nonsyndromic hearing loss patients without <it>GJB2 </it>mutation and 105 controls with normal hearing were also tested for the <it>GJB2 </it>IVS1+1G>A mutation by sequencing.</p> <p>Results</p> <p>Four patients with monoallelic mutation in the coding region of <it>GJB2 </it>were found carrying the <it>GJB2 </it>IVS1+1G>A mutation on the opposite allele. One patient with the <it>GJB2 </it>c.235delC mutation carried one variant, -3175 C>T, in exon 1 of <it>GJB2</it>. Neither <it>GJB2 </it>IVS1+1G>A mutation nor any variant in exon 1 of <it>GJB2 </it>was found in the 262 nonsyndromic hearing loss patients without <it>GJB2 </it>mutation or in the 105 normal hearing controls.</p> <p>Conclusion</p> <p>Testing for the <it>GJB2 </it>IVS 1+1 G to A mutation explained deafness in 1.89% of Chinese <it>GJB2 </it>monoallelic patients, and it should be included in routine testing of patients with <it>GJB2 </it>monoallelic pathogenic mutation.</p
- …