44 research outputs found

    Sharper Model-free Reinforcement Learning for Average-reward Markov Decision Processes

    Full text link
    We develop several provably efficient model-free reinforcement learning (RL) algorithms for infinite-horizon average-reward Markov Decision Processes (MDPs). We consider both online setting and the setting with access to a simulator. In the online setting, we propose model-free RL algorithms based on reference-advantage decomposition. Our algorithm achieves O~(S5A2sp(hβˆ—)T)\widetilde{O}(S^5A^2\mathrm{sp}(h^*)\sqrt{T}) regret after TT steps, where SΓ—AS\times A is the size of state-action space, and sp(hβˆ—)\mathrm{sp}(h^*) the span of the optimal bias function. Our results are the first to achieve optimal dependence in TT for weakly communicating MDPs. In the simulator setting, we propose a model-free RL algorithm that finds an Ο΅\epsilon-optimal policy using O~(SAsp2(hβˆ—)Ο΅2+S2Asp(hβˆ—)Ο΅)\widetilde{O} \left(\frac{SA\mathrm{sp}^2(h^*)}{\epsilon^2}+\frac{S^2A\mathrm{sp}(h^*)}{\epsilon} \right) samples, whereas the minimax lower bound is Ξ©(SAsp(hβˆ—)Ο΅2)\Omega\left(\frac{SA\mathrm{sp}(h^*)}{\epsilon^2}\right). Our results are based on two new techniques that are unique in the average-reward setting: 1) better discounted approximation by value-difference estimation; 2) efficient construction of confidence region for the optimal bias function with space complexity O(SA)O(SA)

    Model Suggests Potential for Porites Coral Population Recovery After Removal of Anthropogenic Disturbance (Luhuitou, Hainan, South China Sea)

    Get PDF
    Population models are important for resource management and can inform about potential trajectories useful for planning purposes, even with incomplete monitoring data. From size frequency data on Luhuitou fringing reef, Hainan, South China Sea, a matrix population model of massive corals (Porites lutea) was developed and trajectories over 100 years under no disturbance and random disturbances were projected. The model reflects a largely open population of Porites lutea, with low local recruitment and preponderance of imported recruitment. Under no further disturbance, the population of Porites lutea will grow and its size structure will change from predominance of small size classes to large size classes. Therewith, total Porites cover will increase. Even under random disturbances every 10 to 20 years, the Porites population could remain viable, albeit at lower space cover. The models suggest recovery at Luhuitou following the removal of chronic anthropogenic disturbance. Extending the area of coral reef reserves to protect the open coral community and the path of connectivity is advisable and imperative for the conservation of Hainan’s coral reefs

    Design and Test Research on Cutting Blade of Corn Harvester Based on Bionic Principle

    Get PDF
    Existing corn harvester cutting blades have problems associated with large cutting resistance, high energy consumption, and poor cut quality. Using bionics principles, a bionic blade was designed by extracting the cutting tooth profile curve of the B. horsfieldi palate. Using a double-blade cutting device testing system, a single stalk cutting performance contrast test for corn stalks obtained at harvest time was carried out. Results show that bionic blades have superior performance, demonstrated by strong cutting ability and good cut quality. Using statistical analysis of two groups of cutting test data, the average cutting force and cutting energy of bionic blades and ordinary blades were obtained as 480.24 N and 551.31 N and 3.91 J and 4.38 J, respectively. Average maximum cutting force and cutting energy consumption for the bionic blade were reduced by 12.89% and 10.73%, respectively. Variance analysis showed that both blade types had a significant effect on maximum cutting energy and cutting energy required to cut a corn stalk. This demonstrates that bionic blades have better cutting force and energy consumption reduction performance than ordinary blades

    Downregulation of RPL6 by siRNA Inhibits Proliferation and Cell Cycle Progression of Human Gastric Cancer Cell Lines

    Get PDF
    Our previous study revealed that human ribosomal protein L6 (RPL6) was up-regulated in multidrug-resistant gastric cancer cells and over-expression of RPL6 could protect gastric cancer from drug-induced apoptosis. It was further demonstrated that up-regulation of RPL6 accelerated growth and enhanced in vitro colony forming ability of GES cells while down-regulation of RPL6 exhibited the opposite results. The present study was designed to investigate the potential role of RPL6 in therapy of gastric cancer for clinic. The expression of RPL6 and cyclin E in gastric cancer tissues and normal gastric mucosa was evaluated by immunohistochemisty. It was found that RPL6 and cyclin E were expressed at a higher level in gastric cancer tissues than that in normal gastric mucosa and the two were correlative in gastric cancer. Survival time of postoperative patients was analyzed by Kaplan- Meier analysis and it was found that patients with RPL6 positive expression showed shorter survival time than patients that with RPL6 negative expression. RPL6 was then genetically down-regulated in gastric cancer SGC7901 and AGS cell lines by siRNA. It was demonstrated that down-regulation of RPL6 reduced colony forming ability of gastric cancer cells in vitro and reduced cell growth in vivo. Moreover, down-regulation of RPL6 could suppress G1 to S phase transition in these cells. Further, we evidenced that RPL6 siRNA down-regulated cyclin E expression in SGC7901 and AGS cells. Taken together, these data suggested that RPL6 was over-expressed in human gastric tissues and caused poor prognosis. Down-regulation of RPL6 could suppress cell growth and cell cycle progression at least through down-regulating cyclin E and which might be used as a novel approach to gastric cancer therapy

    Spatial pattern of coral diversity in Luhuitou fringing reef, Sanya, China

    No full text
    84 quadrats from 5 vertical transects of Luhuitou fringing reef are investigated in detail by using video-quadrat and indoor-interpretation methods. The results show that (1) the reef consists of 69 species of hermatypic corals belonging to 24 genera and 13 families which are unevenly distributed in abundance. (2) Among all the corals, Porites lutea is the most dominant species with importance value percentage up to 36.62%; Porites and Acropora are dominant genera with importance value percentages 43.85% and 22.88%, respectively. (3) There exist distinct spatial differences in coral communities. Both the coral covers and coral diversity indices on the northeastern transects are higher than those on the central and southern transects. (4) Coral communities also show remarkable zonal characteristics with less coral species occurring on reef flat than on reef slope. The importance value percentage of the sole dominant coral genus, Porites, is over 50%, while on the reef slope, the importance value percentages are 28.33% for the first dominant genus Acropora and 26.71% for the second dominant genus Porites. Our further analysis suggests that the spatial and zonal differences of coral diversity pattern are correlated with both natural environmental changes and human activities. The shallow water reef flat is frequently exposed at low tide and it receives more anthropogenic influences (including dredging and trampling) than the deep water reef slope. Thus, the coral community on the reef flat is not as well developed as that on reef slope. The relatively poor coral covers and coral diversity indices on the central and southern transects are closely related to heavy human activities around these sites such as aquaculture, fishing and coastal sewage drainage. Therefore, the impact of human activities must be taken into account in developing strategies for the protection of this coral reef

    Exact Policy Recovery in Offline RL with Both Heavy-Tailed Rewards and Data Corruption

    No full text
    We study offline reinforcement learning (RL) with heavy-tailed reward distribution and data corruption: (i) Moving beyond subGaussian reward distribution, we allow the rewards to have infinite variances; (ii) We allow corruptions where an attacker can arbitrarily modify a small fraction of the rewards and transitions in the dataset. We first derive a sufficient optimality condition for generalized Pessimistic Value Iteration (PEVI), which allows various estimators with proper confidence bounds and can be applied to multiple learning settings. In order to handle the data corruption and heavy-tailed reward setting, we prove that the trimmed-mean estimation achieves the minimax optimal error rate for robust mean estimation under heavy-tailed distributions. In the PEVI algorithm, we plug in the trimmed mean estimation and the confidence bound to solve the robust offline RL problem. Standard analysis reveals that data corruption induces a bias term in the suboptimality gap, which gives the false impression that any data corruption prevents optimal policy learning. By using the optimality condition for the generalized PEVI, we show that as long as the bias term is less than the ``action gap'', the policy returned by PEVI achieves the optimal value given sufficient data

    A Multi-Attribute Pheromone Ant Secure Routing Algorithm Based on Reputation Value for Sensor Networks

    No full text
    With the development of wireless sensor networks, certain network problems have become more prominent, such as limited node resources, low data transmission security, and short network life cycles. To solve these problems effectively, it is important to design an efficient and trusted secure routing algorithm for wireless sensor networks. Traditional ant-colony optimization algorithms exhibit only local convergence, without considering the residual energy of the nodes and many other problems. This paper introduces a multi-attribute pheromone ant secure routing algorithm based on reputation value (MPASR). This algorithm can reduce the energy consumption of a network and improve the reliability of the nodes’ reputations by filtering nodes with higher coincidence rates and improving the method used to update the nodes’ communication behaviors. At the same time, the node reputation value, the residual node energy and the transmission delay are combined to formulate a synthetic pheromone that is used in the formula for calculating the random proportion rule in traditional ant-colony optimization to select the optimal data transmission path. Simulation results show that the improved algorithm can increase both the security of data transmission and the quality of routing service
    corecore