34 research outputs found

    Differentiation With Shared Features And Cannibalization Of Information Goods

    Get PDF
    Large sunk cost of development, negligible cost of reproduction and distribution and substantial economies of scale make information goods distinct from industry goods. In this paper, we analyse versioning strategies of horizontally differentiated information goods with shared feature sets, discrete hierarchical groups and continuous individual consumer tastes. Based on our modelling results, when cannibalization is considered among different market segments, it is always sub-optimal to differentiate information goods if market is not fully differentiated or characteristics of the information goods are not specifically designed to relate to certain market segments

    Offline Reinforcement Learning with Instrumental Variables in Confounded Markov Decision Processes

    Full text link
    We study the offline reinforcement learning (RL) in the face of unmeasured confounders. Due to the lack of online interaction with the environment, offline RL is facing the following two significant challenges: (i) the agent may be confounded by the unobserved state variables; (ii) the offline data collected a prior does not provide sufficient coverage for the environment. To tackle the above challenges, we study the policy learning in the confounded MDPs with the aid of instrumental variables. Specifically, we first establish value function (VF)-based and marginalized importance sampling (MIS)-based identification results for the expected total reward in the confounded MDPs. Then by leveraging pessimism and our identification results, we propose various policy learning methods with the finite-sample suboptimality guarantee of finding the optimal in-class policy under minimal data coverage and modeling assumptions. Lastly, our extensive theoretical investigations and one numerical study motivated by the kidney transplantation demonstrate the promising performance of the proposed methods

    Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization

    Full text link
    Most offline reinforcement learning (RL) methods suffer from the trade-off between improving the policy to surpass the behavior policy and constraining the policy to limit the deviation from the behavior policy as computing QQ-values using out-of-distribution (OOD) actions will suffer from errors due to distributional shift. The recently proposed \textit{In-sample Learning} paradigm (i.e., IQL), which improves the policy by quantile regression using only data samples, shows great promise because it learns an optimal policy without querying the value function of any unseen actions. However, it remains unclear how this type of method handles the distributional shift in learning the value function. In this work, we make a key finding that the in-sample learning paradigm arises under the \textit{Implicit Value Regularization} (IVR) framework. This gives a deeper understanding of why the in-sample learning paradigm works, i.e., it applies implicit value regularization to the policy. Based on the IVR framework, we further propose two practical algorithms, Sparse QQ-learning (SQL) and Exponential QQ-learning (EQL), which adopt the same value regularization used in existing works, but in a complete in-sample manner. Compared with IQL, we find that our algorithms introduce sparsity in learning the value function, making them more robust in noisy data regimes. We also verify the effectiveness of SQL and EQL on D4RL benchmark datasets and show the benefits of in-sample learning by comparing them with CQL in small data regimes.Comment: ICLR 2023 notable top 5

    Cocktail: Learn a Better Neural Network Controller from Multiple Experts via Adaptive Mixing and Robust Distillation

    Get PDF
    Neural networks are being increasingly applied to control and decision-making for learning-enabled cyber-physical systems (LE-CPSs). They have shown promising performance without requiring the development of complex physical models; however, their adoption is significantly hindered by the concerns on their safety, robustness, and efficiency. In this work, we propose COCKTAIL, a novel design framework that automatically learns a neural network-based controller from multiple existing control methods (experts) that could be either model-based or neural network-based. In particular, COCKTAIL first performs reinforcement learning to learn an optimal system-level adaptive mixing strategy that incorporates the underlying experts with dynamically-assigned weights and then conducts a teacher-student distillation with probabilistic adversarial training and regularization to synthesize a student neural network controller with improved control robustness (measured by a safe control rate metric with respect to adversarial attacks or measurement noises), control energy efficiency, and verifiability (measured by the computation time for verification). Experiments on three non-linear systems demonstrate significant advantages of our approach on these properties over various baseline methods.Comment: The paper has been accepted by Design Automation Conference 202

    Performance and mechanism of sand stabilization via microbial-induced CaCO3 precipitation using phosphogypsum

    Get PDF
    Phosphogypsum is a solid waste generated during the production of phosphoric acid. Effective utilization of phosphogypsum resources is a complex challenge. In this research, an innovative and eco-friendly sand consolidation technique, i.e., microbial-induced CaCO3 precipitation using phosphogypsum (MICPP), is applied to achieve phosphogypsum mineralization and sand stabilization. Phosphogypsum is employed as a calcium source for sand consolidation. To elucidate the efficiency and the mechanism of sand consolidation through MICPP, a series of experimental tests on the sand columns using varying phosphogypsum dosages and consolidation methods are conducted. The results show a positive correlation between the increase in phosphogypsum dosage and the increase in the compressive strength of the specimens. Concurrently, As the amount of phosphogypsum increased, the permeability coefficient of the sand columns decreased and the production of CaCO3 increased. Notably, the immersion method exhibits a superior curing effect compared to the stirring method. The MICPP-treated specimens significantly mitigated the risk of environmental contamination. The CaCO3 precipitated by the microbial action is predominantly in the form of calcite that effectively fills the voids, bond surfaces, and bridge gaps in the sand columns, thereby substantially enhancing the performance of sand columns

    Bone age assessment from articular surface and epiphysis using deep neural networks

    Get PDF
    Bone age assessment is of great significance to genetic diagnosis and endocrine diseases. Traditional bone age diagnosis mainly relies on experienced radiologists to examine the regions of interest in hand radiography, but it is time-consuming and may even lead to a vast error between the diagnosis result and the reference. The existing computer-aided methods predict bone age based on general regions of interest but do not explore specific regions of interest in hand radiography. This paper aims to solve such problems by performing bone age prediction on the articular surface and epiphysis from hand radiography using deep convolutional neural networks. The articular surface and epiphysis datasets are established from the Radiological Society of North America (RSNA) pediatric bone age challenge, where the specific feature regions of the articular surface and epiphysis are manually segmented from hand radiography. Five convolutional neural networks, i.e., ResNet50, SENet, DenseNet-121, EfficientNet-b4, and CSPNet, are employed to improve the accuracy and efficiency of bone age diagnosis in clinical applications. Experiments show that the best-performing model can yield a mean absolute error (MAE) of 7.34 months on the proposed articular surface and epiphysis datasets, which is more accurate and fast than the radiologists. The project is available at https://github.com/YameiDeng/BAANet/, and the annotated dataset is also published at https://doi.org/10.5281/zenodo.7947923

    A Study of the Vegetation and Floristic Affinity of the Limestone Forests in Southern and Southwestern China

    No full text
    Volume: 82Start Page: 570End Page: 58

    Cumulative Effect, Targeted Poverty Alleviation, and Firm Value: Evidence from China

    No full text
    This paper studies the influence of the annual cumulative earnings of Chinese listed TPA (targeted poverty alleviation) companies before 2004 on the companies’ value using data from 2012 to 2019, measures the long-term earnings persistence of these companies with the variable of the cumulative earnings averaged by the market price of each company at the current year’s end, and obtains a model of the company’s value combined with each company’s earnings persistence and the long-term competitive strength of its products. The cumulative data from 2004 to 2012, 2005 to 2013…, and 2011 to 2019 provide the data used for regression from 2012 to 2019. The TPA companies’ value is affected by long-term cumulative net profits and long-term competitive advantage. The higher the company’s accumulated net profit, the longer the duration of the long-term competitive advantage, the more stable the company’s value increase, and the higher the quality of the value increase
    corecore