5,408 research outputs found

    Congestion control in multi-serviced heterogeneous wireless networks using dynamic pricing

    Get PDF
    Includes bibliographical references.Service providers, (or operators) employ pricing schemes to help provide desired QoS to subscribers and to maintain profitability among competitors. An economically efficient pricing scheme, which will seamlessly integrate users’ preferences as well as service providers’ preferences, is therefore needed. Else, pricing schemes can be viewed as promoting social unfairness in the dynamically priced network. However, earlier investigations have shown that the existing dynamic pricing schemes do not consider the users’ willingness to pay (WTP) before the price of services is determined. WTP is the amount a user is willing to pay based on the worth attached to the service requested. There are different WTP levels for different subscribers due to the differences in the value attached to the services requested and demographics. This research has addressed congestion control in the heterogeneous wireless network (HWN) by developing a dynamic pricing scheme that efficiently incentivises users to utilize radio resources. The proposed Collaborative Dynamic Pricing Scheme (CDPS), which identifies the users and operators’ preference in determining the price of services, uses an intelligent approach for controlling congestion and enhancing both the users’ and operators’ utility. Thus, the CDPS addresses the congestion problem by firstly obtaining the users WTP from users’ historical response to price changes and incorporating the WTP factor to evaluate the service price. Secondly, it uses a reinforcement learning technique to illustrate how a price policy can be obtained for the enhancement of both users and operators’ utility, as total utility reward obtained increases towards a defined ‘goal state’

    Cooperation in Multi-Agent Reinforcement Learning

    Get PDF
    As progress in reinforcement learning (RL) gives rise to increasingly general and powerful artificial intelligence, society needs to anticipate a possible future in which multiple RL agents must learn and interact in a shared multi-agent environment. When a single principal has oversight of the multi-agent system, how should agents learn to cooperate via centralized training to achieve individual and global objectives? When agents belong to self-interested principals with imperfectly-aligned objectives, how can cooperation emerge from fully-decentralized learning? This dissertation addresses both questions by proposing novel methods for multi-agent reinforcement learning (MARL) and demonstrating the empirical effectiveness of these methods in high-dimensional simulated environments. To address the first case, we propose new algorithms for fully-cooperative MARL in the paradigm of centralized training with decentralized execution. Firstly, we propose a method based on multi-agent curriculum learning and multi-agent credit assignment to address the setting where global optimality is defined as the attainment of all individual goals. Secondly, we propose a hierarchical MARL algorithm to discover and learn interpretable and useful skills for a multi-agent team to optimize a single team objective. Extensive experiments with ablations show the strengths of our approaches over state-of-the-art baselines. To address the second case, we propose learning algorithms to attain cooperation within a population of self-interested RL agents. We propose the design of a new agent who is equipped with the new ability to incentivize other RL agents and explicitly account for the other agents' learning process. This agent overcomes the challenging limitation of fully-decentralized training and generates emergent cooperation in difficult social dilemmas. Then, we extend and apply this technique to the problem of incentive design, where a central incentive designer explicitly optimizes a global objective only by intervening on the rewards of a population of independent RL agents. Experiments on the problem of optimal taxation in a simulated market economy demonstrate the effectiveness of this approach.Ph.D

    Utilizing Reinforcement Learning and Computer Vision in a Pick-And-Place Operation for Sorting Objects in Motion

    Get PDF
    This master's thesis studies the implementation of advanced machine learning (ML) techniques in industrial automation systems, focusing on applying machine learning to enable and evolve autonomous sorting capabilities in robotic manipulators. In particular, Inverse Kinematics (IK) and Reinforcement Learning (RL) are investigated as methods for controlling a UR10e robotic arm for pick-and-place of moving objects on a conveyor belt within a small-scale sorting facility. A camera-based computer vision system applying YOLOv8 is used for real-time object detection and instance segmentation. Perception data is utilized to ascertain optimal grip points, specifically through an implemented algorithm that outputs optimal grip position, angle, and width. As the implemented system includes testing and evaluation on a physical system, the intricacies of hardware control, specifically the reverse engineering of an OnRobot RG6 gripper is elaborated as part of this study. The system is implemented on the Robotic Operating System (ROS), and its design is in particular driven by high modularity and scalability in mind. The camera-based vision system serves as the primary input, while the robot control is the output. The implemented system design allows for the evaluation of motion control employing both IK and RL. Computation of IK is conducted via MoveIt2, while the RL model is trained and computed in NVIDIA Isaac Sim. The high-level control of the robotic manipulator was accomplished with use of Proximal Policy Optimization (PPO). The main result of the research is a novel reward function for the pick-and-place operation that takes into account distance and orientation from the target object. In addition, the provided system administers task control by independently initializing pick-and-place operation phases for each environment. The findings demonstrate that PPO was able to significantly enhance the velocity, accuracy, and adaptability of industrial automation. Our research shows that accurate control of the robot arm can be reached by training the PPO Model purely by applying a digital twin simulation

    Machine Learning Techniques and Stochastic Modeling in Mathematical Oncology

    Get PDF
    The cancer stem cell hypothesis claims that tumor growth and progression are driven by a (typically) small niche of the total cancer cell population called cancer stem cells (CSCs). These CSCs can go through symmetric or asymmetric divisions to differentiate into specialised, progenitor cells or reproduce new CSCs. While it was once held that this differentiation pathway was unidirectional, recent research has demonstrated that differenti- ated cells are more plastic than initially considered. In particular, differentiated cells can de-differentiate and recover their stem-like capacity. Two recent papers have considered how this rate of plasticity affects the evolutionary dynamic of an invasive, malignant population of stem cells and differentiated cells into existing tissue [64, 109]. These papers arrive at seemingly opposing conclusions, one claiming that increased plasticity results in increased invasive potential, and the other that increased plasticity decreases invasive potential. Here, we show that what is most important, when determining the effect on invasive potential, is how one distributes this increased plasticity between the compartments of resident and mutant-type cells. We also demonstrate how these results vary, producing non-monotone fixation probability curves, as inter-compartmental plasticity changes when differentiated cell compartments are allowed to continue proliferating, highlighting a fundamental dif- ference between the two models. We conclude by demonstrating the stability of these qualitative results over various parameter ranges. Imaging flow cytometry is a tool that uses the high-throughput capabilities of conven- tional flow cytometry for the purposes of producing single cell images. We demonstrate the label free prediction of mitotic cell cycle phases in Jurkat cells by utilizing brightfield and darkfield images from an imaging flow cytometer. The method is a non destructive method that relies upon images only and does not introduce (potentially confounding) dies or biomarkers to the cell cycles. By utilizing deep convolutional neural networks regularized by generated, synthetic images in the presence of severe class imbalance we are able to produce an estimator that outperforms the previous state of the art on the dataset by 10-15%. The in-silico development of a chemotherapeutic dosing schedule for treating cancer relies upon a parameterization of a particular tumour growth model to describe the dynamics of the cancer in response to the dose of the drug. In practice, it is often prohibitively difficult to ensure the validity of patient-specific parameterizations of these models for any particular patient. As a result, sensitivities to these particular parameters can result in therapeutic dosing schedules that are optimal in principle not performing well on particular patients. In this study, we demonstrate that chemotherapeutic dosing strategies learned via reinforcement learning methods are more robust to perturbations in patient-specific parameter values than those learned via classical optimal control methods. By training a reinforcement learning agent on mean-value parameters and allowing the agent periodic access to a more easily measurable metric, relative bone marrow density, for the purpose of optimizing dose schedule while reducing drug toxicity, we are able to develop drug dosing schedules that outperform schedules learned via classical optimal control methods, even when such methods are allowed to leverage the same bone marrow measurements

    Marketing Orientation, Customer Satisfaction and Retention: the Case of the Telecommunications Services Market in Jordan

    Get PDF
    A great deal of attention has been devoted by researchers to examine different aspects of the relationship between marketing orientation (MO) and competitive advantage, mostly within causal relationship style research. However, the mechanisms and intermediate variables underlying this relationship remain vague and poorly investigated. Drawing upon mixed method research utilising qualitative and quantitative techniques, this study aims to offer further insight into this relationship within Jordan’s telecommunications market, focusing on customer satisfaction and customer retention as two prominent performance indicators in this market. Hence, this research set out to investigate the mechanisms and interrelationships that link marketing orientation and organisational performance, the issue that seems to be highly justified in the matured and competitive market where consumers have more choices, switching cost are decreasing and retention of the market base is becoming more and more difficult. As a case study undertaken in Jordan’s telecommunications market, the main four telecommunications operators in the market were represented. Quantitative data analysis was used to determine the variations between the main operators in the market regarding their adopted levels of marketing orientation. On the other hand, the qualitative technique - namely semi-structured interviews - represents the main instrument the study utilises to gain an in-depth insight into the relationship between marketing orientation (MO) and organisational performance. This qualitative tool enabled the researcher to construct a rich picture of the mechanisms and ways by which firms manage the different attitudinal dimensions of customer satisfaction and the behavioural dimensions of customer retention. Results of the research confirm significant variations between high- and low-marketing orientation telecommunications operators with regard to the approaches, drivers and mechanisms by which firms manage their capabilities to achieve customer satisfaction and customer retention. Thus, two different patterns were indicated which were associated with the adopted level of marketing orientation of these firms. The most important finding to come out of the research was that genuine marketing orientation is an integrated attitudinal-behavioural perspective. Hence, any deficiencies or even ignoring of any aspect will weaken a firm’s overall value creation capability, the main mission of the marketing-oriented firm. In addition, internal culture emerged as a critical success factor for marketing-oriented firms. It serves as the glue that ensures a firm’s values are adhered to, and also allows a clearer understanding of a firm’s vision and mission, which in turn resulted in the fact that these firms are more capable to translate their attitudes into practice on the ground. Moreover, the role of marketing orientation was substantial as it worked as a supportive environment that stimulates a firm’s capabilities to integrate and coordinate its resources and competencies into new ones in such a way as to enhance its overall performance as well as to achieve congruence with the changing business environment. The importance of this research stems from its nature and approach in studying the relationship between marketing orientation and organisational performance. The main issue being evaluated is different from the bulk of marketing orientation works that have focused on examining different aspects of marketing orientation and organisational performance within causal relationship-style research, and mostly within a short-run view. In contrast, this study is concerned with gaining in-depth understanding of this relationship through evaluating its mechanisms and interrelationships, the aspect that was treated as a black box in prior research

    A Reinforcement Learning based Cognitive Approach for Quality of Experience Management in the Future Internet

    Get PDF
    This thesis aims at providing an innovative contribution to the definition of the Future Internet Core Platform, in the frame of the "La Sapienza" University research activities on the EU FP7 FI-WARE project. The thesis introduces and designs an innovative "Cognitive Application Interface" in charge of deriving key parameters driving the Network Control elements to meet personalised Application Quality of Experience Requirements. The thesis proposes the innovative concept of a dynamic association between Applications and Classes of Service. A Reinforcement Learning based approach is followed. A solution based on a standard Q-learning algorithm is proposed. Simulation results obtained using the OPNET simulation tool are described. Preliminary work on an alternative solution based on a Foe Q-Learning algorithm is also illustrated. The proposed framework is very flexible, allows QoE personalization, requires low processing capabilities and entails a very limited signalling overhead
    • …
    corecore