8 research outputs found

    Designing experience replay algorithms for off-policy reinforcement learning by studying their sampling distributions

    Get PDF
    Els algorismes off-policy d’aprenentatge per reforç fan ús dels mecanismes de repetició de la memòria per a aprendre de l’experiència viscuda prèviament per altres agents. Existeixen diversos algorismes per a obtenir mostres de la memòria, i tots tenen la intenció de fer el procés d’aprenentatge més ràpid i eficient en el nombre de mostres visualitzades. Tot i això, no existeix cap marc comú que permeti comparar-los i explicar les diferències dels seus rendiments. A aquesta tesi presentarem una eina per a estudiar aquests algorismes: les distribucions de mostreig de transicions i estats. Aquestes eines fan possible establir comparacions entre diferents algorismes de repetició de memòria. Una anàlisi des del punt de vista de les distribucions evidencia que les memòries dels agents no són equilibrades: hi ha parts de l’espai d’estats que estan sobrerepresentades, mentre que d’altres amb prou feines són a la memòria. Aquesta troballa es repeteix en diversos entorns diferents, i de manera marcada a entorns on les recompenses són disperses. Acabarem proposant dos nous algorismes que solucionen el problema del desequilibri i obtenen millor rendiment a tasques amb recompenses disperses, mentre que funcionen igual de bé que els algorismes de referència en situacions més equilibrades.Los algoritmos off-policy de aprendizaje por refuerzo usan los mecanismos de repetición de la memoria para aprender de la experiencia previamente obtenida por otros agentes. Existen varios algoritmos que permiten obtener muestras de la memoria, con la intención de acelerar el proceso de aprendizaje y hacerlo más eficiente en el número de muestras visualizadas. Sin embargo, no existe un marco común que permita compararlos y explicar diferencias en su rendimiento. En esta tesis presentamos una herramienta para estudiar estos algoritmos: las distribuciones de muestreo de transiciones y de estados. Estas herramientas permiten comparar diferentes algoritmos de repetición de la memoria. Un análisis desde el punto de vista de las distribuciones muestra que las memorias de los agentes no están equilibradas: hay partes del espacio de estados que están sobrerrepresentadas, y hay otras que prácticamente no están. Este descubrimiento se repite en distintos entornos, y de manera destacada en entornos con recompensas dispersas. Finalizamos proponiendo dos nuevos algoritmos que solucionan el problema del desequilibrio y consiguen mejor rendimiento en situaciones con recompensas dispersas, mientras que funcionan tan bien como los algoritmos de referencia en situaciones donde el desequilibrio no es un problema.Off-policy reinforcement learning algorithms make use of experience replay mechanisms to learn from experience gathered by earlier policies. Several algorithms have been proposed to sample transitions from the replay buffer to make training more sample efficient. However, a general framework to compare them and explain their performance differences is missing. In this thesis we propose the transition and state sampling distributions as tools to study these algorithms, allowing to draw comparisons across sampling strategies. An analysis from the distribution point of view reveals that replay buffers are imbalanced, with parts of the state space being underrepresented, while other sections are massively overrepresented. These findings happen across several environments, especially in sparse reward settings. We finish by proposing two algorithms that address the imbalance problem and show that they lead to better performance in sparse reward tasks while matching our baselines in low-imbalance situations.Outgoin

    Causal Discovery from Temporal Data: An Overview and New Perspectives

    Full text link
    Temporal data, representing chronological observations of complex systems, has always been a typical data structure that can be widely generated by many domains, such as industry, medicine and finance. Analyzing this type of data is extremely valuable for various applications. Thus, different temporal data analysis tasks, eg, classification, clustering and prediction, have been proposed in the past decades. Among them, causal discovery, learning the causal relations from temporal data, is considered an interesting yet critical task and has attracted much research attention. Existing casual discovery works can be divided into two highly correlated categories according to whether the temporal data is calibrated, ie, multivariate time series casual discovery, and event sequence casual discovery. However, most previous surveys are only focused on the time series casual discovery and ignore the second category. In this paper, we specify the correlation between the two categories and provide a systematical overview of existing solutions. Furthermore, we provide public datasets, evaluation metrics and new perspectives for temporal data casual discovery.Comment: 52 pages, 6 figure

    Evaluating productive efficiency:comparative study of commercial banks in Gulf countries

    Get PDF
    Financial institutes are an integral part of any modern economy. In the 1970s and 1980s, Gulf Cooperation Council (GCC) countries made significant progress in financial deepening and in building a modern financial infrastructure. This study aims to evaluate the performance (efficiency) of financial institutes (banking sector) in GCC countries. Since, the selected variables include negative data for some banks and positive for others, and the available evaluation methods are not helpful in this case, so we developed a Semi Oriented Radial Model to perform this evaluation. Furthermore, since the SORM evaluation result provides a limited information for any decision maker (bankers, investors, etc...), we proposed a second stage analysis using classification and regression (C&R) method to get further results combining SORM results with other environmental data (Financial, economical and political) to set rules for the efficient banks, hence, the results will be useful for bankers in order to improve their bank performance and to the investors, maximize their returns. Mainly there are two approaches to evaluate the performance of Decision Making Units (DMUs), under each of them there are different methods with different assumptions. Parametric approach is based on the econometric regression theory and nonparametric approach is based on a mathematical linear programming theory. Under the nonparametric approaches, there are two methods: Data Envelopment Analysis (DEA) and Free Disposal Hull (FDH). While there are three methods under the parametric approach: Stochastic Frontier Analysis (SFA); Thick Frontier Analysis (TFA) and Distribution-Free Analysis (DFA). The result shows that DEA and SFA are the most applicable methods in banking sector, but DEA is seem to be most popular between researchers. However DEA as SFA still facing many challenges, one of these challenges is how to deal with negative data, since it requires the assumption that all the input and output values are non-negative, while in many applications negative outputs could appear e.g. losses in contrast with profit. Although there are few developed Models under DEA to deal with negative data but we believe that each of them has it is own limitations, therefore we developed a Semi-Oriented-Radial-Model (SORM) that could handle the negativity issue in DEA. The application result using SORM shows that the overall performance of GCC banking is relatively high (85.6%). Although, the efficiency score is fluctuated over the study period (1998-2007) due to the second Gulf War and to the international financial crisis, but still higher than the efficiency score of their counterpart in other countries. Banks operating in Saudi Arabia seem to be the highest efficient banks followed by UAE, Omani and Bahraini banks, while banks operating in Qatar and Kuwait seem to be the lowest efficient banks; this is because these two countries are the most affected country in the second Gulf War. Also, the result shows that there is no statistical relationship between the operating style (Islamic or Conventional) and bank efficiency. Even though there is no statistical differences due to the operational style, but Islamic bank seem to be more efficient than the Conventional bank, since on average their efficiency score is 86.33% compare to 85.38% for Conventional banks. Furthermore, the Islamic banks seem to be more affected by the political crisis (second Gulf War), whereas Conventional banks seem to be more affected by the financial crisis

    Evaluating productive efficiency : comparative study of commercial banks in Gulf countries

    Get PDF
    Financial institutes are an integral part of any modern economy. In the 1970s and 1980s, Gulf Cooperation Council (GCC) countries made significant progress in financial deepening and in building a modern financial infrastructure. This study aims to evaluate the performance (efficiency) of financial institutes (banking sector) in GCC countries. Since, the selected variables include negative data for some banks and positive for others, and the available evaluation methods are not helpful in this case, so we developed a Semi Oriented Radial Model to perform this evaluation. Furthermore, since the SORM evaluation result provides a limited information for any decision maker (bankers, investors, etc.), we proposed a second stage analysis using classification and regression (C&R) method to get further results combining SORM results with other environmental data (Financial, economical and political) to set rules for the efficient banks, hence, the results will be useful for bankers in order to improve their bank performance and to the investors, maximize their returns. Mainly there are two approaches to evaluate the performance of Decision Making Units (DMUs), under each of them there are different methods with different assumptions. Parametric approach is based on the econometric regression theory and nonparametric approach is based on a mathematical linear programming theory. Under the nonparametric approaches, there are two methods: Data Envelopment Analysis (DEA) and Free Disposal Hull (FDH). While there are three methods under the parametric approach: Stochastic Frontier Analysis (SFA); Thick Frontier Analysis (TFA) and Distribution-Free Analysis (DFA). The result shows that DEA and SFA are the most applicable methods in banking sector, but DEA is seem to be most popular between researchers. However DEA as SFA still facing many challenges, one of these challenges is how to deal with negative data, since it requires the assumption that all the input and output values are non-negative, while in many applications negative outputs could appear e.g. losses in contrast with profit. Although there are few developed Models under DEA to deal with negative data but we believe that each of them has it is own limitations, therefore we developed a Semi-Oriented-Radial-Model (SORM) that could handle the negativity issue in DEA. The application result using SORM shows that the overall performance of GCC banking is relatively high (85.6%). Although, the efficiency score is fluctuated over the study period (1998-2007) due to the second Gulf War and to the international financial crisis, but still higher than the efficiency score of their counterpart in other countries. Banks operating in Saudi Arabia seem to be the highest efficient banks followed by UAE, Omani and Bahraini banks, while banks operating in Qatar and Kuwait seem to be the lowest efficient banks; this is because these two countries are the most affected country in the second Gulf War. Also, the result shows that there is no statistical relationship between the operating style (Islamic or Conventional) and bank efficiency. Even though there is no statistical differences due to the operational style, but Islamic bank seem to be more efficient than the Conventional bank, since on average their efficiency score is 86.33% compare to 85.38% for Conventional banks. Furthermore, the Islamic banks seem to be more affected by the political crisis (second Gulf War), whereas Conventional banks seem to be more affected by the financial crisis.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Tools and Algorithms for the Construction and Analysis of Systems

    Get PDF
    This open access two-volume set constitutes the proceedings of the 27th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS 2021, which was held during March 27 – April 1, 2021, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2021. The conference was planned to take place in Luxembourg and changed to an online format due to the COVID-19 pandemic. The total of 41 full papers presented in the proceedings was carefully reviewed and selected from 141 submissions. The volume also contains 7 tool papers; 6 Tool Demo papers, 9 SV-Comp Competition Papers. The papers are organized in topical sections as follows: Part I: Game Theory; SMT Verification; Probabilities; Timed Systems; Neural Networks; Analysis of Network Communication. Part II: Verification Techniques (not SMT); Case Studies; Proof Generation/Validation; Tool Papers; Tool Demo Papers; SV-Comp Tool Competition Papers
    corecore