243 research outputs found

    HPC Cloud for Scientific and Business Applications: Taxonomy, Vision, and Research Challenges

    Full text link
    High Performance Computing (HPC) clouds are becoming an alternative to on-premise clusters for executing scientific applications and business analytics services. Most research efforts in HPC cloud aim to understand the cost-benefit of moving resource-intensive applications from on-premise environments to public cloud platforms. Industry trends show hybrid environments are the natural path to get the best of the on-premise and cloud resources---steady (and sensitive) workloads can run on on-premise resources and peak demand can leverage remote resources in a pay-as-you-go manner. Nevertheless, there are plenty of questions to be answered in HPC cloud, which range from how to extract the best performance of an unknown underlying platform to what services are essential to make its usage easier. Moreover, the discussion on the right pricing and contractual models to fit small and large users is relevant for the sustainability of HPC clouds. This paper brings a survey and taxonomy of efforts in HPC cloud and a vision on what we believe is ahead of us, including a set of research challenges that, once tackled, can help advance businesses and scientific discoveries. This becomes particularly relevant due to the fast increasing wave of new HPC applications coming from big data and artificial intelligence.Comment: 29 pages, 5 figures, Published in ACM Computing Surveys (CSUR

    A survey on elasticity management in PaaS systems

    Full text link
    [EN] Elasticity is a goal of cloud computing. An elastic system should manage in an autonomic way its resources, being adaptive to dynamic workloads, allocating additional resources when workload is increased and deallocating resources when workload decreases. PaaS providers should manage resources of customer applications with the aim of converting those applications into elastic services. This survey identifies the requirements that such management imposes on a PaaS provider: autonomy, scalability, adaptivity, SLA awareness, composability and upgradeability. This document delves into the variety of mechanisms that have been proposed to deal with all those requirements. Although there are multiple approaches to address those concerns, providers main goal is maximisation of profits. This compels providers to look for balancing two opposed goals: maximising quality of service and minimising costs. Because of this, there are still several aspects that deserve additional research for finding optimal adaptability strategies. Those open issues are also discussed.This work has been partially supported by EU FEDER and Spanish MINECO under research Grant TIN2012-37719-C03-01.Muñoz-Escoí, FD.; Bernabeu Aubán, JM. (2017). A survey on elasticity management in PaaS systems. Computing. 99(7):617-656. https://doi.org/10.1007/s00607-016-0507-8S617656997Ajmani S (2004) Automatic software upgrades for distributed systems. PhD thesis, Department of Electrical and Computer Science, Massachusetts Institute of Technology, USAAjmani S, Liskov B, Shrira L (2006) Modular software upgrades for distributed systems. In: 20th European Conference on Object-Oriented Programming (ECOOP), Nantes, France, pp 452–476Alhamad M, Dillon TS, Chang E (2010) Conceptual SLA framework for cloud computing. In: 4th International Conference on Digital Ecosystems and Technologies (DEST), Dubai, pp 606–610Almeida S, Leitão J, Rodrigues LET (2013) ChainReaction: a causal+ consistent datastore based on chain replication. In: 8th EuroSys Conference, Prague, Czech Republic, pp 85–98Araujo J, Matos R, Maciel PRM, Matias R (2011) Software aging issues on the Eucalyptus cloud computing infrastructure. In: IEEE International Conference on Systems, Man, and Cybernetics (SMC), Anchorage, Alaska, USA, pp 1411–1416Arief LB, Speirs NA (2000) A UML tool for an automatic generation of simulation programs. In: Worshop on Software and Performance (WOSP), Ottawa, Canada, pp 71–76Armbrust M, Fox A, Griffith R, Joseph AD, Katz RH, Konwinski A, Lee G, Patterson DA, Rabkin A, Stoica I, Zaharia M (2010) A view of cloud computing. Commun ACM 53(4):50–58Bailis P, Ghodsi A (2013) Eventual consistency today: limitations, extensions, and beyond. Commun ACM 56(5):55–63Bailis P, Ghodsi A, Hellerstein JM, Stoica I (2013) Bolt-on causal consistency. In: Intnl Conf Mgmnt Data (SIGMOD). NY, USA, New York, pp 761–772Balsamo S, Marco AD, Inverardi P, Simeoni M (2004) Model-based performance prediction in software development: a survey. IEEE Trans Softw Eng 30(5):295–310Barham P, Dragovic B, Fraser K, Hand S, Harris TL, Ho A, Neugebauer R, Pratt I, Warfield A (2003) Xen and the art of virtualization. In: 19th ACM Symposium on Operating Systems Principles (SOSP), Bolton Landing, NY, USA, pp 164–177Bennani MN, Menascé DA (2005) Resource allocation for autonomic data centers using analytic performance models. In: 2nd Intnl Conf Auton Comput (ICAC), Seattle, WA, USA, pp 229–240Birman KP (1996) Building Secure and Reliable Network Applications. Manning Publications Co., ISBN 1-884777-29-5Bloom T (1983) Dynamic module replacement in a distributed programming system. PhD thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, USABloom T, Day M (1993) Reconfiguration and module replacement in Argus: theory and practice. Softw Eng J 8(2):102–108Caballer M, Segrelles Quilis JD, Moltó G, Blanquer I (2015) A platform to deploy customized scientific virtual infrastructures on the cloud. Concurr Comput Pract E 27(16):4318–4329Calatrava A, Romero E, Moltó G, Caballer M, Alonso JM (2016) Self-managed cost-efficient virtual elastic clusters on hybrid cloud infrastructures. Future Gener Comp Syst 61:13–25Calcavecchia NM, Caprarescu BA, Nitto ED, Dubois DJ, Petcu D (2012) DEPAS: a decentralized probabilistic algorithm for auto-scaling. Computing 94(8–10):701–730Casalicchio E, Silvestri L (2013) Mechanisms for SLA provisioning in cloud-based service providers. Comput Netw 57(3):795–810Casalicchio E, Menascé DA, Aldhalaan A (2013) Autonomic resource provisioning in cloud systems with availability goals. In: ACM Cloud Autonomic Computing Conference (CAC), FL, USA, Miami, pp 1–10Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2008) Bigtable: a distributed storage system for structured data. ACM Trans Comput Syst 26(2):4Copil G, Trihinas D, Truong HL, Moldovan D, Pallis G, Dustdar S, Dikaiakos MD (2014) ADVISE—A framework for evaluating cloud service elasticity behavior. In: 12th International Conference on Service-Oriented Computing (ICSOC), France, Paris, pp 275–290Cotroneo D, Natella R, Pietrantuono R, Russo S (2014) A survey of software aging and rejuvenation studies. ACM J Emerg Technol 10(1):8:1–8:34Coutinho EF, de Carvalho Sousa FR, Rego PAL, Gomes DG, de Souza JN (2015) Elasticity in cloud computing: a survey. Ann Telecommun 70(15):289–309Dawoud W, Takouna I, Meinel C (2011) Elastic VM for cloud resources provisioning optimization. In: 1st International Conference on Advances in Computing and Communications (ACC), Kochi, India, pp 431–445de Juan-Marín R, Decker H, Armendáriz-Íñigo JE, Bernabéu-Aubán JM, Muñoz-EscoíFD (2015) Scalability approaches for causal multicast: a survey. Computing (in press)de Miguel M, Lambolais T, Hannouz M, Betgé-Brezetz S, Piekarec S (2000) UML extensions for the specification and evaluation of latency constraints in architectural models. In: Workshop on Software and Performance (WOSP), Ottawa, Canada, pp 83–88Demers AJ, Greene DH, Hauser C, Irish W, Larson J, Shenker S, Sturgis HE, Swinehart DC, Terry DB (1987) Epidemic algorithms for replicated database maintenance. In: 6th ACM Symposium on Principles of Distributed Computing (PODC), Vancouver, Canada, pp 1–12Dustdar S, Guo Y, Satzger B, Truong HL (2011) Principles of elastic processes. IEEE Internet Comput 15(5):66–71Emeakaroha VC, Brandic I, Maurer M, Dustdar S (2013) Cloud resource provisioning and SLA enforcement via LoM2HiS framework. Concurr Comput Pract E 25(10):1462–1481Felter W, Ferreira A, Rajamony R, Rubio J (2015) An updated performance comparison of virtual machines and Linux containers. In: IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Philadelphia, PA, USA, pp 171–172Fox A, Brewer EA (1999) Harvest, yield and scalable tolerant systems. In: 7th Workshop on Hot Topics in Operating Systems (HotOS), Rio Rico, Arizona, USA, pp 174–178Galante G, De Bona LCE (2012) A survey on cloud computing elasticity. In: 5th International Conference on Utility and Cloud Computing (UCC), Chicago, IL, USA, pp 263–270Galante G, De Bona LCE, Mury AR, Schulze B, Righi RR (2016) An analysis of public clouds elasticity in the execution of scientific applications: a survey. J Grid Comput 14(2):193–216Gambi A, Hummer W, Truong HL, Dustdar S (2013) Testing elastic computing systems. IEEE Internet Comput 17(6):76–82Garg S, van Moorsel APA, Vaidyanathan K, Trivedi KS (1998) A methodology for detection and estimation of software aging. In: 9th International Symposium on Software Reliability Engineering (ISSRE), Paderborn, Germany, pp 283–292Gey F, Landuyt DV, Joosen W (2015) Middleware for customizable multi-staged dynamic upgrades of multi-tenant SaaS applications. In: 8th IEEE/ACM International Conference on Utility and Cloud Computing (UCC), Limassol, Cyprus, pp 102–111Gilbert S, Lynch NA (2002) Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News 33(2):51–59Gong Z, Gu X, Wilkes J (2010) PRESS: PRedictive Elastic reSource Scaling for cloud systems. In: 6th International Conference on Network and Service Management (CNSM), Niagara Falls, Canada, pp 9–16Grozev N, Buyya R (2014) Inter-cloud architectures and application brokering: taxonomy and survey. Softw Pract Exp 44(3):369–390Hammer M (2009) How to touch a running system. reconfiguration of stateful components. PhD thesis, Facultät für Mathematik, Informatik und Statistik, Ludwig-Maximilians-Universität München, Munich, GermanyHasan MZ, Magana E, Clemm A, Tucker L, Gudreddi SLD (2012) Integrated and autonomic cloud resource scaling. In: IEEE Network Operations and Management Symposium (NOMS), Maui, HI, USA, pp 1327–1334Herbst NR, Kounev S, Reussner R (2013) Elasticity in cloud computing: What it is, and what it is not. In: 10th International Conference on Autonomic Computing (ICAC), San Jose, CA, USA, pp 23–27Hermanns H, Herzog U, Katoen J (2002) Process algebra for performance evaluation. Theor Comput Sci 274(1–2):43–87Horn P (2001) Autonomic computing: IBM’s perspective on the state of information technology. Tech. rep. IBM PressHuebscher MC, McCann JA (2008) A survey of autonomic computing—degrees, models, and applications. ACM Comput Surv 40(3):7Hwang J, Zeng S, Wu F, Wood T (2013) A component-based performance comparison of four hypervisors. In: International Symposium on Integrated Network Management (IM), Ghent, Belgium, pp 269–276IBM (2006) An architectural blueprint for autonomic computing. White paper, 4th edIosup A, Ostermann S, Yigitbasi N, Prodan R, Fahringer T, Epema DHJ (2011) Performance analysis of cloud computing services for many-tasks scientific computing. IEEE Trans Parallel Distrib Syst 22(6):931–945Ivanovic D, Carro M, Hermenegildo MV (2013) A sharing-based approach to supporting adaptation in service compositions. Computing 95(6):453–492Jiang Y, Perng C, Li T, Chang RN (2011) ASAP: A self-adaptive prediction system for instant cloud resource demand provisioning. In: 11th International Conference on Data Mining (ICDM), Vancouver, Canada, pp 1104–1109Johnson PR, Thomas RH (1975) The maintenance of duplicate databases. RFC 677, Network Working Group, Internet Engineering Task ForceKephart JO, Chess DM (2003) The vision of autonomic computing. IEEE Comput 36(1):41–50Kiviti A, Laor D, Costa G, Enberg P, Har’El N, Marti D, Zolotarov V (2014) OSv—Optimizing the operating system for virtual machines. In: USENIX Annual Technical Conference (ATC), Philadelphia, PA, USA, pp 61–72Knauth T, Fetzer C (2011) Scaling non-elastic applications using virtual machines. In: IEEE International Conference on Cloud Computing (CLOUD), Washington, DC, USA, pp 468–475Knauth T, Fetzer C (2014) DreamServer: truly on-demand cloud services. In: International Conference on Systems and Storage (SYSTOR), Haifa, Israel, pp 1–11Kramer J, Magee J (1990) The evolving philosophers problem: dynamic change management. IEEE Trans Softw Eng 16(11):1293–1306Lakshman A, Malik P (2010) Cassandra: a decentralized structured storage system. Oper Syst Rev 44(2):35–40Lang W, Shankar S, Patel JM, Kalhan A (2014) Towards multi-tenant performance SLOs. IEEE Trans Knowl Data Eng 26(6):1447–1463Langner F, Andrzejak A (2013) Detecting software aging in a cloud computing framework by comparing development versions. In: IFIP/IEEE International Symposium on Integrated Network Management (IM), Ghent, Belgium, pp 896–899Lazowska ED, Zahorjan J, Graham GS, Sevcik KC (1984) Quantitative system performance. Computer system analysis using queueing network models. Prentice Hall, Upper Saddle RiverLeitner P, Michlmayr A, Rosenberg F, Dustdar S (2010) Monitoring, prediction and prevention of SLA violations in composite services. In: IEEE International Conference on Web Services (ICWS), Florida, USA, Miami, pp 369–376Li W (2011) Evaluating the impacts of dynamic reconfiguration on the QoS of running systems. J Syst Softw 84(12):2123–2138Lim HC, Babu S, Chase JS, Parekh SS (2009) Automated control in cloud computing: challenges and opportunities. In: 1st ACM Workshop Automated Control Datacenters Clouds (ACDC), Barcelona, Spain, pp 13–18Liu J, Zhou J, Buyya R (2015) Software rejuvenation based fault tolerance scheme for cloud applications. In: 8th IEEE International Conference on Cloud Computing (CLOUD), New York City, NY, USA, pp 1115–1118Lorido-Botran T, Miguel-Alonso J, Lozano JA (2014) A review of auto-scaling techniques for elastic applications in cloud environments. J Grid Comput 12(4):559–592Massie M, Li B, Nicholes B, Vuksan V, Alexander R, Buchbinder J, Costa F, Dean A, Josephsen D, Phaal P, Pocock D (2012) Monitoring with Ganglia. O’Reilly Media, Tracking Dynamic Host and Application Metrics at Scale. ISBN 978-1-4493-2970-9Matias R Jr, Andrzejak A, Machida F, Elias D, Trivedi KS (2014) A systematic differential analysis for fast and robust detection of software aging. In: 33rd IEEE Symposium on Reliable Distributed Systems (SRDS). Nara, Japan, pp 311–320Medina V, García JM (2014) A survey of migration mechanisms of virtual machines. ACM Comput Surv 46(3):30Mell P, Grance T (2011) The NIST definition of cloud computing. Recommendations of the National Institute of Standards and Technology, Special Publication 800-145Menascé DA, Bennani MN (2006) Autonomic virtualized environments. In: International Conference on Autonomic and Autonomous Systems (ICAS), Silicon Valley, California, USA, p 28Menascé DA, Ngo P (2009) Understanding cloud computing: Experimentation and capacity planning. In: 35th International Computer Measurement Group Conference, Dallas, TX, USAMenascé DA, Ruan H, Gomaa H (2007) QoS management in service-oriented architectures. Perform Eval 64(7–8):646–663Miedes E, Muñoz-Escoí FD (2010) Dynamic switching of total-order broadcast protocols. In: International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), Las Vegas, Nevada, USA, pp 457–463Mohamed M (2014) Generic monitoring and reconfiguration for service-based applications in the cloud. PhD thesis, Université d’Evry-Val d’Essonne, FranceMohamed M, Amziani M, Belaïd D, Tata S, Melliti T (2015) An autonomic approach to manage elasticity of business processes in the cloud. Future Gener Comp Sys 50(C):49–61Mohd Yusoh ZI (2013) Composite SaaS resource management in cloud computing using evolutionary computation. PhD thesis, Sc Eng Faculty, Queensland University of Technology, Brisbane, AustraliaMontero RS, Moreno-Vozmediano R, Llorente IM (2011) An elasticity model for high throughput computing clusters. J Parallel Distrib Comput 71(6):750–757Morabito R, Kjällman J, Komu M (2015) Hypervisors vs. lightweight virtualization: a performance comparison. In: IEEE International Conference on Cloud Engineering (IC2E), Tempe, AZ, USA, pp 386–393Najjar A, Serpaggi X, Gravier C, Boissier O (2014) Survey of elasticity management solutions in cloud computing. In: Mahmood Z (ed) Continued rise of the cloud: advances and trends in cloud computing. Springer, Berlin, pp 235–263Naskos A, Gounaris A, Sioutas S (2015) Cloud elasticity: a survey. In: 1st International Workshop on Algorithmic Aspects of Cloud Computing (ALGOCLOUD), Patras, Greece, pp 151–167Neamtiu I, Dumitras T (2011) Cloud software upgrades: challenges and opportunities. In: IEEE International Workshop on the Maintenance and Evolution of Service-Oriented and Cloud-Based Systems (MESOCA), Williamsburg, VA, USA, pp 1–10Neuman BC (1994) Scale in distributed systems. In: Singhal M, Casavant TL (eds) Readings in Distributed computing systems. IEEE-CS Press, Los Alamitos, pp 463–489Padala P, Shin KG, Zhu X, Uysal M, Wang Z, Singhal S, Merchant A, Salem K (2007) Adaptive control of virtualized resources in utility computing environments. In: EuroSys Conference Lisbon, Portugal, pp 289–302Parnas DL (1994) Software aging. In: 6th International Conference on Software Engineering (ICSE), Sorrento, Italy, pp 279–287Parzen E (1960) A survey on time series analysis. Tech. rep., n. 37, Applied Mathematics and Statistics Laboratory, Stanford University, Stanford, CA, USAPascual-Miret L, González de Mendívil JR, Bernabéu-Aubán JM, Muñoz-Escoí FD (2015) Widening CAP consistency. Tech. rep., IUMTI-SIDI-2015/003, Univ. Politècnica de València, Valencia, SpainPopek GJ, Goldberg RP (1974) Formal requirements for virtualizable third generation architectures. Commun ACM 17(7):412–421Potter S, Nieh J (2005) AutoPod: Unscheduled system updates with zero data loss. In: 2nd International Conference on Autonomic Computing (ICAC), Seattle, WA, USA, pp 367–368Rajagopalan S (2014) System support for elasticity and high availability. PhD thesis, The University of British Columbia, Vancouver, CanadaReinecke P, Wolter K, van Moorsel APA (2010) Evaluating the adaptivity of computing systems. Perform Eval 67(8):676–693Rolia JA, Sevcik KC (1995) The method of layers. IEEE Trans Softw Eng 21(8):689–700Roy N, Dubey A, Gokhale AS (2011) Efficient autoscaling in the cloud using predictive models for workload forecasting. In: 4th IEEE International Conference on Cloud Computing (CLOUD), Washington, DC, USA, pp 500–507Ruiz-Fuertes MI, Muñoz-Escoí FD (2009) Performance evaluation of a metaprotocol for database replication adaptability. In: 28th IEEE Symposium on Reliable Distributed Systems (SRDS), Niagara Falls, New York, USA, pp 32–38Saito Y, Shapiro M (2005) Optimistic replication. ACM Comput Surv 37(1):42–81Seifzadeh H, Abolhassani H, Moshkenani MS (2013) A survey of dynamic software updating. J Softw Evol Process 25(5):535–568Sharma U, Shenoy PJ, Sahu S, Shaikh A (2011) A cost-aware elasticity provisioning system for the cloud. In: International Conference on Distributed Computing Systems (ICDCS), Minneapolis, Minnesota, USA, pp 559–570Shen M, Kshemkalyani AD, Hsu TY (2015) Causal consistency for geo-replicated cloud storage under partial replication. In: International Parallel and Distributed Processing Symposium (IPDPS) Workshop, Hyderabad, India, pp 509–518Shen Z, Subbiah S, Gu X, Wilkes J (2011) CloudScale: elastic resource scaling for multi-tenant cloud systems. In: ACM Symposium on Cloud Computing (SOCC), Cascais, Portugal, p 5Simoes R, Kamienski CA (2014) Elasticity management in private and hybrid clouds. In: 7th IEEE International Conference on Cloud Computing (CLOUD), Anchorage, AK, USA, pp 793–800Singh S, Chana I (2015) QoS-aware autonomic resource management in cloud computing: a systematic review. ACM Comput Surv 48(3):42:1–42:46Smith CU (1980) The prediction and evaluation of the performance of software from extended design specifications. PhD thesis, Department of Computer Science, The University of Texas at Austin, USASmith CU, Williams LG (2003) Software performance engineering. In: Lavagno L, Martin G, Selic B (eds) UML for real. Design of embedded real-time systems, chap 16. Springer, Berlin, pp 343–365Solarski M (2004) Dynamic upgrade of distributed software components. PhD thesis, Fakultät IV Elektronik und Informatik, Technischen Universität Berlin, Berlin, GermanySoltesz S, Pötzl H, Fiuczynski ME, Bavier AC, Peterson LL (2007) Container-based operating system virtualization: a scalable, high-performance alternative to hypervisors. In: European Conference, Lisbon, Portugal, pp 275–287Soules CAN, Appavoo J, Hui K, Wisniewski RW, Silva DD, Ganger GR, Krieger O, Stumm M, Auslander MA, Ostrowski M, Rosenburg BS, Xenidis J (2003) System support for online reconfiguration. In: USENIX Annual Technical Conference. San Antonio, Texas, USA, pp 141–154Sridharan S (2012) A performance comparison of hypervisors for cloud computing. Master Thesis (paper 269), School of Computing, University of North Florida, USAStonebraker M (1986) The case for shared nothing. IEEE Database Eng Bull 9(1):4–9Sun D, Guimarans D, Fekete A, Gramoli V, Zhu L (2015) Multi-objective optimisation of rolling upgrade allowing for failures in clouds. In: 34th IEEE Symposium on Reliable Distributed Systems (SRDS). Montreal, QC, Canada, pp 68–73Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. The MIT Press, CambridgeToosi AN, Calheiros RN, Buyya R (2014) Interconnected cloud computing environments: challenges, taxonomy, and survey. ACM Comput Surv 47(1):7:1–7:47Vaquero González LM, Rodero-Merino L, Cáceres J, Lindner MA (2009) A break in the clouds: towards a cloud definition. Comput Commun Rev 39(1):50–55Varrette S, Guzek M, Plugaru V, Besseron X, Bouvry P (2013) HPC performance and energy-efficiency of Xen, KVM and VMware hypervisors. In: 25th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD). Porto de Galinhas, Pernambuco, Brazil, pp 89–96Vasic N, Novakovic DM, Miucin S, Kostic D, Bianchini R (2012) DejaVu: accelerating resource allocation in virtualized environments. In: 17th nternational Conference on Architectural Support for Programing Languages and Operating Systems (ASPLOS), London, UK, pp 423–436Vaughan-Nichols SJ (2006) New approach to virtualization is a lightweight. IEEE Comput 39(11):12–14Vogels W (2009) Eventually consistent. Commun ACM 52(1):40–44Wada H, Suzuki J, Yamano Y, Oba K (2011) Evolutionary deployment optimization for service-oriented clouds. Softw Pract Exp 41(5):469–493Whitaker A, Cox RS, Shaw M, Gribble SD (2005) Rethinking the design of virtual machine monitors. IEEE Comput 38(5):57–62Wishart DMG (1969) A survey of control theory. J R Stat Soc Ser A-G 132(3):293–319Yataghene L, Amziani M, Ioualalen M, Tata S (2014) A queuing model for business processes elasticity evaluation. In: International Workshop on Advanced Information Systems for Enterprises (IWAISE), Tunis, Tunisia, pp 22–28Zawirski M, Preguiça N, Duarte S, Bieniusa A, Balegas V, Shapiro M (2015) Write fast, read in th

    Resource management in a containerized cloud : status and challenges

    Get PDF
    Cloud computing heavily relies on virtualization, as with cloud computing virtual resources are typically leased to the consumer, for example as virtual machines. Efficient management of these virtual resources is of great importance, as it has a direct impact on both the scalability and the operational costs of the cloud environment. Recently, containers are gaining popularity as virtualization technology, due to the minimal overhead compared to traditional virtual machines and the offered portability. Traditional resource management strategies however are typically designed for the allocation and migration of virtual machines, so the question arises how these strategies can be adapted for the management of a containerized cloud. Apart from this, the cloud is also no longer limited to the centrally hosted data center infrastructure. New deployment models have gained maturity, such as fog and mobile edge computing, bringing the cloud closer to the end user. These models could also benefit from container technology, as the newly introduced devices often have limited hardware resources. In this survey, we provide an overview of the current state of the art regarding resource management within the broad sense of cloud computing, complementary to existing surveys in literature. We investigate how research is adapting to the recent evolutions within the cloud, being the adoption of container technology and the introduction of the fog computing conceptual model. Furthermore, we identify several challenges and possible opportunities for future research

    Elastic, Interoperable and Container-based Cloud Infrastructures for High Performance Computing

    Full text link
    Tesis por compendio[ES] Las aplicaciones científicas implican generalmente una carga computacional variable y no predecible a la que las instituciones deben hacer frente variando dinámicamente la asignación de recursos en función de las distintas necesidades computacionales. Las aplicaciones científicas pueden necesitar grandes requisitos. Por ejemplo, una gran cantidad de recursos computacionales para el procesado de numerosos trabajos independientes (High Throughput Computing o HTC) o recursos de alto rendimiento para la resolución de un problema individual (High Performance Computing o HPC). Los recursos computacionales necesarios en este tipo de aplicaciones suelen acarrear un coste muy alto que puede exceder la disponibilidad de los recursos de la institución o estos pueden no adaptarse correctamente a las necesidades de las aplicaciones científicas, especialmente en el caso de infraestructuras preparadas para la ejecución de aplicaciones de HPC. De hecho, es posible que las diferentes partes de una aplicación necesiten distintos tipos de recursos computacionales. Actualmente las plataformas de servicios en la nube se han convertido en una solución eficiente para satisfacer la demanda de las aplicaciones HTC, ya que proporcionan un abanico de recursos computacionales accesibles bajo demanda. Por esta razón, se ha producido un incremento en la cantidad de clouds híbridos, los cuales son una combinación de infraestructuras alojadas en servicios en la nube y en las propias instituciones (on-premise). Dado que las aplicaciones pueden ser procesadas en distintas infraestructuras, actualmente la portabilidad de las aplicaciones se ha convertido en un aspecto clave. Probablemente, las tecnologías de contenedores son la tecnología más popular para la entrega de aplicaciones gracias a que permiten reproducibilidad, trazabilidad, versionado, aislamiento y portabilidad. El objetivo de la tesis es proporcionar una arquitectura y una serie de servicios para proveer infraestructuras elásticas híbridas de procesamiento que puedan dar respuesta a las diferentes cargas de trabajo. Para ello, se ha considerado la utilización de elasticidad vertical y horizontal desarrollando una prueba de concepto para proporcionar elasticidad vertical y se ha diseñado una arquitectura cloud elástica de procesamiento de Análisis de Datos. Después, se ha trabajo en una arquitectura cloud de recursos heterogéneos de procesamiento de imágenes médicas que proporciona distintas colas de procesamiento para trabajos con diferentes requisitos. Esta arquitectura ha estado enmarcada en una colaboración con la empresa QUIBIM. En la última parte de la tesis, se ha evolucionado esta arquitectura para diseñar e implementar un cloud elástico, multi-site y multi-tenant para el procesamiento de imágenes médicas en el marco del proyecto europeo PRIMAGE. Esta arquitectura utiliza un almacenamiento distribuido integrando servicios externos para la autenticación y la autorización basados en OpenID Connect (OIDC). Para ello, se ha desarrollado la herramienta kube-authorizer que, de manera automatizada y a partir de la información obtenida en el proceso de autenticación, proporciona el control de acceso a los recursos de la infraestructura de procesamiento mediante la creación de las políticas y roles. Finalmente, se ha desarrollado otra herramienta, hpc-connector, que permite la integración de infraestructuras de procesamiento HPC en infraestructuras cloud sin necesitar realizar cambios en la infraestructura HPC ni en la arquitectura cloud. Cabe destacar que, durante la realización de esta tesis, se han utilizado distintas tecnologías de gestión de trabajos y de contenedores de código abierto, se han desarrollado herramientas y componentes de código abierto y se han implementado recetas para la configuración automatizada de las distintas arquitecturas diseñadas desde la perspectiva DevOps.[CA] Les aplicacions científiques impliquen generalment una càrrega computacional variable i no predictible a què les institucions han de fer front variant dinàmicament l'assignació de recursos en funció de les diferents necessitats computacionals. Les aplicacions científiques poden necessitar grans requisits. Per exemple, una gran quantitat de recursos computacionals per al processament de nombrosos treballs independents (High Throughput Computing o HTC) o recursos d'alt rendiment per a la resolució d'un problema individual (High Performance Computing o HPC). Els recursos computacionals necessaris en aquest tipus d'aplicacions solen comportar un cost molt elevat que pot excedir la disponibilitat dels recursos de la institució o aquests poden no adaptar-se correctament a les necessitats de les aplicacions científiques, especialment en el cas d'infraestructures preparades per a l'avaluació d'aplicacions d'HPC. De fet, és possible que les diferents parts d'una aplicació necessiten diferents tipus de recursos computacionals. Actualment les plataformes de servicis al núvol han esdevingut una solució eficient per satisfer la demanda de les aplicacions HTC, ja que proporcionen un ventall de recursos computacionals accessibles a demanda. Per aquest motiu, s'ha produït un increment de la quantitat de clouds híbrids, els quals són una combinació d'infraestructures allotjades a servicis en el núvol i a les mateixes institucions (on-premise). Donat que les aplicacions poden ser processades en diferents infraestructures, actualment la portabilitat de les aplicacions s'ha convertit en un aspecte clau. Probablement, les tecnologies de contenidors són la tecnologia més popular per a l'entrega d'aplicacions gràcies al fet que permeten reproductibilitat, traçabilitat, versionat, aïllament i portabilitat. L'objectiu de la tesi és proporcionar una arquitectura i una sèrie de servicis per proveir infraestructures elàstiques híbrides de processament que puguen donar resposta a les diferents càrregues de treball. Per a això, s'ha considerat la utilització d'elasticitat vertical i horitzontal desenvolupant una prova de concepte per proporcionar elasticitat vertical i s'ha dissenyat una arquitectura cloud elàstica de processament d'Anàlisi de Dades. Després, s'ha treballat en una arquitectura cloud de recursos heterogenis de processament d'imatges mèdiques que proporciona distintes cues de processament per a treballs amb diferents requisits. Aquesta arquitectura ha estat emmarcada en una col·laboració amb l'empresa QUIBIM. En l'última part de la tesi, s'ha evolucionat aquesta arquitectura per dissenyar i implementar un cloud elàstic, multi-site i multi-tenant per al processament d'imatges mèdiques en el marc del projecte europeu PRIMAGE. Aquesta arquitectura utilitza un emmagatzemament integrant servicis externs per a l'autenticació i autorització basats en OpenID Connect (OIDC). Per a això, s'ha desenvolupat la ferramenta kube-authorizer que, de manera automatitzada i a partir de la informació obtinguda en el procés d'autenticació, proporciona el control d'accés als recursos de la infraestructura de processament mitjançant la creació de les polítiques i rols. Finalment, s'ha desenvolupat una altra ferramenta, hpc-connector, que permet la integració d'infraestructures de processament HPC en infraestructures cloud sense necessitat de realitzar canvis en la infraestructura HPC ni en l'arquitectura cloud. Es pot destacar que, durant la realització d'aquesta tesi, s'han utilitzat diferents tecnologies de gestió de treballs i de contenidors de codi obert, s'han desenvolupat ferramentes i components de codi obert, i s'han implementat receptes per a la configuració automatitzada de les distintes arquitectures dissenyades des de la perspectiva DevOps.[EN] Scientific applications generally imply a variable and an unpredictable computational workload that institutions must address by dynamically adjusting the allocation of resources to their different computational needs. Scientific applications could require a high capacity, e.g. the concurrent usage of computational resources for processing several independent jobs (High Throughput Computing or HTC) or a high capability by means of using high-performance resources for solving complex problems (High Performance Computing or HPC). The computational resources required in this type of applications usually have a very high cost that may exceed the availability of the institution's resources or they are may not be successfully adapted to the scientific applications, especially in the case of infrastructures prepared for the execution of HPC applications. Indeed, it is possible that the different parts that compose an application require different type of computational resources. Nowadays, cloud service platforms have become an efficient solution to meet the need of HTC applications as they provide a wide range of computing resources accessible on demand. For this reason, the number of hybrid computational infrastructures has increased during the last years. The hybrid computation infrastructures are the combination of infrastructures hosted in cloud platforms and the computation resources hosted in the institutions, which are named on-premise infrastructures. As scientific applications can be processed on different infrastructures, the application delivery has become a key issue. Nowadays, containers are probably the most popular technology for application delivery as they ease reproducibility, traceability, versioning, isolation, and portability. The main objective of this thesis is to provide an architecture and a set of services to build up hybrid processing infrastructures that fit the need of different workloads. Hence, the thesis considered aspects such as elasticity and federation. The use of vertical and horizontal elasticity by developing a proof of concept to provide vertical elasticity on top of an elastic cloud architecture for data analytics. Afterwards, an elastic cloud architecture comprising heterogeneous computational resources has been implemented for medical imaging processing using multiple processing queues for jobs with different requirements. The development of this architecture has been framed in a collaboration with a company called QUIBIM. In the last part of the thesis, the previous work has been evolved to design and implement an elastic, multi-site and multi-tenant cloud architecture for medical image processing has been designed in the framework of a European project PRIMAGE. This architecture uses a storage integrating external services for the authentication and authorization based on OpenID Connect (OIDC). The tool kube-authorizer has been developed to provide access control to the resources of the processing infrastructure in an automatic way from the information obtained in the authentication process, by creating policies and roles. Finally, another tool, hpc-connector, has been developed to enable the integration of HPC processing infrastructures into cloud infrastructures without requiring modifications in both infrastructures, cloud and HPC. It should be noted that, during the realization of this thesis, different contributions to open source container and job management technologies have been performed by developing open source tools and components and configuration recipes for the automated configuration of the different architectures designed from the DevOps perspective. The results obtained support the feasibility of the vertical elasticity combined with the horizontal elasticity to implement QoS policies based on a deadline, as well as the feasibility of the federated authentication model to combine public and on-premise clouds.López Huguet, S. (2021). Elastic, Interoperable and Container-based Cloud Infrastructures for High Performance Computing [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/172327TESISCompendi

    RHAS: robust hybrid auto-scaling for web applications in cloud computing

    Get PDF

    HPC for Urgent Decision-Making

    Get PDF
    Emerging use cases from incident response planning and broad-scope European initiatives (e.g. Destination Earth [1,2], European Green Deal and Digital Package [21]) are expected to require federated, distributed infrastructures combining computing and data platforms. These will provide elasticity enabling users to build applications and integrate data for thematic specialisation and decision support, within ever shortening response time windows. For prompt and, in particular, for urgent decision support, the conventional usage modes of HPC centres is not adequate: these rely on relatively long-term arrangements for time-scheduled exclusive use of HPC resources, and enforce well- established yet time-consuming policies for granting access. In urgent decision support scenarios, managers or members of incident response teams must initiate processing and control the resources required based on their real-time judgement on how a complex situation evolves over time. This circle of clients is distinct from the regular users of HPC centres, and they must interact with HPC workflows on-demand and in real-time, while engaging significant HPC and data processing resources in or across HPC centres. This white paper considers the technical implications of supporting urgent decisions through establishing flexible usage modes for computing, analytics and AI/ML-based applications using HPC and large, dynamic assets. The target decision support use cases will involve ensembles of jobs, data-staging to support workflows, and interactions with services/facilities external to HPC systems/centres. Our analysis identifies the need for flexible and interactive access to HPC resources, particularly in the context of dynamic workflows processing large datasets. This poses several technical and organisational challenges: short-notice secure access to HPC and data resources, dynamic resource allocation and scheduling, coordination of resource managers, support for data-intensive workflow (including data staging on node-local storage), preemption of already running workloads and interactive steering of simulations. Federation of services and resources across multiple sites will help to increase availability, provide elasticity for time-varying resource needs and enable leverage of data locality
    • …
    corecore