Query Optimization in ElasticSearch

Abstract

V diplomskem delu predstavim bazo, njeno zgodovino nastanka in kam se umešča s stališča primerov uporabe na trgu programske opreme. Naredim kratek pregled primerov uporabe iz realnega sveta in na kratko raziščem trende njene popularnosti v primerjavi s sorodnimi produkti ter trgom kot celoto. Na podlagi lastnih izkušenj, literature, uradne dokumentacije in izkušenj drugih uporabnikov obravnavam primere, ki so povzročali problematično delovanje baze. Za vsakega od primerov obravnavam možnost in smiselnost rešitve problema z avtomatsko optimizacijo poizvedb. Primer obravnavam tudi zgodovinsko, saj je Elasticsearch v zadnjem letu z novimi verzijami bistveno spreminjal. Ugotavljam, da avtomatizacija poizvedb ni smiselna, saj so razvijalci pri Elastcisearchu večino primerov rešili z arhitekturnimi spremembami, z interno optimizacijo in s spremembo poizvedovalnega jezika, ki uporabniku odvzame dvoumnost pri izražanju poizvedb. Ugotavljam, da so najbolj pomembna lastnost dobro delujoče gruče, primerna velikost črepinje, ki pa je ni mogoče enostavno spreminjati. Za to je potrebno eksperimentalno načrtovanje zmogljivost, ki ga tudi opišem. Poleg optimalne velikosti črepinje obstaja nekaj slabih praks, ki jih v delu tudi opišem, z njimi lahko sesujemo gručo in zanje je pomembno, da jih uporabnik baze pozna.In the graduation thesis, I present database, its history of origin, and where it is placed from the perspective of cases of use on the software market. I make a short overview of examples of use from the real world, and shortly research trends of its popularity compared to related products and market as a whole. Based on my own experience, literature, official documentation, and experience of other users, I examine the cases which caused problematic operation of the database. For each of the cases I examine the possibility and advisability of solving the problem with automatic optimisation of queries. I examine the case also historically, since Elasticsearch has in the last year significantly changed. I note that automation of queries is not advisable, since the developers in Elasticsearch solved most of the cases with architectural changes, internal optimisation, and a change of query language, which takes away from the user ambiguity in expressing the queries. I establish that the most important feature of well-functioning cluster is a proper size of shards, which cannot be easily changed. For that, an experimental planning of activities is necessary, which I also describe. In addition to optimum size of shard, there are some bad practices, which I also describe in the thesiswith them we can collapse cluster, and it is important that they are known by the user

    Similar works