92 research outputs found
Approximate Top-K Retrieval from Hidden Relations
We consider the evaluation of approximate top-k queries from relations with a-priori unknown values. Such relations can arise for example in the context of expensive predicates, or cloud-based data sources. The task is to find an approximate top-k set that is close to the exact one while keeping the total processing cost low. The cost of a query is the sum of the costs of the entries that are read from the hidden relation.
A novel aspect of this work is that we consider prior information about the values in the hidden matrix. We propose an algorithm that uses regression models at query time to assess whether a row of the matrix can enter the top-k set given that only a subset of its values are known. The regression models are trained with existing data that follows the same distribution as the relation subjected to the query.
To evaluate the algorithm and to compare it with a method proposed previously in literature, we conduct experiments using data from a context sensitive Wikipedia search engine. The results indicate that the proposed method outperforms the baseline algorithms in terms of the cost while maintaining a high accuracy of the returned results
Hypoteettisten ydivoimalaitosonnettomuuksien mahdolliset seuraukset kotimaisissa ydinvoimalaitoksissa
Knowledge of the potential consequences of a nuclear power plant accident is an important aspect on emergency preparedness. The protective actions during a nuclear emergency are based on the radiological consequences of the release. This thesis studies three hypothetical nuclear power accident scenarios with different magnitudes. The operating Finnish power reactor units Loviisa 1&2 and Olkiluoto 1&2 are included in the study. Modeling of the consequences is based on historical weather data from years 2012-2015 retrieved with AROME and HARMONIE operative weather forecast models. Dispersion and deposition calculations are done with SILAM dispersion model. Dose rates and doses are calculated with threat assessment tool TIUKU, developed by the Finnish Radiation and Nuclear Safety Authority. Post-processing of the data is done with Python and its computational libraries. The results are compared to operational intervention levels and dose criteria. The sufficiency of emergency planning zones (EPZs) is analysed as well. Comparison shows that the protective actions are needed outside the emergency planning zones in the worst scenario studied, otherwise the EPZs suite the studied scenarios.Tietämys ydinvoimalaitosonnettomuuden mahdollisista seurauksista on tärkeä osa onnettomuuksiin varatumisessa. Suojelutoimet ydinvoimalaitosonnettomuuden aikana pohjautuvat päästön säteilyvaikutuksiin. Tämä diplomityö tutkii kolmea eri suuruusluokan hypoteettista voimalaitosonnettomuusskenaariota Suomen käyvillä laitosyksiköillä Loviisa 1&2 sekä Olkiluoto 1&2. Vaikutusten arviointi perustuu todelliseen säädataan vuosilta 2012-2015, joka on kerätty operatiivisilla AROME- ja HARMONIE-sääennustusmalleilla. Leviämislaskut ja laskeumat on laskettu käytteän SILAM-leviämismallia. Annosnopeudet ja annokset on laskettu Säteilyturvakeskuksen uhka-arviotyökalu TIUKUlla. Tulosten jälkikäsittely on tehty Pythonilla sen laskennallisia kirjastoja käyttäen. Saatuja tuloksia verrataan operatiivisiin toimenpidetasoihin ja annoskriteereihin. Lisäksi varautumisalueiden riittävyyttä arvioidaan. Vertailu osoittaa, että vakavimmassa tutkitussa onnettomuustapauksessa suojelutoimia tarvitaan varautumisalueen ulkopuolella. Muissa tapauksissa varautumisalue todettiin riittäväksi
Algorithms for finding orders and analyzing sets of chains
Rankings of items are a useful concept in a variety of applications, such as clickstream analysis, some voting methods, bioinformatics, and other fields of science such as paleontology. This thesis addresses two problems related to such data. The first problem is about finding orders, while the second one is about analyzing sets of orders.
We address two different tasks in the problem of finding orders. We can find orders either by computing an aggregate of a set of known orders, or by constructing an order for a previously unordered data set. For the first task we show that bucket orders, a subclass of partial orders, are a useful structure for summarizing sets of orders. We formulate an optimization problem for finding such partial orders, show that it is NP-hard, and give an efficient randomized algorithm for finding approximate solutions to it. Moreover, we show that the expected cost of a solution found by the randomized algorithm differs from the optimal solution only by a constant factor. For the second approach we propose a simple method for sampling orders for 0–1 vectors that is based on the consecutive ones property.
For analyzing orders, we discuss three different methods. First, we give an algorithm for clustering sets of orders. The algorithm is a variant of Lloyd's iteration for solving the k-means problem. We also give two different approaches for mapping orders to vectors in a high-dimensional Euclidean space. These mappings are used on one hand for clustering, and on the other hand for creating two dimensional visualizations (scatterplots) for sets of orders. Finally, we discuss randomization testing in case of orders. To this end we propose an MCMC algorithm for creating random sets of orders that preserve certain well defined properties of a given set of orders. The random data sets can be used to assess the statistical significance of the results obtained e.g. by clustering
Randomization algorithms for large sparse networks
In many domains it is necessary to generate surrogate networks, e.g., for hypothesis testing of different properties of a network. Generating surrogate networks typically requires that different properties of the network are preserved, e.g., edges may not be added or deleted and edge weights may be restricted to certain intervals. In this paper we present an efficient property-preserving Markov chain Monte Carlo method termed CycleSampler for generating surrogate networks in which (1) edge weights are constrained to intervals and vertex strengths are preserved exactly, and (2) edge and vertex strengths are both constrained to intervals. These two types of constraints cover a wide variety of practical use cases. The method is applicable to both undirected and directed graphs. We empirically demonstrate the efficiency of the CycleSampler method on real-world data sets. We provide an implementation of CycleSampler in R, with parts implemented in C.Peer reviewe
High-yield production of biologically active recombinant protein in shake flask culture by combination of enzyme-based glucose delivery and increased oxygen transfer
This report describes the combined use of an enzyme-based glucose release system (EnBase®) and high-aeration shake flask (Ultra Yield Flask™). The benefit of this combination is demonstrated by over 100-fold improvement in the active yield of recombinant alcohol dehydrogenase expressed in E. coli. Compared to Terrific Broth and ZYM-5052 autoinduction medium, the EnBase system improved yield mainly through increased productivity per cell. Four-fold increase in oxygen transfer by the Ultra Yield Flask contributed to higher cell density with EnBase but not with the other tested media, and consequently the product yield per ml of EnBase culture was further improved
Teacher Agency and Futures Thinking
Problems encountered in top-down school reforms have repeatedly highlighted the significance of teachers’ agency in educational change. At the same time, temporality has been identified as a key element in teachers’ agency, with teachers’ beliefs about the future and experiences of the past shaping their agentic orientations. However, research on teachers’ future orientations is typically limited to short-term trajectories, as opposed to long-term visions of education. To address this, we draw on a futures studies perspective to give more explicit attention to teachers’ long-term visions of their work. We argue that the method of future narratives, already well-established in the field of futures studies, is a fruitful methodological framework for studying these long-term visions. In this paper, we first show that the futures studies approach is theoretically compatible with the ecological model of teacher agency. We then outline the method of future narratives to point out the possibilities it offers. Finally, we illustrate our approach with an exploratory analysis of a small set of future narratives where teachers imagine a future workday. Our analysis reveals that the narratives offer a rich view of teachers’ longer-term visions of education, including instances of reflecting on the role of education in relation to broader societal developments. Our study suggests that this novel approach can provide tools for research on teacher agency as well as practical development of teacher education, addressing long-term educational issues and policies.Peer reviewe
- …