Search CORE

27 research outputs found

Confidence of Gaussian Processes

Author: Ruusmann Laura
Publication venue
Publication date: 01/01/2018
Field of study

Masinõpe on arvutiteaduse valdkond, mis tegeleb arvutisüsteemide oskusega iseseisvalt õppida. Masinõppemeetodeid kasutatakse nii andmete kirjeldamiseks kui ka tunnustele väärtuste ennustamiseks. Kui masinõppemudelit kasutatakse reaalarvulise väärtuse ennustamiseks, siis nimetatakse seda regressiooniks. Praktikas on reaalarvulist väärtust ennustades tihti tarvis arvestada, et vääral ennustusel võivad olla kallid tagajärjed. Väärade ennustuste kahju aitab vähendada see, kui mudel oskab ise hinnata, kui täpne tema ennustus on. Üheks näiteks sellisest hinnangust on tagastada vahemik, kuhu mudel 95% tõenäosusega hindab olevat õige väärtuse. Selline lähenemine on Gaussi protsessidel põhineva regressioonimudeli eriliseks omaduseks ning seda vahemikku nimetatakse usaldusvahemikuks. On oluline, et mudeli hinnang enda täpsuse kohta vastaks tegelikkusele ning et mudel ei hindaks end liiga enesekindlalt. Masinõppemudelite usaldusväärsuse hindamine on oluline, sest selliste mudelitega tarkvara kätte on tänapäeval usaldatud üha vastutusrikkamate otsuste langetamine. Antud bakalaureusetöö keskendub Gaussi protsessidel põhineva regressioonimudeli enesekindluse uurimisele. Antud töös uuritakse, kui tihti satuvad ennustatavate väärtuste tegelikud väärtused vahemikku, kuhu mudel hindab nende sattumise 95% tõenäosusega. Mõõtmised 6651 mudelil näitavad, et suurem osa päris märgendeid satuvad usaldusvahemikku oluliselt harvem kui 95% juhtudest ehk et Gaussi protsesside mudel on liigselt enesekindel. Keskmiseks usaldusvahemikku kuulunud osakaaluks on 0,93. Töö peamine tulemus on, et 73% mõõtetulemustest on madalamad kui võiks olla eelnevalt nimetatud tõenäosuse järgi. Ühtlasi on märkimisväärne see, et kõige väiksemate ja kõige suuremate väärtustega sisendväärtuste puhul on mudel rohkem liigselt enesekindel. Gaussi protsesside usaldusvahemiku uurimise näol on tegemist millegagi, midaei ole varem uuritud. Tänu käesolevale tööle on olemas hinnang Gaussi protsesside regressioonimudeli usaldusväärsusele ning selle töö tulemus aitab Gaussi protsesside kasutajatel võtta arvesse antud meetodi liigset enesekindlust.Machine learning is a field in computer science that provides computer systems with the ability to learn independently. Machine learning methods are used for both descriptive and predictive purposes. When a machine learning model is used to predict a real valued number it is called regression. In practice, it is often important in regression to take into account that false predictions might have severe consequences. To avoid such false predictions, it is helpful if the model is able to rate how accurate its prediction is. An example of this is for the model to provide an interval where it predicts the true value with 95% certainty. This approach is unique to Gaussian process regression model and thisinterval is called confidence interval. It is important that the model rates itself accurately and not overly confidently. Evaluating confidence of machine learning models is important since software solutions equipped with machine learning algorithms are becoming more common and are being trusted with decisions that require more responsibility. This Bachelor’s thesis focuses on the confidence of Gaussian process regression models. This research examines how often are true values contained in the intervals where model predicts them with 95% probability. Measurement results on 6651 models show that the majority of true labels are included in the confidence interval in less than 95% of cases, which means that Gaussian process regression model is overconfident. Mean ratio of true labels in confidence intervals per model was 0.93. Main result of the research is that for 73% of the models the confidence intervalcontained less true labels than was expected by the probability. It is noteworthy that for input values that had smallest or largest values the model was more often overconfident.Confidence of Gaussian processes has not been researched before and this research provides evaluation on how reliable are Gaussian processes. The results of this thesis enable users of Gaussian processes models to consider overconfidence of models

ADA University of Tartu

How should the completeness and quality of curated nanomaterial data be evaluated?

Author: Aberg
Arts
Auvinen
Baalousha
Badireddy
Batini
Beronius
Bondarenko
Bouwmeester
Bouwmeester
Brazma
Brazma
Casals
Chirico
Chirico
Cho
Cho
Christine Ogilvie Hendren
Christoffer Åberg
Clarissa Marquardt
Clift
Cohen
Crist
Dearden
Dhawan
Doak
Domey
Domingos
Donaldson
Durda
European Commission
Fadeel
Fenech
Field
Field
Fostel
Fred Klaessig
Fu
Fu
Gajewicz
Gajewicz
Gallud
Gebel
Gilbertson
Golbamaki
Grieger
Guadagnini
Gunsolus
Guzan
Hackley
Hall
Handy
Hanne Vriens
Hendren
Hendren
Hendren
Hirsch
Hobbs
Holden
Hole
Horst
Hristozov
Hubert Rauscher
Huk
Iseult Lynch
Izak-Nau
Izak-Nau
Jeliazkova
John Rumble
Kaiser
Kaplan
Karcher
Karlsson
Kettner
Kim
Kim
Klimisch
Klimisch
Kookana
Kroll
Kroll
Krug
Kühnel
Kühnel
Laborda
Lee
Leggett
Li
Liu
Lohse
Lubinski
Lynch
Lynch
Lynch
Lynch
Lövestam
Ma
Madden
Maiorano
Marchese Robinson
Mark D. Hoover
Marquardt
Martínez-Bartolomé
Maynard
Mikolajczyk
Milani
Miller
Mills
Mitrano
Monopoli
Morris
Murdock
Mustad
Myers
Nanayakkara
Nel
Nme
Nowack
Oberdörster
OECD
Ong
Oomen
Oomen
Orchard
Ostraat
Paszkiewicz
Peter Hoet
Petersen
Petersen
Powers
Powers
Przybylak
Puzyn
Rauscher
Richard L. Marchese Robinson
Rocca-Serra
Rocca-Serra
Ronit Purian
Rumble
Ruusmann
Rösslein
Sandra Karcher
Sansone
Sansone
Schneider
Shankar
Simkó
Spek
Stacey L. Harper
Stefaniak
Taylor
Taylor
Thomas
Thomas
Thomas
Tomasz Puzyn
Van Noorden
van Reeuwijk
Vrček
Walkey
Ware
Warheit
Warheit
Watson
Willie Peijnenburg
Wörle-Knirsch
Xia
Yang
Zhang
Ågerstrand
Publication venue: 'Royal Society of Chemistry (RSC)'
Publication date: 01/01/2016
Field of study

Nanotechnology is of increasing significance. Curation of nanomaterial data into electronic databases offers opportunities to better understand and predict nanomaterials' behaviour. This supports innovation in, and regulation of, nanotechnology. It is commonly understood that curated data need to be sufficiently complete and of sufficient quality to serve their intended purpose. However, assessing data completeness and quality is non-trivial in general and is arguably especially difficult in the nanoscience area, given its highly multidisciplinary nature. The current article, part of the Nanomaterial Data Curation Initiative series, addresses how to assess the completeness and quality of (curated) nanomaterial data. In order to address this key challenge, a variety of related issues are discussed: the meaning and importance of data completeness and quality, existing approaches to their assessment and the key challenges associated with evaluating the completeness and quality of curated nanomaterial data. Considerations which are specific to the nanoscience area and lessons which can be learned from other relevant scientific disciplines are considered. Hence, the scope of this discussion ranges from physicochemical characterisation requirements for nanomaterials and interference of nanomaterials with nanotoxicology assays to broader issues such as minimum information checklists, toxicology data quality schemes and computational approaches that facilitate evaluation of the completeness and quality of (curated) data. This discussion is informed by a literature review and a survey of key nanomaterial data curation stakeholders. Finally, drawing upon this discussion, recommendations are presented concerning the central question: how should the completeness and quality of curated nanomaterial data be evaluated

LJMU Research Online (Liverpool John Moores University)

JRC Publications Repository

University of Groningen

KITopen

University of Birmingham Research Portal

ScholarsArchive@OSU

Leiden University Scholary Publications

Lirias

Crossref

Proceedings - University of Groningen

Repository KITopen

ARTS repository - University of Groningen

PubMed Central

Dissertations of the University of Groningen

Perspectives from the NanoSafety Modelling Cluster on the validation criteria for (Q)SAR models used in nanotechnology

Author: Aberg
Agnieszka Gajewicz
Alberto Fernández
Alexander
Andrea-N. Richarz
Astefanei
Baalousha
Bai
Baldi
Baskin
Berrar
Berrar
Boulos
Braga-Neto
Brehm
Burello
Burello
Cassano
Chen
Cherkasov
Chirico
Chirico
Cho
Chomenidis
Clark
Coe
Cohen
Consonni
Cooper
Crist
Cronin
Dearden
Dearden
DLA
Domey
Donaldson
Dragos
Drakakis
EChA
Editorial
Efron
Emilio Benfenati
Eriksson
Esbensen
Fahmy
Ferreira
Fourches
Gadaleta
Gajewicz
Gajewicz
Gasteiger
Golbamaki
Golbraikh
Golbraikh
Golbraikh
Gorodkin
Gramatica
Gramatica
Gramatica
Guyon
Handy
Hansch
Hanser
Haralambos Sarimveis
Hassellov
Hastings
Hawkins
Hawkins
Helma
Helma
Hu
Iqbal
Janna Hastings
Jeliazkova
Jin
Johnson
Kaplan
Kar
Kar
Keefer
Kennard
Khan
Kleandrova
Kroll
Krug
Le
Lewinski
Lewis
Lindgren
Lindh
Liu
Liu
Lopez
Lopez
Low
Luan
Lubinski
Lusted
Lv
Lövestam
Manke
Manthos G. Papadopulos
Marchese Robinson
Marchese Robinson
Mark T.D. Cronin
Marquardt
Massart
Mikolajczyk
Miller
Mitchell
Murdock
Nel
Netzeva
Newcombe
Nina Jeliazkova
Norinder
OECD
OECD
OECD
OECD
OECD
OECD
OECD
OECD
OECD
Oksel
Oksel
Oksel
Oksel
Palmer
Pathakoti
Pavan
Petersen
Piir
Potter
Powers
Powers
Przybylak
Puzyn
Puzyn
Rauscher
Richard L. Marchese Robinson
Richarz
Robert Rallo
Ross
Roy
Roy
Rucker
Ruusmann
Ruusmann
Sahlin
Sahlin
Sahlin
Serrano-Andres
Sheridan
Sheridan
Sheridan
Silva
Singh
Sizochenko
Stefaniak
Sushko
Tantra
Taylor
Tetko
Tetko
Thomas
Thomas
Todeschini
Tomasz Puzyn
Topliss
Topliss
Toropov
Toropov
Toropov
Toropova
Toropova
Toropova
Tropsha
Tropsha
Visser
Vladimir Lobaskin
Vriens
Walkey
Wehrens
Weininger
Winkler
Wirnitzer
Wold
Wood
Worth
Worth
Worth
Xia
Yang
Yasri
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/02/2018
Field of study

Nanotechnology and the production of nanomaterials have been expanding rapidly in recent years. Since many types of engineered nanoparticles are suspected to be toxic to living organisms and to have a negative impact on the environment, the process of designing new nanoparticles and their applications must be accompanied by a thorough exposure risk analysis. (Quantitative) Structure-Activity Relationship ([Q]SAR) modelling creates promising options among the available methods for the risk assessment. These in silico models can be used to predict a variety of properties, including the toxicity of newly designed nanoparticles. However, (Q)SAR models must be appropriately validated to ensure the clarity, consistency and reliability of predictions. This paper is a joint initiative from recently completed European research projects focused on developing (Q)SAR methodology for nanomaterials. The aim was to interpret and expand the guidance for the well-known “OECD Principles for the Validation, for Regulatory Purposes, of (Q)SAR Models”, with reference to nano-(Q)SAR, and present our opinions on the criteria to be fulfilled for models developed for nanoparticles

LJMU Research Online (Liverpool John Moores University)

Crossref

White Rose Research Online

Dekabrist Wilhelm Küchelbecker

Author: Ruusmann Ants
Publication venue: Tartu : Tartu Riiklik Ülikool
Publication date: 01/01/1960
Field of study

https://www.ester.ee/record=b5637248*es

ADA University of Tartu

Defitsiit kui osa nõukogudeaegsest argielust Eesti NSV-s : fenomenoloogiline vaatepunkt

Author: Ruusmann Reet
Publication venue
Publication date: 13/08/2010
Field of study

Teadusmagistritöö elektrooniline versioon ei sisalda publikatsioone

ADA University of Tartu

Qualitative and quantitative aspects of acute aquatic toxicity to Tetrahymena pyriformis

Author: Ruusmann Villu
Publication venue
Publication date: 01/01/2006
Field of study

ADA University of Tartu

Comparison of category-level, item-level and general sales forecasting models

Author: Ruusmann Laura
Publication venue: Tartu Ülikool
Publication date: 01/01/2020
Field of study

Sales forecasting is the process of estimating future sales. In this thesis, multiple methods are tested out for achieving best forecasting accuracy with lowest computational requirements. Three families of methods are investigated: a traditional statistical forecasting approach (ARIMA), classical machine learning techniques (specifically ensemble methods) and a third one based on deep learning methods (specifically recurrent neural networks with LSTM architectures). The study uses real-world sales transaction data from a large retail company in a Baltic country and the aim of this thesis is to improve their current sales forecasting system. Here we show that improving on their current sales forecasting is possible and additionally analyse the influence of promotional sales to prediction accuracy. The results show that using a combination of multiple item-level decision tree-based ensemble models yields the best prediction accuracy with regard to training complexity. Additionally, when comparing accuracy of forecasts for promotional sales and non-promotional sales, a variant of ARIMA achieves the most accurate results when forecasting promotional sales

ADA University of Tartu

"The Needs of Consumers Oblige": Daily Problems and Criticism of the System in Public Letters in the 1960s and 1980s

Author: Reet Ruusmann
Publication venue: University of Tartu, Estonian National Museum, Estonian Literary Museum
Publication date: 01/12/2010
Field of study

<p>This article examines public letters as an expression of everyday household problems and practices. The sources used are those letters from among the archival materials of the Tartu Retail Trade Association and the Tallinn Markets Administration that have survived from the 1960s and 1980s. Using qualitative thematic analysis, I examine the points of criticism that have arisen in connection with retail trade and the (deficit) reality behind it, along with specific behavioural practices. In addition, my aim is to show the relations between individuals and the state, and their dynamics, by presenting criticism with the help of discourse analysis.</p

Journals from University of Tartu

Directory of Open Access Journals

From data point timelines to a well curated data set, data mining of experimental data and chemical structure data from scientific articles, problems and possible solutions

Author: Uko Maran
Villu Ruusmann
Publication venue: Springer Science and Business Media LLC
Publication date: 01/07/2013
Field of study

Crossref

QSAR DataBank - an approach for the digital organization and archiving of QSAR model information

Author: Sulev Sild
Uko Maran
Villu Ruusmann
Publication venue: Springer Science and Business Media LLC
Publication date: 01/01/2014
Field of study

Abstract Background Research efforts in the field of descriptive and predictive Quantitative Structure-Activity Relationships or Quantitative Structure–Property Relationships produce around one thousand scientific publications annually. All the materials and results are mainly communicated using printed media. The printed media in its present form have obvious limitations when they come to effectively representing mathematical models, including complex and non-linear, and large bodies of associated numerical chemical data. It is not supportive of secondary information extraction or reuse efforts while in silico studies poses additional requirements for accessibility, transparency and reproducibility of the research. This gap can and should be bridged by introducing domain-specific digital data exchange standards and tools. The current publication presents a formal specification of the quantitative structure-activity relationship data organization and archival format called the QSAR DataBank (QsarDB for shorter, or QDB for shortest). Results The article describes QsarDB data schema, which formalizes QSAR concepts (objects and relationships between them) and QsarDB data format, which formalizes their presentation for computer systems. The utility and benefits of QsarDB have been thoroughly tested by solving everyday QSAR and predictive modeling problems, with examples in the field of predictive toxicology, and can be applied for a wide variety of other endpoints. The work is accompanied with open source reference implementation and tools. Conclusions The proposed open data, open source, and open standards design is open to public and proprietary extensions on many levels. Selected use cases exemplify the benefits of the proposed QsarDB data format. General ideas for future development are discussed. </jats:sec

Crossref

Springer - Publisher Connector

PubMed Central