99 research outputs found

    Visualizing large epidemiological data sets using depth and density

    Get PDF
    Only abstract. Paper copies of master’s theses are listed in the Helka database (http://www.helsinki.fi/helka). Electronic copies of master’s theses are either available as open access or only on thesis terminals in the Helsinki University Library.Vain tiivistelmä. Sidottujen gradujen saatavuuden voit tarkistaa Helka-tietokannasta (http://www.helsinki.fi/helka). Digitaaliset gradut voivat olla luettavissa avoimesti verkossa tai rajoitetusti kirjaston opinnäytekioskeilla.Endast sammandrag. Inbundna avhandlingar kan sökas i Helka-databasen (http://www.helsinki.fi/helka). Elektroniska kopior av avhandlingar finns antingen öppet på nätet eller endast tillgängliga i bibliotekets avhandlingsterminaler.The emphasis in this work is on visualizing large data sets in epidemiology. In epidemiology, especially studies of rare events in a population require large sample size. Also, large data are collected in longitudinal studies and by national health services and other government officials. The developments in computer technology over the last ten years have increased the efficiency of processing of large data sets and this has opened new opportunities for the statistical data analysis. Specific methods has been developed for visualizing large data sets, since most of the traditional data analyzing tools are not necessarily efficient. In this work the definition of a large data set is based on two typical characteristics of large data sets: A data set is large if plotting or computation times are long, or if plots have an extensive amount of overplotting. In this work the visualization of large data sets is discussed in terms of these two characteristics. Large data sets are discussed in a bivariate situation and added with one or two categorical variables. The visualization of large data sets is discussed with the graphical methods based on the concepts of depth and density. Both approaches deal with overplotting by aggregating data into groups which are visualized instead of individual data points. The methods using depth define a depth value for each observation and visualize groups which are determined using the depth values. The most studied depth-based method is the bagplot whose modification, the grouped bagplot, is introduced in this work. On the other hand, the methods using density divide a two-dimensional plane into bins and analyze the data points separately in each bin. Density-based methods are more well-known and more used than depth-based methods hence in this thesis the emphasis is on depth-based methods. The graphical methods are applied using the data sets of the MORGAM Project. The MORGAM Project is a large international follow-up study of the cardiovascular diseases and genetic risk factors. The methods are compared using an interactive web application which was developed as a part of this work for researchers of the MORGAM Project. The user of the web application selects a data set to be analyzed and graphical methods to be used. The application then shows the graphs based on selections. It was observed during the comparison of depth- and density-based methods, that the bimodality of a data set was detected only with density-based methods. This is due to the fact that when using depth-based methods the assumption is that the underlying data set is unimodal. On the other hand, the group comparison was more efficient with depth-based methods. The comparison of the processing times of the methods showed that the methods using depth have longer processing times than the methods using density.Tämän tutkielman aiheena ovat suurten epidemiologisten aineistojen visualisointi. Epidemiologiassa suuri otoskoko on edellytys erityisesti tutkittaessa harvinaisen tapahtuman esiintymistä populaatiossa. Lisäksi suuria epidemiologisia aineistoja syntyy laajoissa pitkittäistutkimuksissa sekä kansallisen terveydenhuollon ja viranomaisten keräämänä. Suurten aineistojen käsittelyn tehokkuutta on nostanut viimeisen vuosikymmenen aikana tapahtunut tietokonetekniikan kehitys, joka on avannut uusia mahdollisuuksia tilastotieteellisen data-analyysin alueella. Suurten aineistojen visualisointia varten on kehitetty omia menetelmiä, sillä tavalliset graafiset menetelmät eivät välttämättä ole tehokkaita. Tässä tutkielmassa suuri aineisto on määritelty kahden suurille aineistoille tyypillisten ominaisuuden perusteella: Aineisto on suuri, jos aineiston prosessointiaika on pitkä tai aineistoa kuvatessa suurin osa pisteistä sijoittuu päällekkäin tai hyvin lähelle toisiaan. Tässä työssä suurten aineistojen visualisointia käsitellään näiden kahden ongelman näkökulmasta. Suurien aineistojen käsittely on rajattu kahden jatkuvan muuttujan tapaukseen. Lisäksi tarkastellaan aineistoja, joissa on kahden jatkuvan muuttujan lisäksi yksi tai kaksi luokittelumuuttujaa. Suurten aineistojen visualisointia käsitellään aineiston syvyyden (data depth) ja tiheyden käsitteisiin perustuvien menetelmien avulla. Molempien lähestymistapojen perustana on aineiston yhdistäminen ryhmiksi, jotka kuvataan yksittäisten havaintojen sijasta päällekkäisten havaintojen ongelman poistamiseksi. Syvyysmenetelmät määrittelevät jokaiselle havainnolle syvyysarvon ja visualisoivat ryhmät, jotka muodostetaan syvyysarvojen perusteella. Tutkituin syvyysmenetelmä on pussikuvio (the bagplot), jonka muunnelma, ryhmäpussikuvio (the grouped bagplot), esitetään tässä työssä. Tiheysmenetelmät taas jakavat kaksiulotteisen tason osiin ja tarkastelevat havaintojen määrää erikseen jokaisessa osassa. Tiheysmenetelmät ovat tunnetumpia ja käytetympiä kuin syvyysmenetelmät, joten tässä tutkielmassa painotus on syvyysmenetelmissä. Graafisia menetelmiä sovelletaan käytännössä MORGAM-projektin aineistoihin. MORGAM-projekti on suuri kansainvälinen sydän- ja verisuonitautien ja geneettisten riskitekijöiden seurantatutkimus. Menetelmien vertailussa käytetään interaktiivista verkkosovellusta, joka kehitettiin osana tätä tutkielmaa työkaluksi MORGAM:in tutkijoille. Sovelluksen käyttäjä valitsee analysoitavan aineiston ja graafiset menetelmät, jonka jälkeen sovellus muodostaa kuvat. Vertailtaessa syvyys- ja tiheysmenetelmiä toisiinsa huomataan, että aineiston monihuippuisuuden havaitseminen onnistuu ainoastaan tiheysmenetelmiltä. Tämä johtuu siitä, että syvyyden käsitteeseen liittyy oletus aineiston yksihuippuisuudesta. Toisaalta, syvyysmenetelmät ovat tehokkaampia ryhmien vertailussa. Lisäksi vertailtaessa menetelmien prosessointiaikoja huomataan, että tiheysmenetelmien prosessointiajat ovat lyhyempiä

    The relationship between gambling expenditure, socio-demographics, health-related correlates and gambling behavioura cross-sectional population-based survey in Finland

    Get PDF
    AimsTo investigate gambling expenditure and its relationship with socio-demographics, health-related correlates and past-year gambling behaviour. DesignCross-sectional population survey. SettingPopulation-based survey in Finland. ParticipantsFinnish people aged 15-74years drawn randomly from the Population Information System. The participants in this study were past-year gamblers with gambling expenditure data available (n = 3251, 1418 women and 1833 men). MeasurementsExpenditure shares, means of weekly gambling expenditure (WGE, Euro) and monthly gambling expenditure as a percentage of net income (MGE/NI, %) were calculated. The correlates used were perceived health, smoking, mental health [Mental Health Inventory (MHI)-5], alcohol use [Alcohol Use Disorders Identification Test (AUDIT)-C], game types, gambling frequency, gambling mode and gambling severity [South Oaks Gambling Screen (SOGS)]. FindingsGender (men versus women) was found to be associated significantly with gambling expenditure, with exp = 1.40, 95% confidence interval (CI) = 1.29, 1.52 and P ConclusionsIn Finland, male gender is associated significantly with both weekly gambling expenditure and monthly gambling expenditure related to net income. People in Finland with lower incomes contribute proportionally more of their income to gambling compared with middle- and high-income groups.Peer reviewe

    Socio-Demographic Factors, Gambling Behaviour, and the Level of Gambling Expenditure : A Population-Based Study

    Get PDF
    The aim of this study was to examine the relationship between socio-demographic factors, gambling behaviour, and the level of gambling expenditure. The data were drawn from the population-based Gambling Harms Survey 2016 and 2017 conducted in Finland. The data were linked to register-based variables. Past-year gamblers were included (Wave 1; n=5 805, both Waves; n=2 165). The study showed that of the 4.2% of gamblers that produced 50.0% of the total GE in 2016, 33.1% of the GE was produced by those with a gambling problem and 43.3% by those with at-risk gambling pattern. Compared to gamblers in the lowest GE group, those in the highest GE group were more likely to be men, aged 25 or older, with upper secondary education, have a high income, be on dis- ability pension or sickness allowance, be frequent gamblers, gambling at least six game types, and showing at-risk and problem gambling patterns. Cumulative weekly GE by income tertiles remained fairly stable between the years. The results suggest that GE is highly concentrated. Among the small group of high-intensity consumers, the majority of the revenue comes from at-risk and problem gambling. Participants in the low GE group differ from those in the intermediate and high GE groups in terms of socio-demographics and gambling behaviour.The aim of this study was to examine the relationship between socio-demographic factors, gambling behaviour, and the level of gambling expenditure. The data were drawn from the population-based Gambling Harms Survey 2016 and 2017 conducted in Finland. The data were linked to register-based variables. Past-year gamblers were included (Wave 1; n=5 805, both Waves; n=2 165). The study showed that of the 4.2% of gamblers that produced 50.0% of the total GE in 2016, 33.1% of the GE was produced by those with a gambling problem and 43.3% by those with at-risk gambling pattern. Compared to gamblers in the lowest GE group, those in the highest GE group were more likely to be men, aged 25 or older, with upper secondary education, have a high income, be on dis- ability pension or sickness allowance, be frequent gamblers, gambling at least six game types, and showing at-risk and problem gambling patterns. Cumulative weekly GE by income tertiles remained fairly stable between the years. The results suggest that GE is highly concentrated. Among the small group of high-intensity consumers, the majority of the revenue comes from at-risk and problem gambling. Participants in the low GE group differ from those in the intermediate and high GE groups in terms of socio-demographics and gambling behaviour.Peer reviewe

    Suitability of random forest analysis for epidemiological research: Exploring sociodemographic and lifestyle-related risk factors of overweight in a cross-sectional design.

    Get PDF
    Aims: Factors that contribute to the development of overweight are numerous and form a complex structure with many unknown interactions and associations. We aimed to explore this structure (i.e. the mutual importance or hierarchy of sociodemographic and lifestyle-related risk factors of being overweight) using a machine-learning technique called random forest (RF). The results were compared with traditional logistic regression (LR) analysis. Methods: The cross-sectional FINRISK 2007 Study included 4757 Finns (aged 25-74 years). Information on participants' lifestyle and sociodemographic characteristics were collected with questionnaires. Diet was assessed, using a validated food-frequency questionnaire. Height and weight were measured. Participants with a body mass index (BMI) 25 kg/m(2) were classified as overweight. R-statistical software was used to run RF analysis (randomForest') to derive estimates for variable importance and out-of-bag error, which were compared to a LR model. Results: In total, 704 (32%) men and 1119 (44%) women had normal BMI, whereas 1502 (69%) men and 1432 (57%) women had BMI 25. Estimated error rates for the models were similar (RF vs. LR: 42% vs. 40% for men, 38% vs. 35% for women). Both models ranked age, education and physical activity as the most important risk factors for being overweight, but RF ranked macronutrients (carbohydrates and protein) as more important compared to LR. Conclusions: RF did not demonstrate higher power in variable selection compared to LR in our study. The features of RF are more likely to appear beneficial in settings with a larger number of predictors.Peer reviewe

    BMC Public Health Gambling expenditure by game type among weekly gamblers in Finland

    Get PDF
    Background: Excessive expenditure and financial harms are core features of problem gambling. There are various forms of gambling and their nature varies. The aim was to measure gambling expenditure by game type while controlling for demographics and other gambling participation factors. A further aim was to find out how each game type was associated with gambling expenditure when the number of game types played is adjusted for. Methods: Using data from the 2015 Finnish Gambling survey on adult gamblers (n = 3555), multiple log-linear regression was used to examine the effects of demographics, gambling participation, and engaging in different game types on weekly gambling expenditure (WGE) and relative gambling expenditure (RGE). Background: Excessive expenditure and financial harms are core features of problem gambling. There are various forms of gambling and their nature varies. The aim was to measure gambling expenditure by game type while controlling for demographics and other gambling participation factors. A further aim was to find out how each game type was associated with gambling expenditure when the number of game types played is adjusted for. Conclusions: It seems that overall gambling frequency is the strongest indicator of high gambling expenditure. Our results showed that different game types had different effect sizes on gambling expenditure. Weekly gambling on horse races and non-monopoly games had the greatest increasing effect on expenditure. However, different game types also varied based on their popularity. The extent of potential harms caused by high expenditure therefore also varies on the population level. Based on our results, future prevention and harm minimization efforts should be tailored to different game types for greater effectiveness.Peer reviewe

    Rahapelaaminen, peliongelmat ja rahapelaamiseen liittyvät asenteet ja mielipiteet vuosina 2007-2019 : Suomalaisten rahapelaaminen 2019

    Get PDF
    Suomalaisten rahapelaamista on kartoitettu säännöllisesti toistettujen väestötutkimusten avulla. Rahapelejä ovat muun muassa arvontapelit kuten Lotto, urheilu- ja hevospelit, rahapeliautomaatit, raaputusarvat ja kasinopelit. Näitä pelejä voi pelata myös internetissä. Raportissa tarkastellaan suomalaisten rahapelaamista, rahapeliongelmia sekä pelaamiseen liittyviä mielipiteitä ja asenteita. Tähän raporttiin on koottu keväällä 2020 julkaistut tulokset ja aiemmin julkaisemattomia tuloksia sosiaaliryhmittäisenä tarkasteluna sekä tietoa pelaamisympäristöstä, motivaatiotekijöistä ja rahapelihaitoista. Uusina teemoina mukana ovat myös digitaalinen pelaaminen, digipeliongelmat, alkoholin riskikäyttö ja psyykkinen kuormittuneisuus. Ajallisessa vertailussa on mukana vuodet 2007, 2011, 2015 ja 2019. Tieto tukee päätöksentekoa ja palvelee rahapelihaittojen ehkäisyn, hoidon ja hoitopalvelujen parissa työskenteleviä sekä tutkijoita. Tutkimus tarjoaa luotettavaa tietoa myös rahapelihaittojen ehkäisytyön tarpeisiin. Raportti on tuotettu sosiaali- ja terveysministeriön toimeksiannosta Terveyden ja hyvinvoinnin laitoksessa

    Effectiveness of a Gamification Strategy to Prevent Childhood Obesity in Schools : A Cluster Controlled Trial

    Get PDF
    Objective The aim of this study was to examine the effectiveness of a school-based gamification strategy to prevent childhood obesity. Methods Schools were randomized in Santiago, Chile, between March and May 2018 to control or to receive a nutrition and physical activity intervention using a gamification strategy (i.e., the use of points, levels, and rewards) to achieve healthy challenges. The intervention was delivered for 7 months and participants were assessed at 4 and 7 months. Primary outcomes were mean difference in BMI z score and waist circumference (WC) between trial arms at 7 months. Secondary outcomes were mean difference in BMI and systolic and diastolic blood pressure between trial arms at 7 months. Results A total of 24 schools (5 controls) and 2,197 students (653 controls) were analyzed. Mean BMI z score was lower in the intervention arm compared with control (adjusted mean difference -0.133, 95% CI: -0.25 to -0.01), whereas no evidence of reduction in WC was found. Mean BMI and systolic blood pressure were lower in the intervention arm compared with control. No evidence of reduction in diastolic blood pressure was found. Conclusions The multicomponent intervention was effective in preventing obesity but not in reducing WC. Gamification is a potentially powerful tool to increase the effectiveness of school-based interventions to prevent obesity.Peer reviewe
    corecore