    Exploratory Analysis of Human Sleep Data

    In this thesis we develop data mining techniques to analyze sleep irregularities in humans. We investigate the effects of several demographic, behavioral and emotional factors on sleep progression and on patient\u27s susceptibility to sleep-related and other disorders. Mining is performed over subjective and objective data collected from patients visiting the UMass Medical Center and the Day Kimball Hospital for treatment. Subjective data are obtained from patient responses to questions posed in a sleep questionnaire. Objective data comprise observations and clinical measurements recorded by sleep technicians using a suite of instruments together called polysomnogram. We create suitable filters to capture significant events within sleep epochs. We propose and employ a Window-based Association Rule Mining Algorithm to discover associations among sleep progression, pathology, demographics and other factors. This algorithm is a modified and extended version of the Set-and-Sequences Association Rule Mining Algorithm developed at WPI to support the mining of association rules from complex data types. We analyze both the medical as well as the statistical significance of the associations discovered by our algorithm. We also develop predictive classification models using logistic regression and compare the results with those obtained through association rule mining

    Quantification of Physical Activity and Sleep Behaviors with Wearable Sensors : Analysis of a large-scale real-world heart rate variability dataset

    Puettavia mittalaitteita, kuten älykelloja, voidaan käyttää arjessa oman terveydentilan, fyysisen kunnon, terveyskäyttäytymisen sekä hyvinvoinnin seuraamiseen. Puettavien mittalaitteiden käyttö on nykyisin suosittua, ja kuluttajat mittaavat niillä yleensä liikuntaa ja unta. Puettavien mittalaitteiden keräämä mittausaineisto on esimerkki arkielämän aineistoista (real-world data), jotka voivat tarjota käytännönläheisiä havaintoja terveydestä ja hyvinvoinnista. Arkielämässä kerättyjen aineistojen hyödyntäminen tutkimustarkoituksiin on kuitenkin haastavaa, sillä kuluttajat käyttävät puettavia mittalaitteita vapaaehtoisesti arkielämän olosuhteissa. Siksi aineiston käsittelyssä on otettava huomioon aineiston keräyksen kontrolloimattomat tutkimusasetelmien ulkopuoliset olosuhteet, jotka aiheuttavat mittausaineistoon tyypillisesti epätarkkuutta ja puutteellisuutta sekä otospopulaation valikoituneisuutta. Puettavien mittalaitteiden tuottamille jatkuva-aikaisille aineistoille ei myöskään toistaiseksi ole vakiintuneita käsittelytapoja. Näiden tekijöiden vuoksi puettavien mittalaitteiden keräämiä aineistoja käytetään nykyisin vielä vain vähän tutkimuksissa, vaikka ne voivat tarjota uusia havaintoja terveyskäyttäytymisestä ja hyvinvoinnista. Väitöstyössä hyödynnetään puettavan sydämen sykevälivaihtelua mittaavan laitteen tuottamaa arkielämän suurta aineistoa määrittämään liikuntaan ja uneen liittyvää käyttäytymistä. Liikunta ja uni ovat tärkeitä terveyskäyttäytymisen tekijöitä, ja väitöstyössä tutkitaan erityisesti liikunnan määrittämisen menetelmiä, liikuntakäyttäytymisen ajallista vaihtelua, sekä liikunnan, alkoholin nauttimisen ja muiden elämäntapojen vaikutusta uneen. Lisäksi väitöstyön tavoitteena on arvioida puettavien mittalaitteiden tuottamien suurten arkielämän aineistojen ja niiden hyödyntämisen soveltuvuutta tieteellisen tutkimukseen sekä osoittaa näiden aineistojen tarjoamia uusia havaintoja ja näkökulmia terveydestä ja hyvinvoinnista. Väitöstutkimuksen aineistona käytettiin 52 273 suomalaisen työntekijän tunnisteettomia arkielämässä tehtyjä sydämen sykevälivaihtelun mittauksia, jotka oli alun perin tehty osana terveyttä edistävää ja ennaltaehkäisevää terveydenhuoltoa. Aineisto on kerätty Firstbeat Technologies Oy:n toimesta, joka kehittää ja tarjoaa sykevälivaihtelun analyysimenetelmiä liikunnan, stressin ja palautumisen arviointiin. Aineisto sisälsi kolmipäiväisiä jatkuva-aikaisia mittauksia sydämen sykevälivaihtelusta sekä itseraportointeja nautitusta alkoholin määrästä sekä työ- että nukkumisajoista. Väitöstyössä liikunnan määrittämisessä hyödynnettiin sykevälivaihteluun perustuvaa hapenoton arviota. Unta arvioitiin autonomisen hermoston säätelyn kautta käyttäen perinteisiä sykevälivaihtelumuuttujia sekä uudenlaisia sykevälivaihteluun perustuvia palautumismuuttujia. Väitöstyön tulokset pohjautuvat sekä perinteisiin tilastollisiin että koneoppimisen menetelmiin. Liikuntakäyttäytymisessä havaittiin ajallista vaihtelua: liikunnan määrä oli korkein viikonloppuisin sekä alkuvuonna. Kun liikuntaa arvioitiin absoluuttisella hapenotolla, liikunnan määrä oli korkeampi miehillä kuin naisilla, ja nuoremmilla kuin vanhemmilla sekä normaalipainoisilla kuin lihavilla henkilöillä. Toisaalta kun liikunnan määrää arvioitiin ottaen huomioon henkilöiden kuntotaso, erot liikunnan määrässä henkilöiden välillä pieneni huomattavasti. Lisäksi liikuntakäyttäytymisellä havaittiin olevan yhteys uneen. Päivällä harrastettu liikunta näytti heikentävän autonomisen hermoston parasympaattista säätelyä unen aikana, mutta säännöllinen liikunta näytti lisäävän parasympaattista säätelyä ja palautumista unen aikana. Unen aikaisen autonomisen hermoston säätelyn kannalta tärkein tekijä oli kuitenkin päivän aikana nautittu alkoholi. Jo 1–2 alkoholiannosta heikensi autonomisen hermoston parasympaattista säätelyä unen aikana ja tämä säätely heikkeni sitä enemmän, mitä useampia alkoholiannoksia päivän aikana nautittiin. Painoon suhteutettu, sama alkoholimäärä näytti vaikuttavan autonomisen hermoston säätelyyn enemmän nuoremmilla kuin vanhemmilla henkilöillä, mutta samalla tavalla sekä paljon että vähän liikuntaa harrastavilla henkilöillä, ja sekä miehillä että naisilla. Monet väitöstyön tulokset tukevat aiempia tutkimustuloksia, kuten esimerkiksi havainnot suuremmasta liikunta-aktiivisuudesta viikonloppuisin, miesten, nuorten ja normaalipainoisten suuremmasta liikuntamäärästä absoluuttisella hapenottomäärällä mitattuna, sekä liikunnan ja alkoholin yhteydestä autonomisen hermoston säätelyyn unen aikana. Toisaalta väitöstyössä havaittiin esimerkiksi myös alkoholin nauttimisen ja henkilön taustatekijöiden yhteisvaikutuksia autonomisen hermoston säätelyyn, joita ei ole voitu aiemmin tutkia pienten tutkimuspopulaatioiden vuoksi. Kokonaisuudessaan väitöstyö osoittaa, että puettavien mittalaitteiden tuottamat arkielämän aineistot soveltuvat tieteelliseen tutkimukseen ja tulokset tukevat aiempia tutkimustuloksia, mutta tarjoavat myös uusia havaintoja sekä näkemyksiä. Tosielämän tieto voikin parantaa terveyskäyttäytymisen ja hyvinvoinnin tuntemusta, erityisesti niiltä osin, joihin perinteiset tutkimusasetelmat eivät sovellu. Käytännössä tosielämän havaintoja ja tietoa voidaan käyttää havainnollistamaan käyttäytymisen vaikutusta terveyteen ja hyvinvointiin, sekä tukemaan terveyskäyttäytymisen muutosta entistä henkilökohtaisemmin ja kohdennetummin.Wearable monitoring devices, such as smartwatches, are used for monitoring personal health, fitness, health behaviors and well-being in daily life. Nowadays, wearable devices are popular and many consumers use them, in particular, to record their physical activity and sleep. Data recorded with wearable devices is an example of real-world data that can provide practical observations and insights on health and wellness, but its analyses pose challenges for research. Consumers conduct continuous recordings with wearable devices in non-research settings. Hence, any analysis of wearable real-world monitoring data must take into account the limitations and inaccuracies of the data, as well as sampling biases and incomplete representativeness of the population that arise from the uncontrolled data collection setting. To date, there are no well-established methods for analyzing health behaviors and well-being from continuous wearable monitoring data. Consequently, real-world health monitoring data is not commonly used for research although it could provide valuable observations and insights on health behaviors and well-being. This thesis work aims at analyzing a large-scale real-world dataset of wearable heart rate variability (HRV) recordings to quantify the behaviors of physical activity (PA) and sleep that are one of the most important health behaviors. Specifically, the thesis focuses on the quantification methods and temporal patterns of PA behavior, as well as the associations that PA, alcohol intake and other lifestyles have with sleep. In addition, this thesis work aims to evaluate the feasibility to use real-world wearable monitoring data with applicable analysis methodologies for scientific research, and to demonstrate the observations and data-driven hypotheses that the results provide. The study material was an anonymized real-world HRV monitoring dataset of 52,273 Finnish employees, which was gathered and prepared by Firstbeat Technologies Oy (Jyväskylä, Finland), a Finnish company providing and developing HRV analytics for stress, recovery and exercise. The dataset included three-day continuous HRV recordings performed in free-living settings combined with self- reports of alcohol intake, work and sleep times. The recordings were originally performed for a routine wellness program (Firstbeat Lifestyle Assessment) provided for the employees by their employers as a part of preventive occupational healthcare and health promotion program. For the analysis of this thesis, PA behavior was quantified from the recordings using an HRV-based estimate of the oxygen uptake. Sleep was quantified by the regulation of the autonomic nervous system (ANS) using traditional HRV parameters and novel HRV-based indices of recovery. Both statistical and machine- learning methods were employed in the analysis for the thesis results. Temporal variations in PA behavior were observed: the amount of PA was highest at the weekends and at the beginning of the year. The amount of PA quantified by the absolute oxygen consumption was higher for men than for women, and higher for younger than older subjects, and also higher for individuals of normal weight than obese. However, PA levels were more similar between the subjects when their physical fitness level was considered in quantifying PA. Moreover, PA behavior was associated with sleep. After a day including PA, the parasympathetic regulation of the ANS and recovery during sleep were diminished, but regular PA seemed to increase parasympathetic regulation of the ANS and aid recovery during sleep. The most important predictor for ANS regulation during sleep was, however, acute alcohol intake. Acute alcohol intake dose-dependently diminished the parasympathetic regulation of the ANS and recovery during sleep, an effect that was already observable after only 1–2 standardized units of alcohol. Moreover, the same alcohol intake, normalized by the body weight, seemed to affect the ANS regulation more in younger subjects than in the older ones, but was similar for both sedentary and physically active subjects, as well as for both men and women. Many of the results obtained in this thesis accord with the findings of previous studies, such as the higher PA level on weekends, the higher amount of absolute intensity PA in men, younger and normal weight subjects, and the relationship of PA and alcohol intake with the ANS regulation during sleep. On the other hand, the results of this thesis provide new observations, for example, about the interaction between alcohol intake and subject’s background characteristics that could not have been studied before due to the limited and homogenous study populations. In conclusion, the results of this thesis demonstrates that real-world wearable monitoring data can be feasible for scientific research and its results not only supports the findings of existing studies but also provides new observations, insights and data-driven hypotheses. The real-world evidence facilitates our understanding of aspects of health behaviors and wellness that cannot be studied in the more traditional, controlled research settings. These real-world insights can be further used for designing more personalized and targeted health interventions and as tools for promoting health and well-being

    Predicting Mental Health Crisis in Veterans: Early Warning Signs, Precursors and Protective Factors

    Mental Health (MH) conditions have recently increased to a large extent due to socio-demographic changes. Posttraumatic Stress Disorder (PTSD) is one of the most common mental health disorders prevalent in US. PTSD is even more troubling at double the rate in combat veterans leaving their service compared to general population. Severity of PTSD is associated with risk taking behaviors such as substance abuse, non-suicidal self-injury, and sexual risk behaviors. Psychological disorders are often preceded by early warning signs and recognizing the early warning signs of PTSD will help in preventing the returning or worsening of PTSD symptoms. Ecological momentary assessment (EMA) studies are more sophisticated in tracking fluctuations of symptoms real-time, and they are effective in monitoring for crisis events in veterans. Mobile applications are commonly used means to gather such EMA information from participants. Our research focuses on developing interpretable machine learning (ML) models using socio-demographic data and EMA data from natural settings to predict high PTSD risk in veterans and those who engage in risky behaviors. Findings from these models can be integrated with existing m-health frameworks to generate text alerts to the mentors when the crisis patterns are observed in their mentees. Such an integrated crisis prediction and alerting system would add benefit to peer mentors to plan intervention

    A Database For Exploratory Analysis of Human Sleep

    This thesis focuses on the design, development, and exploratory analysis of a human sleep data repository. We have successfully collected comprehensive data for 1,046 sleep disorder patients and created a Terabyte-scale database system to handle it. The data for each patient was collected from the patient\u27s medical records, and from the patient\u27s allnight sleep study (for a total of about 0.6 Gigabytes per patient). Data collected from the patient\u27s medical record contain more than 70 attributes, including demographic data, smoking, drinking, and exercise habits, depression and daytime sleepiness questionnaires, and overall medical history. Data collected from the patient\u27s all-night sleep study consist of 50-55 time-series signals recorded during a period of 6-8 hours at the hospital\u27s sleep clinic. These signals include among others an electroencephalogram, electromyogram, electrooculogram, electrocardiogram, and signals tracking blood oxygen level, body position, limb movements, snoring and blood pressure. 350 additional attributes summarize sleep related events taking place during the night long study, including sleep stages, arousals, and respiratory disturbances. Particular attention during the development of our database system was paid to a database design that effectively handles the data size and complexity, that describes the structure of sleep data in clinically meaningful terms, and that will facilitates the discovery of patterns in sleep data using machine learning algorithms. We have interfaced our database with Weka, a well known data mining system. To the best of our knowledge, our database is one of the world\u27s largest and most comprehensive in the domain of human sleep disorders

    The association between manganese exposure, parkinsonism, and quality of life in South African manganese mine workers

    A research report submitted to the Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, in partial fulfillment of the requirements for the degree of Master of Science in Epidemiology & Biostatistics. February 2018.Background Manganese is an essential micronutrient for humans, but excessive levels are harmful. Manganese neurotoxicity is associated with parkinsonism and the associated motor deficits can affect an individual’s daily activities and quality of life (QoL) in manganese–exposed persons. Objectives In this study, we sought to investigate the associations between manganese, parkinsonism and QoL in South African manganese mine workers, in the period 2010–2014. Methods This was a secondary analysis of data from 418 South African manganese mine workers already recruited into a prospective study of the association between Mn mining exposure and parkinsonism. Parkinsonism, the primary outcome, was defined as a Unified Parkinson’s Disease Rating Scale motor subsection part 3 score (UPDRS-3) ≥15. The 39–item Parkinson’s Disease Questionnaire (PDQ-39) was used to assess miners’ health status or QoL, the secondary outcome. Cumulative manganese exposure in mg/m3-year (measured as inhalable dust) was estimated using an exposure matrix from participants’ job histories. We used Mann-Whitney and Pearson’s Chi-Square tests to compare participants’ parkinsonism status with regard to baseline continuous and categorical characteristics. Multiple linear and logistic regression modeling was used to quantify associations. Results The mean age of the manganese mine workers was 41.5 years (SD=11.9); 97.6% were male. Average manganese exposure was estimated as 3.7 mg/m3-years (SD=5.8) at baseline with mean duration of 13.5 years (SD=11.7). The prevalence of parkinsonism was 29.4%. Participants’ characteristics, stratified by parkinsonism status, differed significantly by age, education, and comorbid disease. Parkinsonism prevalence decreased significantly with increasing miners’ education status, p=0.029 and was higher (36.4% vs 25.9%, p=0.042) in those with comorbidities. Parkinsonism participants were generally older (mean age 45.3 vs 39.6, p<0.0001). QoL sub-scores and total scaled PDQ-39 score means were higher in mine workers with parkinsonism compared to those without. We found no evidence of a monotonic dose-response relationship between cumulative manganese exposure and parkinsonism. Similarly, there was no statistically significant association between QoL and cumulative manganese exposure. Being aged 40 years or older was an independent risk factor for having parkinsonism (OR=2.11, 95% CI: 1.18, 3.78). Parkinsonism (β=0.63, p=0.004) and age (β= -0.48, p=0.031) were strong predictors of QoL. Conclusion We found a strong association between parkinsonism and QoL in manganese mine workers, confirming previous reports in manganese–exposed welders. There was no evidence of an association between parkinsonism and manganese exposure. The lack of a monotonic dose–response relationship between parkinsonism and manganese exposure may be due to the healthy worker survivor effect, a non-linear relationship, or exposure misclassification.LG201

    Text Mining for Information Systems Researchers: An Annotated Topic Modeling Tutorial

    Analysts have estimated that more than 80 percent of today’s data is stored in unstructured form (e.g., text, audio, image, video)—much of it expressed in rich and ambiguous natural language. Traditionally, to analyze natural language, one has used qualitative data-analysis approaches, such as manual coding. Yet, the size of text data sets obtained from the Internet makes manual analysis virtually impossible. In this tutorial, we discuss the challenges encountered when applying automated text-mining techniques in information systems research. In particular, we showcase how to use probabilistic topic modeling via Latent Dirichlet allocation, an unsupervised text-mining technique, with a LASSO multinomial logistic regression to explain user satisfaction with an IT artifact by automatically analyzing more than 12,000 online customer reviews. For fellow information systems researchers, this tutorial provides guidance for conducting text-mining studies on their own and for evaluating the quality of others

    Behavioral periodicity detection from 24h wrist accelerometry and associations with cardiometabolic risk and health-related quality of life

    Periodicities (repeating patterns) are observed in many human behaviors. Their strength may capture untapped patterns that incorporate sleep, sedentary, and active behaviors into a single metric indicative of better health. We present a framework to detect periodicities from longitudinal wrist-worn accelerometry data. GENEActiv accelerometer data were collected from 20 participants (17 men, 3 women, aged 35–65) continuously for (range: 13.9 to 102.0) consecutive days. Cardiometabolic risk biomarkers and health-related quality of life metrics were assessed at baseline. Periodograms were constructed to determine patterns emergent from the accelerometer data. Periodicity strength was calculated using circular autocorrelations for time-lagged windows. The most notable periodicity was at 24 h, indicating a circadian rest-activity cycle; however, its strength varied significantly across participants. Periodicity strength was most consistently associated with LDL-cholesterol (’s = 0.40–0.79, ’s < 0.05) and triglycerides (’s = 0.68–0.86, ’s < 0.05) but also associated with hs-CRP and health-related quality of life, even after adjusting for demographics and self-rated physical activity and insomnia symptoms. Our framework demonstrates a new method for characterizing behavior patterns longitudinally which captures relationships between 24 h accelerometry data and health outcomes