172 research outputs found

    Doctor of Philosophy

    Get PDF
    dissertationFamily health history (FHH) is an independent risk factor for predicting an individual's chance of developing selected chronic diseases. Though various FHH tools have been developed, many research questions remain to be addressed. Before FHH can be used as an effective risk assessment tool in public health screenings or population-based research, it is important to understand the quality of collected data and evaluate risk prediction models. No literature has been identified whereby risks are predicted by applying machine learning solely on FHH. This dissertation addressed several questions. First, using mixed methods, we defined 50 requirements for documenting FHH for a population-based study. Second, we examined the accuracy of self- and proxy-reported FHH data in the Health Family Tree database, by comparing the disease and risk factor rates generated from this database with rates recorded in a cancer registry and standard public health surveys. The rates generated from the Health Family Tree were statistically lower than those from public sources (exceptions: stroke rates were the same, exercise rates were higher). Third, we validated the Health Family Tree risk predictive algorithm. The very high risk (≥2) predicted the risk of all concerned diseases for adult population (20 ~ 99 years of age), and the predictability remained when using disease rates from public sources as the reference in the relative risk model. The referent population used to establish the expected rate of disease impacted risk classification: the lower expected disease rates generated by the Health Family Tree, in comparison to the rates from public iv sources, caused more persons to be classified at high risk. Finally, we constructed and evaluated new predictive models using three machine learning classifiers (logistic regression, Bayesian networks, and support vector machine). A limited set of information about first-degree relatives was used to predict future disease. In summary, combining FHH with valid risk algorithms provide a low cost tool for identifying persons at risk for common diseases. These findings may be especially useful when developing strategies to screen populations for common diseases and identifying those at highest risk for public health interventions or population-based research

    Mining Social Media to Understand Consumers' Health Concerns and the Public's Opinion on Controversial Health Topics.

    Full text link
    Social media websites are increasingly used by the general public as a venue to express health concerns and discuss controversial medical and public health issues. This information could be utilized for the purposes of public health surveillance as well as solicitation of public opinions. In this thesis, I developed methods to extract health-related information from multiple sources of social media data, and conducted studies to generate insights from the extracted information using text-mining techniques. To understand the availability and characteristics of health-related information in social media, I first identified the users who seek health information online and participate in online health community, and analyzed their motivations and behavior by two case studies of user-created groups on MedHelp and a diabetes online community on Twitter. Through a review of tweets mentioning eye-related medical concepts identified by MetaMap, I diagnosed the common reasons of tweets mislabeled by natural language processing tools tuned for biomedical texts, and trained a classifier to exclude non medically-relevant tweets to increase the precision of the extracted data. Furthermore, I conducted two studies to evaluate the effectiveness of understanding public opinions on controversial medical and public health issues from social media information using text-mining techniques. The first study applied topic modeling and text summarization to automatically distill users' key concerns about the purported link between autism and vaccines. The outputs of two methods cover most of the public concerns of MMR vaccines reported in previous survey studies. In the second study, I estimated the public's view on the ac{ACA} by applying sentiment analysis to four years of Twitter data, and demonstrated that the the rates of positive/negative responses measured by tweet sentiment are in general agreement with the results of Kaiser Family Foundation Poll. Finally, I designed and implemented a system which can automatically collect and analyze online news comments to help researchers, public health workers, and policy makers to better monitor and understand the public's opinion on issues such as controversial health-related topics.PhDInformationUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/120714/1/owenliu_1.pd

    A Transdisciplinary Emergent Approach for Systems and Interventions (EASI)

    Get PDF
    In modeling human behavior and social structures several factors can emerge over time this can be attributed to the availability of new data, increased complexity, changes to the organizational structure, interventions, introduction of innovative technology or services and due to improved knowledge and treatments. We hypothesize a new class of emergent decision support systems that continually evolve to account for this Causal Drift . In this work, we illustrate the application of the Emergent Approach to Systems and Intervention (EASI™) methodology with the example of Community Intervention Activity Model (CIAM) to reduce the rate of diabetic hospitalization at the local/ county level. A key contribution of this work is the design of an efficient theoretically informed emergent data collection system. A second key contribution of this work is that it offers practitioners a methodology to systematically determine data that needs to be collected and then model the collected data. Thus EASI™ methodology supports the efficient capture of data that has utility in decision making. To ensure applicability of this work publicly available Behavioral Risk Factor Surveillance System (BRFSS) and Social Vulnerability Index (SVI) data sets have been utilized. The EASI™ method has four significant advantages: a) the prediction is based on theoretically informed causal structure; this allows it to be used as a basis for evaluation of interventions as opposed to deep learning and other machine-based structure learning methods which are susceptible to spurious associations, b) existing data is utilized to evaluate clinical relevance of predictions, c) leveraging high dimensional synthetic observational health data to model health objectives, and d) provides guidance on transformation of system from the emergent basis to the new emergent system as new knowledge is gained. The dissertation proposes, implements, and evaluates the EASI™ methodology as applied to a CAIM for the reduction in diabetic hospitalizations

    Pathopoiesis Mechanism of Smoking and Shared Genes in Pancreatic Cancer

    Get PDF
    Pancreatic cancer (PC) remains a significant, unresolved issue because of its complex genetic blueprint and lack of reliable detection markers. The purpose of this study was to examine the possible correlation between tobacco use, gender, and age in the etiopathogenesis of PC and other cancer types with a shared-gene association (CTSG-A). The unified paradigm of cancer causation was used to understand the pathopoiesis mechanism of smoking and shared genes in PC. A cross-sectional study was performed using secondary data from the cancer survivorship module of the 2014 Behavioral Risk Factor Surveillance System survey. Results of ordinal logistic regression analyses indicated no correlation between smoking and prevalence of PC and CTSG-A, but gender and age were significant predictors. Gender has a statistically significant effect on the prediction of PC/ CTSG-A induction and promotion. Increased probability of developing the disease was found as the person reach the age between 62 and 69 years of age. Findings may enhance the understanding of environmental, genetic, and biodemographic interactions in disease evolution (induction, promotion, and expression periods). Findings may also be used to promote population health and improve health behaviors for individuals in vulnerable, high-risk groups

    Alleviating Environmental Health Disparities Through Community Science and Data Integration

    Get PDF
    Environmental contamination is a fundamental determinant of health and well-being, and when the environment is compromised, vulnerabilities are generated. The complex challenges associated with environmental health and food security are influenced by current and emerging political, social, economic, and environmental contexts. To solve these “wicked” dilemmas, disparate public health surveillance efforts are conducted by local, state, and federal agencies. More recently, citizen/community science (CS) monitoring efforts are providing site-specific data. One of the biggest challenges in using these government datasets, let alone incorporating CS data, for a holistic assessment of environmental exposure is data management and interoperability. To facilitate a more holistic perspective and approach to solution generation, we have developed a method to provide a common data model that will allow environmental health researchers working at different scales and research domains to exchange data and ask new questions. We anticipate that this method will help to address environmental health disparities, which are unjust and avoidable, while ensuring CS datasets are ethically integrated to achieve environmental justice. Specifically, we used a transdisciplinary research framework to develop a methodology to integrate CS data with existing governmental environmental monitoring and social attribute data (vulnerability and resilience variables) that span across 10 different federal and state agencies. A key challenge in integrating such different datasets is the lack of widely adopted ontologies for vulnerability and resiliency factors. In addition to following the best practice of submitting new term requests to existing ontologies to fill gaps, we have also created an application ontology, the Superfund Research Project Data Interface Ontology (SRPDIO)

    Social analytics for health integration, intelligence, and monitoring

    Get PDF
    Nowadays, patient-generated social health data are abundant and Healthcare is changing from the authoritative provider-centric model to collaborative and patient-oriented care. The aim of this dissertation is to provide a Social Health Analytics framework to utilize social data to solve the interdisciplinary research challenges of Big Data Science and Health Informatics. Specific research issues and objectives are described below. The first objective is semantic integration of heterogeneous health data sources, which can vary from structured to unstructured and include patient-generated social data as well as authoritative data. An information seeker has to spend time selecting information from many websites and integrating it into a coherent mental model. An integrated health data model is designed to allow accommodating data features from different sources. The model utilizes semantic linked data for lightweight integration and allows a set of analytics and inferences over data sources. A prototype analytical and reasoning tool called “Social InfoButtons” that can be linked from existing EHR systems is developed to allow doctors to understand and take into consideration the behaviors, patterns or trends of patients’ healthcare practices during a patient’s care. The tool can also shed insights for public health officials to make better-informed policy decisions. The second objective is near-real time monitoring of disease outbreaks using social media. The research for epidemics detection based on search query terms entered by millions of users is limited by the fact that query terms are not easily accessible by non-affiliated researchers. Publically available Twitter data is exploited to develop the Epidemics Outbreak and Spread Detection System (EOSDS). EOSDS provides four visual analytics tools for monitoring epidemics, i.e., Instance Map, Distribution Map, Filter Map, and Sentiment Trend to investigate public health threats in space and time. The third objective is to capture, analyze and quantify public health concerns through sentiment classifications on Twitter data. For traditional public health surveillance systems, it is hard to detect and monitor health related concerns and changes in public attitudes to health-related issues, due to their expenses and significant time delays. A two-step sentiment classification model is built to measure the concern. In the first step, Personal tweets are distinguished from Non-Personal tweets. In the second step, Personal Negative tweets are further separated from Personal Non-Negative tweets. In the proposed classification, training data is labeled by an emotion-oriented, clue-based method, and three Machine Learning models are trained and tested. Measure of Concern (MOC) is computed based on the number of Personal Negative sentiment tweets. A timeline trend of the MOC is also generated to monitor public concern levels, which is important for health emergency resource allocations and policy making. The fourth objective is predicting medical condition incidence and progression trajectories by using patients’ self-reported data on PatientsLikeMe. Some medical conditions are correlated with each other to a measureable degree (“comorbidities”). A prediction model is provided to predict the comorbidities and rank future conditions by their likelihood and to predict the possible progression trajectories given an observed medical condition. The novel models for trajectory prediction of medical conditions are validated to cover the comorbidities reported in the medical literature

    Can social norms explain long-term trends in alcohol use? Insights from inverse generative social science

    Get PDF
    Social psychological theory posits entities and mechanisms that attempt to explain observable differences in behavior. For example, dual process theory suggests that an agent's behavior is influenced by intentional (arising from reasoning involving attitudes and perceived norms) and unintentional (i.e., habitual) processes. In order to pass the generative sufficiency test as an explanation of alcohol use, we argue that the theory should be able to explain notable patterns in alcohol use that exist in the population, e.g., the distinct differences in drinking prevalence and average quantities consumed by males and females. In this study, we further develop and apply inverse generative social science (iGSS) methods to an existing agent-based model of dual process theory of alcohol use. Using iGSS, implemented within a multi-objective grammar-based genetic program, we search through the space of model structures to identify whether a single parsimonious model can best explain both male and female drinking, or whether separate and more complex models are needed. Focusing on alcohol use trends in New York State, we identify an interpretable model structure that achieves high goodness-of-fit for both male and female drinking patterns simultaneously, and which also validates successfully against reserved trend data. This structure offers a novel interpretation of the role of norms in formulating drinking intentions, but the structure's theoretical validity is questioned by its suggestion that individuals with low autonomy would act against perceived descriptive norms. Improved evidence on the distribution of autonomy in the population is needed to understand whether this finding is substantive or is a modeling artefact

    Population-level Indicators of Physical Activity, Sedentary Behaviour and Sleep in Canada based on Twitter

    Get PDF
    Social media platforms contain large amounts of freely and publicly available data that could be used to measure population characteristics across different geographical regions. Analyzing public data sources such as social media data has shown promising results for public health measures and monitoring. This thesis addresses challenges in building sys- tems that collect high-volumes of data from social media platforms. More specifically, we look at Twitter data processing, filtering, and aggregation to provide population-level in- dicators of physical activity, sedentary behavior, and sleep (PASS). In the first part of the thesis, we go over the whole machine learning pipeline built: (i) Twitter data collection from November 2017 to May 2018; (ii) data preparation through manual annotation, key- word filtering, and an active learning technique for the labelling of 10,283 tweets; and (iii) training a classifier to identify PASS related tweets. Training the model involves building an initial classifier to efficiently find relevant tweets in subsequent annotation iterations. Our classifiers include an ensemble model consisting of several shallow machine learning algorithms, along with deep learning algorithms. In the second part of the thesis, we look at the performance of different solutions. We provide benchmark results for the task of classifying PASS related tweets for the various algorithms considered. We also derive health indicators by aggregating and computing the proportion of classified tweets by province and compare our metrics with the prevalence of obesity, diabetes and mood disorders from the Canadian Community Health Survey. Our work shows how machine learning can be used to complement public health data and better inform health policy makers to improve the lives of Canadians

    F as in Fat: How Obesity Policies Are Failing in America, 2005

    Get PDF
    Examines national and state obesity rates and government policies. Challenges the research community to focus on major research questions to inform policy decisions, and policymakers to pursue actions to combat the obesity crisis
    • …
    corecore