48 research outputs found
2023 SDSU Data Science Symposium Presentation Abstracts
This document contains abstracts for presentations and posters 2023 SDSU Data Science Symposium
Recommended from our members
Generating Reliable and Responsive Observational Evidence: Reducing Pre-analysis Bias
A growing body of evidence generated from observational data has demonstrated the potential to influence decision-making and improve patient outcomes. For observational evidence to be actionable, however, it must be generated reliably and in a timely manner. Large distributed observational data networks enable research on diverse patient populations at scale and develop new sound methods to improve reproducibility and robustness of real-world evidence. Nevertheless, the problems of generalizability, portability and scalability persist and compound. As analytical methods only partially address bias, reliable observational research (especially in networks) must address the bias at the design stage (i.e., pre-analysis bias) including the strategies for identifying patients of interest and defining comparators.
This thesis synthesizes and enumerates a set of challenges to addressing pre-analysis bias in observational studies and presents mixed-methods approaches and informatics solutions for overcoming a number of those obstacles. We develop frameworks, methods and tools for scalable and reliable phenotyping including data source granularity estimation, comprehensive concept set selection, index date specification, and structured data-based patient review for phenotype evaluation. We cover the research on potential bias in the unexposed comparator definition including systematic background rates estimation and interpretation, and definition and evaluation of the unexposed comparator.
We propose that the use of standardized approaches and methods as described in this thesis not only improves reliability but also increases responsiveness of observational evidence. To test this hypothesis, we designed and piloted a Data Consult Service - a service that generates new on-demand evidence at the bedside. We demonstrate that it is feasible to generate reliable evidence to address clinicians’ information needs in a robust and timely fashion and provide our analysis of the current limitations and future steps needed to scale such a service
Repeatable and reusable research - Exploring the needs of users for a Data Portal for Disease Phenotyping
Background: Big data research in the field of health sciences is hindered by a lack of agreement on how to identify and define different conditions and their medications. This means that researchers and health professionals often have different phenotype definitions for the same condition. This lack of agreement makes it hard to compare different study findings and hinders the ability to conduct repeatable and reusable research. Objective: This thesis aims to examine the requirements of various users, such as researchers, clinicians, machine learning experts, and managers, for both new and existing data portals for phenotypes (concept libraries). Methods: Exploratory sequential mixed methods were used in this thesis to look at which concept libraries are available, how they are used, what their characteristics are, where there are gaps, and what needs to be done in the future from the point of view of the people who use them. This thesis consists of three phases: 1) two qualitative studies, including one-to-one interviews with researchers, clinicians, machine learning experts, and senior research managers in health data science, as well as focus group discussions with researchers working with the Secured Anonymized Information Linkage databank, 2) the creation of an email survey (i.e., the Concept Library Usability Scale), and 3) a quantitative study with researchers, health professionals, and clinicians. Results: Most of the participants thought that the prototype concept library would be a very helpful resource for conducting repeatable research, but they specified that many requirements are needed before its development. Although all the participants stated that they were aware of some existing concept libraries, most of them expressed negative perceptions about them. The participants mentioned several facilitators that would encourage them to: 1) share their work, such as receiving citations from other researchers; and 2) reuse the work of others, such as saving a lot of time and effort, which they frequently spend on creating new code lists from scratch. They also pointed out several barriers that could inhibit them from: 1) sharing their work, such as concerns about intellectual property (e.g., if they shared their methods before publication, other researchers would use them as their own); and 2) reusing others' work, such as a lack of confidence in the quality and validity of their code lists. Participants suggested some developments that they would like to see happen in order to make research that is done with routine data more reproducible, such as the availability of a drive for more transparency in research methods documentation, such as publishing complete phenotype definitions and clear code lists. Conclusions: The findings of this thesis indicated that most participants valued a concept library for phenotypes. However, only half of the participants felt that they would contribute by providing definitions for the concept library, and they reported many barriers regarding sharing their work on a publicly accessible platform such as the CALIBER research platform. Analysis of interviews, focus group discussions, and qualitative studies revealed that different users have different requirements, facilitators, barriers, and concerns about concept libraries. This work was to investigate if we should develop concept libraries in Kuwait to facilitate the development of improved data sharing. However, at the end of this thesis the recommendation is this would be unlikely to be cost effective or highly valued by users and investment in open access research publications may be of more value to the Kuwait research/academic community
Recommended from our members
Phenotyping with Partially Labeled, Partially Observed Data
Identifying a group of individuals that share a common set of characteristics is a conceptually simple task, which is often difficult in practice. Such phenotyping problems emerge in various settings, including the analysis of clinical data. In this setting, phenotyping is often stymied by persistent data quality issues. These include a lack of reliable labels to indicate the presence of absence of characteristics of interest, and significant missingness in observed variables.
This dissertation introduces methods for learning phenotypes when the data contain missing values (partially observed) and labels are scarce (partially labeled). Aim 1 utilizes an unsupervised probabilistic graphical model to learn phenotypes from partially observed data. Aim 2 introduces a related semi-supervised probabilistic graphical model for learning phenotypes from partially labeled clinical data. Finally, Aim 3 describes a method for training deep generative models when the training data contain missing values. The algorithm is then applied in a semi-supervised setting where it accounts for partially labeled data as well
2023 SDSU Data Science Symposium Presentation Abstracts
This document contains abstracts for presentations and posters 2023 SDSU Data Science Symposium
Towards Artificial General Intelligence (AGI) in the Internet of Things (IoT): Opportunities and Challenges
Artificial General Intelligence (AGI), possessing the capacity to comprehend,
learn, and execute tasks with human cognitive abilities, engenders significant
anticipation and intrigue across scientific, commercial, and societal arenas.
This fascination extends particularly to the Internet of Things (IoT), a
landscape characterized by the interconnection of countless devices, sensors,
and systems, collectively gathering and sharing data to enable intelligent
decision-making and automation. This research embarks on an exploration of the
opportunities and challenges towards achieving AGI in the context of the IoT.
Specifically, it starts by outlining the fundamental principles of IoT and the
critical role of Artificial Intelligence (AI) in IoT systems. Subsequently, it
delves into AGI fundamentals, culminating in the formulation of a conceptual
framework for AGI's seamless integration within IoT. The application spectrum
for AGI-infused IoT is broad, encompassing domains ranging from smart grids,
residential environments, manufacturing, and transportation to environmental
monitoring, agriculture, healthcare, and education. However, adapting AGI to
resource-constrained IoT settings necessitates dedicated research efforts.
Furthermore, the paper addresses constraints imposed by limited computing
resources, intricacies associated with large-scale IoT communication, as well
as the critical concerns pertaining to security and privacy
Mining Behavioral Patterns from Mobile Big Data
Mobile devices connected to the Internet are a ubiquitous platform that can easily record a large amount of data describing human behavior. Specifically, the data collected from mobile devices --- referred to as mobile big data reveal important social and economic information. Therefore, analyzing mobile big data is valuable for several stakeholders, ranging from smartphone manufacturers to network operators and app developers.
This thesis aims to discover and understand behavioral patterns from mobile big data based on large real-world datasets. Specifically, this thesis reveals patterns from three domains: people, time, and location. First, we explore mobile big data from the people domain and propose a framework to discover users' daily activity patterns from their mobile app usage. By applying the framework to a real-world dataset consisting of 653,092 users, we successfully extract five common patterns among millions of people, including commuting, pervasive socializing, nightly entertainment, afternoon reading, and nightly socializing. Second, still from the people domain, we derive group health conditions by using their smartphone usage data. In particular, we collect mobile usage records of 452 users in North America. We then demonstrate the potential for inferring group health conditions (i.e., COVID-19 outbreak stages) by leveraging less privacy-sensitive smartphone data, including CPU usage, memory usage, and network connections. Third, we mine the behavior patterns from the time domain. We reveal the evolution of mobile app usage by conducting a longitudinal study on 1,465 users from 2012 to 2017. The results show that users' app usage significantly changes over time. However, the evolution in app-category usage and individual app usage are different in terms of popularity distribution, usage diversity, and correlations. Last, with respect to the location domain, we leverage city-scale spatiotemporal mobile app usage data to reveal urban land usage patterns. We prove the strong correlation between mobile usage behavior and location features, which brings a new angle to urban analytics.Internetiin kytketyt mobiililaitteet ovat kaikkialla läsnä oleva alusta, joka voi helposti tallentaa suuren määrän tietoja, jotka kuvaavat ihmisen käyttäytymistä. Erityisesti mobiililaitteista kerätyt tiedot, joita kutsutaan mobiiliksi massadataksi (big data), paljastavat tärkeitä sosiaalisia ja taloudellisia tietoja. Siksi mobiilin massadatan analysointi on arvokasta useille sidosryhmille älypuhelinvalmistajista verkko-operaattoreihin ja sovelluskehittäjiin.
Tämän väitöskirjan tavoitteena on löytää ja ymmärtää käyttäytymismalleja mobiilista massadatasta, joka perustuu suuriin reaalimaailman tietojoukkoihin. Erityisesti tämä väitöskirja tuottaa malleja kolmelta eri alueelta: ihmisiin, aikaan ja sijaintiin liittyen. Ensinnäkin tutkimme mobiilia massadataa ihmisiin liittyen ja ehdotamme viitekehystä, jonka avulla voidaan löytää käyttäjien päivittäisiä toimintamalleja heidän mobiilisovellustensa käytön perusteella. Soveltamalla tätä viitekehystä tosielämän tietojoukkoon, joka koostuu 653 092 käyttäjästä, löysimme onnistuneesti viisi yleistä mallia miljoonien ihmisten tiedoista, joihin kuuluivat mm. tiedot työmatkoista, sosiaalisista kontakteista, yöllisestä viihteestä, iltapäivän lukemisesta ja yöllisestä seurustelusta. Toiseksi, edelleen ihmisiin liittyen, johdamme tietoja ryhmien terveysolosuhteista käyttämällä heidän älypuhelintensa käyttötietoja. Keräsimme erityisesti 452 käyttäjän mobiilikäyttötietoja Pohjois-Amerikassa. Sitten osoitamme, että on mahdollista päätellä ryhmän terveysolosuhteet (eli COVID-19-epidemiavaiheet) hyödyntämällä vähemmän yksityisyyden kannalta arkoja älypuhelintietoja, mukaan lukien suorittimen käyttö, muistin käyttö ja verkkoyhteydet. Kolmanneksi louhimme käyttäytymismalleja aikaan liittyen. Paljastamme mobiilisovellusten käytön kehityksen tekemällä pitkittäistutkimuksen 1 465 käyttäjälle vuosina 2012–2017. Tulokset osoittavat, että käyttäjien sovellusten käyttö muuttuu merkittävästi ajan myötä. Sovellusluokan käytön ja yksittäisten sovellusten käytön kehitys on kuitenkin erilainen niiden suosion jakautumisen, käytön moninaisuuden ja korrelaatioiden suhteen. Lopuksi liittyen sijaintitietoihin hyödynnämme spatiotemporaalisten mobiilisovellusten käyttötietoja suurkaupunkitasolla paljastaaksemme kaupunkien maankäyttömallit. Todistamme vahvan korrelaation mobiililaitteiden käyttöön liittyvän käyttäytymisen ja sijaintiominaisuuksien välillä, mikä tuottaa uuden näkökulman kaupunkianalytiikkaan
Modern Views of Machine Learning for Precision Psychiatry
In light of the NIMH's Research Domain Criteria (RDoC), the advent of
functional neuroimaging, novel technologies and methods provide new
opportunities to develop precise and personalized prognosis and diagnosis of
mental disorders. Machine learning (ML) and artificial intelligence (AI)
technologies are playing an increasingly critical role in the new era of
precision psychiatry. Combining ML/AI with neuromodulation technologies can
potentially provide explainable solutions in clinical practice and effective
therapeutic treatment. Advanced wearable and mobile technologies also call for
the new role of ML/AI for digital phenotyping in mobile mental health. In this
review, we provide a comprehensive review of the ML methodologies and
applications by combining neuroimaging, neuromodulation, and advanced mobile
technologies in psychiatry practice. Additionally, we review the role of ML in
molecular phenotyping and cross-species biomarker identification in precision
psychiatry. We further discuss explainable AI (XAI) and causality testing in a
closed-human-in-the-loop manner, and highlight the ML potential in multimedia
information extraction and multimodal data fusion. Finally, we discuss
conceptual and practical challenges in precision psychiatry and highlight ML
opportunities in future research