5 research outputs found

    Toward a Collaborative Platform for Hybrid Designs Sharing a Common Cohort

    Get PDF
    This doctoral thesis binds together four included papers in a thematical whole and is simultaneously an independent work proposing a platform facilitating epidemiological research. Population-based prospective cohort studies typically recruit a relatively large group of participants representative of a studied population and follow them over years or decades. This group of participants is called a cohort. As part of the study, the participants may be asked to answer extensive questionnaires, undergo medical examinations, donate blood samples, and participate in several rounds of follow-ups. The collected data can also include information from other sources, such as health registers. In prospective cohort studies, the participants initially do not have the investigated diagnoses, but statistically, a certain percentage will be diagnosed with a disease yearly. The studies enable the researchers to investigate how those who got a disease differ from those who did not. Often, many new studies can be nested within a cohort study. Data for a subgroup of the cohort is then selected and analyzed. A new study combined with an existing cohort is said to have a hybrid design. When a research group uses the same cohort as a basis for multiple new studies, these studies often have similarities regarding the workflow for designing the study and analysis. The thesis shows the potential for developing a platform encouraging the reuse of work from previous studies and systematizing the study design workflows to enhance time efficiency and reduce the risk of errors. However, the study data are subject to strict acts and regulations pertaining to privacy and research ethics. Therefore, the data must be stored and accessed within a secured IT environment where researchers log in to conduct analyses, with minimal possibilities to install analytics software not already provided by default. Further, transferring the data from the secured IT environment to a local computer or a public cloud is prohibited. Nevertheless, researchers can usually upload and run script files, e.g., written in R and run in R-studio. A consequence is that researchers - often having limited software engineering skills - may rely mainly on self-written code for their analyses, possibly unsystematically developed with a high risk of errors and reinventing solutions solved in preceding studies within the group. The thesis makes a case for a platform providing a collaboration software as a service (SaaS) addressing the challenges of the described research context and proposes its architecture and design. Its main characteristic, and contribution, is the separation of concerns between the SaaS, which operates independently of the data, and a secured IT environment where data can be accessed and analyzed. The platform lets the researchers define the data analysis for the study using the cloud-based software, which is then automatically transformed into an executable version represented as source code in a scripting language already supported by the secure environment where the data resides. The author has not found systems solving the same problem similarly. However, the work is informed by cloud computing, workflow management systems, data analysis pipelines, low-code, no-code, and model-driven development

    The Beauty of Complex Designs

    Get PDF
    The increasing use of omics data in epidemiology enables many novel study designs, but also introduces challenges for data analysis. We describe the possibilities for systems epidemiological designs in the Norwegian Women and Cancer (NOWAC) study and show how the complexity of NOWAC enables many beautiful new study designs. We discuss the challenges of implementing designs and analyzing data. Finally, we propose a systems architecture for swift design and exploration of epidemiological studies

    Autostrata: Improved Automatic Stratification for Coarsened Exact Matching

    Get PDF
    We commonly adjust for confounding factors in analytical observational epidemiologyto reduce biases that distort the results. Stratification and matching are standard methods for reducing confounder bias. Coarsened exact matching (CEM) is a recent method using stratification to coarsen variables into categorical variables to enable exact matching of exposed and nonexposed subjects. CEM’s standard approach to stratifying variables is histogram binning. However, histogram binning creates strata of uniformwidths and does not distinguish between exposed and nonexposed. We present Autostrata, a novel algorithmic approach to stratification producing improved results in CEM and providing more control to the researcher

    Greedy knot selection algorithm for restricted cubic spline regression

    Get PDF
    Non-linear regression modeling is common in epidemiology for prediction purposes or estimating relationships between predictor and response variables. Restricted cubic spline (RCS) regression is one such method, for example, highly relevant to Cox proportional hazard regression model analysis. RCS regression uses third-order polynomials joined at knot points to model non-linear relationships. The standard approach is to place knots by a regular sequence of quantiles between the outer boundaries. A regression curve can easily be fitted to the sample using a relatively high number of knots. The problem is then overfitting, where a regression model has a good fit to the given sample but does not generalize well to other samples. A low knot count is thus preferred. However, the standard knot selection process can lead to underperformance in the sparser regions of the predictor variable, especially when using a low number of knots. It can also lead to overfitting in the denser regions. We present a simple greedy search algorithm using a backward method for knot selection that shows reduced prediction error and Bayesian information criterion scores compared to the standard knot selection process in simulation experiments. We have implemented the algorithm as part of an open-source R-package, knutar

    Swirlwave. Cloudless wide area friend-to-friend networking middleware for smartphones

    Get PDF
    Swirlwave is a middleware that enables peer-to-peer and distributed computing for Internet-connected devices with the following characteristics: The devices lack publicly reachable IP addresses, they can be expected to disconnect from the network for periods of time, and they frequently change network locations. This is the typical case for smartphones. The middleware fits into the friend-to-friend subcategory of peer-to-peer systems, meaning that the overlay network is built on top of already existing trust relationships among its users. It is independent of clouds and application servers, it has built in encryption for confidentiality and authentication, and it aims to be easily extensible for new applications. The solution described in the thesis was implemented for smartphones running the Android operating system, but its principles are not limited to this
    corecore