23 research outputs found

    DISCO-SCA and Properly Applied GSVD as Swinging Methods to Find Common and Distinctive Processes

    Get PDF
    BACKGROUND: In systems biology it is common to obtain for the same set of biological entities information from multiple sources. Examples include expression data for the same set of orthologous genes screened in different organisms and data on the same set of culture samples obtained with different high-throughput techniques. A major challenge is to find the important biological processes underlying the data and to disentangle therein processes common to all data sources and processes distinctive for a specific source. Recently, two promising simultaneous data integration methods have been proposed to attain this goal, namely generalized singular value decomposition (GSVD) and simultaneous component analysis with rotation to common and distinctive components (DISCO-SCA). RESULTS: Both theoretical analyses and applications to biologically relevant data show that: (1) straightforward applications of GSVD yield unsatisfactory results, (2) DISCO-SCA performs well, (3) provided proper pre-processing and algorithmic adaptations, GSVD reaches a performance level similar to that of DISCO-SCA, and (4) DISCO-SCA is directly generalizable to more than two data sources. The biological relevance of DISCO-SCA is illustrated with two applications. First, in a setting of comparative genomics, it is shown that DISCO-SCA recovers a common theme of cell cycle progression and a yeast-specific response to pheromones. The biological annotation was obtained by applying Gene Set Enrichment Analysis in an appropriate way. Second, in an application of DISCO-SCA to metabolomics data for Escherichia coli obtained with two different chemical analysis platforms, it is illustrated that the metabolites involved in some of the biological processes underlying the data are detected by one of the two platforms only; therefore, platforms for microbial metabolomics should be tailored to the biological question. CONCLUSIONS: Both DISCO-SCA and properly applied GSVD are promising integrative methods for finding common and distinctive processes in multisource data. Open source code for both methods is provided

    TMEM107 recruits ciliopathy proteins to subdomains of the ciliary transition zone and causes Joubert syndrome

    Get PDF
    The transition zone (TZ) ciliary subcompartment is thought to control cilium composition and signalling by facilitating a protein diffusion barrier at the ciliary base. TZ defects cause ciliopathies such as Meckel–Gruber syndrome (MKS), nephronophthisis (NPHP) and Joubert syndrome1 (JBTS). However, the molecular composition and mechanisms underpinning TZ organization and barrier regulation are poorly understood. To uncover candidate TZ genes, we employed bioinformatics (coexpression and co-evolution) and identified TMEM107 as a TZ protein mutated in oral–facial–digital syndrome and JBTS patients. Mechanistic studies in Caenorhabditis elegans showed that TMEM-107 controls ciliary composition and functions redundantly with NPHP-4 to regulate cilium integrity, TZ docking and assembly of membrane to microtubule Y-link connectors. Furthermore, nematode TMEM-107 occupies an intermediate layer of the TZ-localized MKS module by organizing recruitment of the ciliopathy proteins MKS-1, TMEM-231 (JBTS20) and JBTS-14 (TMEM237). Finally, MKS module membrane proteins are immobile and super-resolution microscopy in worms and mammalian cells reveals periodic localizations within the TZ. This work expands the MKS module of ciliopathy-causing TZ proteins associated with diffusion barrier formation and provides insight into TZ subdomain architecture

    Simultaneous component methods to identify common and distinctive mechanisms underlying linked data

    No full text
    Vaak worden gelijktijdig verschillende soorten informatie verzameld over eenzelfde groep van entiteiten. Een voorbeeld van entiteiten zijn items van een vragenlijst en verschillende soorten informatie die men hierover kan verzamelen zijn antwoorden van respondenten afkomstig uit verschillende culturen. Entiteiten kunnen bijvoorbeeld ook verwijzen naar participanten waarover men verschillende soorten informatie kan inzamelen door verschillende vragenlijsten van hen af te nemen. In al deze gevallen kan de verzamelde informatie worden gegroepeerd in verschillende data blokken. In het eerste geval krijgt men dan, voor iedere cultuur, één respondent bij item data blok, en in het tweede geval één participant bij item data blok per vragenlijst. Omdat de informatie in de verschillende data blokken betrekking heeft op dezelfde groep van entiteiten, worden dit ook gekoppelde data genoemd. In het eerste geval zijn de data blokken gekoppeld via de items, in het tweede geval via de respondenten. Tal van onderzoeksvragen kunnen geassocieerd worden met gekoppelde data. In dit proefschrift wordt de nadruk gelegd op het zoeken naar mechanismen die de variatie in de data kunnen verklaren. Hiervoor wordt een beroep gedaan op de familie van methoden voor Simultane Componenten Analyse (SCA). In SCA worden de data blokken ontbonden in een beperkt aantal componenten die een maximale hoeveelheid van de variantie in de data verklaren. Bij het op zoek gaan naar achterliggende mechanismen kan men zich in een verdere stap afvragen welke mechanismen gemeenschappelijk zijn voor alle data blokken, en welke specifiek zijn voor één of enkele ervan. Traditionele SCA methoden bieden hierop geen adequaat antwoord omdat ze componenten opleveren die een mix bevatten van gemeenschappelijke en specifieke informatie. In dit proefschrift wordt dit probleem aangepakt doorheen vijf hoofdstukken. In Hoofdstuk 1 nemen we het begrip variantie verklaard door een simultane component in een data blok onder de loep, aangezien dit begrip, op eerste zicht, uitermate geschikt lijkt te zijn voor het formaliseren van gemeenschappelijke en specifieke mechanismen. We bewijzen dat, in vele gevallen, de bepaling van de variantie die een simultane component uniek verklaart in een data blok problematisch is, tenzij er enkele mathematische condities vervuld zijn.In Hoofdstuk 2 presenteren we een eerste methode om gemeenschappelijke en specifieke informatie te ontwarren in gekoppelde data. Deze methode maakt gebruik van de rotationele vrijheid van een SCA model, en verloopt in twee stappen: Eerst worden de gekoppelde data geanalyseerd met SCA; vervolgens worden de bekomen simultane componenten geroteerd op een speciale manier om de gemeenschappelijke en specifieke informatie te ontwarren. We karakteriseren deze methode als een zachte methode omdat er met deze methode geen model fit verloren gaat. In Hoofdstuk 3, presenteren we een software pakket waarmee het hele data-analytisch proces van de methode vermeld in Hoofdstuk 2 kan worden uitgevoerd. Naast zijn gebruiksvriendelijkheid, vergemakkelijkt dit software pakket de keuze van model parameters zoals het aantal mechanismen en hun status als gemeenschappelijk of specifiek. Verder biedt het de mogelijkheid om de data op verschillende manier voor te bewerken. Zowel een alleenstaande versie van het software pakket, als een MATLAB versie zijn vrij beschikbaar.In sommige gevallen kan het verder aangewezen zijn om extra beperkingen op te leggen aan de componentenstructuur om zo de gemeenschappelijke en specifieke componenten uit elkaar te kunnen trekken. In vergelijking met de eerste methode, zal de interpretatie van de componenten daardoor verbeteren. Echter, het opleggen van extra beperkingen zal ook leiden tot een verlies in model fit. Om een optimale balans te vinden tussen verlies in fit en winst in interpreteerbaarheid, presenteren we in Hoofdstuk 4 een uitbreiding van de eerste methode die de mogelijkheid biedt om extra beperkingen op te leggen met een, door de gebruiker bepaald, niveau van strengheid. Als zodanig wordt deze methode gekarakteriseerd als half-zacht/hard. Tenslotte, in de tot dusver gepresenteerde methoden ligt de primaire focus op het vinden van specifieke componenten, terwijl de gemeenschappelijke componenten eerder een residuele status hebben. Soms is het echter erg belangrijk om gemeenschappelijke componenten op te sporen. Daarom presenteren we in Hoofdstuk 5 een derde methode die het vinden van gemeenschappelijke mechanismen als primaire focus heeft. Hiervoor maken we gedeeltelijk gebruik van ideeën uit een regressiecontext.nrpages: 156status: publishe

    The Impact of a Change in Employment on Three Work-Related Diseases: A Retrospective Longitudinal Study of 10,530 Belgian Employees

    Get PDF
    Background: The literature that has investigated to what extent a change in employment contributes to good health is contradictory or shows inconsistent results. The aim of this study was to investigate whether an association exists between a change in employment and cardiovascular, musculoskeletal and neuropsychological diseases in a sample of 10,530 Belgian workers in a seven-year follow-up study period. Methods: The following factors were analysed: Demographic variables, a change in employment and the work-related risks. Individuals being on medication for cardiovascular, musculoskeletal, and neuropsychological diseases were used as proxies for the three health issues. Logistic regression models for autocorrelated data with repeated measures were used to examine each medication type. Results: A change in employment and psychosocial load can have an important eect on the health of cardiovascular employees. Demographic variables, such as BMI and age, are risk factors for all three medications. Repetitive, manual tasks, handling static, exposure to noise levels of 87 dB, mechanical and/or manual handling with loads, and shift work were found to be positively associated with medications taken for musculoskeletal diseases. Exposure to noise 80 dB(A), managing physical loads and night work were found to be associated with being on medication for neuropsychological diseases. Physical activity and skill levels were considered to be protective factors for being on medication for neuropsychological diseases. Conclusions: Change in employment and psychosocial load were found as two important risk factors for being on medication for cardiovascular (CVD). Dealing with loads, doing shift work and being daily exposed to the noise of 87 dB correlated with being on medication for musculoskeletal (MSD). Dealing with physical loads, doing night work and being exposed to the noise of 80 dB were risk factors for being on medication for neuropsychological (NPD). While doing physical activity and reporting higher skill levels were found to be protective factors for NPD

    Observational Surveillance Approach to Detect Novel Work-Related Diseases and Hazards: An Application to a Belgian Occupational Health and Safety Database.

    No full text
    International audienceRapid changes in working conditions give rise to new occupational health risks. We applied the Spectrosome approach, a network-based analysis, to investigate associations between disease and multiple occupational exposures

    SCA with rotation to distinguish common and distinctive information in linked data

    No full text
    Often data are collected that consist of different blocks that all contain information about the same entities (e. g., items, persons, or situations). In order to unveil both information that is common to all data blocks and information that is distinctive for one or a few of them, an integrated analysis of the whole of all data blocks may be most useful. Interesting classes of methods for such an approach are simultaneous-component and multigroup factor analysis methods. These methods yield dimensions underlying the data at hand. Unfortunately, however, in the results from such analyses, common and distinctive types of information are mixed up. This article proposes a novel method to disentangle the two kinds of information, by making use of the rotational freedom of component and factor models. We illustrate this method with data from a cross-cultural study of emotions

    Performing DISCO-SCA to search for distinctive and common information in linked data

    No full text
    Behavioral researchers often obtain information about the same set of entities from different sources. A main challenge in the analysis of such data is to reveal, on the one hand, the mechanisms underlying all of the data blocks under study and, on the other hand, the mechanisms underlying a single data block or a few such blocks only (i.e., common and distinctive mechanisms, respectively). A method called DISCO-SCA has been proposed by which such mechanisms can be found. The goal of this article is to make the DISCO-SCA method more accessible, in particular for applied researchers. To this end, first we will illustrate the different steps in a DISCO-SCA analysis, with data stemming from the domain of psychiatric diagnosis. Second, we will present in this article the DISCO-SCA graphical user interface (GUI). The main benefits of the DISCO-SCA GUI are that it is easy to use, strongly facilitates the choice of model selection parameters (such as the number of mechanisms and their status as being common or distinctive), and is freely available. Keywords: Common and distinctive, Simultaneous component analysis, Rotation, Linked data, Graphical user interfac

    The long-term effect of job mobility on workers’ mental health: a propensity score analysis

    No full text
    Objectives The main purpose of this longitudinal study was to elucidate the impact of external job mobility, due to a change of employer, on mental health. Methods A cohort of Belgian employees from the IDEWE occupational medicine registry was followed-up for twenty-seven years, from 1993 to 2019. The use of drugs for neuropsychological diseases was considered as an objective indicator of mental health. The covariates were related to demographic, physical, behavioural characteristics, occupational and work-related risks. Propensity scores were calculated with a Cox regression model with time-varying covariates. The PS matching was used to eliminate the systematic differences in subjects’ characteristics and to balance the covariates’ distribution at every time point. Results The unmatched sample included 11,246 subjects, with 368 (3.3%) that changed their job during the baseline year and 922 (8.2%) workers that left their employer during the follow-up. More than half of the matched sample were males, were aged less than 38 years old, did not smoke, were physically active, and normal weighted, were not exposed to shift-work, noise, job strain or physical load. A strong association between job mobility and neuropsychological treatment was found in the matched analysis (HR = 2.065, 95%CI = 1.397–3.052, P-value < 0.001) and confirmed in the sensitivity analysis (HR of 2.012, 95%CI = 1.359–2.979, P-value < 0.001). Furthermore, it was found a protective role of physical activity and a harmful role of job strain on neuropsychological treatment. Conclusions Our study found that workers with external job mobility have a doubled risk of treatment with neuropsychological medication, compared to workers without job mobility

    Scores of the 28 <i>E. coli</i> samples on the DISCO-SCA components.

    No full text
    <p>Scores of the 28 samples on the components. The first two columns describe the experimental design; the next two the scores on the distinctive LC components; the fifth and sixth the scores on the distinctive GC components; and the last the scores on the common component.</p

    Proportion of variance accounted for the comparative genomics data by DISCO-SCA, by the GSVD, and by the adapted GSVD.

    No full text
    <p>Proportion of variance accounted for the comparative genomics data by DISCO-SCA (with 3 common components: C1, C4, C5; one human component: C3; and one yeast component: C2), and by the GSVD and adapted GSVD. The components are ordered to have maximum congruence between the different analysis methods.</p
    corecore