7 research outputs found

    On computational methods for spatial mapping of thehuman proteome

    No full text
    Proteins are complex molecules that are involved in almost every task in the body. In general, the role a protein fulfills is highly dependent on where in the cell it is located, its subcellular localization. In order to understand human biology, it is therefore imperative to gain insight into the world of proteins by examining their subcellular distribution and interaction with each other. This thesis focuses on the development of computational models capable of performing large scale spatial protein analysis on a subcellular level. Within that scope, we were able to develop models that classify the localization of proteins in immunofluorescence microscopy images as well as show how such models can integrate with other methods to gain novel insights and understanding into the roles and spatially dependent functions of proteins.  In Paper I, we present and combine two separate methods for large scale protein localization. The first method is an integration of a protein localization task as a mini-game within an established massively multiplayer online video game. The second method consists of the first image-based deep neural network learning model capable of multi-label subcellular localization classification. We show that both these methods enable accurate and scalable high-throughput analysis of subcellular protein localization that overcome many of the challenges associated with such a dataset. We also show that combining the two methods yield better results than either of them do on their own, resulting in a model that is nearing human performance.  In Paper II, based on the success of the neural network model from Paper I, we continue the investigation into usage of deep neural networks for the purpose of subcellular protein localization. In an effort to find the best possible model for such tasks, a machine learning image competition was developed. Over 2,000 teams participated with various kinds of architectures, resulting in a predictor that far outperforms the one presented in Paper I. The winning model is analyzed thoroughly, and we show that its internal feature representation contains biologically relevant information and that it can be used for quantitative analysis of protein patterns.  Paper III takes the feature representation of immunofluorescence images from the model developed in Paper II and integrates it with features extracted from affinity purification experiments to create a hierarchical map of the human cell’s architecture. This method creates a map of protein communities grouped by subcellular structures, of which approximately 54% are putatively novel. We show that the map is biologically significant by validating several of the novel findings using affinity purification experiments and in-situ fractionation.  In Paper IV, we apply what was learned in Paper I and II in order to create a model that identifies proteins residing within micronuclei. We apply the model on the image data from the Human Protein Atlas to create the first extensive mapping of the micronuclear proteome. Through enrichment analysis of the identified proteins, we propose that micronuclei harbor a more diverse set of functions than previously thought. We find that the micronuclear proteome is highly interconnected and contains many proteins that show visible variations across different micronuclei, and theorize on what this means for their role in the cell. In conclusion, Paper I and II examine and establish the possibilities of using deep neural networks for systematic subcellular protein localization analysis. Paper III and IV build upon what was learned in Papers I and II and use their models to examine protein distribution patterns and provide novel biological insights.Proteiner Ă€r komplexa molekyler som Ă€r inblandade i nĂ€ra nog varje kroppslig funktion. Överlag Ă€r ett proteins roll högst beroende av var i cellen det befinner sig, dess subcellulĂ€ra lokalisation. För att förstĂ„ mĂ€nsklig biologi Ă€r det dĂ€rför nödvĂ€ndigt att fĂ„ insikt i proteinernas vĂ€rld genom att undersöka deras subcellulĂ€ra distribution och hur de interagerar med varandra. Den hĂ€r avhandlingen fokuserar pĂ„ utvecklandet av datormodeller kapabla att genomföra storskalig spatiell proteinanalys pĂ„ en subcellulĂ€r nivĂ„. Inom detta tillĂ€mpningsomrĂ„de kunde vi utveckla modeller för att klassificera lokaliseringen av proteiner i immunofluorescensmikroskopibilder och visa hur sĂ„dana modeller kan interagera med andra metoder för nya insikter i proteiners roller och deras rumsberoende funktioner. I Artikel I presenterar vi och kombinerar tvĂ„ separata metoder för storskalig proteinlokalisering. Den första metoden Ă€r en integration av en proteinlokaliseringsuppgift som ett minispel i ett etablerat massivt onlinespel. Den andra metoden bestĂ„r av den första bildbaserade djupa neuralnĂ€tverksmodellen kapabel att multietikettklassificera subcellulĂ€r proteinlokalisering. Vi visar att bĂ„da metoderna gör det möjligt att genomföra precisa och skalbara analyser av subcellulĂ€r proteinlokalisering, med hög genomströmning, som överkommer mĂ„nga av de svĂ„righeter som Ă€r associerade med sĂ„dana dataset. Vi visar ocksĂ„ att en kombination av de tvĂ„ metoderna producerar bĂ€ttre resultat Ă€n var metod gör för sig och resulterar i en modell som nĂ€rmar sig mĂ€nsklig prestanda. I Artikel II fortsĂ€tter vi, baserat pĂ„ framgĂ„ngen med Artikel I:s neuralnĂ€tverksmodell, undersöka anvĂ€ndningen av djupa neuralnĂ€tverk för subcellulĂ€r proteinlokalisering. I ett försök att hitta den bĂ€sta möjliga modellen för sĂ„dana uppgifter utvecklade vi en bildbaserad maskininlĂ€rningstĂ€vling. Över 2.000 lag deltog med olika typer av arkitekturer, vilket resulterade i en prediktor som lĂ„ngt övertrĂ€ffar den som presenterades i Artikel I. Den vinnande modellen blir noggrant analyserad och vi visar att dess interna numeriska representation innehĂ„ller biologiskt relevant information samt att dessa kan anvĂ€ndas för kvantiativ analys av proteinmönster. Artikel III anvĂ€nder den numeriska representationen av immunofluorescensbilder frĂ„n modellen utvecklad i Artikel II och integrerar den med en numerisk representation extraherad frĂ„n affinitetsreningsexperiment för att skapa en hierarkisk karta över den mĂ€nskliga cellens arkitektur. Denna metod gör en kartlĂ€ggning över grupper av proteiner, av vilka cirka 54% av grupperna Ă€r förmodat nya. Vi visar att kartlĂ€ggningen Ă€r biologiskt signifikant genom att validera ett flertal av de nya upptĂ€ckterna med affinitetsreningsexperiment och insitu fraktionering. I Artikel IV applicerar vi vad vi lĂ€rt oss frĂ„n Artikel I och II för att skapa en modell som identifierar proteiner som befinner sig i mikrokĂ€rnor. Vi applicerar modellen pĂ„ bilddata frĂ„n Human Protein Atlas för att skapa den första omfattande kartlĂ€ggningen av mikrokĂ€rneproteomet. Med hjĂ€lp av anrikningsanalys föreslĂ„r vi att mikrokĂ€rnor har en mer mĂ„ngfaldig funktionalitet Ă€n vad som tidigare har antagits. Vi finner att mikrokĂ€rneproteomet Ă€r starkt sammanlĂ€nkat samt innehĂ„ller mĂ„nga proteiner som uppvisar variation mellan olika mikrokĂ€rnor och diskuterar vad detta betyder för deras roll i cellen. Sammanfattat, Artikel I och II undersöker och etablerar möjligheterna för anvĂ€ndning av djupa neuralnĂ€tverk för systematisk subcellulĂ€r proteinlokaliseringsanalys. Artikel III och IV bygger vidare pĂ„ vad vi lĂ€rt oss i Artikel I och II och anvĂ€nder deras modeller för att undersöka proteindistributionsmönster och förser oss med nya biologiska insikter.QC 2022-11-17</p

    On computational methods for spatial mapping of thehuman proteome

    No full text
    Proteins are complex molecules that are involved in almost every task in the body. In general, the role a protein fulfills is highly dependent on where in the cell it is located, its subcellular localization. In order to understand human biology, it is therefore imperative to gain insight into the world of proteins by examining their subcellular distribution and interaction with each other. This thesis focuses on the development of computational models capable of performing large scale spatial protein analysis on a subcellular level. Within that scope, we were able to develop models that classify the localization of proteins in immunofluorescence microscopy images as well as show how such models can integrate with other methods to gain novel insights and understanding into the roles and spatially dependent functions of proteins.  In Paper I, we present and combine two separate methods for large scale protein localization. The first method is an integration of a protein localization task as a mini-game within an established massively multiplayer online video game. The second method consists of the first image-based deep neural network learning model capable of multi-label subcellular localization classification. We show that both these methods enable accurate and scalable high-throughput analysis of subcellular protein localization that overcome many of the challenges associated with such a dataset. We also show that combining the two methods yield better results than either of them do on their own, resulting in a model that is nearing human performance.  In Paper II, based on the success of the neural network model from Paper I, we continue the investigation into usage of deep neural networks for the purpose of subcellular protein localization. In an effort to find the best possible model for such tasks, a machine learning image competition was developed. Over 2,000 teams participated with various kinds of architectures, resulting in a predictor that far outperforms the one presented in Paper I. The winning model is analyzed thoroughly, and we show that its internal feature representation contains biologically relevant information and that it can be used for quantitative analysis of protein patterns.  Paper III takes the feature representation of immunofluorescence images from the model developed in Paper II and integrates it with features extracted from affinity purification experiments to create a hierarchical map of the human cell’s architecture. This method creates a map of protein communities grouped by subcellular structures, of which approximately 54% are putatively novel. We show that the map is biologically significant by validating several of the novel findings using affinity purification experiments and in-situ fractionation.  In Paper IV, we apply what was learned in Paper I and II in order to create a model that identifies proteins residing within micronuclei. We apply the model on the image data from the Human Protein Atlas to create the first extensive mapping of the micronuclear proteome. Through enrichment analysis of the identified proteins, we propose that micronuclei harbor a more diverse set of functions than previously thought. We find that the micronuclear proteome is highly interconnected and contains many proteins that show visible variations across different micronuclei, and theorize on what this means for their role in the cell. In conclusion, Paper I and II examine and establish the possibilities of using deep neural networks for systematic subcellular protein localization analysis. Paper III and IV build upon what was learned in Papers I and II and use their models to examine protein distribution patterns and provide novel biological insights.Proteiner Ă€r komplexa molekyler som Ă€r inblandade i nĂ€ra nog varje kroppslig funktion. Överlag Ă€r ett proteins roll högst beroende av var i cellen det befinner sig, dess subcellulĂ€ra lokalisation. För att förstĂ„ mĂ€nsklig biologi Ă€r det dĂ€rför nödvĂ€ndigt att fĂ„ insikt i proteinernas vĂ€rld genom att undersöka deras subcellulĂ€ra distribution och hur de interagerar med varandra. Den hĂ€r avhandlingen fokuserar pĂ„ utvecklandet av datormodeller kapabla att genomföra storskalig spatiell proteinanalys pĂ„ en subcellulĂ€r nivĂ„. Inom detta tillĂ€mpningsomrĂ„de kunde vi utveckla modeller för att klassificera lokaliseringen av proteiner i immunofluorescensmikroskopibilder och visa hur sĂ„dana modeller kan interagera med andra metoder för nya insikter i proteiners roller och deras rumsberoende funktioner. I Artikel I presenterar vi och kombinerar tvĂ„ separata metoder för storskalig proteinlokalisering. Den första metoden Ă€r en integration av en proteinlokaliseringsuppgift som ett minispel i ett etablerat massivt onlinespel. Den andra metoden bestĂ„r av den första bildbaserade djupa neuralnĂ€tverksmodellen kapabel att multietikettklassificera subcellulĂ€r proteinlokalisering. Vi visar att bĂ„da metoderna gör det möjligt att genomföra precisa och skalbara analyser av subcellulĂ€r proteinlokalisering, med hög genomströmning, som överkommer mĂ„nga av de svĂ„righeter som Ă€r associerade med sĂ„dana dataset. Vi visar ocksĂ„ att en kombination av de tvĂ„ metoderna producerar bĂ€ttre resultat Ă€n var metod gör för sig och resulterar i en modell som nĂ€rmar sig mĂ€nsklig prestanda. I Artikel II fortsĂ€tter vi, baserat pĂ„ framgĂ„ngen med Artikel I:s neuralnĂ€tverksmodell, undersöka anvĂ€ndningen av djupa neuralnĂ€tverk för subcellulĂ€r proteinlokalisering. I ett försök att hitta den bĂ€sta möjliga modellen för sĂ„dana uppgifter utvecklade vi en bildbaserad maskininlĂ€rningstĂ€vling. Över 2.000 lag deltog med olika typer av arkitekturer, vilket resulterade i en prediktor som lĂ„ngt övertrĂ€ffar den som presenterades i Artikel I. Den vinnande modellen blir noggrant analyserad och vi visar att dess interna numeriska representation innehĂ„ller biologiskt relevant information samt att dessa kan anvĂ€ndas för kvantiativ analys av proteinmönster. Artikel III anvĂ€nder den numeriska representationen av immunofluorescensbilder frĂ„n modellen utvecklad i Artikel II och integrerar den med en numerisk representation extraherad frĂ„n affinitetsreningsexperiment för att skapa en hierarkisk karta över den mĂ€nskliga cellens arkitektur. Denna metod gör en kartlĂ€ggning över grupper av proteiner, av vilka cirka 54% av grupperna Ă€r förmodat nya. Vi visar att kartlĂ€ggningen Ă€r biologiskt signifikant genom att validera ett flertal av de nya upptĂ€ckterna med affinitetsreningsexperiment och insitu fraktionering. I Artikel IV applicerar vi vad vi lĂ€rt oss frĂ„n Artikel I och II för att skapa en modell som identifierar proteiner som befinner sig i mikrokĂ€rnor. Vi applicerar modellen pĂ„ bilddata frĂ„n Human Protein Atlas för att skapa den första omfattande kartlĂ€ggningen av mikrokĂ€rneproteomet. Med hjĂ€lp av anrikningsanalys föreslĂ„r vi att mikrokĂ€rnor har en mer mĂ„ngfaldig funktionalitet Ă€n vad som tidigare har antagits. Vi finner att mikrokĂ€rneproteomet Ă€r starkt sammanlĂ€nkat samt innehĂ„ller mĂ„nga proteiner som uppvisar variation mellan olika mikrokĂ€rnor och diskuterar vad detta betyder för deras roll i cellen. Sammanfattat, Artikel I och II undersöker och etablerar möjligheterna för anvĂ€ndning av djupa neuralnĂ€tverk för systematisk subcellulĂ€r proteinlokaliseringsanalys. Artikel III och IV bygger vidare pĂ„ vad vi lĂ€rt oss i Artikel I och II och anvĂ€nder deras modeller för att undersöka proteindistributionsmönster och förser oss med nya biologiska insikter.QC 2022-11-17</p

    Deep learning is combined with massive-scale citizen science to improve large-scale image classification

    No full text
    Pattern recognition and classification of images are key challenges throughout the life sciences. We combined two approaches for large-scale classification of fluorescence microscopy images. First, using the publicly available data set from the Cell Atlas of the Human Protein Atlas (HPA), we integrated an image-classification task into a mainstream video game (EVE Online) as a mini-game, named Project Discovery. Participation by 322,006 gamers over 1 year provided nearly 33 million classifications of subcellular localization patterns, including patterns that were not previously annotated by the HPA. Second, we used deep learning to build an automated Localization Cellular Annotation Tool (Loc-CAT). This tool classifies proteins into 29 subcellular localization patterns and can deal efficiently with multi-localization proteins, performing robustly across different cell types. Combining the annotations of gamers and deep learning, we applied transfer learning to create a boosted learner that can characterize subcellular protein distribution with F1 score of 0.72. We found that engaging players of commercial computer games provided data that augmented deep learning and enabled scalable and readily improved image classification.QC 20181001</p
    corecore