1,215 research outputs found

    On-Chip Living-Cell Microarrays for Network Biology

    Get PDF

    Suuremahuliste andmete kasutamine geenidevaheliste seoste leidmiseks

    Get PDF
    Väitekirja elektrooniline versioon ei sisalda publikatsioone.Geenid määravad ära, millistest RNA ja valgu molekulidest elusorganism koosneb. Ainult geenide tuvastamisest ei piisa, et aru saada kuidas organism toimib, millal ja kuidas erinevad geenide produktid avalduvad ja mida need teevad. Elusorganismi olemuse mõistmiseks ja bioloogiliste protsesside mõjutamiseks on vajalik aru saada geenide ja valkude omavahelistest seostest. Suure läbilaskevõimega tehnoloogiad võimaldavad hõlpsasti mõõta bioloogiliste protsesside erinevaid tahke. See omakorda on toonud kaasa andmemahtude üha kiireneva kasvutrendi ning vajaduse uute meetodite järele, mis aitaks toorandmeid analüüsida, andmeid omavahel kombineerida ning tulemusi visualiseerida. Samuti on kasvanud vajadus arvutuslike meetoditega katsetada, kas olemasolevad andmemudelid kirjeldavad bioloogilist uurimisobjekti piisavalt täpselt. Käesolevas uurimistöös on näidatud erinevaid bioinformaatilisi meetodeid, kuidas suuremahuliste ning eritüübiliste eksperimentaalsete andmete kombineerimist saab rakendada geenidevaheliste seoste leidmiseks. Suuremahulistele andmetele on integreerimise ja omavahel võrreldavaks tegemisega võimalik anda lisaväärtust. Töö käigus koondati kokku ja tehti avalikkusele ligipääsetavaks embrüonaalsete tüvirakkude regulatsiooni käsitlevate publikatsioonide lisafailides avaldatud info ESCDb andmebaasi näol. Neid andmeid kasutades on teadlaskonnal võimalik leida geenide vahelisi seoseid, mida eraldiseisvaid andmeid analüüsides ei ole võimalik välja selgitada. Andmebaasi kogutud info kombineerimisel arvutusliku mudeldamisega õnnestus leida käesoleva töö raames uus regulaator embrüonaalsetes tüvirakkudes — IL11. Lisaks võimaldas erinevate andmetüüpide kombineerimine leida embrüonaalsete tüvirakkude keskse regulaatori — OCT4 geeni alternatiivsed märklaudgeenide moodulid. Kasutades DNA konserveerumisinfot koos regulatoorsete motiivide analüüsiga leiti kolm uut rasvatüvirakkude diferentseerumise regulaatorvalku. Samuti käsitletakse töös automaatset grupeerimis- ja visualiseerimismetoodikat VisHiC, mis aitab esile tõsta huvitavaid geenigruppe, mida teiste meetoditega edasi uurida. Töös on näidatud erinevaid suuremahuliste andmestike integreerimise viise, mis võimaldavad leida selliseid geenidevahelisi seoseid, mida ei oleks võimalik leida kui analüüsiksime üht andmestikku korraga.In order to understand the basic principles of how organisms function, and to be able to affect the biological processes, we need to understand relationships between genes and proteins. Modern high-throughput technology enables to study different sides of biological processes in a rapid manner. This, however, has led to a steady growth of amount of data available. The need for more sophisticated methods for analysing raw data, for combining different data sources, and to visualise the results, has emerged. Additionally, computational modeling is required to test if our understanding of biological processes is supported by the available data. A variety of bioinformatics methods are used to demonstrate how to combine different type of high-throughput data for identifying relationships between genes. Furthermore, it was shown that through combining various data types from different sources adds value to already published data. In the thesis, data from publications about embryonic stem cell regulation were collected together and made available through Embryonic Stem Cell Database (ESCDb). Complementary data in the database allows researchers to find relationships between genes that would not be possible when analysing only one dataset at a time. One of the main findings of this study illustrates how using computational modelling on data from the ESCDb allowed to find a novel pluripotency regulator — IL11. Additionally, integration of different data types led to identification of alternative gene regulatory modules of core pluripotency regulator OCT4. Similarly, combination of conservation data and regulatory motif analysis led to identification of three new regulators of adipocyte differentiation. This thesis also covers innovative methodology, VisHiC, for automatic identification and visualisation of functionally related gene sets. This methodology allows to find relevant gene sets for further characterisation from large high-throughput datasets. This doctoral thesis demonstrates that integration of different high-throughput datasets enables establishing gene-gene relationships that would not be possible when looking at a single data type in isolation

    Animated interval scatter-plot views for the exploratory analysis of large scale microarray time-course data.

    Get PDF
    Microarray technologies are a relatively new development that allow biologists to monitor the activity of thousands of genes (normally around 8,000) in parallel across multiple stages of a biological process. While this new perspective on biological functioning is recognised as having the potential to have a significant impact on the diagnosis, treatment, and prevention of diseases, it is only through effective analysis of the data produced that biologists can begin to unlock this potential. A significant obstacle to achieving effective analysis of microarray time-course is the combined scale and complexity of the data. This inevitably makes it difficult to reveal certain significant patterns in the data. In particular, it is less dominant patterns and, specifically, patterns that occur over smaller intervals of an experiment's overall time-frame that are more difficult to find. While existing techniques are capable of finding either unexpected patterns of activity over the majority of an experiment's time-frame or expected patterns of activity over smaller intervals of the time-frame, there are no techniques, or combination of techniques, that are suitable for finding unsuspected patterns of activity over smaller intervals. In order to overcome this limitation we have developed the Time-series Explorer, which specifically supports biologists in their attempts to reveal these types of pattern by allowing them to control an animated interval scatter-plot view of their data. This paper discusses aspects of the technique that make such an animated overview viable and describes the results of a user evaluation assessing the practical utility of the technique within the wider context of microarray time-series analysis as a whole

    MaTSE: the gene expression time-series explorer.

    Get PDF
    Background High throughput gene expression time-course experiments provide a perspective on biological functioning recognized as having huge value for the diagnosis, treatment, and prevention of diseases. There are however significant challenges to properly exploiting this data due to its massive scale and complexity. In particular, existing techniques are found to be ill suited to finding patterns of changing activity over a limited interval of an experiments time frame. The Time-Series Explorer (TSE) was developed to overcome this limitation by allowing users to explore their data by controlling an animated scatter-plot view. MaTSE improves and extends TSE by allowing users to visualize data with missing values, cross reference multiple conditions, highlight gene groupings, and collaborate by sharing their findings. Results MaTSE was developed using an iterative software development cycle that involved a high level of user feedback and evaluation. The resulting software combines a variety of visualization and interaction techniques which work together to allow biologists to explore their data and reveal temporal patterns of gene activity. These include a scatter-plot that can be animated to view different temporal intervals of the data, a multiple coordinated view framework to support the cross reference of multiple experimental conditions, a novel method for highlighting overlapping groups in the scatter-plot, and a pattern browser component that can be used with scatter-plot box queries to support cooperative visualization. A final evaluation demonstrated the tools effectiveness in allowing users to find unexpected temporal patterns and the benefits of functionality such as the overlay of gene groupings and the ability to store patterns. Conclusions We have developed a new exploratory analysis tool, MaTSE, that allows users to find unexpected patterns of temporal activity in gene expression time-series data. Overall, the study acted well to demonstrate the benefits of an iterative software development life cycle and allowed us to investigate some visualization problems that are likely to be common in the field of bioinformatics. The subjects involved in the final evaluation were positive about the potential of MaTSE to help them find unexpected patterns in their data and characterized MaTSE as an exploratory tool valuable for hypothesis generation and the creation of new biological knowledge

    Improving stroke risk prediction and individualised treatment in carotid atherosclerosis

    Get PDF
    Background: Unstable carotid atherosclerosis causes stroke, but methods to identify patients and lesions at risk are lacking. Currently, this risk estimation is based on measurements of stenosis and neurological symptoms, which determines the therapy of either medical treatment with or without carotid endarterectomy. The efficacy of this therapy is low and higher accuracy of diagnosis and therapy is warranted. Imaging of carotid plaque morphology using software for visualisation of plaque components may improve assessment of plaque phenotype and stroke risk. These studies aimed firstly to investigate if, and if yes, how, the carotid plaque morphology with image analysis of CTA associated with on-going biology in the corresponding specimen. Secondly, if risk stratification in clinical risk scores can be linked to the aforementioned associations. Finally, if the on-going biological processes can be specifically predicted out of the CTA imaging analysis. Methods: Plaque features were analysed in pre-operative CTA with dedicated software. In study I and II, the plaques were stratified according to quantified high and low of each feature, profiled with microarrays, followed by bioinformatic analyses. Immunohistochemistry was performed to evaluate the findings in plaques. In study III, patient phenotype, according to clinical stroke risk scores of CAR and ABCD2 stratified the cohorts of high vs low scores which were subsequently profiled with microarrays, followed by bioinformatic analyses and correlation analyses of plaque morphology in CTA. In study IV, the microarray transcriptomes were individually coupled to morphological data from the CTA analysis, developing models with machine intelligence to predict the gene expression from a CTA image. The models were then tested in unseen patients. Results: In study I, stabilising markers and processes related to SMCs and ECM organisation were associated with highly calcified plaques, while inflammatory and lipid related processes were repressed. PRG4, a novel marker for atherosclerosis, was identified as the most up-regulated gene in highly calcified plaques. Study II showed that carotid lesions with large lipid rich necrotic core, intraplaque haemorrhage or plaque burden were characterized by molecular signatures coupled with inflammation and extracellular matrix degradation, typically linked with instability. Symptomatology associated with large lipid rich necrotic core and plaque burden. Cross-validated prediction model for symptoms, showed that plaque morphology by CTA alone was superior to stenosis degree. Study III revealed that a high clinical risk score in CAR and ABCD2, reflect a plaque phenotype linked to immune response and coagulation, where the novel ABCB5, was one of the most up-regulated genes. The high risk scores correlated with the plaque components matrix and calcification but no positive association with stenosis degree. Study IV resulted in 414 robustly predicted transcripts from the CTA image analysis, of which pathway analysis showed biological processes associated with typical pathophysiology of atherosclerosis and plaque instability. The model testing demonstrated a good correlation between predicted and observed transcript expression levels and pathway analysis revealed a unique dominant mechanism for each individual. Conclusions: Biological processes in carotid plaques associated to vulnerability, can be linked to plaque morphology analysed with CTA image analysis. Patient phenotype classified with clinical risk scores associates to plaque phenotype and morphology in CTA. The biological processes in the atherosclerotic plaque can be predicted with plaque morphology CTA analysis in this small pilot study, providing a possibility to precision medicine after validation in larger scale studie

    Proximity in chromatin : opportunities for innovations

    Get PDF
    Mammalian chromosomes extensively communicate with each other via long-range chromatin interactions. These interactions are mostly mediated by proteins, which work as teams to control genes in the cells. These interactions could also help to unravel the mechanisms of diseases such as cancer, from new perspectives. The packaging of the chromatin fiber and how it relates to epigenetic marks that regulate its accessibility to govern lineage-specific gene expression repertoires is currently the focus of immense efforts worldwide. Moreover, how chromosomes are hierarchically folded and how they relate to each other as well as to structural hallmarks of the nucleus is a largely unchartered territory in large cell populations not to mention in individual cells. This thesis has an emphasis on the analysis of pivotal chromatin features of single cells. Thus, interactions between a genome organizer termed CTCF and a factor involved in DNA repair, PARP1, could be demonstrated using the ISPLA technique. Such interactions likely underlie the formation of chromatin networks. Next, novel strategies/techniques were developed to visualize chromosomal structures and 3D networks by scoring for chromatin proximities within individual cells. One strategy included a novel method termed Chromatin In Situ Proximity (ChrISP) to visualize and identify proximities between chromatin fibers and other structural hallmarks in single cells at a resolution < 170 Å beyond that of the light microscope. Thus, large-scale changes in conformations of a single human chromosome upon the administration of reprogramming cues could be visualized. Finally, this innovation was further developed to explore differences in proximities of chromatin fibers that organize chromosome territories. The novel design, termed “rainbow ChrISP” translates physical distances in 3D, between chromatin fibres into different colors visualized with conventional microscope. This technique produced new insights into chromosome conformations and their regulation to enhance our understanding of their governing principles in single cells during development and disease

    Knowledge visualization: From theory to practice

    Get PDF
    Visualizations have been known as efficient tools that can help users analyze com- plex data. However, understanding the displayed data and finding underlying knowl- edge is still difficult. In this work, a new approach is proposed based on understanding the definition of knowledge. Although there are many definitions used in different ar- eas, this work focuses on representing knowledge as a part of a visualization and showing the benefit of adopting knowledge representation. Specifically, this work be- gins with understanding interaction and reasoning in visual analytics systems, then a new definition of knowledge visualization and its underlying knowledge conversion processes are proposed. The definition of knowledge is differentiated as either explicit or tacit knowledge. Instead of directly representing data, the value of the explicit knowledge associated with the data is determined based on a cost/benefit analysis. In accordance to its importance, the knowledge is displayed to help the user under- stand the complex data through visual analytical reasoning and discovery

    Meetodid avalike geeniekspressiooni andmete taaskasutamiseks

    Get PDF
    Väitekirja elektrooniline versioon ei sisalda publikatsioone.Avalikud geeniekspressiooni andmebaasid sisaldavad andmeid rohkem kui miljoni bioloogilise proovi kohta, mis on pärit sadadest erinevatest kudedest ja haigustest. Sealjuures iga proovi kohta on teda sisuliselt kõigi geenide avaldumismuster. Nii on tekkinud olukord, kus on võimalik sooritada bioloogilisi uuringuid ilma katseid tegemata, kasutades vaid olemasolevaid andmeid. Andmestike suurus aga esitab mitmeid väljakutseid: korrektne analüüs nõuab spetsiifilisi statistilisi teadmisi, vajalik info on peidetud suure hulga ebavajaliku taha ning analüüs ise on töömahukas. Kõik need põhjused takistavad avalike andmete laiemat kasutuselevõttu. Antud töö eesmärk on muuta geeniekspressiooni andmete taaskasutamist, läbi meetodite ja tööriistade arendamise, efektiivsemaks ja kättesaadavamaks. Üks suuremaid probleeme andmete taaskasutamisel on nende ligipääsetavus. Seetõttu oleme loonud kaks veebikeskkonda, mis võimaldavad sooritada keerukaid analüüse avalikel andmetel kasutajasõbralikul moel. Neist esimene visualiseerib embrüonaalsete tüvirakkide kohta käivaid andmeid, mis pärinevad FunGenES konsortsiumist. Teine aga võimaldab otsida sarnase käitumisega geene üle sadade avalike andmestike. Teostades analüüse üle paljude andmestike tekib paratamatult vajadus saadud tulemusi omavahel ühendada. Selleks lõime algoritmi astakute agregeerimiseks, mis on kohandatud just geeni nimekirjade jaoks. Uurides mitmeid andmestikke korraga, on oluline neist kõigist omada sisulist ülevaadet. Selle hõlbustamiseks oleme välja töötanud visualiseerimismeetodi, mis suudab vähese vaevaga tekitada kompaktseid, kuid informatiivseid ülevaateid geeniekspressiooni andmetest. Tutvustatud meetodid ja tööriistad on loodud praktilisi vajadusi silmas pidades ning kõik nad on leidnud juba ka rakendust erinevates uuringutes.Public gene expression databases contain data about more than million biological samples, from hundreds of tissues and diseases. In principle, we know the expression pattern for all genes in these samples. Thus, we have a situation, where it is possible to carry out biological studies without performing new experiments. The size of the datasets, however, poses several challenges: appropriate analysis requires specific statistical skills, useful information is well hidden in the datasets and the analysis itself is time consuming. All these reasons prevent the wider usage of public gene expression data. The goal of this thesis is to facilitate re-use of expression data by developing analysis methods and tools. One of the biggest obstacles for re-using expression data is its accessibility. For that reason, we have created two web environments that allow to run complex analysis pipelines on public gene expression data. First of those visualises embryonic stem cell data from FunGenES consortium. The other allows to search for genes with similar behaviour across hundreds of public datasets. By performing analyses over multiple datasets there will be eventually need for integration of the results. For this task we created a rank aggregation algorithm that is specifically designed for lists of genes. When studying multiple datasets it is important to have good overview of their contents. To allow rapid functional characterization of datasets, we have created a visualisation method that can create compact but informative visual summaries of the data. Methods and tools described here, have been created with practical considerations in mind and have already been used in various studies

    Do You Know What I Know?:Situational Awareness of Co-located Teams in Multidisplay Environments

    Get PDF
    Modern collaborative environments often provide an overwhelming amount of visual information on multiple displays. In complex project settings, the amount of visual information on multiple displays, and the multitude of personal and shared interaction devices in these environments can reduce the awareness of team members on ongoing activities, the understanding of shared visualisations, and the awareness of who is in control of shared artifacts. Research reported in this thesis addresses the situational awareness (SA) support of co-located teams working on team projects in multidisplay environments. Situational awareness becomes even more critical when the content of multiple displays changes rapidly, and when these provide large amounts of information. This work aims at getting insights into design and evaluation of shared display visualisations that afford situational awareness and group decision making. This thesis reports the results of three empirical user studies in three different domains: life science experimentation, decision making in brainstorming teams, and agile software development. The first and the second user studies evaluate the impact of the Highlighting-on-Demand and the Chain-of-Thoughts SA on the group decision-making and awareness. The third user study presents the design and evaluation of a shared awareness display for software teams. Providing supportive visualisations on a shared large display, we aimed at reducing the distraction from the primary task, enhancing the group decision-making process and the perceived task performance

    Status and Potential of Single-cell Transcriptomics for Understanding Plant Development and Functional Biology

    Get PDF
    Funding Information University of Western Australia Acknowledgments The authors would like to extend sincere thanks to Robert Salomon for inspiring to write this manuscript. Resources were provided by The University of Western Australia.Peer reviewedPostprin
    corecore