2,333 research outputs found
Recommended from our members
Collective analysis of multiple high-throughput gene expression datasets
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University LondonModern technologies have resulted in the production of numerous high-throughput biological datasets. However, the pace of development of capable computational methods does not cope with the pace of generation of new high-throughput datasets. Amongst the most popular biological high-throughput datasets are gene expression datasets (e.g. microarray datasets). This work targets this aspect by proposing a suite of computational methods which can analyse multiple gene expression datasets collectively. The focal method in this suite is the unification of clustering results from multiple datasets using external specifications (UNCLES). This method applies clustering to multiple heterogeneous datasets which measure the expression of the same set of genes separately and then combines the resulting partitions in accordance to one of two types of external specifications; type A identifies the subsets of genes that are consistently co-expressed in all of the given datasets while type B identifies the subsets of genes that are consistently co-expressed in a subset of datasets while being poorly co-expressed in another subset of datasets. This contributes to the types of questions which can addressed by computational methods because existing clustering, consensus clustering, and biclustering methods are inapplicable to address the aforementioned objectives. Moreover, in order to assist in setting some of the parameters required by UNCLES, the M-N scatter plots technique is proposed. These methods, and less mature versions of them, have been validated and applied to numerous real datasets from the biological contexts of budding yeast, bacteria, human red blood cells, and malaria. While collaborating with biologists, these applications have led to various biological insights. In yeast, the role of the poorly-understood gene CMR1 in the yeast cell-cycle has been further elucidated. Also, a novel subset of poorly understood yeast genes has been discovered with an expression profile consistently negatively correlated with the well-known ribosome biogenesis genes. Bacterial data analysis has identified two clusters of negatively correlated genes. Analysis of data from human red blood cells has produced some hypotheses regarding the regulation of the pathways producing such cells. On the other hand, malarial data analysis is still at a preliminary stage. Taken together, this thesis provides an original integrative suite of computational methods which scrutinise multiple gene expression datasets collectively to address previously unresolved questions, and provides the results and findings of many applications of these methods to real biological datasets from multiple contexts.National Institute for Health Research (NIHR) and the Brunel College of Engineering, Design and Physical Science
Chemical Similarity and Threshold of Toxicological Concern (TTC) Approaches: Report of an ECB Workshop held in Ispra, November 2005
There are many national, regional and international programmes – either regulatory or voluntary – to assess the hazards or risks of chemical substances to humans and the environment. The first step in making a hazard assessment of a chemical is to ensure that there is adequate information on each of the endpoints. If adequate information is not available then additional data is needed to complete the dataset for this substance. For reasons of resources and animal welfare, it is important to limit the number of tests that have to be conducted, where this is scientifically justifiable. One approach is to consider closely related chemicals as a group, or chemical category, rather than as individual chemicals. In a category approach, data for chemicals and endpoints that have been already tested are used to estimate the hazard for untested chemicals and endpoints. Categories of chemicals are selected on the basis of similarities in biological activity which is associated with a common underlying mechanism of action.
A homologous series of chemicals exhibiting a coherent trend in biological activity can be rationalised on the basis of a constant change in structure. This type of grouping is relatively straightforward. The challenge lies in identifying the relevant chemical structural and physicochemical characteristics that enable more sophisticated groupings to be made on the basis of similarity in biological activity and hence purported mechanism of action. Linking two chemicals together and rationalising their similarity with reference to one or more endpoints has been very much carried out on an ad hoc basis. Even with larger groups, the process and approach is ad hoc and based on expert judgement. There still appears to be very little guidance about the tools and approaches for grouping chemicals systematically.
In November 2005, the ECB Workshop on Chemical Similarity and Thresholds of Toxicological Concern (TTC) Approaches was convened to identify the available approaches that currently exist to encode similarity and how these can be used to facilitate the grouping of chemicals. This report aims to capture the main themes that were discussed.
In particular, it outlines a number of different approaches that can facilitate the formation of chemical groupings in terms of the context under consideration and the likely information that would be required. Grouping methods were divided into one of four classes – knowledge-based, analogue-based, unsupervised, and supervised. A flowchart was constructed to attempt to capture a possible work flow to highlight where and how these approaches might be best applied.JRC.I.3-Toxicology and chemical substance
Mass data exploration in oncology: An information synthesis approach
New technologies and equipment allow for mass treatment of samples and
research teams share acquired data on an always larger scale. In this context
scientists are facing a major data exploitation problem. More precisely, using
these data sets through data mining tools or introducing them in a classical
experimental approach require a preliminary understanding of the information
space, in order to direct the process. But acquiring this grasp on the data is
a complex activity, which is seldom supported by current software tools. The
goal of this paper is to introduce a solution to this scientific data grasp
problem. Illustrated in the Tissue MicroArrays application domain, the proposal
is based on the synthesis notion, which is inspired by Information Retrieval
paradigms. The envisioned synthesis model gives a central role to the study the
researcher wants to conduct, through the task notion. It allows for the
implementation of a task-oriented Information Retrieval prototype system. Cases
studies and user studies were used to validate this prototype system. It opens
interesting prospects for the extension of the model or extensions towards
other application domains
Lidská lymfopoéza a její výzkum pomocí single-cell analýzy
Development of human B-lymphocytes is a convoluted process. A self-renewing stem cell progenitor in a primary lymphoid tissue commits to the lymphoid lineage. Subsequent B-lineage commitment entails somatic gene recombination processes which lead to the eventual expression of a surface antigen receptor. Functionality of the B-cell receptor, as well as successful testing for autoreactivity by the cell, are preconditions for the differenti- ation of a mature B-lymphocyte. Processes within this development are often investigated using single-cell analysis via flow cytometry, fluorescence-activated cell sorting and mass cytometry. Coupling these high-throughput methods with modern approaches to data analysis carries enormous potential in revealing rare cell populations and aberrant events in haematopoiesis. Keywords: B-lymphocyte, lymphopoiesis, flow cytometry, FACS, mass cytometry, clus- ter analysis, FlowSOM, PCA, t-SNE, Wanderlust.Vývoj lidských B-lymfocytů je spletitým dějem. Samoobnovující se kmenový progeni- tor v primární lymfoidní tkáni je nejdříve předurčen k vývoji v lymfoidní linii. Vývoj B-buněčnou cestou pak zahrnuje somatické genové rekombinace, které vedou k expresi povrchového antigenního receptoru. Funkčnost B-buněčného receptoru a podrobení buňky testování na autoreaktivitu jsou podmínkami pro diferenciaci ve zralý B-lymfocyt. Děje v rámci tohoto vývoje jsou často zkoumány pomocí single-cell analýzy skrze průtokovou cy- tometrii, fluorescencí aktivované třídění buněk a hmotnostní cytometrii. Propojení těchto vysoce výkonných metod s moderními přístupy k analýze dat skýtá obrovský potenciál v nacházení vzácných buněčných populací a odchylných událostí v rámci krvetvorby. Klíčová slova: B-lymfocyt, lymfopoéza, průtoková cytometrie, FACS, hmotnostní cy- tometrie, shluková analýza, FlowSOM, PCA, t-SNE, Wanderlust. 1Department of Cell BiologyKatedra buněčné biologieFaculty of SciencePřírodovědecká fakult
Proposal for an Organic Web, The missing link between the Web and the Semantic Web, Part 1
A huge amount of information is produced in digital form. The Semantic Web
stems from the realisation that dealing efficiently with this production
requires getting better at interlinking digital informational resources
together. Its focus is on linking data. Linking data isn't enough. We need to
provide infrastructural support for linking all sorts of informational
resources including resources whose understanding and fine interlinking
requires domain-specific human expertise. At times when many problems scale to
planetary dimensions, it is essential to scale coordination of information
processing and information production, without giving up on expertise and depth
of analysis, nor forcing languages and formalisms onto thinkers,
decision-makers and innovators that are only suitable to some forms of
intelligence. This article makes a proposal in this direction and in line with
the idea of interlinking championed by the Semantic Web.Comment: Supplementary material by Guillaume Bouzige and Mathilde Noua
Don't judge comics by their cover: fostering reading and writing through the clt framework
The new methodologies that are gradually becoming a trend in education. Leaving aside traditional dimensions, the unit aims to adopt a learner centred teacher with the purpose of adapting to the needs the actual context demands. As some problems in terms of reading and writing have been identified, the unit is meant to foster the student’s motivation by using comics within a CLT approach. In this line, students will grow communicatively effective. Keywords: Diversity, Communicative Language Teaching, motivation, comics, written skills
The hardware implementation of an artificial neural network using stochastic pulse rate encoding principles
In this thesis the development of a hardware artificial neuron device and artificial neural network using stochastic pulse rate encoding principles is considered. After a review of neural network architectures and algorithmic approaches suitable for hardware implementation, a critical review of hardware techniques which have been considered in analogue and digital systems is presented. New results are presented demonstrating the potential of two learning schemes which adapt by the use of a single reinforcement signal. The techniques for computation using stochastic pulse rate encoding are presented and extended with new novel circuits relevant to the hardware implementation of an artificial neural network. The generation of random numbers is the key to the encoding of data into the stochastic pulse rate domain. The formation of random numbers and multiple random bit sequences from a single PRBS generator have been investigated. Two techniques, Simulated Annealing and Genetic Algorithms, have been applied successfully to the problem of optimising the configuration of a PRBS random number generator for the formation of multiple random bit sequences and hence random numbers. A complete hardware design for an artificial neuron using stochastic pulse rate encoded signals has been described, designed, simulated, fabricated and tested before configuration of the device into a network to perform simple test problems. The implementation has shown that the processing elements of the artificial neuron are small and simple, but that there can be a significant overhead for the encoding of information into the stochastic pulse rate domain. The stochastic artificial neuron has the capability of on-line weight adaption. The implementation of reinforcement schemes using the stochastic neuron as a basic element are discussed
- …