5,532 research outputs found
Tetraodon genome confirms Takifugu findings : most fish are ancient polyploids
An evolutionary hypothesis suggested by studies of the genome of the tiger pufferfish Takifugu rubripes has now been confirmed by comparison with the genome of a close relative, the spotted green pufferfish Tetraodon nigroviridis. Ray-finned fish underwent a whole-genome duplication some 350 million years ago that might explain their evolutionary success
Of dups and dinos : evolution at the K/Pg boundary
Fifteen years into sequencing entire plant genomes, more than 30 paleopolyploidy events could be mapped on the tree of flowering plants (and many more when also transcriptome data sets are considered). While some genome duplications are very old and have occurred early in the evolution of dicots and monocots, or even before, others are more recent and seem to have occurred independently in many different plant lineages. Strikingly, a majority of these duplications date somewhere between 55 and 75 million years ago (mya), and thus likely correlate with the K/Pg boundary. If true, this would suggest that plants that had their genome duplicated at that time, had an increased chance to survive the most recent mass extinction event, at 66 mya, which wiped out a majority of plant and animal life, including all non-avian dinosaurs. Here, we review several processes, both neutral and adaptive, that might explain the establishment of polyploid plants, following the K/Pg mass extinction
Event based text mining for integrated network construction
The scientific literature is a rich and challenging data source for research in systems biology, providing numerous interactions between biological entities. Text mining techniques have been increasingly useful to extract such information from the literature in an automatic way, but up to now the main focus of text mining in the systems biology field has been restricted mostly to the discovery of protein-protein interactions. Here, we take this approach one step further, and use machine learning techniques combined with text mining to extract a much wider variety of interactions between biological entities. Each particular interaction type gives rise to a separate network, represented as a graph, all of which can be subsequently combined to yield a so-called integrated network representation. This provides a much broader view on the biological system as a whole, which can then be used in further investigations to analyse specific properties of the networ
Accurate RT-qPCR gene expression analysis on cell culture lysates
Gene expression quantification on cultured cells using the reverse transcription quantitative polymerase chain reaction (RT-qPCR) typically involves an RNA purification step that limits sample processing throughput and precludes parallel analysis of large numbers of samples. An approach in which cDNA synthesis is carried out on crude cell lysates instead of on purified RNA samples can offer a fast and straightforward alternative. Here, we evaluate such an approach, benchmarking Ambion's Cells-to-CT kit with the classic workflow of RNA purification and cDNA synthesis, and demonstrate its good accuracy and superior sensitivity
Determination of 2D implanted ion distributions using inverse radon transform methods
Two methods are presented for the experimental determination of 2D implanted ion distribution resulting from implantations with a line source into amorphous targets. It is shown that the relation between the 2D distribution and the depth profiles resulting from tilted angle implantations is described by the Radon transformation. The inverse transformation has been applied to accurately measured depth profiles. The first method uses a digitization of the 2D distribution and the second method uses a parameterized function for the 2D distribution. The methods are tested for a 400 keV boron implantation in an amorphous layer of silicon. The experimental obtained 2D distributions are compared with a TRIM Monte Carlo simulation. A good agreement between experiment and simulation is observed
zt: A Sofware Tool for Simple and Partial Mantel Tests
Different methods of data analysis (e.g. clustering and ordination) are based on distance matrices. In some cases, researchers may wish to compare several distance matrices with one another in order to test a hypothesis concerning a possible relationship between these matrices. However, this is not always self-evident. Usually, values in distance matrices are, in some way, correlated and therefore the usual assumption of independence between objects is violated in the classical tests approach. Furthermore, often, spurious correlations can be observed when comparing two distances matrices. A classic example is the comparison between genetic and environmental distances. Colonies that are in close proximity of each other tend to have similar environments and therefore there will be a positive correlation between environmental and geographical distances. Such colonies will also be more likely to exchange migrants so that genetic distances will be positively correlated with spatial distances. The consequence is that an observed positive association between genetic and environmental distances may be simply due to spatial effects. The most widely used method to account for distance correlations is a procedure known as the Mantel test (Mantel,'67; Mantel and Valand,'70 following the pioneering work of Daniels,'44 ; Daniels and Kendall'47). The simple Mantel test considers two matrices while an extension known as the partial Mantel test considers three matrices. These tools are widely used in different fields of research such as population genetics, ecology, anthropology, psychometrics and sociology.
Improving the adaptability of simulated evolutionary swarm robots in dynamically changing environments
One of the important challenges in the field of evolutionary robotics is the development of systems that can adapt to a changing environment. However, the ability to adapt to unknown and fluctuating environments is not straightforward. Here, we explore the adaptive potential of simulated swarm robots that contain a genomic encoding of a bio-inspired gene regulatory network (GRN). An artificial genome is combined with a flexible agent-based system, representing the activated part of the regulatory network that transduces environmental cues into phenotypic behaviour. Using an artificial life simulation framework that mimics a dynamically changing environment, we show that separating the static from the conditionally active part of the network contributes to a better adaptive behaviour. Furthermore, in contrast with most hitherto developed ANN-based systems that need to re-optimize their complete controller network from scratch each time they are subjected to novel conditions, our system uses its genome to store GRNs whose performance was optimized under a particular environmental condition for a sufficiently long time. When subjected to a new environment, the previous condition-specific GRN might become inactivated, but remains present. This ability to store 'good behaviour' and to disconnect it from the novel rewiring that is essential under a new condition allows faster re-adaptation if any of the previously observed environmental conditions is reencountered. As we show here, applying these evolutionary-based principles leads to accelerated and improved adaptive evolution in a non-stable environment
Analysis of a Gibbs sampler method for model based clustering of gene expression data
Over the last decade, a large variety of clustering algorithms have been
developed to detect coregulatory relationships among genes from microarray gene
expression data. Model based clustering approaches have emerged as
statistically well grounded methods, but the properties of these algorithms
when applied to large-scale data sets are not always well understood. An
in-depth analysis can reveal important insights about the performance of the
algorithm, the expected quality of the output clusters, and the possibilities
for extracting more relevant information out of a particular data set. We have
extended an existing algorithm for model based clustering of genes to
simultaneously cluster genes and conditions, and used three large compendia of
gene expression data for S. cerevisiae to analyze its properties. The algorithm
uses a Bayesian approach and a Gibbs sampling procedure to iteratively update
the cluster assignment of each gene and condition. For large-scale data sets,
the posterior distribution is strongly peaked on a limited number of
equiprobable clusterings. A GO annotation analysis shows that these local
maxima are all biologically equally significant, and that simultaneously
clustering genes and conditions performs better than only clustering genes and
assuming independent conditions. A collection of distinct equivalent
clusterings can be summarized as a weighted graph on the set of genes, from
which we extract fuzzy, overlapping clusters using a graph spectral method. The
cores of these fuzzy clusters contain tight sets of strongly coexpressed genes,
while the overlaps exhibit relations between genes showing only partial
coexpression.Comment: 8 pages, 7 figure
Java-ML: a machine learning library
Java-ML is a collection of machine learning and data mining algorithms, which aims to be a readily usable and easily extensible API for both software developers and research scientists. The interfaces for each type of algorithm are kept simple and algorithms strictly follow their respective interface. Comparing different classifiers or clustering algorithms is therefore straightforward, and implementing new algorithms is also easy. The implementations of the algorithms are clearly written, properly documented and can thus be used as a reference. The library is written in Java and is available from http://java-ml.sourceforge.net/ under the GNU GPL license
Extracting protein-protein interactions from text using rich feature vectors and feature selection
Because of the intrinsic complexity of natural language, automatically extracting accurate information from text remains a challenge. We have applied rich featurevectors derived from dependency graphs to predict protein-protein interactions using machine learning techniques. We present the first extensive analysis of applyingfeature selection in this domain, and show that it can produce more cost-effective models. For the first time, our technique was also evaluated on several large-scalecross-dataset experiments, which offers a more realistic view on model performance.
During benchmarking, we encountered several fundamental problems hindering comparability with other methods. We present a set of practical guidelines to set up ameaningful evaluation.
Finally, we have analysed the feature sets from our experiments before and after feature selection, and evaluated the contribution of both lexical and syntacticinformation to our method. The gained insight will be useful to develop better performing methods in this domain
- …