13 research outputs found
A new approach for the construction of historical databasesâNoSQL Document-oriented databases: the example of AtlantoCracies
This article proposes, and justifies, the use of the Document-oriented databases as a flexible, easy to use, and powerful digital
tool in the field of historical research. First, the reasons that have made relational databases the predominant instrument among
historians are studied, while detailing the problems involved in their use. Next, the way in which historians have tried to face these
problems by using other digital tools is explained, as well as the limitations that such use entails. Through a case studyâthat of
European aristocratic networks in early modern timesâit is shown, however, that Document-oriented databases, present notable
advantages and have greater explanatory power for the historianâs work. Thanks to their flexibility, they are better adapted to the
often-unpredictable nature of historical sources without diminishing their ease of use or their analytical potential.Junta de AndalucĂa UPO-1264973Junta de AndalucĂa HUM 100
Clustering Main Concepts from e-Mails
Eâmail is one of the most common ways to communicate, assuming, in some cases, up to 75% of a companyâs communication, in which every employee spends about 90 minutes a day in eâmail tasks such as filing and deleting. This paper deals with the generation of clusters of relevant words from Eâmail texts. Our approach consists of the application of text mining techniques and, later, data mining techniques, to obtain related concepts extracted from sent and received messages. We have developed a new clustering algorithm based on neighborhood, which takes into account similarity values among words obtained in the text mining phase. The potential of these applications is enormous and only a few companies, mainly large organizations, have invested in this project so far, taking advantage of employeesâs knowledge in future decisions
Ensemble and Greedy Approach for the Reconstruction of Large Gene Co-Expression Networks
In the recent years, the vast amount of genetic information generated by high-throughput approaches, have led to the need of new methods for data handling. The integrative analysis of diverse-nature gene information could provide a much-sought overview to study complex biological systems and processes. In this sense, Co-expression Gene Networks (CGN) have become a powerful tool in the comprehensive analysis of gene expression. Such networks represent relationships between genes (or gene products) by means of a graph composed of nodes and edges, where nodes represent genes and edges the relationships among them. Amongst the main features of CGN, sparseness and scale-free topology may notably affect the latter network analysis. Within this framework, structure optimization techniques are also important in the reduction of the size of the networks, not only improving their topology but also keeping a positive prediction ratio. On the other hand, ensemble strategies have significantly improved the precision of results by combining different measures or methods.
In this work, we present Ensemble and Greedy networks (EnGNet), a novel two-step method for CGN inference. First, EnGNet uses an ensemble strategy for co-expression networks generation. Final score is estimated by major voting among three different methdos, i.e. Spearman and Kendall coefficients and Normalized Mutual Information. Second, a greedy algorithm optimizes both the size and the topological features of the network. Not only do achieved results show that this method is able to obtain reliable networks, but also that it significantly improves the topology of the networks.
Moreover, the usefulness of the method is proven by an application to a human dataset on post-traumatic stress disorder (PTSD), revealing an innate immunity-mediated response to this pathology in accordance with previous studies. These results are indicative of the potential of CGN, and EnGNet in particular, in the unveiling of the genetic causes for complex diseases. Finally, the implications of CGN in biomarkers discovery, could lead research towards earlier detection and effective treatment of these diseases
CarGene: Characterisation of sets of genes based on metabolic pathways analysis
The great amount of biological information provides scientists
with an incomparable framework for testing the results of new algorithms.
Several tools have been developed for analysing gene-enrichment and
most of them are Gene Ontology-based tools. We developed a Kyoto
Encyclopedia of Genes and Genomes (Kegg)-based tool that provides a
friendly graphical environment for analysing gene-enrichment. The tool
integrates two statistical corrections and simultaneously analysing the
information about many groups of genes in both visual and textual
manner. We tested the usefulness of our approach on a previous
analysis (Huttenshower et al.). Furthermore, our tool is freely available
(http://www.upo.es/eps/bigs/cargene.html).Ministerio de Ciencia y TecnologĂa TIN2007-68084-C02-00Ministerio de Ciencia e InnovaciĂłn PCI2006-A7-0575Junta de AndalucĂa P07-TIC-02611Junta de AndalucĂa TIC-20
Discovering 뱉patterns from gene expression data
The biclustering techniques have the purpose of finding subsets of
genes that show similar activity patterns under a subset of conditions. In this
paper we characterize a specific type of pattern, that we have called 뱉pattern,
and present an approach that consists in a new biclustering algorithm specifically
designed to find 뱉patterns, in which the gene expression values evolve across
the experimental conditions showing a similar behavior inside a band that ranges
from 0 up to a preâdefined threshold called α. The α value guarantees the coâ
expression among genes. We have tested our method on the Yeast dataset and
compared the results to the biclustering algorithms of Cheng & Church (2000)
and Aguilar & Divina (2005). Results show that the algorithm finds interesting
biclusters, grouping genes with similar behaviors and maintaining a very low
mean squared residue
Neighborhood-Based Clustering of Gene-Gene Interactions
n this work, we propose a new greedy clustering algorithm to identify groups of related genes. Clustering algorithms analyze genes in order to group those with similar behavior. Instead, our approach groups pairs of genes that present similar positive and/or negative interactions. Our approach presents some interesting properties. For instance, the user can specify how the range of each gene is going to be segmented (labels). Some of these will mean expressed or inhibited (depending on the gradation). From all the label combinations a function transforms each pair of labels into another one, that identifies the type of interaction. From these pairs of genes and their interactions we build clusters in a greedy, iterative fashion, as two pairs of genes will be similar if they have the same amount of relevant interactions. Initial twoâgenes clusters grow iteratively based on their neighborhood until the set of clusters does not change. The algorithm allows the researcher to modify all the criteria: discretization mapping function, geneâgene mapping function and filtering function, and provides much flexibility to obtain clusters based on the level of precision needed.
The performance of our approach is experimentally tested on the yeast dataset. The final number of clusters is low and genes within show a significant level of cohesion, as it is shown graphically in the experiments
Anålisis de datos de expresión genética
El anålisis de datos de expresión genética es
una de las tareas fundamentales dentro de la
BioinformĂĄtica. Para llevar a cabo este estudio
se hace necesaria la aplicación de técnicas de
MinerĂa de Datos. Las tĂ©cnicas de Clustering han
probado ser de gran utilidad a la hora de descubrir
grupos de genes que intervienen en una misma
funciĂłn celular o que estĂĄn regulados de la misma
manera. Recientemente, el Biclustering ha sido
propuesto como método para descubrir patrones
de comportamiento especĂ co en los que el valor
de expresiĂłn de un subgrupo de genes evoluciona
de la misma forma a lo largo de un subgrupo de
condiciones de laboratorio. En este artĂculo se
revisan las distintas técnicas usadas en el anålisis
de datos de expresión genética, estudiåndose en
profundidad los métodos basados en Biclustering,
ademås de discutir los diferentes métodos de
validaciĂłn para evaluar el modelo generado por
las distintas propuestas
El trabajo autĂłnomo como herramienta didĂĄctica
El objetivo de este artĂculo es el de presentar tres casos prĂĄcticos, en el ĂĄmbito de tres asignaturas de la TitulaciĂłn en IngenierĂa TĂ©cnica en InformĂĄtica de GestiĂłn de la Universidad Pablo de Olavide, en los que el trabajo autĂłnomo del alumno ha sido la herramienta utilizada para solventar la problemĂĄtica provocada por la reducciĂłn de horas de clases que deriva de la implantaciĂłn del EEES que se agravaba mĂĄs en la modalidad semipresencial de la titulaciĂłn, modalidad en la que los alumnos, normalmente trabajadores en activo, ven reducidas las horas de presencialidad requerida un 50% para facilitar la compaginaciĂłn de estudios y actividad laboral. Los resultados obtenidos en tĂ©rminos de tasas de Ă©xito y porcentajes de abandono muestran unamejora de los resultados obtenidos por las asignaturas, corroborando la utilidad de un trabajo autĂłnomo bien planteado.ArtĂculo revisado por pare
A Deterministic Model to Infer Gene Networks from Microarray Data
Microarray experiments help researches to construct the str ucture of gene regulatory networks, i.e., networks representing relation ships among different genes. Filter and knowledge extraction processes
are necessary in order to handle the huge amount of data produced
by microarray technologies. We propose regression trees techniques as
a method to identify gene networks. Regression trees are a very use ful technique to estimate the numerical values for the target outputs.
They are very often more precise than linear regression models because
they can adjust different linear regressions to separate areas of the search
space. In our approach, we generate a single regression tree for each genes
from a set of genes, taking as input the remaining genes, to finally build
a graph from all the relationships among output and input genes. In this
paper, we will simplify the approach by setting an only seed, the gene
ARN1, and building the graph around it. The final model might gives
some clues to understand the dynamics, the regulation or the topology
of the gene network from one (or several) seeds, since it gathers rele vant genes with accurate connections. The performance of our approach
is experimentally tested on the yeast Saccharomyces cerevisiae dataset
(Rosetta compendium)