67 research outputs found

    Open Data and Quantitative Techniques for Anthropology of Road Traffic

    Full text link
    What kind of questions about human mobility can computational analysis help answer? How to translate the findings into anthropology? We analyzed a publicly available data set of road traffic counters in Slovenia to answer these questions. The data reveals interesting information on how a nation drives, how it travels for tourism, which locations it prefers, what it does during the week and the weekend, and how its habits change during the year. We conducted the empirical analysis in two parts. First, we defined interesting traffic spots and designed computational methods to find them in a large data set. As shown in the paper, traffic counters hint at potential causes and effects in driving practices that we can interpret anthropologically. Second, we used clustering to find groups of similar traffic counters as described by their daily profiles. Clustering revealed the main features of road traffic in Slovenia. Using the two quantitative approaches, we outline the general properties of road traffic in the country and identify and explain interesting outliers. We show that quantitative data analysis only partially answers anthropological questions, but it can be a valuable tool for preliminary research. We conclude that open data are a useful component in an anthropological analysis and that quantitative discovery of small local events can help us pinpoint future fieldwork sites.Comment: 17 pages, 7 figure

    Sparse data embedding and prediction by tropical matrix factorization

    Get PDF
    Background Matrix factorization methods are linear models, with limited capability to model complex relations. In our work, we use tropical semiring to introduce non-linearity into matrix factorization models. We propose a method called Sparse Tropical Matrix Factorization (STMF) for the estimation of missing (unknown) values in sparse data. Results We evaluate the efficiency of the STMF method on both synthetic data and biological data in the form of gene expression measurements downloaded from The Cancer Genome Atlas (TCGA) database. Tests on unique synthetic data showed that STMF approximation achieves a higher correlation than non-negative matrix factorization (NMF), which is unable to recover patterns effectively. On real data, STMF outperforms NMF on six out of nine gene expression datasets. While NMF assumes normal distribution and tends toward the mean value, STMF can better fit to extreme values and distributions. Conclusion STMF is the first work that uses tropical semiring on sparse data. We show that in certain cases semirings are useful because they consider the structure, which is different and simpler to understand than it is with standard linear algebra.This work is supported by the Slovene Research Agency, Young Researcher Grant (52096) awarded to AO, and research core funding (P1-0222 to PO and P2-0209 to TC)

    VizRank: Finding Informative Data Projections in Functional Genomics by Machine Learning

    Get PDF
    VizRank is a tool that finds interesting two-dimensional projections of class-labeled data. When applied to multi-dimensional functional genomics data sets, VizRank can systematically find relevant biological patterns

    Analysis of CLIP and iCLIP methods for nucleotide-resolution studies of protein-RNA interactions.

    Get PDF
    UV cross-linking and immunoprecipitation (CLIP) and individual-nucleotide resolution CLIP (iCLIP) are methods to study protein-RNA interactions in untreated cells and tissues. Here, we analyzed six published and two novel data sets to confirm that both methods identify protein-RNA cross-link sites, and to identify a slight uridine preference of UV-C-induced cross-linking. Comparing Nova CLIP and iCLIP data revealed that cDNA deletions have a preference for TTT motifs, whereas iCLIP cDNA truncations are more likely to identify clusters of YCAY motifs as the primary Nova binding sites. In conclusion, we demonstrate how each method impacts the analysis of protein-RNA binding specificity.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are

    Control of a neuronal morphology program by an RNA-binding zinc finger protein, Unkempt

    Get PDF
    Cellular morphology is an essential determinant of cellular function in all kingdoms of life, yet little is known about how cell shape is controlled. Here we describe a molecular program that controls the early morphology of neurons through a metazoan-specific zinc finger protein, Unkempt. Depletion of Unkempt in mouse embryos disrupts the shape of migrating neurons, while ectopic expression confers neuronal-like morphology to cells of different nonneuronal lineages. We found that Unkempt is a sequence-specific RNA-binding protein and identified its precise binding sites within coding regions of mRNAs linked to protein metabolism and trafficking. RNA binding is required for Unkempt-induced remodeling of cellular shape and is directly coupled to a reduced production of the encoded proteins. These findings link post-transcriptional regulation of gene expression with cellular shape and have general implications for the development and disease of multicellular organisms

    A systems view of spliceosomal assembly and branchpoints with iCLIP.

    Get PDF
    Studies of spliceosomal interactions are challenging due to their dynamic nature. Here we used spliceosome iCLIP, which immunoprecipitates SmB along with small nuclear ribonucleoprotein particles and auxiliary RNA binding proteins, to map spliceosome engagement with pre-messenger RNAs in human cell lines. This revealed seven peaks of spliceosomal crosslinking around branchpoints (BPs) and splice sites. We identified RNA binding proteins that crosslink to each peak, including known and candidate splicing factors. Moreover, we detected the use of over 40,000 BPs with strong sequence consensus and structural accessibility, which align well to nearby crosslinking peaks. We show how the position and strength of BPs affect the crosslinking patterns of spliceosomal factors, which bind more efficiently upstream of strong or proximally located BPs and downstream of weak or distally located BPs. These insights exemplify spliceosome iCLIP as a broadly applicable method for transcriptomic studies of splicing mechanisms

    Insights into the design and interpretation of iCLIP experiments

    Get PDF
    Abstract Background Ultraviolet (UV) crosslinking and immunoprecipitation (CLIP) identifies the sites on RNAs that are in direct contact with RNA-binding proteins (RBPs). Several variants of CLIP exist, which require different computational approaches for analysis. This variety of approaches can create challenges for a novice user and can hamper insights from multi-study comparisons. Here, we produce data with multiple variants of CLIP and evaluate the data with various computational methods to better understand their suitability. Results We perform experiments for PTBP1 and eIF4A3 using individual-nucleotide resolution CLIP (iCLIP), employing either UV-C or photoactivatable 4-thiouridine (4SU) combined with UV-A crosslinking and compare the results with published data. As previously noted, the positions of complementary DNA (cDNA)-starts depend on cDNA length in several iCLIP experiments and we now find that this is caused by constrained cDNA-ends, which can result from the sequence and structure constraints of RNA fragmentation. These constraints are overcome when fragmentation by RNase I is efficient and when a broad cDNA size range is obtained. Our study also shows that if RNase does not efficiently cut within the binding sites, the original CLIP method is less capable of identifying the longer binding sites of RBPs. In contrast, we show that a broad size range of cDNAs in iCLIP allows the cDNA-starts to efficiently delineate the complete RNA-binding sites. Conclusions We demonstrate the advantage of iCLIP and related methods that can amplify cDNAs that truncate at crosslink sites and we show that computational analyses based on cDNAs-starts are appropriate for such methods
    corecore