22,518 research outputs found
The social in the platform trap: Why a microscopic system focus limits the prospect of social machines
“Filter bubble”, “echo chambers”, “information diet” – the metaphors to describe today’s information dynamics on social media platforms are fairly diverse. People use them to describe the impact of the viral spread of fake, biased or purposeless content online, as witnessed during the recent race for the US presidency or the latest outbreak of the Ebola virus (in the latter case a tasteless racist meme was drowning out any meaningful content). This unravels the potential envisioned to arise from emergent activities of human collectives on the World Wide Web, as exemplified by the Arab Spring mass movements or digital disaster response supported by the Ushahidi tool suite
MEME-LaB : motif analysis in clusters
Genome-wide expression analysis can result in large numbers of clusters of co-expressed genes. While there are tools for ab initio discovery of transcription factor binding sites, most do not provide a quick and easy way to study large numbers of clusters. To address this, we introduce a web-tool called MEME-LaB. The tool wraps MEME (an ab initio motif finder), providing an interface for users to input multiple gene clusters, retrieve promoter sequences, run motif finding, and then easily browse and condense the results, facilitating better interpretation of the results from large-scale datasets
MEME Suite: tools for motif discovery and searching
The MEME Suite web server provides a unified portal for online discovery and analysis of sequence motifs representing features such as DNA binding sites and protein interaction domains. The popular MEME motif discovery algorithm is now complemented by the GLAM2 algorithm which allows discovery of motifs containing gaps. Three sequence scanning algorithms—MAST, FIMO and GLAM2SCAN—allow scanning numerous DNA and protein sequence databases for motifs discovered by MEME and GLAM2. Transcription factor motifs (including those discovered using MEME) can be compared with motifs in many popular motif databases using the motif database scanning algorithm Tomtom. Transcription factor motifs can be further analyzed for putative function by association with Gene Ontology (GO) terms using the motif-GO term association tool GOMO. MEME output now contains sequence LOGOS for each discovered motif, as well as buttons to allow motifs to be conveniently submitted to the sequence and motif database scanning algorithms (MAST, FIMO and Tomtom), or to GOMO, for further analysis. GLAM2 output similarly contains buttons for further analysis using GLAM2SCAN and for rerunning GLAM2 with different parameters. All of the motif-based tools are now implemented as web services via Opal. Source code, binaries and a web server are freely available for noncommercial use at http://meme.nbcr.net
Transcription Factor-DNA Binding Via Machine Learning Ensembles
We present ensemble methods in a machine learning (ML) framework combining
predictions from five known motif/binding site exploration algorithms. For a
given TF the ensemble starts with position weight matrices (PWM's) for the
motif, collected from the component algorithms. Using dimension reduction, we
identify significant PWM-based subspaces for analysis. Within each subspace a
machine classifier is built for identifying the TF's gene (promoter) targets
(Problem 1). These PWM-based subspaces form an ML-based sequence analysis tool.
Problem 2 (finding binding motifs) is solved by agglomerating k-mer (string)
feature PWM-based subspaces that stand out in identifying gene targets. We
approach Problem 3 (binding sites) with a novel machine learning approach that
uses promoter string features and ML importance scores in a classification
algorithm locating binding sites across the genome. For target gene
identification this method improves performance (measured by the F1 score) by
about 10 percentage points over the (a) motif scanning method and (b) the
coexpression-based association method. Top motif outperformed 5 component
algorithms as well as two other common algorithms (BEST and DEME). For
identifying individual binding sites on a benchmark cross species database
(Tompa et al., 2005) we match the best performer without much human
intervention. It also improved the performance on mammalian TFs.
The ensemble can integrate orthogonal information from different weak
learners (potentially using entirely different types of features) into a
machine learner that can perform consistently better for more TFs. The TF gene
target identification component (problem 1 above) is useful in constructing a
transcriptional regulatory network from known TF-target associations. The
ensemble is easily extendable to include more tools as well as future PWM-based
information.Comment: 33 page
Sequence information gain based motif analysis
Background: The detection of regulatory regions in candidate sequences is essential for the understanding of the regulation of a particular gene and the mechanisms involved. This paper proposes a novel methodology based on information theoretic metrics for finding regulatory sequences in promoter regions. Results: This methodology (SIGMA) has been tested on genomic sequence data for Homo sapiens and Mus musculus. SIGMA has been compared with different publicly available alternatives for motif detection, such as MEME/MAST, Biostrings (Bioconductor package), MotifRegressor, and previous work such Qresiduals projections or information theoretic based detectors. Comparative results, in the form of Receiver Operating Characteristic curves, show how, in 70 % of the studied Transcription Factor Binding Sites, the SIGMA detector has a better performance and behaves more robustly than the methods compared, while having a similar computational time. The performance of SIGMA can be explained by its parametric simplicity in the modelling of the non-linear co-variability in the binding motif positions. Conclusions: Sequence Information Gain based Motif Analysis is a generalisation of a non-linear model of the cis-regulatory sequences detection based on Information Theory. This generalisation allows us to detect transcription factor binding sites with maximum performance disregarding the covariability observed in the positions of the training set of sequences. SIGMA is freely available to the public at http://b2slab.upc.edu.Postprint (published version
Memetic Artificial Bee Colony Algorithm for Large-Scale Global Optimization
Memetic computation (MC) has emerged recently as a new paradigm of efficient
algorithms for solving the hardest optimization problems. On the other hand,
artificial bees colony (ABC) algorithms demonstrate good performances when
solving continuous and combinatorial optimization problems. This study tries to
use these technologies under the same roof. As a result, a memetic ABC (MABC)
algorithm has been developed that is hybridized with two local search
heuristics: the Nelder-Mead algorithm (NMA) and the random walk with direction
exploitation (RWDE). The former is attended more towards exploration, while the
latter more towards exploitation of the search space. The stochastic adaptation
rule was employed in order to control the balancing between exploration and
exploitation. This MABC algorithm was applied to a Special suite on Large Scale
Continuous Global Optimization at the 2012 IEEE Congress on Evolutionary
Computation. The obtained results the MABC are comparable with the results of
DECC-G, DECC-G*, and MLCC.Comment: CONFERENCE: IEEE Congress on Evolutionary Computation, Brisbane,
Australia, 201
- …