1,607 research outputs found
A Peer-reviewed Newspaper About_ Excessive Research
Research on machines, research with machines, and research as a machine.
Publication resulting from research workshop at Exhibition Research Lab, Liverpool John Moores University, organised in collaboration with Liverpool John Moores University and Liverpool Biennial, and transmediale festival for art and digital culture, Berlin
Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space?
The organization and mining of malaria genomic and post-genomic data is
highly motivated by the necessity to predict and characterize new biological
targets and new drugs. Biological targets are sought in a biological space
designed from the genomic data from Plasmodium falciparum, but using also the
millions of genomic data from other species. Drug candidates are sought in a
chemical space containing the millions of small molecules stored in public and
private chemolibraries. Data management should therefore be as reliable and
versatile as possible. In this context, we examined five aspects of the
organization and mining of malaria genomic and post-genomic data: 1) the
comparison of protein sequences including compositionally atypical malaria
sequences, 2) the high throughput reconstruction of molecular phylogenies, 3)
the representation of biological processes particularly metabolic pathways, 4)
the versatile methods to integrate genomic data, biological representations and
functional profiling obtained from X-omic experiments after drug treatments and
5) the determination and prediction of protein structures and their molecular
docking with drug candidate structures. Progresses toward a grid-enabled
chemogenomic knowledge space are discussed.Comment: 43 pages, 4 figures, to appear in Malaria Journa
Scalable Architecture for Integrated Batch and Streaming Analysis of Big Data
Thesis (Ph.D.) - Indiana University, Computer Sciences, 2015As Big Data processing problems evolve, many modern applications demonstrate special characteristics. Data exists in the form of both large historical datasets and high-speed real-time streams, and many analysis pipelines require integrated parallel batch processing and stream processing. Despite the large size of the whole dataset, most analyses focus on specific subsets according to certain criteria. Correspondingly, integrated support for efficient queries and post- query analysis is required.
To address the system-level requirements brought by such characteristics, this dissertation proposes a scalable architecture for integrated queries, batch analysis, and streaming analysis of Big Data in the cloud. We verify its effectiveness using a representative application domain - social media data analysis - and tackle related research challenges emerging from each module of the architecture by integrating and extending multiple state-of-the-art Big Data storage and processing systems.
In the storage layer, we reveal that existing text indexing techniques do not work well for the unique queries of social data, which put constraints on both textual content and social context. To address this issue, we propose a flexible indexing framework over NoSQL databases to support fully customizable index structures, which can embed necessary social context information for efficient queries.
The batch analysis module demonstrates that analysis workflows consist of multiple algorithms with different computation and communication patterns, which are suitable for different processing frameworks. To achieve efficient workflows, we build an integrated analysis stack based on YARN, and make novel use of customized indices in developing sophisticated analysis algorithms.
In the streaming analysis module, the high-dimensional data representation of social media streams poses special challenges to the problem of parallel stream clustering. Due to the sparsity of the high-dimensional data, traditional synchronization method becomes expensive and severely impacts the scalability of the algorithm. Therefore, we design a novel strategy that broadcasts the incremental changes rather than the whole centroids of the clusters to achieve scalable parallel stream clustering algorithms.
Performance tests using real applications show that our solutions for parallel data loading/indexing, queries, analysis tasks, and stream clustering all significantly outperform implementations using current state-of-the-art technologies
Selection on Visual Opsin Genes in Diurnal Neotropical Frogs and Loss of the SWS2 Opsin in Poison Frogs
Amphibians are ideal for studying visual system evolution because their biphasic (aquatic and terrestrial) life history and ecological diversity expose them to a broad range of visual conditions. Here, we evaluate signatures of selection on visual opsin genes across Neotropical anurans and focus on three diurnal clades that are well-known for the concurrence of conspicuous colors and chemical defense (i.e., aposematism): poison frogs (Dendrobatidae), Harlequin toads (Bufonidae: Atelopus), and pumpkin toadlets (Brachycephalidae: Brachycephalus). We found evidence of positive selection on 44 amino acid sites in LWS, SWS1, SWS2, and RH1 opsin genes, of which one in LWS and two in RH1 have been previously identified as spectral tuning sites in other vertebrates. Given that anurans have mostly nocturnal habits, the patterns of selection revealed new sites that might be important in spectral tuning for frogs, potentially for adaptation to diurnal habits and for color-based intraspecific communication. Furthermore, we provide evidence that SWS2, normally expressed in rod cells in frogs and some salamanders, has likely been lost in the ancestor of Dendrobatidae, suggesting that under low-light levels, dendrobatids have inferior wavelength discrimination compared to other frogs. This loss might follow the origin of diurnal activity in dendrobatids and could have implications for their behavior. Our analyses show that assessments of opsin diversification in across taxa could expand our understanding of the role of sensory system evolution in ecological adaptation.</p
Quootstrap: Scalable Unsupervised Extraction of Quotation-Speaker Pairs from Large News Corpora via Bootstrapping
We propose Quootstrap, a method for extracting quotations, as well as the
names of the speakers who uttered them, from large news corpora. Whereas prior
work has addressed this problem primarily with supervised machine learning, our
approach follows a fully unsupervised bootstrapping paradigm. It leverages the
redundancy present in large news corpora, more precisely, the fact that the
same quotation often appears across multiple news articles in slightly
different contexts. Starting from a few seed patterns, such as ["Q", said S.],
our method extracts a set of quotation-speaker pairs (Q, S), which are in turn
used for discovering new patterns expressing the same quotations; the process
is then repeated with the larger pattern set. Our algorithm is highly scalable,
which we demonstrate by running it on the large ICWSM 2011 Spinn3r corpus.
Validating our results against a crowdsourced ground truth, we obtain 90%
precision at 40% recall using a single seed pattern, with significantly higher
recall values for more frequently reported (and thus likely more interesting)
quotations. Finally, we showcase the usefulness of our algorithm's output for
computational social science by analyzing the sentiment expressed in our
extracted quotations.Comment: Accepted at the 12th International Conference on Web and Social Media
(ICWSM), 201
Recommended from our members
Gene Regulatory Compatibility in Bacteria: Consequences for Synthetic Biology and Evolution
Mechanistic understanding of gene regulation is crucial for rational engineering of new genetic systems through synthetic biology. Genetic engineering efforts in new organisms are often hampered by a lack of knowledge about how regulatory components function in new host contexts. This dissertation focuses on efforts to overcome these challenges through the development of generalizable experimental methods for studying the behavior of DNA regulatory sequences in diverse species at large-scale.
Chapter 2 describes experimental approaches for quantitatively assessing the functions of thousands of diverse natural regulatory sequences through a combination of metagenomic mining, high-throughput DNA synthesis and deep sequencing. By employing these methods in three distinct bacterial species, we revealed striking functional differences in gene regulatory capacity. We identified regulatory sequences with activity levels with activity levels spanning several orders of magnitude, which will aid in efforts to engineer diverse bacterial species. We also demonstrate functional species-selective gene circuits with programmable host behaviors that may be useful for microbial community engineering. In Chapter 3 we provide evidence for the evolution of altered stringency in σ70-mediated transcriptional activation based on patterns of initiation and activity from promoters of diverse compositions. We show that the contrast in GC content between a regulatory element and the host genome dictates both the likelihood and the magnitude of expression. We also discuss the potential implications of this proposed mechanism on horizontal gene transfer.
The next two chapters focus on efforts aimed at extending the high-throughput methods described in earlier chapters to new organisms. Chapter 4 presents an in vitro approach for multiplexed gene expression profiling. Through the development and use of cell-free expression systems made from diverse bacteria, it was possible to rapidly acquire thousands of transcriptional measurements in small volume reactions, enabling functional comparisons of regulatory sequence function across multiple species. In Chapter 5 we characterize the restriction-modification system repertoires of several commensal bacterial species. We also describe ongoing efforts to develop methods for bypassing these systems in order to increase transformation efficiencies in species that are difficult or impossible to transform using current approaches
The CBRB regulon: Promoter dissection reveals novel insights into the CbrAB expression network in Pseudomonas putida
CbrAB is a high ranked global regulatory system exclusive of the Pseudomonads that responds to carbon limiting conditions. It has become necessary to define the particular regulon of CbrB and discriminate it from the downstream cascades through other regulatory components. We have performed in vivo binding analysis of CbrB in P. putida and determined that it directly controls the expression of at least 61 genes; 20% involved in regulatory functions, including the previously identified CrcZ and CrcY small regulatory RNAs. The remaining are porines or transporters (20%), metabolic enzymes (16%), activities related to protein translation (5%) and orfs of uncharacterised function (38%). Amongst the later, we have selected the operon PP2810-13 to make an exhaustive analysis of the CbrB binding sequences, together with those of crcZ and crcY. We describe the implication of three independent non-palindromic subsites with a variable spacing in three different targets; CrcZ, CrcY and operon PP2810-13 in the CbrAB activation. CbrB is a quite peculiar σN—depen-dent activator since it is barely dependent on phosphorylation for transcriptional activation. With the depiction of the precise contacts of CbrB with the DNA, the analysis of the multi-merisation status and its dependence on other factors such as RpoN o IHF, we propose a model of transcriptional activation.Ministerio de EconomÃa y Competitividad BIO2014-57545-
- …