Search CORE

156 research outputs found

Compile-Time Query Optimization for Big Data Analytics

Author: Leonidas Fegaras
Publication venue: RonPub
Publication date: 01/01/2019
Field of study

Many emerging programming environments for large-scale data analysis, such as Map-Reduce, Spark, and Flink, provide Scala-based APIs that consist of powerful higher-order operations that ease the development of complex data analysis applications. However, despite the simplicity of these APIs, many programmers prefer to use declarative languages, such as Hive and Spark SQL, to code their distributed applications. Unfortunately, most current data analysis query languages are based on the relational model and cannot effectively capture the rich data types and computations required for complex data analysis applications. Furthermore, these query languages are not well-integrated with the host programming language, as they are based on an incompatible data model. To address these shortcomings, we introduce a new query language for data-intensive scalable computing that is deeply embedded in Scala, called DIQL, and a query optimization framework that optimizes and translates DIQL queries to byte code at compile-time. In contrast to other query languages, our query embedding eliminates impedance mismatch as any Scala code can be seamlessly mixed with SQL-like syntax, without having to add any special declaration. DIQL supports nested collections and hierarchical data and allows query nesting at any place in a query. With DIQL, programmers can express complex data analysis tasks, such as PageRank and matrix factorization, using SQL-like syntax exclusively. The DIQL query optimizer uses algebraic transformations to derive all possible joins in a query, including those hidden across deeply nested queries, thus unnesting nested queries of any form and any number of nesting levels. The optimizer also uses general transformations to push down predicates before joins and to prune unneeded data across operations. DIQL has been implemented on three Big Data platforms, Apache Spark, Apache Flink, and Twitter's Cascading/Scalding, and has been shown to have competitive performance relative to Spark DataFrames and Spark SQL for some complex queries. This paper extends our previous work on embedded data-intensive query languages by describing the complete details of the formal framework and the query translation and optimization processes, and by providing more experimental results that give further evidence of the performance of our system

RonPub -- Research Online Publishing

Applicative Bidirectional Programming with Lenses

Author: Ellis T.
Fegaras L.
Hayashi Y.
Jaskelioff M.
Mac Lane S.
Matsuda K.
Mu S.-C.
Rajkumar R.
Reynolds J. C.
van Laarhoven T.
Yu Y.
Publication venue
Publication date: 01/08/2015
Field of study

A bidirectional transformation is a pair of mappings between source and view data objects, one in each direction. When the view is modified, the source is updated accordingly with respect to some laws. One way to reduce the development and maintenance effort of bidirectional transformations is to have specialized languages in which the resulting programs are bidirectional by construction---giving rise to the paradigm of bidirectional programming. In this paper, we develop a framework for applicative-style and higher-order bidirectional programming, in which we can write bidirectional transformations as unidirectional programs in standard functional languages, opening up access to the bundle of language features previously only available to conventional unidirectional languages. Our framework essentially bridges two very different approaches of bidirectional programming, namely the lens framework and Voigtlander’s semantic bidirectionalization, creating a new programming style that is able to bag benefits from both

Crossref

Kent Academic Repository

Explore Bristol Research

Translation of Array-based Loop Programs to Optimized SQL-based Distributed Programs

Author: Leonidas Fegaras
Md Hasanuzzaman Noor
Tanvir Ahmed Khan
Tanzima Sultana
Publication venue: RonPub
Publication date: 01/01/2022
Field of study

Many data analysis programs are often expressed in terms of array operations in sequential loops. However, these programs do not scale very well to large amounts of data that cannot fit in the memory of a single computer and they have to be rewritten to work on Big Data analysis platforms, such as Map-Reduce and Spark. We present a novel framework, called SQLgen, that automatically translates sequential loops on arrays to distributed data-parallel programs, specifically Spark SQL programs. We further extend this framework by introducing OSQLgen, which automatically parallelizes array-based loop programs to distributed data-parallel programs on block arrays. At first, our framework translates the sequential loops on arrays to monoid comprehensions and then to Spark SQL. For SQLgen, the SQL is over coordinate arrays while for OSQLgen, it is over block arrays. As block arrays are more compact than coordinate arrays, computations on block matrices are significantly faster than on arrays in the coordinate format. Since not all array-based loops can be translated to SQL on block arrays, we focus on certain patterns of loops that match an algebraic structure known as a semiring. Many linear algebra operations, such as matrix multiplication required in many machine learning algorithms, as well as many graph programs that are equivalent to a semiring can be translated to distributed data-parallel programs on block arrays using OSQLgen, thus giving us a substantial performance gain. Finally, to evaluate our framework, we compare the performance of OSQLgen with GraphX, GraphFrames, MLlib, and hand-written Spark SQL programs on coordinate and block arrays on various real-world problems

RonPub -- Research Online Publishing

How functional programming mattered

Author: Abelson
Armstrong
Armstrong
Arts
Arts
Axelsson
Baars
Backus
Bahr
Barthe
Bertot
Bird
Bird
Bird
Bird
Bird RS Moor
Blelloch
Blelloch
Bringert
Carette
Chakravarty
Chakravarty MMT Leshchinskiy
Chetali
Chin
Claessen
Claessen
Cole
Cole
de Moor
Dean
Devriese
Dijkstra
Dybvig
Elliott
Elliott
Epstein
Farmer
Fegaras
Felleisen
Ford
Gibbons
Gibbons
Gill
Halloway
Hammond
Hansen MR Rischel
Harris
Hinze
Hinze
Hu
Hu
Hu
Hu
Hu
Hudak
Hudak
Hudak
Hudak
Hudak
Hudak
Hughes
Hughes
Hughes
Hughes JM Bolinder
Hutton
Hutton
Jones
Katayama
Katayama
Kiselyov
Launchbury
Launchbury
Leijen
Leroy
Liang
Lindley
Loidl HW Rubio
Matsuzaki
Mcbride
Meijer
Meijer
Milner
Minsky
Minsky
Moggi
Moggi
Morita
Mu
Naiman
Norell
Odersky
Oliveira BCS Moors
Paterson
Paulson
Persson
Peyton Jones
Peyton Jones
Peyton Jones SL Wadler
Reynolds
Sagonas
Schrijvers
Sculthorpe
Seibel
Sheard
Skillicorn
Smith
Snyder
Steele
Steele
Svenningsson
Svenningsson
Swierstra
Swierstra SD Duponcheel
Takano
Takano
Tesson
Wadler
Wadler
Wadler
Wadler
Wampler
Yang
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2015
Field of study

In 1989 when functional programming was still considered a niche topic, Hughes wrote a visionary paper arguing convincingly ‘why functional programming matters’. More than two decades have passed. Has functional programming really mattered? Our answer is a resounding ‘Yes!’. Functional programming is now at the forefront of a new generation of programming technologies, and enjoying increasing popularity and influence. In this paper, we review the impact of functional programming, focusing on how it has changed the way we may construct programs, the way we may verify programs, and fundamentally the way we may think about programs

Crossref

Chalmers Research

Kent Academic Repository

Chalmers Publication Library

Explore Bristol Research

Transcriptional regulation of Elf-1: locus-wide analysis reveals four distinct promoters, a tissue-specific enhancer, control by PU.1 and the importance of Elf-1 downregulation for erythroid maturation

Author: Andrew D. Wood
Athanasiou
Bassuk
Berthold Göttgens
Bockamp
Brudno
Chan
Cheng
Chou
Clark
Davis
Dube
Fernando J. Calero-Nieto
Follows
Forsberg
Garrett-Sinha
Geng
Gottgens
Gottgens
Gottgens
Gottgens
Hegen
Huang
Jin
Josette-Renée Landry
Juang
Juang
Juang
Karantzoulis-Fegaras
Landry
Landry
Leiden
Mayor
Miranda-Saavedra
Moreau-Gachelin
Nicola K. Wilson
Nishiyama
Nottingham
Nowling
O'Reilly
Okuno
Pimanda
Rao
Rekhtman
Rozen
Sankaran
Sarah Kinston
Seth
Song
Suzuki
Svenson
Voo
Wadman
Wang
Wang
Wood
Zhang
Publication venue: Oxford University Press
Publication date
Field of study

Ets transcription factors play important roles during the development and maintenance of the haematopoietic system. One such factor, Elf-1 (E74-like factor 1) controls the expression of multiple essential haematopoietic regulators including Scl/Tal1, Lmo2 and PU.1. However, to integrate Elf-1 into the wider regulatory hierarchies controlling haematopoietic development and differentiation, regulatory elements as well as upstream regulators of Elf-1 need to be identified. Here, we have used locus-wide comparative genomic analysis coupled with chromatin immunoprecipitation (ChIP-chip) assays which resulted in the identification of five distinct regulatory regions directing expression of Elf-1. Further, ChIP-chip assays followed by functional validation demonstrated that the key haematopoietic transcription factor PU.1 is a major upstream regulator of Elf-1. Finally, overexpression studies in a well-characterized erythroid differentiation assay from primary murine fetal liver cells demonstrated that Elf-1 downregulation is necessary for terminal erythroid differentiation. Given the known activation of PU.1 by Elf-1 and our newly identified reciprocal activation of Elf-1 by PU.1, identification of an inhibitory role for Elf-1 has significant implications for our understanding of how PU.1 controls myeloid–erythroid differentiation. Our findings therefore not only represent the first report of Elf-1 regulation but also enhance our understanding of the wider regulatory networks that control haematopoiesis

Crossref

PubMed Central

A novel G-quadruplex-forming GGA repeat region in the c-myb promoter is a critical regulator of promoter activity

Author: Arimondo
Baldrich
Barak
Bellon
Bender
Bossone
Chamboredon
Cogoi
Daheron
De Armond
DesJardins
Dexheimer
Diana J. Uribe
Freyer
Giraldo
Gonda
Griffin
Guerra
Hafner
Hammond-Kosack
Heckman
Izzo
Ji
Jordan-Sciutto
Karantzoulis-Fegaras
Keniry
Komatsu
Laurence H. Hurley
Lew
Luger
Matsugami
Matsugami
Matsugami
Matsugami
McCann
Michelotti
Mohaghegh
Neidle
Nicolaides
Nirula
Oh
Parks
Parks
Perrotti
Pyrc
Ramsay
Regan M. Memmott
Sakatsume
Salas
Scot W. Ebbinghaus
Scott
Siddiqui-Jain
Slamon
Song
Song
Sullivan
SunMi L. Palumbo
Takimoto
Todokoro
Watson
Watson
Williams
Yang
Yi
Yuan
Yulia Krotova-Khan
Zaug
Publication venue: Oxford University Press
Publication date
Field of study

The c-myb promoter contains multiple GGA repeats beginning 17 bp downstream of the transcription initiation site. GGA repeats have been previously shown to form unusual DNA structures in solution. Results from chemical footprinting, circular dichroism and RNA and DNA polymerase arrest assays on oligonucleotides representing the GGA repeat region of the c-myb promoter demonstrate that the element is able to form tetrad:heptad:heptad:tetrad (T:H:H:T) G-quadruplex structures by stacking two tetrad:heptad G-quadruplexes formed by two of the three (GGA)4 repeats. Deletion of one or two (GGA)4 motifs destabilizes this secondary structure and increases c-myb promoter activity, indicating that the G-quadruplexes formed in the c-myb GGA repeat region may act as a negative regulator of the c-myb promoter. Complete deletion of the c-myb GGA repeat region abolishes c-myb promoter activity, indicating dual roles of the c-myb GGA repeat element as both a transcriptional repressor and an activator. Furthermore, we demonstrated that Myc-associated zinc finger protein (MAZ) represses c-myb promoter activity and binds to the c-myb T:H:H:T G-quadruplexes. Our findings show that the T:H:H:T G-quadruplex-forming region in the c-myb promoter is a critical cis-acting element and may repress c-myb promoter activity through MAZ interaction with G-quadruplexes in the c-myb promoter

Crossref

PubMed Central

EZH2 modulates angiogenesis in vitro and in a mouse model of limb ischemia

Author: Aicher
Andrea Caporali
Apostolou
Caporali
Caporali
Consortium
Costanza Emanueli
Crosson
Delgado-Olguín
Donovan
Dragneva
Dreger
Emanueli
ENCODE Project Consortium
Ezhkova
Ferrari
Fish
Fulton
Gianni D Angelini
Granger
Greer
Grochot-Przeczek
Grzenda
Han
He
He
Ilaria Floris
Illi
Johnson
Karantzoulis-Fegaras
Karolchik
Kermani
Kottakis
Lamalice
Lopez
Madeddu
Marco Meloni
Margueron
Mathelier
Matouk
Meuchel
Micol Marchetti
Min
Modarresi
Ohtani
Ohtani
Pasini
Pruunsild
Raul Urrutia
Searles
Shi
Silvestre
Tan
Tijana Mitić
Yoo
Young
Yu
Zhu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/10/2014
Field of study

Epigenetic mechanisms may regulate the expression of pro-angiogenic genes, thus affecting reparative angiogenesis in ischemic limbs. The enhancer of zest homolog-2 (EZH2) induces thtrimethylation of lysine 27 on histone H3 (H3K27me3), which represses gene transcription. We explored (i) if EZH2 expression is regulated by hypoxia and ischemia; (ii) the impact of EZH2 on the expression of two pro-angiogenic genes: eNOS and BDNF; (iii) the functional effect of EZH2 inhibition on cultured endothelial cells (ECs); (iv) the therapeutic potential of EZH2 inhibition in a mouse model of limb ischemia (LI). EZH2 expression was increased in cultured ECs exposed to hypoxia (control: normoxia) and in ECs extracted from mouse ischemic limb muscles (control: absence of ischemia). EZH2 increased the H3K27me3 abundance onto regulatory regions of eNOS and BDNF promoters. In vitro RNA silencing or pharmacological inhibition by 3-deazaneplanocin (DZNep) of EZH2 increased eNOS and BDNF mRNA and protein levels and enhanced functional capacities (migration, angiogenesis) of ECs under either normoxia or hypoxia. In mice with experimentally induced LI, DZNep increased angiogenesis in ischaemic muscles, the circulating levels of pro-angiogenic hematopoietic cells and blood flow recovery. Targeting EZH2 for inhibition may open new therapeutic avenues for patients with limb ischemia

Crossref

PubMed Central

Edinburgh Research Explorer

Enlighten

Explore Bristol Research

Using the Parametricity Theorem for Program Fusion

Author: Leonidas Fegaras
Publication venue
Publication date
Field of study

Program fusion techniques have long been proposed as an effective means of improving program performance and of eliminating unnecessary intermediate data structures. This paper proposes a new approach on program fusion that is based entirely on the type signatures of programs. First, for each function, a recursive skeleton is extracted that captures its pattern of recursion. Then, the parametricity theorem of this skeleton is derived, which provides a rule for fusing this function with any function. This method generalizes other approaches that use fixed parametricity theorems to fuse programs. 1 Introduction There is much work recently on using higher-order operators, such as fold [11] and build [8, 5], to automate program fusion [2] and deforestation [13]. Even though these methods do a good job on fusing programs, they are only effective if programs are expressed in terms of these operators. This limits their applicability to conventional functional languages. To ameliorate this pr..

CiteSeerX

Supporting Bulk Synchronous Parallelism in Map-Reduce Queries

Author: Leonidas Fegaras
Publication venue
Publication date: 21/11/2012
Field of study

Abstract—One of the major drawbacks of the Map-Reduce (MR) model is that, to simplify reliability and fault tolerance, it does not preserve data in memory across consecutive MR jobs: a MR job must dump its data to the distributed file system before they can be read by the next MR job. This restriction imposes a high overhead to complex MR workflows and graph algorithms, such as PageRank, which require repetitive MR jobs. The Bulk Synchronous Parallelism (BSP) programming model, on the other hand, has been recently advocated as an alternative to the MR model that does not suffer from this restriction, and, under certain circumstances, allows complex repetitive algorithms to run entirely in the collective memory of a cluster. We present a framework for translating complex declarative queries for scientific and graph data analysis applications to both MR and BSP evaluation plans, leaving the choice to be made at run-time based on the available resources. If the resources are sufficient, the query will be evaluated entirely in memory based on the BSP model, otherwise, the same query will be evaluated based on the MR model. I

CiteSeerX

Crossref