57 research outputs found
Recommended from our members
Elixir: synthesis of parallel irregular algorithms
Algorithms in new application areas like machine learning and data analytics usually operate on unstructured sparse graphs. Writing efficient parallel code to implement these algorithms is very challenging for a number of reasons.
First, there may be many algorithms to solve a problem and each algorithm may have many implementations. Second, synchronization, which is necessary for correct parallel execution, introduces potential problems such as data-races and deadlocks. These issues interact in subtle ways, making the best solution dependent both on the parallel platform and on properties of the input graph. Consequently, implementing and selecting the best parallel solution can be a daunting task for non-experts, since we have few performance models for predicting the performance of parallel sparse graph programs on parallel hardware.
This dissertation presents a synthesis methodology and a system, Elixir, that addresses these problems by (i) allowing programmers to specify solutions at a high level of abstraction, and (ii) generating many parallel implementations automatically and using search to find the best one. An Elixir specification consists of a set of operators capturing the main algorithm logic and a schedule specifying how to efficiently apply the operators. Elixir employs sophisticated automated reasoning to merge these two components, and uses techniques based on automated planning to insert synchronization and synthesize efficient parallel code.
Experimental evaluation of our approach demonstrates that the performance of the Elixir generated code is competitive to, and can even outperform, hand-optimized code written by expert programmers for many interesting graph benchmarks.Computer Science
A Survey on Automated Program Repair Techniques
With the rapid development and large-scale popularity of program software,
modern society increasingly relies on software systems. However, the problems
exposed by software have also come to the fore. Software defect has become an
important factor troubling developers. In this context, Automated Program
Repair (APR) techniques have emerged, aiming to automatically fix software
defect problems and reduce manual debugging work. In particular, benefiting
from the advances in deep learning, numerous learning-based APR techniques have
emerged in recent years, which also bring new opportunities for APR research.
To give researchers a quick overview of APR techniques' complete development
and future opportunities, we revisit the evolution of APR techniques and
discuss in depth the latest advances in APR research. In this paper, the
development of APR techniques is introduced in terms of four different patch
generation schemes: search-based, constraint-based, template-based, and
learning-based. Moreover, we propose a uniform set of criteria to review and
compare each APR tool, summarize the advantages and disadvantages of APR
techniques, and discuss the current state of APR development. Furthermore, we
introduce the research on the related technical areas of APR that have also
provided a strong motivation to advance APR development. Finally, we analyze
current challenges and future directions, especially highlighting the critical
opportunities that large language models bring to APR research.Comment: This paper's earlier version was submitted to CSUR in August 202
Programming Languages and Systems
This open access book constitutes the proceedings of the 29th European Symposium on Programming, ESOP 2020, which was planned to take place in Dublin, Ireland, in April 2020, as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2020. The actual ETAPS 2020 meeting was postponed due to the Corona pandemic. The papers deal with fundamental issues in the specification, design, analysis, and implementation of programming languages and systems
Reversible Computation: Extending Horizons of Computing
This open access State-of-the-Art Survey presents the main recent scientific outcomes in the area of reversible computation, focusing on those that have emerged during COST Action IC1405 "Reversible Computation - Extending Horizons of Computing", a European research network that operated from May 2015 to April 2019. Reversible computation is a new paradigm that extends the traditional forwards-only mode of computation with the ability to execute in reverse, so that computation can run backwards as easily and naturally as forwards. It aims to deliver novel computing devices and software, and to enhance existing systems by equipping them with reversibility. There are many potential applications of reversible computation, including languages and software tools for reliable and recovery-oriented distributed systems and revolutionary reversible logic gates and circuits, but they can only be realized and have lasting effect if conceptual and firm theoretical foundations are established first
Reversible Computation: Extending Horizons of Computing
This open access State-of-the-Art Survey presents the main recent scientific outcomes in the area of reversible computation, focusing on those that have emerged during COST Action IC1405 "Reversible Computation - Extending Horizons of Computing", a European research network that operated from May 2015 to April 2019. Reversible computation is a new paradigm that extends the traditional forwards-only mode of computation with the ability to execute in reverse, so that computation can run backwards as easily and naturally as forwards. It aims to deliver novel computing devices and software, and to enhance existing systems by equipping them with reversibility. There are many potential applications of reversible computation, including languages and software tools for reliable and recovery-oriented distributed systems and revolutionary reversible logic gates and circuits, but they can only be realized and have lasting effect if conceptual and firm theoretical foundations are established first
The Posthuman Reality of Feed-Based Social Media Systems
The conceptual boundary between the subject and user parallels the boundary between humanist and posthumanist definitions of human being, and the challenges of new media communications technology today impel this evolution. My dissertation discusses subjectivity as the self-differentiation of a particular set of processes, and the influence of communications media upon this process. Here, it includes the basis of differentiation for an I, including: the question of identity, potential agency, and knowledge. The collage of attributes that constitute a portrait of what I call the user, the subject of online social media, is demonstrably emergent, dispersed, and discursive; in terms of agency and sovereignty, the useras with other instances of posthuman subjectivityis contingent upon its media ecology and is decidedly less free than other definitions of subjectivity (such the self-sovereign individual of the social contract, which comes to be as a negation of contingency). The concept of self-sovereignty excludes the influences of history, and other influences upon the emergence of the subject, emphasizing an exclusively internal causation. The users existence, conversely, is processual and dispersed throughout networks; its being and agency are dividual, not individual. The subjectivity of the user must thus be thought in terms of its mediated contingency, as the self-sovereign agency that is characteristic of humanist traditions is less applicable to todays media ecologies. I argue that the traits of the subject in humanist traditions can be interpreted as the epiphenomena of societies whose information ecology was dominated by logocentric, typographic literacy. Today, with the advent of social media and its users, we can understand from a new vantage how subjectivities are modulated, amplified, and attenuated by technical distributions, particularly the unseen (and unseeable) non-human agents in the computation systems that constitute online social networks
Recommended from our members
Learning to Edit Code : Towards Building General Purpose Models for Source Code Editing
The way software developers edit code day-to-day tends to be repetitive, often using existing code elements. Many researchers have tried to automate the repetitive code editing process by mining specific change templates. However, such templates are often manually implemented for automated applications. Consequently, such template-based automated code editing is very tedious to implement. In addition, template-based code editing is often narrowly-scoped and low noise tolerant. Machine Learning, specially deep learning-based techniques, could help us solve these problems because of their generalization and noise tolerance capacities.
The advancement of deep neural networks and the availability of vast open-source evolutionary data opens up the possibility of automatically learning those templates from the wild and applying those in the appropriate context. However, deep neural network-based modeling for code changes, and code, in general, introduces some specific problems that need specific attention from the research community. For instance, source code exhibit strictly defined syntax and semantics inherited from the properties of Programming Language (PL). In addition, source code vocabulary (possible number of tokens) can be arbitrarily large.
This dissertation formulates the problem of automated code editing as a multi-modal translation problem, where, given a piece of code, the context, and some guidance, the objective is to generate edited code. In particular, we divide the problem into two sub-problems — source code understanding and generation. We empirically show that the deep neural networks (models in general) for these problems should be aware of the PL-properties (i.e., syntax, semantics). This dissertation investigates two primary directions of endowing the models with knowledge about PL-properties — (i) explicit encoding: where we design models catering to a specific property, and (ii) implicit encoding: where we train a very-large model to learn these properties from very large corpus of source code in unsupervised ways.
With implicit encoding, we custom design the model to cater to the need for that property. As an example of such models, we developed CODIT — a tree-based neural model for syntactic correctness. We design CODIT based on the Context Free Grammar of the programming language. Instead of generating source code, CODIT first generates the tree structure by sampling the production rule from the CFG. Such a mechanism prohibits infeasible production rule selection. In the later stage, CODIT generates the edited code conditioned on the tree generated earlier. Suchconditioning makes the edited code syntactically correct. CODIT showed promise in learning code edit patterns in the wild and effectiveness in automatic program repair. In another empirical study, we showed that a graph-based model is better suitable for source code understanding tasks such as vulnerability detection.
On the other hand, with implicit encoding, we use a very large (with several hundred million parameters) yet generic model. However, we pre-train these models on a super-large (usually hundreds of gigabytes) collection of source code and code metadata. We empirically show that if sufficiently pre-trained, such models are capable enough to learn PL properties such as syntax and semantics. In this dissertation, we developed two such pre-trained models, with two different learning objectives. First, we developed PLBART— the first-ever pre-trained encoder-decoder-based model for source code and show that such pre-train enables the model to generate syntactically and semantically correct code. Further, we show an in-depth empirical study on using PLBART in automated code editing. Finally, we develop another pre-trained model — NatGen to encode the natural coding convention followed by developers into the model. To design NatGen, we first deliberately modify the code from the developers’ written version preserving the original semantics. We call such transformations ‘de-naturalizing’ transformations. Following the previous studies on induced unnaturalness in code, we defined several such ‘de-naturalizing’ transformations and applied those to developer-written code. We pre-train NatGen to reverse the effect of these transformations. That way, NatGen learns to generate code similar to the developers’ written by undoing any unnaturalness induced by our forceful ‘de-naturalizing‘ transformations. NatGen has performed well in code editing and other source code generation tasks.
The models and empirical studies we performed while writing this dissertation go beyond the scope of automated code editing and are applicable to other software engineering automation problems such as Code translation, Code summarization, Code generation, Vulnerability detection,Clone detection, etc. Thus, we believe this dissertation will influence and contribute to the advancement of AI4SE and PLP
A risk mitigation framework for construction / asset management of real estate and infrastructure projects
The increasing demand on residential, office, retail, and services buildings as well as hotels and recreation has been encouraging investors from both private and public sectors to develop new communities and cities to meet the mixed demand in one location. These projects are huge in size, include several diversified functions, and are usually implemented over many years. The real estate projects’ master schedules are usually initiated at an early stage of development. The decision to start investing in infrastructure systems, that can ultimately serve fully occupied community or city, is usually taken during the early development stage. This applies to all services such as water, electricity, sewage, telecom, natural gas, roads, urban landscape and cooling and heating. Following the feasibility phase and its generated implementation schedule, the construction of the infrastructure system starts together with a number of real estate projects of different portfolios (retail, residential, commercial,…etc.). The development of the remaining real estate projects continues parallel to customer occupancy of the completed projects. The occurrence of unforeseen risk events, post completing the construction of infrastructure system, may force decision makers to react by relaxing the implementation of the remaining unconstructed projects within their developed communities. This occurs through postponing the unconstructed project and keeping the original feasibility-based sequence of projects unchanged. Decision makers may also change the sequence of implementing their projects where they may prioritize either certain portfolio or location zone above the other, depending on changes in the market demand conditions. The change may adversely impact the original planned profit in the original feasibility. The profit may be generated from either real estate portfolios and/or their serving Infrastructure system. The negative impact may occur due to possible delayed occupancy of the completed real estate projects which in turn reduces the services demand. This finally results in underutilization of the early implemented Infrastructure system. This research aims at developing a dynamic decision support prototype system to quantify impacts of unforeseen risks on the profitability of real estate projects as well as its infrastructure system in the cases of changing projects’ implementation schedules. It is also aimed to support decision makers with scheduled portfolio mix that maximizes their Expected Gross Profit (EGP) of real estate projects and their infrastructure system. The provided schedules can be either based on location zone or portfolio type to meet certain marketing conditions or even to respect certain relations between neighbor projects’ implementation constraints. In order to achieve the research objectives, a Risk Impact Mitigation (RIM) decision support system is developed. RIM consists mainly of four models, Real Estate Scheduling Optimization Model RESOM, Sustainable Landscape Optimization Model SLOM, District Cooling Optimization Model DCOM and Water Simulation Optimization Model WSOM. Integrated with the three Infrastructure specialized models SLOM, DCOM, WSOM, RESOM provides EGP values for individual Infrastructure systems. The three infrastructure models provide the demand profile that relate to a RESOM generated implementation schedule. RESOM then uses these profiles for calculating the profits using the projects’ capital expenditure and financial expenses. The three models included in this research (SLOM, DCOM and WSOM) relate to the urban landscape, district cooling and water systems respectively. RIM is applied on a large scale real estate development in Egypt. The development was subjected to difficult political and financial circumstances that were not forecasted while preparing original feasibility studies. RIM is validated using a questionnaire process. The questionnaire is distributed to 31 experts of different academic and professional background. RIM’s models provided expected results for different real life cases tested by experts as part of the validation process. The validation process indicated that RIM’s results are consistent, in compliance with expected results and is extremely useful and novel in supporting real estate decision makers in mitigating risk impacts on their profits. The validation process also indicated promising benefits and potential need for developed commercial version for future application within the industry
- …