8 research outputs found

    Visual data flow programming languages challenges and opportunities

    Get PDF

    Proposed Unified Visual Programming Language

    Get PDF
    Throughout time, computer scientists have edited their programs by keypunching, then typing and more recently by typing and `point-and-clicking'. The need for improving the editing of a computer program is driven mainly by the necessity to reduce errors and increase productivity. Furthermore facilitating the editing of a computer program can contribute in facilitating the apprenticeship of programming by novices. The awakening of visual programming languages (VPL) offers an opportunity to considerably reduce editing errors, but also to be a great tool for non-programmers. For instance, the programmer is not required any longer to remember the syntax or constructs of a language to type it; instead he/she builds up a program by pointing and clicking visual objects perhaps combined with typing. This could lead to a less error prone program and higher productivity since the programmer types less. There still are a lot of challenges to surmount to increase the usability and intuitiveness of VPLs, especially for large-scale general purpose programming. One of the main problems in VPLs is that of achieving scalability. This study aims at contributing to the area of visual programming languages, by proposing the framework for a visual programming language that is the unification of the best features of four existing VPLs. This unified VPL, or UVPL, is designed to better achieve scalability, which is lacking - in general - in current VPLs. This study provides as well a short survey of the four selected VPLs, and a set of metrics to evaluate visual programs.Computer Science Departmen

    Exploring Visual Programming Concepts for Probabilistic Programming Languages

    Get PDF
    Probabilistic programming is a way to create systems that help us make decisions in the face of uncertainty. Lots of everyday decisions involve judgment in determining relevant factors that we do not directly observe. Historically, one way to help make decisions under uncertainty has been to use a probabilistic reasoning system. Probabilistic reasoning combines our knowledge of a situation with the laws of probability to determine those unobserved factors that are critical to the decision. Typically, the way the several observations are combined is through the usage of bayesian statistics, due to its anachronistic interpretation where existing knowledge (priors) are combined with observations in order to gather evidence towards competing hypothesis.When compared to other machine learning methods (such as random forests, neural networks or linear regression), which take homogeneous data as input (requiring the user to separate their domain into different models), probabilistic programming is used to leverage the data's original structure. Plus, it provides full probability distributions over both the predictions and parameters of the model, whereas ML methods can only give the user a certain degree of confidence on the predictions.Until recently, probabilistic reasoning systems have been limited in scope, and have been hard to apply to many real world situations. Models are communicated using a mix of natural language, pseudo code, and mathematical formulae and solved using special purpose, one-off inference methods. Rather than precise specifications suitable for automatic inference, graphical models typically serve as coarse, high-level descriptions, eliding critical aspects such as fine-grained independence, abstraction and recursion.Probabilistic programming is a new approach that makes probabilistic reasoning systems easier to build and more widely applicable. A probabilistic programming language (PPL) is a programming language designed to describe probabilistic models, in a such a way we can say that the program itself is the model, and then perform inference in those models. PPLs have seen recent interest from the artificial intelligence, programming languages, cognitive science, and natural languages communities. By empowering users with a common dialect in the form of a programming language, rather than requiring each one of them to the non-trivial and error-prone task of writing their own models and hand-tailored inference algorithms for the problem at hand, it encourages exploration, since different models require less time to setup and evaluate, and enables sharing knowledge in the form of best practices, patterns and tools such as optimized compilers or interpreters, debuggers, IDE's, optimizers and profilers.PPLs are closely related to graphical models and Bayesian networks, but are more expressive and flexible. One can easily realize this by looking at the re-usable components PPLs offer, being one of them the inference engine, which can be plugged in into different models. For instances, it is easy to replace the exact-solution traditional Bayesian networks inference, which requires time exponential in the number of variables to run, with approximation algorithms such as the Markov Chain Monte Carlo (MCMC) or Variational Message Passing (VMP), which make it possible to compute large hierarchical models by resorting to sampling and approximation. PPLs often extend from a basic language (i.e., they are embedded in a host language like R, Java or Scala), although some PPLs such as WinBUGS and Stan offer a self-contained language, with no obvious origin in another language.There have been successful applications of visual programming among several domains, being it education (MIT's Scratch and Microsoft's VPL), general-purpose programming (NoFlo), 3D modeling (Blender) and data science (RapidMiner and Weka Knowledge Flow). The latter, being popular products, have shown that there is added value in providing a graphical representation for working with data. However, as of today no tool provides a graphical representation for a PPL.DARPA, the main funder behind PPLs' research, considers one of the main key points of its Probabilistic Programming for Advancing Machine Learning program to make models easier to write (reducing development time, encouraging experimentation and reducing the level of expertise required to develop such models). The use of visual programming is suitable for this kind of objectives, so building upon the enormous flexibility of PPLs and the advantages of probabilistic models, we want to take advantage of the graphical intuition given by data visualization that data scientists are now accustomed to, and attempt to provide model and algorithmical visualization by rethinking how to capture the (usually textual) programmatic formalisms in a graphical manner.The goal of this dissertation is thus to explore graphical representations of a probabilistic programming language through the usage of node-based programming. The hypothesis under consideration is that graphical representations (not to be confused with bayesian graphical model), are more intuitive and easy to learn that full-blown PPLs.We intend to validate such hypothesis by ensuring that classical problems solved in the literature by PPLs are also supported by our graphical representation, and then measure how quickly a group of people trained in statistics would produce a viable model in both alternatives.Probabilistic programming is a way to create systems that help us make decisions in the face of uncertainty. Lots of everyday decisions involve judgment in determining relevant factors that we do not directly observe. Historically, one way to help make decisions under uncertainty has been to use a probabilistic reasoning system. Probabilistic reasoning combines our knowledge of a situation with the laws of probability to determine those unobserved factors that are critical to the decision. Typically, the way the several observations are combined is through the usage of bayesian statistics, due to its anachronistic interpretation where existing knowledge (priors) are combined with observations in order to gather evidence towards competing hypothesis.When compared to other machine learning methods (such as random forests, neural networks or linear regression), which take homogeneous data as input (requiring the user to separate their domain into different models), probabilistic programming is used to leverage the data's original structure. Plus, it provides full probability distributions over both the predictions and parameters of the model, whereas ML methods can only give the user a certain degree of confidence on the predictions.Until recently, probabilistic reasoning systems have been limited in scope, and have been hard to apply to many real world situations. Models are communicated using a mix of natural language, pseudo code, and mathematical formulae and solved using special purpose, one-off inference methods. Rather than precise specifications suitable for automatic inference, graphical models typically serve as coarse, high-level descriptions, eliding critical aspects such as fine-grained independence, abstraction and recursion.Probabilistic programming is a new approach that makes probabilistic reasoning systems easier to build and more widely applicable. A probabilistic programming language (PPL) is a programming language designed to describe probabilistic models, in a such a way we can say that the program itself is the model, and then perform inference in those models. PPLs have seen recent interest from the artificial intelligence, programming languages, cognitive science, and natural languages communities. By empowering users with a common dialect in the form of a programming language, rather than requiring each one of them to the non-trivial and error-prone task of writing their own models and hand-tailored inference algorithms for the problem at hand, it encourages exploration, since different models require less time to setup and evaluate, and enables sharing knowledge in the form of best practices, patterns and tools such as optimized compilers or interpreters, debuggers, IDE's, optimizers and profilers.PPLs are closely related to graphical models and Bayesian networks, but are more expressive and flexible. One can easily realize this by looking at the re-usable components PPLs offer, being one of them the inference engine, which can be plugged in into different models. For instances, it is easy to replace the exact-solution traditional Bayesian networks inference, which requires time exponential in the number of variables to run, with approximation algorithms such as the Markov Chain Monte Carlo (MCMC) or Variational Message Passing (VMP), which make it possible to compute large hierarchical models by resorting to sampling and approximation. PPLs often extend from a basic language (i.e., they are embedded in a host language like R, Java or Scala), although some PPLs such as WinBUGS and Stan offer a self-contained language, with no obvious origin in another language.There have been successful applications of visual programming among several domains, being it education (MIT's Scratch and Microsoft's VPL), general-purpose programming (NoFlo), 3D modeling (Blender) and data science (RapidMiner and Weka Knowledge Flow). The latter, being popular products, have shown that there is added value in providing a graphical representation for working with data. However, as of today no tool provides a graphical representation for a PPL.DARPA, the main funder behind PPLs' research, considers one of the main key points of its Probabilistic Programming for Advancing Machine Learning program to make models easier to write (reducing development time, encouraging experimentation and reducing the level of expertise required to develop such models). The use of visual programming is suitable for this kind of objectives, so building upon the enormous flexibility of PPLs and the advantages of probabilistic models, we want to take advantage of the graphical intuition given by data visualization that data scientists are now accustomed to, and attempt to provide model and algorithmical visualization by rethinking how to capture the (usually textual) programmatic formalisms in a graphical manner.The goal of this dissertation is thus to explore graphical representations of a probabilistic programming language through the usage of node-based programming. The hypothesis under consideration is that graphical representations (not to be confused with bayesian graphical model), are more intuitive and easy to learn that full-blown PPLs.We intend to validate such hypothesis by ensuring that classical problems solved in the literature by PPLs are also supported by our graphical representation, and then measure how quickly a group of people trained in statistics would produce a viable model in both alternatives

    Iteration constructs in data-flow visual programming languages

    No full text
    Many visual programming languages (VPLs) rely on the data-flow paradigm, probably because of its simple and intuitive functioning mechanism. However, there are cases where more powerful programming constructs are needed to deal with complex problems. For example, iteration is undoubtedly an important aspect of programming, and should allow repetitive behaviors to be specified in compact and easy ways. Most existing data-flow VPLs provide special constructs to implement iterations, therefore infringing the pure data-flow paradigm in favor of program simplicity. This paper has three main purposes: (1) To provide a survey of the mechanisms used by some representative data-flow VPLs to carry out iterations; (2) To investigate, given a pure data-flow VPL, what should be the minimum set of characteristics which, after being added to the VPL, allow iterations to be implemented; and (3) To show real data-flow iteration implementations which rely on the characteristics pertaining to such a minimum set

    Assisted Reuse of Pattern-Based Composition Knowledge for Mashup Development

    Get PDF
    First generation of the World Wide Web (WWW) enabled users to have instantaneous access to a large diversity of knowledge. Second generation of the WWW (Web 2.0) brought a fundamental change in the way people interact with and through the World Wide Web. Web 2.0 has made the World Wide Web a platform not only for communication and sharing information but also for software development (e.g., web service composition). Web mashup or mashup development is a Web2.0 development approach in which users are expected to create applications by combining multiple data sources, application logic and UI components from the web to cater for their situational application needs. However, in reality creating an even simple mashup application is a complex task that can only be managed by skilled developers. Examples of ready mashup models are one of the main sources of help for users who don't know how to design a mashup, provided that suitable examples can be found (examples that have an analogy with the modeling situation faced by the user). But also tutorials, expert colleagues or friends, and, of course, Google are typical means to find help. However, searching for help does not always lead to a success, and retrieved information is only seldom immediately usable as it is, since the retrieved pieces of information are not contextual, i.e., immediately applicable to the given modeling problem. Motivated by the development challenges faced by a naive user of existing mashup tools, in this thesis we propose toaid such users by enabling assisted reuse of pattern-based composition knowledge. In this thesis we show how it is possible to effectively assist these users in their development task with contextual, interactive recommendations of composition knowledge in the form of mashup model patterns. We study a set of recommendation algorithms with different levels of performance and describe a flexible pattern weaving approach for the one-click reuse of patterns. We prove the generality of our algorithms and approach by implementing two prototype tools for two different mashup platforms. Finally, we validate the usefulness of our assisted development approach by performing thorough empirical tests and two user studies with our prototype tools

    Developments in Dataflow Programming

    Get PDF
    Dataflow has historically been motivated either by parallelism or programmability or some combination of the two. This work, rather than being directed primarily at parallelism or programmability, is instead aimed at maximising the overall utility to the programmer of the system at large. This means that it aims to result in a system in which it is easy to create well-constructed, flexible programs that comply with the principles of software engineering and architecture, but also that the proposed system should be capable at performing practical real-life tasks and should be as widely applicable as can be achieved. With those aims in mind, this project has four goals: * to argue for a unified global dataflow coordination system, extensible to be able to accommodate components of any form that may exist now or in the future; * to establish a link between the design of such a system and the principles of software engineering and architecture; * to design a dataflow coordination system based on those principles, aiming where possible to embed them in the design so that they become easy or unthinking for programmers to apply; and * to implement and test components of the proposed system, using it to build a set of three sample algorithms. Taking the best ideas that have been proposed in dataflow programming in the past --- those that most effectively embed the principles of software engineering --- and extending them with new proposals where necessary, a collection of interactions and functionalities is proposed, including a novel way of using partial evaluation of functions and data dimensionality to represent iteration in an acyclic graph. The proposed design was implemented as far as necessary to construct three test algorithms: calculating a factorial, generating terms of the Fibonacci sequence and performing a merge-sort. The implementation was successful in representing iteration in acyclic dataflow, and the test algorithms generated correct results, limited only by the numerical representation capabilities of the underlying language. Testing and working with the implemented system revealed the importance to usability of the system being visual, interactive and, in a distributed environment, always-available. Proposed further work falls into three categories: writing a full specification (in particular, defining the interfaces by which components will interact); developing new features to extend the functionality; and further developing the test implementation. The conclusion summarises the vision of a unified global dataflow coordination system and makes an appeal for cooperation on its development as an open, non-profit dataflow system run for the good of its community, rather than allowing a proliferation of competing systems run for commercial gain

    A comparison of programming notations for a tertiary level introductory programming course

    Get PDF
    Increasing pressure from national government to improve throughput at South African tertiary education institutions presents challenges to educators of introductory programming courses. In response, educators must adopt effective methods and strategies that encourage novice programmers to be successful in such courses. An approach that seeks to increase and maintain satisfactory throughput is the modification of the teaching model in these courses by adjusting presentation techniques. This thesis investigates the effect of integrating an experimental iconic programming notation and associated development environment with existing conventional textual technological support in the teaching model of a tertiary level introductory programming course. The investigation compares the performance achievement of novice programmers using only conventional textual technological support with that of novice programmers using the integrated iconic and conventional textual technological support. In preparation for the investigation, interpretation of existing knowledge on the behaviour of novice programmers while learning to program results in a novel framework of eight novice programmer requirements for technological support in an introductory programming course. This framework is applied in the examination of existing categories of technological support as well as in the design of new technological support for novice programmers learning to program. It thus provides information for the selection of existing and the design of new introductory programming technological support. The findings of the investigation suggest strong evidence that performance achievement of novice programmers in a tertiary level introductory programming course improves significantly with the inclusion of iconic technological support in the teaching model. The benefits are particularly evident in the portion of the novice programmer population who have been identified as being at risk of being successful in the course. Novice programmers identified as being at risk perform substantially better when using iconic technological support concurrently with conventional textual technological support than their equals who use only the latter form. Considerably more at risk novice programmers using the integrated form of technological support are in fact successful in the introductory programming course when compared with their counterparts who use conventional textual technological support only. The contributions of this thesis address deficiencies existing in current documented research. These contributions are primarily apparent in a number of distinct areas, namely: • formalisation of a novel framework of novice programmer requirements for technological support in an introductory programming course; • application of the framework as a formal evaluation technique; • application of the framework in the design of a visual iconic programming notation and development environment; • enhancement of existing empirical evidence and experimental research methodology typically applied to studies in programming; as well as • a proposal for a modified introductory programming course teaching model. The thesis has effectively applied substantial existing research on the cognitive model of the novice programmer as well as that on experimental technological support. The increase of throughput to a recommended rate of 75 percent in the tertiary level introductory programming course at the University of Port Elizabeth is attributed solely to the incorporation of iconic technological support in the teaching model of the course
    corecore