896 research outputs found

    LSound: An L-System Framework for Score Generation

    Get PDF
    Visual language or visual representation has been used in the past few years in order to express the knowledge in graphics. One of the important graphical elements is fractal and L-Systems is a mathematics-based grammatical model for modelling cell development and plant topology. From the plant model, L-Systems can be interpreted as music sound and score. In this paper, LSound which is a Visual Language Programming (VLP) framework has been developed as a tool that can model plant to music sound and generate music score and vice versa. The objectives of this research have three folds: (i) To expand the grammar dictionary of L-Systems music based on visual programming, (ii) To design and produce a user-friendly and icon-based visual language framework typically for L-Systems musical score generation which helps the basic learners in musical field and (iii) To generate music score from plant models and vice versa using L-Systems method. This research undergoes four phases methodology where the plant is first modeled, then the music is interpreted, followed by the output of music sound through MIDI and a score is generated. Technically, LSound was compared to other existing applications in the aspects of the capability of modelling the plant, rendering the music and generating the sound. LSound is a flexible framework in which the plant can be easily altered through arrow-based programming and the music score can be altered through the music symbols and notes which encourages non-experts to work with L-Systems and music

    L-Music: uma abordagem para composição musical assistida usando L-Systems

    Get PDF
    Generative music systems have been researched for an extended period of time. The scientific corpus of this research field is translating, currently, into the world of the everyday musician and composer. With these tools, the creative process of writing music can be augmented or completely replaced by machines. The work in this document aims to contribute to research in assisted music composition systems. To do so, a review on the state of the art of these fields was performed and we found that a plethora of methodologies and approaches each provide their own interesting results (to name a few, neural networks, statistical models, and formal grammars). We identified Lindenmayer Systems, or L-Systems, as the most interesting and least explored approach to develop an assisted music composition system prototype, aptly named L-Music, due to the ability of producing complex outputs from simple structures. L-Systems were initially proposed as a parallel string rewriting grammar to model algae plant growth. Their applications soon turned graphical (e.g., drawing fractals), and eventually they were applied to music generation. Given that our prototype is assistive, we also took the user interface and user experience design into its well-deserved consideration. Our implemented interface is straightforward, simple to use with a structured visual hierarchy and flow and enables musicians and composers to select their desired instruments; select L-Systems for generating music or create their own custom ones and edit musical parameters (e.g., scale and octave range) to further control the outcome of L-Music, which is musical fragments that a musician or composer can then use in their own works. Three musical interpretations on L-Systems were implemented: a random interpretation, a scale-based interpretation, and a polyphonic interpretation. All three approaches produced interesting musical ideas, which we found to be potentially usable by musicians and composers in their own creative works. Although positive results were obtained, the developed prototype has many improvements for future work. Further musical interpretations can be added, as well as increasing the number of possible musical parameters that a user can edit. We also identified the possibility of giving the user control over what musical meaning L-Systems have as an interesting future challenge.Sistemas de geração de música têm sido alvo de investigação durante períodos alargados de tempo. Recentemente, tem havido esforços em passar o conhecimento adquirido de sistemas de geração de música autónomos e assistidos para as mãos do músico e compositor. Com estas ferramentas, o processo criativo pode ser enaltecido ou completamente substituído por máquinas. O presente trabalho visa contribuir para a investigação de sistemas de composição musical assistida. Para tal, foi efetuado um estudo do estado da arte destas temáticas, sendo que foram encontradas diversas metodologias que ofereciam resultados interessantes de um ponto de vista técnico e musical. Os sistemas de Lindenmayer, ou L-Systems, foram selecionados como a abordagem mais interessante, e menos explorada, para desenvolver um protótipo de um sistema de composição musical assistido com o nome L-Music, devido à sua capacidade de produzirem resultados complexos a partir de estruturas simples. Os L-Systems, inicialmente propostos para modelar o crescimento de plantas de algas, são gramáticas formais, cujo processo de reescrita de strings acontece de forma paralela. As suas aplicações rapidamente evoluíram para interpretações gráficas (p.e., desenhar fractais), e eventualmente também foram aplicados à geração de música. Dada a natureza assistida do protótipo desenvolvido, houve uma especial atenção dada ao design da interface e experiência do utilizador. Esta, é concisa e simples, tendo uma hierarquia visual estruturada para oferecer uma orientação coesa ao utilizador. Neste protótipo, os utilizadores podem selecionar instrumentos; selecionar L-Systems ou criar os seus próprios, e editar parâmetros musicais (p.e., escala e intervalo de oitavas) de forma a gerarem excertos musicais que possam usar nas suas próprias composições. Foram implementadas três interpretações musicais de L-Systems: uma interpretação aleatória, uma interpretação à base de escalas e uma interpretação polifónica. Todas as interpretações produziram resultados musicais interessantes, e provaram ter potencial para serem utilizadas por músicos e compositores nos seus trabalhos criativos. Embora tenham sido alcançados resultados positivos, o protótipo desenvolvido apresenta múltiplas melhorias para trabalho futuro. Entre elas estão, por exemplo, a adição de mais interpretações musicais e a adição de mais parâmetros musicais editáveis pelo utilizador. A possibilidade de um utilizador controlar o significado musical de um L-System também foi identificada como uma proposta futura relevante

    Unsupervised structure induction and multimodal grounding

    Get PDF
    Structured representations build upon symbolic abstraction (e.g., words in natural language and visual concepts in natural images), offer a principled way of encoding our perceptions about the physical world, and enable the human-like generalization of machine learning systems. The predominant paradigm for learning structured representations of the observed data has been supervised learning, but it is limited in several respects. First, supervised learning is challenging given the scarcity of labeled data. Second, conventional approaches to structured prediction have been relying on a single modality (e.g., either images or text), ignoring the learning cues that may have been specified in and can be readily obtained from other modalities of data. In this thesis, we investigate unsupervised approaches to structure induction in a multimodal setting. Unsupervised learning is inherently difficult in general, let alone inducing complex and discrete structures from data without direct supervision. By considering the multimodal setting, we leverage the alignments between different data modalities (e.g., text, audio, and images) to facilitate the learning of structure-induction models, e.g., knowing that the individual words in ``a white pigeon'' always appear with the same visual object, a language parser is likely to treat them as a whole (i.e., phrase). The multimodal learning setting is practically viable because multimodal alignments are generally abundant. For example, they can be found in online posts such as news and tweets that usually contain images and associated text, and in (YouTube) videos, where audio, scripts, and scenes are synchronized and grounded in each other. We develop structure-induction models, which are capable of exploiting bimodal image-text alignments, for two modalities: (1) for natural language, we consider unsupervised syntactic parsing with phrase-structure grammars and regularize the parser by using visual image groundings; and (2) for visual images, we induce scene graph representations by mapping arguments and predicates in the text to their visual counterparts (i.e., visual objects and relations among them) in an unsupervised manner. While useful, crossmodal alignments are not always abundantly available on the web, e.g., the alignments between non-speech audio and text. We tackle the challenge by sharing the visual modality between image-text alignment and image-audio alignment; images function as a pivot and connect audio and text. The contributions of this thesis span from model development to data collection. We demonstrated the feasibility of applying multimodal learning techniques to unsupervised structure induction and multimodal alignment collection. Our work opens up new avenues for multimodal and unsupervised structured representation learning

    Learning the language of apps

    Get PDF
    To explore the functionality of an app, automated test generators systematically identify and interact with its user interface (UI) elements. A key challenge is to synthesize inputs which effectively and efficiently cover app behavior. To do so, a test generator has to choose which elements to interact with but, which interactions to do on each element and which input values to type. In summary, to better test apps, a test generator should know the app's language, that is, the language of its graphical interactions and the language of its textual inputs. In this work, we show how a test generator can learn the language of apps and how this knowledge is modeled to create tests. We demonstrate how to learn the language of the graphical input prior to testing by combining machine learning and static analysis, and how to refine this knowledge during testing using reinforcement learning. In our experiments, statically learned models resulted in 50\% less ineffective actions an average increase in test (code) coverage of 19%, while refining these through reinforcement learning resulted in an additional test (code) coverage of up to 20%. We learn the language of textual inputs, by identifying the semantics of input fields in the UI and querying the web for real-world values. In our experiments, real-world values increase test (code) coverage ~10%; Finally, we show how to use context-free grammars to integrate both languages into a single representation (UI grammar), giving back control to the user. This representation can then be: mined from existing tests, associated to the app source code, and used to produce new tests. 82% test cases produced by fuzzing our UI grammar can reach a UI element within the app and 70% of them can reach a specific code location.Automatisierte Testgeneratoren identifizieren systematisch Elemente der Benutzeroberfläche und interagieren mit ihnen, um die Funktionalität einer App zu erkunden. Eine wichtige Herausforderung besteht darin, Eingaben zu synthetisieren, die das App-Verhalten effektiv und effizient abdecken. Dazu muss ein Testgenerator auswählen, mit welchen Elementen interagiert werden soll, welche Interaktionen jedoch für jedes Element ausgeführt werden sollen und welche Eingabewerte eingegeben werden sollen. Um Apps besser testen zu können, sollte ein Testgenerator die Sprache der App kennen, dh die Sprache ihrer grafischen Interaktionen und die Sprache ihrer Texteingaben. In dieser Arbeit zeigen wir, wie ein Testgenerator die Sprache von Apps lernen kann und wie dieses Wissen modelliert wird, um Tests zu erstellen. Wir zeigen, wie die Sprache der grafischen Eingabe lernen vor dem Testen durch maschinelles Lernen und statische Analyse kombiniert und wie dieses Wissen weiter verfeinern beim Testen Verstärkung Lernen verwenden. In unseren Experimenten führten statisch erlernte Modelle zu 50% weniger ineffektiven Aktionen, was einer durchschnittlichen Erhöhung der Testabdeckung (Code) von 19% entspricht, während die Verfeinerung dieser durch verstärkendes Lernen zu einer zusätzlichen Testabdeckung (Code) von bis zu 20% führte. Wir lernen die Sprache der Texteingaben, indem wir die Semantik der Eingabefelder in der Benutzeroberfläche identifizieren und das Web nach realen Werten abfragen. In unseren Experimenten erhöhen reale Werte die Testabdeckung (Code) um ca. 10%; Schließlich zeigen wir, wie kontextfreien Grammatiken verwenden beide Sprachen in einer einzigen Darstellung (UI Grammatik) zu integrieren, wieder die Kontrolle an den Benutzer zu geben. Diese Darstellung kann dann: aus vorhandenen Tests gewonnen, dem App-Quellcode zugeordnet und zur Erstellung neuer Tests verwendet werden. 82% Testfälle, die durch Fuzzing unserer UI-Grammatik erstellt wurden, können ein UI-Element in der App erreichen, und 70% von ihnen können einen bestimmten Code-Speicherort erreichen

    Learning to Generate 3D Training Data

    Full text link
    Human-level visual 3D perception ability has long been pursued by researchers in computer vision, computer graphics, and robotics. Recent years have seen an emerging line of works using synthetic images to train deep networks for single image 3D perception. Synthetic images rendered by graphics engines are a promising source for training deep neural networks because it comes with perfect 3D ground truth for free. However, the 3D shapes and scenes to be rendered are largely made manual. Besides, it is challenging to ensure that synthetic images collected this way can help train a deep network to perform well on real images. This is because graphics generation pipelines require numerous design decisions such as the selection of 3D shapes and the placement of the camera. In this dissertation, we propose automatic generation pipelines of synthetic data that aim to improve the task performance of a trained network. We explore both supervised and unsupervised directions for automatic optimization of 3D decisions. For supervised learning, we demonstrate how to optimize 3D parameters such that a trained network can generalize well to real images. We first show that we can construct a pure synthetic 3D shape to achieve state-of-the-art performance on a shape-from-shading benchmark. We further parameterize the decisions as a vector and propose a hybrid gradient approach to efficiently optimize the vector towards usefulness. Our hybrid gradient is able to outperform classic black-box approaches on a wide selection of 3D perception tasks. For unsupervised learning, we propose a novelty metric for 3D parameter evolution based on deep autoregressive models. We show that without any extrinsic motivation, the novelty computed from autoregressive models alone is helpful. Our novelty metric can consistently encourage a random synthetic generator to produce more useful training data for downstream 3D perception tasks.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163240/1/ydawei_1.pd

    Pathway to Future Symbiotic Creativity

    Full text link
    This report presents a comprehensive view of our vision on the development path of the human-machine symbiotic art creation. We propose a classification of the creative system with a hierarchy of 5 classes, showing the pathway of creativity evolving from a mimic-human artist (Turing Artists) to a Machine artist in its own right. We begin with an overview of the limitations of the Turing Artists then focus on the top two-level systems, Machine Artists, emphasizing machine-human communication in art creation. In art creation, it is necessary for machines to understand humans' mental states, including desires, appreciation, and emotions, humans also need to understand machines' creative capabilities and limitations. The rapid development of immersive environment and further evolution into the new concept of metaverse enable symbiotic art creation through unprecedented flexibility of bi-directional communication between artists and art manifestation environments. By examining the latest sensor and XR technologies, we illustrate the novel way for art data collection to constitute the base of a new form of human-machine bidirectional communication and understanding in art creation. Based on such communication and understanding mechanisms, we propose a novel framework for building future Machine artists, which comes with the philosophy that a human-compatible AI system should be based on the "human-in-the-loop" principle rather than the traditional "end-to-end" dogma. By proposing a new form of inverse reinforcement learning model, we outline the platform design of machine artists, demonstrate its functions and showcase some examples of technologies we have developed. We also provide a systematic exposition of the ecosystem for AI-based symbiotic art form and community with an economic model built on NFT technology. Ethical issues for the development of machine artists are also discussed

    An Adjectival Interface for procedural content generation

    Get PDF
    Includes abstract.Includes bibliographical references.In this thesis, a new interface for the generation of procedural content is proposed, in which the user describes the content that they wish to create by using adjectives. Procedural models are typically controlled by complex parameters and often require expert technical knowledge. Since people communicate with each other using language, an adjectival interface to the creation of procedural content is a natural step towards addressing the needs of non-technical and non-expert users. The key problem addressed is that of establishing a mapping between adjectival descriptors, and the parameters employed by procedural models. We show how this can be represented as a mapping between two multi-dimensional spaces, adjective space and parameter space, and approximate the mapping by applying novel function approximation techniques to points of correspondence between the two spaces. These corresponding point pairs are established through a training phase, in which random procedural content is generated and then described, allowing one to map from parameter space to adjective space. Since we ultimately seek a means of mapping from adjective space to parameter space, particle swarm optimisation is employed to select a point in parameter space that best matches any given point in adjective space. The overall result, is a system in which the user can specify adjectives that are then used to create appropriate procedural content, by mapping the adjectives to a suitable set of procedural parameters and employing the standard procedural technique using those parameters as inputs. In this way, none of the control offered by procedural modelling is sacrificed â although the adjectival interface is simpler, it can at any point be stripped away to reveal the standard procedural model and give users access to the full set of procedural parameters. As such, the adjectival interface can be used for rapid prototyping to create an approximation of the content desired, after which the procedural parameters can be used to fine-tune the result. The adjectival interface also serves as a means of intermediate bridging, affording users a more comfortable interface until they are fully conversant with the technicalities of the underlying procedural parameters. Finally, the adjectival interface is compared and contrasted to an interface that allows for direct specification of the procedural parameters. Through user experiments, it is found that the adjectival interface presented in this thesis is not only easier to use and understand, but also that it produces content which more accurately reflects usersâ intentions

    Procedural content generation for games

    Full text link
    Virtual worlds play an increasingly important role in game development today. Whether in the entertainment industry, education, collaboration or data visualization - virtual space offers a freely definable environment that can be adapted to any purpose. Nevertheless, the creation of complex worlds is time-consuming and cost-intensive. A classic example for the use of a virtual world is a driving simulator where learner drivers can test their skills. The goal of the generation process is to model a realistic city that is large enough to move around for a long time without constantly passing places that have already been seen. Streets must be realistically modeled, have intersections, represent highways and country roads and create an image through buildings that create the greatest possible immersion in the virtual world. But there is still a lack of life. Pedestrians have to populate the streets in large numbers, other cars have to take part in the traffic, and a driving instructor has to sit next to the learner driver, commenting on the actions and chatting away on long journeys. In short, the effort to model such a world by hand would be immense. This thesis deals with different approaches to generate digital content for virtual worlds procedurally i.e., algorithmically. In the first part of this thesis, virtual, three-dimensional road networks are generated using a pre-defined network graph. The nodes in the graph can be generated procedurally or randomly or can be imported from open data platforms, e.g., from OpenStreetMaps (OSM). The automatic detection of intersections makes the generation flexible. The textures used for roads and intersections are constructed from prefabricated sprites whenever possible, or, in the case of a very individual construction, are newly generated during generation. The ability to create multi-lane roads gives the virtual cities a higher degree of realism. The interstices of the road network usually contain buildings, industrial areas, common areas or agricultural land. Once these so-called parcels have been identified, they can be populated with precisely these contents. In this dissertation we focus on accessible residential buildings. The second part of this thesis discusses a novel method of building generation that allows to procedurally create walk-in, multi-storey buildings. The proceeding of simple mesh generation as shown in the road network generation is extended by rules and constraints that allow a flexible floor planning and guarantee a connection of all rooms by a common corridor per floor and a staircase. Since a cityscape is usually characterised by different building shapes, the generation can be parameterized with regard to texturing, roof design, number of floors, and window and door layout. In order to ensure performance when rendering the city, each building is generated in three levels of detail. The lowest level only shows the outer walls, the highest level shows the interior rooms including stairs, doors and window frames. Once the environment is created in a way that allows the player a certain immersion, the game world has to be filled with life. Thus, the third part of this thesis discusses the procedural creation of stories for games based on pre-trained language models. The focus here is on an interactive, controlled way of playing, in which the player can interact with the objects, persons and places of the story and influence the plot. Actions generated from the entities of the previous section of the story should give a feeling of a prepared story, but always ensure the greatest possible flexibility of course. The controlled use of places, people and objects in the player's inventory allows a porting to a three-dimensional game world as well as the gameplay in the form of a text adventure. All methods for creating digital content presented in this thesis were fully implemented and evaluated with respect to usability and performance

    Deep Architectures for Visual Recognition and Description

    Get PDF
    In recent times, digital media contents are inherently of multimedia type, consisting of the form text, audio, image and video. Several of the outstanding computer Vision (CV) problems are being successfully solved with the help of modern Machine Learning (ML) techniques. Plenty of research work has already been carried out in the field of Automatic Image Annotation (AIA), Image Captioning and Video Tagging. Video Captioning, i.e., automatic description generation from digital video, however, is a different and complex problem altogether. This study compares various existing video captioning approaches available today and attempts their classification and analysis based on different parameters, viz., type of captioning methods (generation/retrieval), type of learning models employed, the desired output description length generated, etc. This dissertation also attempts to critically analyze the existing benchmark datasets used in various video captioning models and the evaluation metrics for assessing the final quality of the resultant video descriptions generated. A detailed study of important existing models, highlighting their comparative advantages as well as disadvantages are also included. In this study a novel approach for video captioning on the Microsoft Video Description (MSVD) dataset and Microsoft Video-to-Text (MSR-VTT) dataset is proposed using supervised learning techniques to train a deep combinational framework, for achieving better quality video captioning via predicting semantic tags. We develop simple shallow CNN (2D and 3D) as feature extractors, Deep Neural Networks (DNNs and Bidirectional LSTMs (BiLSTMs) as tag prediction models and Recurrent Neural Networks (RNNs) (LSTM) model as the language model. The aim of the work was to provide an alternative narrative to generating captions from videos via semantic tag predictions and deploy simpler shallower deep model architectures with lower memory requirements as solution so that it is not very memory extensive and the developed models prove to be stable and viable options when the scale of the data is increased. This study also successfully employed deep architectures like the Convolutional Neural Network (CNN) for speeding up automation process of hand gesture recognition and classification of the sign languages of the Indian classical dance form, ‘Bharatnatyam’. This hand gesture classification is primarily aimed at 1) building a novel dataset of 2D single hand gestures belonging to 27 classes that were collected from (i) Google search engine (Google images), (ii) YouTube videos (dynamic and with background considered) and (iii) professional artists under staged environment constraints (plain backgrounds). 2) exploring the effectiveness of CNNs for identifying and classifying the single hand gestures by optimizing the hyperparameters, and 3) evaluating the impacts of transfer learning and double transfer learning, which is a novel concept explored for achieving higher classification accuracy

    Proceedings of the 9th Arab Society for Computer Aided Architectural Design (ASCAAD) international conference 2021 (ASCAAD 2021): architecture in the age of disruptive technologies: transformation and challenges.

    Get PDF
    The ASCAAD 2021 conference theme is Architecture in the age of disruptive technologies: transformation and challenges. The theme addresses the gradual shift in computational design from prototypical morphogenetic-centered associations in the architectural discourse. This imminent shift of focus is increasingly stirring a debate in the architectural community and is provoking a much needed critical questioning of the role of computation in architecture as a sole embodiment and enactment of technical dimensions, into one that rather deliberately pursues and embraces the humanities as an ultimate aspiration
    • …
    corecore