197 research outputs found

    Computer Vision and Architectural History at Eye Level:Mixed Methods for Linking Research in the Humanities and in Information Technology

    Get PDF
    Information on the history of architecture is embedded in our daily surroundings, in vernacular and heritage buildings and in physical objects, photographs and plans. Historians study these tangible and intangible artefacts and the communities that built and used them. Thus valuableinsights are gained into the past and the present as they also provide a foundation for designing the future. Given that our understanding of the past is limited by the inadequate availability of data, the article demonstrates that advanced computer tools can help gain more and well-linked data from the past. Computer vision can make a decisive contribution to the identification of image content in historical photographs. This application is particularly interesting for architectural history, where visual sources play an essential role in understanding the built environment of the past, yet lack of reliable metadata often hinders the use of materials. The automated recognition contributes to making a variety of image sources usable forresearch.<br/

    Automatic characterization and generation of music loops and instrument samples for electronic music production

    Get PDF
    Repurposing audio material to create new music - also known as sampling - was a foundation of electronic music and is a fundamental component of this practice. Currently, large-scale databases of audio offer vast collections of audio material for users to work with. The navigation on these databases is heavily focused on hierarchical tree directories. Consequently, sound retrieval is tiresome and often identified as an undesired interruption in the creative process. We address two fundamental methods for navigating sounds: characterization and generation. Characterizing loops and one-shots in terms of instruments or instrumentation allows for organizing unstructured collections and a faster retrieval for music-making. The generation of loops and one-shot sounds enables the creation of new sounds not present in an audio collection through interpolation or modification of the existing material. To achieve this, we employ deep-learning-based data-driven methodologies for classification and generation.Repurposing audio material to create new music - also known as sampling - was a foundation of electronic music and is a fundamental component of this practice. Currently, large-scale databases of audio offer vast collections of audio material for users to work with. The navigation on these databases is heavily focused on hierarchical tree directories. Consequently, sound retrieval is tiresome and often identified as an undesired interruption in the creative process. We address two fundamental methods for navigating sounds: characterization and generation. Characterizing loops and one-shots in terms of instruments or instrumentation allows for organizing unstructured collections and a faster retrieval for music-making. The generation of loops and one-shot sounds enables the creation of new sounds not present in an audio collection through interpolation or modification of the existing material. To achieve this, we employ deep-learning-based data-driven methodologies for classification and generation

    Computer Vision and Architectural History at Eye Level:Mixed Methods for Linking Research in the Humanities and in Information Technology

    Get PDF
    Information on the history of architecture is embedded in our daily surroundings, in vernacular and heritage buildings and in physical objects, photographs and plans. Historians study these tangible and intangible artefacts and the communities that built and used them. Thus valuableinsights are gained into the past and the present as they also provide a foundation for designing the future. Given that our understanding of the past is limited by the inadequate availability of data, the article demonstrates that advanced computer tools can help gain more and well-linked data from the past. Computer vision can make a decisive contribution to the identification of image content in historical photographs. This application is particularly interesting for architectural history, where visual sources play an essential role in understanding the built environment of the past, yet lack of reliable metadata often hinders the use of materials. The automated recognition contributes to making a variety of image sources usable forresearch.<br/

    Deep Learning Methods for Instrument Separation and Recognition

    Get PDF
    This thesis explores deep learning methods for timbral information processing in polyphonic music analysis. It encompasses two primary tasks: Music Source Separation (MSS) and Instrument Recognition, with focus on applying domain knowledge and utilising dense arrangements of skip-connections in the frameworks in order to reduce the number of trainable parameters and create more efficient models. Musically-motivated Convolutional Neural Network (CNN) architectures are introduced, emphasizing kernels with vertical, square, and horizontal shapes. This design choice allows for the extraction of essential harmonic and percussive features, which enhances the discrimination of different instruments. Notably, this methodology proves valuable for Harmonic-Percussive Source Separation (HPSS) and instrument recognition tasks. A significant challenge in MSS is generalising to new instrument types and music styles. To address this, a versatile framework for adversarial unsupervised domain adaptation for source separation is proposed, particularly beneficial when labeled data for specific instruments is unavailable. The curation of the Tap & Fiddle dataset is another contribution of the research, offering mixed and isolated stem recordings of traditional Scandinavian fiddle tunes, along with foot-tapping accompaniments, fostering research in source separation and metrical expression analysis within these musical styles. Since our perception of timbre is affected in different ways by transient and stationary parts of sound, the research investigates the potential of Transient Stationary-Noise Decomposition (TSND) as a preprocessing step for frame-level recognition. A method that performs TSND of spectrograms and feeds the decomposed spectrograms to a neural classifier is proposed. Furthermore, this thesis introduces a novel deep learning-based approach for pitch streaming, treating the task as a note-level instrument classification. Such an approach is modular, meaning that it can also successfully stream predicted note-events and not only labelled ground truth note-event information to corresponding instruments. Therefore, the proposed pitch streaming method enables third-party multi-pitch estimation algorithms to perform multi-instrument AMT

    Computer Vision and Architectural History at Eye Level:Mixed Methods for Linking Research in the Humanities and in Information Technology

    Get PDF
    Information on the history of architecture is embedded in our daily surroundings, in vernacular and heritage buildings and in physical objects, photographs and plans. Historians study these tangible and intangible artefacts and the communities that built and used them. Thus valuableinsights are gained into the past and the present as they also provide a foundation for designing the future. Given that our understanding of the past is limited by the inadequate availability of data, the article demonstrates that advanced computer tools can help gain more and well-linked data from the past. Computer vision can make a decisive contribution to the identification of image content in historical photographs. This application is particularly interesting for architectural history, where visual sources play an essential role in understanding the built environment of the past, yet lack of reliable metadata often hinders the use of materials. The automated recognition contributes to making a variety of image sources usable forresearch.<br/

    AI-generated Content for Various Data Modalities: A Survey

    Full text link
    AI-generated content (AIGC) methods aim to produce text, images, videos, 3D assets, and other media using AI algorithms. Due to its wide range of applications and the demonstrated potential of recent works, AIGC developments have been attracting lots of attention recently, and AIGC methods have been developed for various data modalities, such as image, video, text, 3D shape (as voxels, point clouds, meshes, and neural implicit fields), 3D scene, 3D human avatar (body and head), 3D motion, and audio -- each presenting different characteristics and challenges. Furthermore, there have also been many significant developments in cross-modality AIGC methods, where generative methods can receive conditioning input in one modality and produce outputs in another. Examples include going from various modalities to image, video, 3D shape, 3D scene, 3D avatar (body and head), 3D motion (skeleton and avatar), and audio modalities. In this paper, we provide a comprehensive review of AIGC methods across different data modalities, including both single-modality and cross-modality methods, highlighting the various challenges, representative works, and recent technical directions in each setting. We also survey the representative datasets throughout the modalities, and present comparative results for various modalities. Moreover, we also discuss the challenges and potential future research directions

    Computer Vision and Architectural History at Eye Level:Mixed Methods for Linking Research in the Humanities and in Information Technology

    Get PDF
    Information on the history of architecture is embedded in our daily surroundings, in vernacular and heritage buildings and in physical objects, photographs and plans. Historians study these tangible and intangible artefacts and the communities that built and used them. Thus valuableinsights are gained into the past and the present as they also provide a foundation for designing the future. Given that our understanding of the past is limited by the inadequate availability of data, the article demonstrates that advanced computer tools can help gain more and well-linked data from the past. Computer vision can make a decisive contribution to the identification of image content in historical photographs. This application is particularly interesting for architectural history, where visual sources play an essential role in understanding the built environment of the past, yet lack of reliable metadata often hinders the use of materials. The automated recognition contributes to making a variety of image sources usable forresearch.<br/

    Automated Rhythmic Transformation of Drum Recordings

    Get PDF
    Within the creative industries, music information retrieval techniques are now being applied in a variety of music creation and production applications. Audio artists incorporate techniques from music informatics and machine learning (e.g., beat and metre detection) for generative content creation and manipulation systems within the music production setting. Here musicians, desiring a certain sound or aesthetic influenced by the style of artists they admire, may change or replace the rhythmic pattern and sound characteristics (i.e., timbre) of drums in their recordings with those from an idealised recording (e.g., in processes of redrumming and mashup creation). Automated transformation systems for rhythm and timbre can be powerful tools for music producers, allowing them to quickly and easily adjust the different elements of a drum recording to fit the overall style of a song. The aim of this thesis is to develop systems for automated transformation of rhythmic patterns of drum recordings using a subset of techniques from deep learning called deep generative models (DGM) for neural audio synthesis. DGMs such as autoencoders and generative adversarial networks have been shown to be effective for transforming musical signals in a variety of genres as well as for learning the underlying structure of datasets for generation of new audio examples. To this end, modular deep learning-based systems are presented in this thesis with evaluations which measure the extent of the rhythmic modifications generated by different modes of transformation, which include audio style transfer, drum translation and latent space manipulation. The evaluation results underscore both the strengths and constraints of DGMs for transformation of rhythmic patterns as well as neural synthesis of drum sounds within a variety of musical genres. New audio style transfer (AST) functions were specifically designed for mashup-oriented drum recording transformation. The designed loss objectives lowered the computational demands of the AST algorithm and offered rhythmic transformation capabilities which adhere to a larger rhythmic structure of the input to generate music that is both creative and realistic. To extend the transformation possibilities of DGMs, systems based on adversarial autoencoders (AAE) were proposed for drum translation and continuous rhythmic transformation of bar-length patterns. The evaluations which investigated the lower dimensional representations of the latent space of the proposed system based on AAEs with a Gaussian mixture prior (AAE-GM) highlighted the importance of the structure of the disentangled latent distributions of AAE-GM. Furthermore, the proposed system demonstrated improved performance, as evidenced by higher reconstruction metrics, when compared to traditional autoencoder models. This implies that the system can more accurately recreate complex drum sounds, ensuring that the produced rhythmic transformation maintains richness of the source material. For music producers, this means heightened fidelity in drum synthesis and the potential for more expressive and varied drum tracks, enhancing the creativity in music production. This work also enhances neural drum synthesis by introducing a new, diverse dataset of kick, snare, and hi-hat drum samples, along with multiple drum loop datasets for model training and evaluation. Overall, the work in this thesis increased the profile of the field and hopefully will attract more attention and resources to the area, which will help drive future research and development of neural rhythmic transformation systems

    Mixing Methods: Practical Insights from the Humanities in the Digital Age

    Get PDF
    The digital transformation is accompanied by two simultaneous processes: digital humanities challenging the humanities, their theories, methodologies and disciplinary identities, and pushing computer science to get involved in new fields. But how can qualitative and quantitative methods be usefully combined in one research project? What are the theoretical and methodological principles across all disciplinary digital approaches? This volume focusses on driving innovation and conceptualising the humanities in the 21st century. Building on the results of 10 research projects, it serves as a useful tool for designing cutting-edge research that goes beyond conventional strategies
    corecore