37 research outputs found
GETMusic: Generating Any Music Tracks with a Unified Representation and Diffusion Framework
Symbolic music generation aims to create musical notes, which can help users
compose music, such as generating target instrument tracks based on provided
source tracks. In practical scenarios where there's a predefined ensemble of
tracks and various composition needs, an efficient and effective generative
model that can generate any target tracks based on the other tracks becomes
crucial. However, previous efforts have fallen short in addressing this
necessity due to limitations in their music representations and models. In this
paper, we introduce a framework known as GETMusic, with ``GET'' standing for
``GEnerate music Tracks.'' This framework encompasses a novel music
representation ``GETScore'' and a diffusion model ``GETDiff.'' GETScore
represents musical notes as tokens and organizes tokens in a 2D structure, with
tracks stacked vertically and progressing horizontally over time. At a training
step, each track of a music piece is randomly selected as either the target or
source. The training involves two processes: In the forward process, target
tracks are corrupted by masking their tokens, while source tracks remain as the
ground truth; in the denoising process, GETDiff is trained to predict the
masked target tokens conditioning on the source tracks. Our proposed
representation, coupled with the non-autoregressive generative model, empowers
GETMusic to generate music with any arbitrary source-target track combinations.
Our experiments demonstrate that the versatile GETMusic outperforms prior works
proposed for certain specific composition tasks.Comment: 13 pages, 4 figure
EmoGen: Eliminating Subjective Bias in Emotional Music Generation
Music is used to convey emotions, and thus generating emotional music is
important in automatic music generation. Previous work on emotional music
generation directly uses annotated emotion labels as control signals, which
suffers from subjective bias: different people may annotate different emotions
on the same music, and one person may feel different emotions under different
situations. Therefore, directly mapping emotion labels to music sequences in an
end-to-end way would confuse the learning process and hinder the model from
generating music with general emotions. In this paper, we propose EmoGen, an
emotional music generation system that leverages a set of emotion-related music
attributes as the bridge between emotion and music, and divides the generation
into two stages: emotion-to-attribute mapping with supervised clustering, and
attribute-to-music generation with self-supervised learning. Both stages are
beneficial: in the first stage, the attribute values around the clustering
center represent the general emotions of these samples, which help eliminate
the impacts of the subjective bias of emotion labels; in the second stage, the
generation is completely disentangled from emotion labels and thus free from
the subjective bias. Both subjective and objective evaluations show that EmoGen
outperforms previous methods on emotion control accuracy and music quality
respectively, which demonstrate our superiority in generating emotional music.
Music samples generated by EmoGen are available via this
link:https://ai-muzic.github.io/emogen/, and the code is available at this
link:https://github.com/microsoft/muzic/.Comment: 12 pages, 7 page
MuseCoco: Generating Symbolic Music from Text
Generating music from text descriptions is a user-friendly mode since the
text is a relatively easy interface for user engagement. While some approaches
utilize texts to control music audio generation, editing musical elements in
generated audio is challenging for users. In contrast, symbolic music offers
ease of editing, making it more accessible for users to manipulate specific
musical elements. In this paper, we propose MuseCoco, which generates symbolic
music from text descriptions with musical attributes as the bridge to break
down the task into text-to-attribute understanding and attribute-to-music
generation stages. MuseCoCo stands for Music Composition Copilot that empowers
musicians to generate music directly from given text descriptions, offering a
significant improvement in efficiency compared to creating music entirely from
scratch. The system has two main advantages: Firstly, it is data efficient. In
the attribute-to-music generation stage, the attributes can be directly
extracted from music sequences, making the model training self-supervised. In
the text-to-attribute understanding stage, the text is synthesized and refined
by ChatGPT based on the defined attribute templates. Secondly, the system can
achieve precise control with specific attributes in text descriptions and
offers multiple control options through attribute-conditioned or
text-conditioned approaches. MuseCoco outperforms baseline systems in terms of
musicality, controllability, and overall score by at least 1.27, 1.08, and 1.32
respectively. Besides, there is a notable enhancement of about 20% in objective
control accuracy. In addition, we have developed a robust large-scale model
with 1.2 billion parameters, showcasing exceptional controllability and
musicality
First come, First served: Enhancing the Convenience Store Service Experience
One distinctive characteristic of Taiwanese city streets is the omnipresence of convenience stores. These clean, brightly lit stores are in operation 24 hours a day, seven days a week, and offer a wide range of constantly updated lifestyle products and services. Past research in convenience stores have often overlooked the work experiences of convenience store employees, and their contribution to the overall service experience. Thus, the goal of this exploratory study is to explore the convenience store work environment, and to provide some suggestions for in-store technological enhancements. Data was collected through in-depth interviewing, field study observations and Living Lab methodologies. Our research reveals that convenience store employees experience several types of physical, mental and emotional strains throughout their shifts. These strains are often derived from excessive physical exertion and unpleasant interactions with customers. We suggest that certain in-store technological enhancements, such as seamless sensing and seamful actuating, can serve to alleviate employee sense of pressure and anxiety during customer interactions.</p
MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models
AI-empowered music processing is a diverse field that encompasses dozens of
tasks, ranging from generation tasks (e.g., timbre synthesis) to comprehension
tasks (e.g., music classification). For developers and amateurs, it is very
difficult to grasp all of these task to satisfy their requirements in music
processing, especially considering the huge differences in the representations
of music data and the model applicability across platforms among various tasks.
Consequently, it is necessary to build a system to organize and integrate these
tasks, and thus help practitioners to automatically analyze their demand and
call suitable tools as solutions to fulfill their requirements. Inspired by the
recent success of large language models (LLMs) in task automation, we develop a
system, named MusicAgent, which integrates numerous music-related tools and an
autonomous workflow to address user requirements. More specifically, we build
1) toolset that collects tools from diverse sources, including Hugging Face,
GitHub, and Web API, etc. 2) an autonomous workflow empowered by LLMs (e.g.,
ChatGPT) to organize these tools and automatically decompose user requests into
multiple sub-tasks and invoke corresponding music tools. The primary goal of
this system is to free users from the intricacies of AI-music tools, enabling
them to concentrate on the creative aspect. By granting users the freedom to
effortlessly combine tools, the system offers a seamless and enriching music
experience
Phase evolution and superconductivity enhancement in Se-substituted MoTe thin films
The strong spinorbit coupling (SOC) and numerous crystal phases in
fewlayer transition metal dichalcogenides (TMDCs) MX (MW, Mo, and
XTe, Se, S) has led to a variety of novel physics, such as Ising
superconductivity and quantum spin Hall effect realized in monolayer 2H and
TdMX, respectively. Consecutive tailoring of the MX structure from
2H to Td phase may realize the longsought topological superconductivity in
one material system by incorporating superconductivity and quantum spin Hall
effect together. In this work, by combing Raman spectrum, X-ray photoelectron
spectrum (XPS), scanning transmission electron microscopy imaging (STEM) as
well as electrical transport measurements, we demonstrate that a consecutively
structural phase transitions from Td to 1T to 2H polytype can be realized as
the Se-substitution concentration increases. More importantly, the
Sesubstitution has been found to notably enhance the superconductivity of
the MoTe thin film, which is interpreted as the introduction of the
twoband superconductivity. The chemical constituent induced phase transition
offers a new strategy to study the s superconductivity and the possible
topological superconductivity as well as to develop phasesensitive devices
based on MX materials.Comment: 27 pages, 5 figure
Transport evidence of asymmetric spin-orbit coupling in fewlayer superconducting 1TdMoTe
Two-dimensional (2D) transition metal dichalcogenides (TMDCs) MX2 (M=W, Mo,
Nb, and X=Te, Se, S) with strong spin-orbit coupling (SOC) possess plenty of
novel physics including superconductivity. Due to the Ising SOC, monolayer
NbSe and gated MoS of 2H structure can realize the Ising
superconductivity phase, which manifests itself with in-plane upper critical
field far exceeding Pauli paramagnetic limit. Surprisingly, we find that a
few-layer 1Td structure MoTe also exhibits an in-plane upper critical field
() which goes beyond the Pauli paramagnetic limit. Importantly, the
in-plane upper critical field shows an emergent two-fold symmetry which is
different from the isotropic in 2H structure TMDCs. We show that
this is a result of an asymmetric SOC in 1Td structure TMDCs. The asymmetric
SOC is very strong and estimated to be on the order of tens of meV. Our work
provides the first transport evidence of a new type of asymmetric SOC in TMDCs
which may give rise to novel superconducting and spin transport properties.
Moreover, our findings mostly depend on the symmetry of the crystal and apply
to a whole class of 1Td TMDCs such as 1Td-WTe which is under intense study
due to its topological properties.Comment: 34 pages, 12 figure
A fast and robust open-switch fault diagnosis method for variable-speed PMSM system
Traditional open-switch fault diagnosis methods suffer from poor rapidity or robustness. To solve this issue, a new differential current observer-based fault diagnosis method is proposed in this article. With the designed differential observer, fault symptoms (residuals) can be generated and adopted for fault diagnosis easily. Considering that the residuals are sensitive to motor operating condition in the conventional model-based method due to model error, an adaptive fault detection threshold is designed. As a result, the false detection and missed detection caused by the change of working condition can be avoided, and stronger robustness against speed, load, and parameter variations can be achieved with superior rapidity compared with existing methods. Finally, the rapidity and robustness of the proposed fault diagnosis method are verified sufficiently through experimental results
Identification and Construction of a Long Noncoding RNA Prognostic Risk Model for Stomach Adenocarcinoma Patients
Background. Long noncoding RNA-based prognostic biomarkers have demonstrated great potential in the diagnosis and prognosis of cancer patients. However, systematic assessment of a multiple lncRNA-composed prognostic risk model is lacking in stomach adenocarcinoma (STAD). This study is aimed at constructing a lncRNA-based prognostic risk model for STAD patients. Methods. RNA sequencing data and clinical information of STAD patients were retrieved from The Cancer Genome Atlas (TCGA) database. Differentially expressed lncRNAs (DElncRNAs) were identified using the R software. Univariate and multivariate Cox regression analyses were performed to construct a prognostic risk model. The survival analysis, C-index, and receiver operating characteristic (ROC) curve were employed to assess the sensitivity and specificity of the model. The results were verified using the GEPIA online tool and our clinical samples. Pearson correlation coefficient analysis, Gene Ontology (GO), and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment were performed to indicate the potential biological functions of the selected lncRNA. Results. A total of 1917 DElncRNAs were identified from 343 cases of STAD tissues and 30 cases of noncancerous tissues. According to univariate and multivariable Cox regression analyses, four DElncRNAs (AC129507.1, LINC02407, AL022316.1, and AP000695.2) were selected to establish a prognostic risk model. There was a significant difference in the overall survival between high-risk patients and low-risk patients based on this risk model. The C-index of the model was 0.652. The area under the curve (AUC) for the ROC curve was 0.769. GEPIA results confirmed the expression and prognostic significance of AP000695.2 in STAD. Our clinical data confirmed that upregulated expression of AP000695.2 was correlated with the T stage, distant metastasis, and TNM stage in STAD. GO and KEGG analyses demonstrated that AP000695.2 was closely related to the tumorigenesis process. Conclusions. In this study, we constructed a lncRNA-based prognostic risk model for STAD patients. Our study will provide novel insight into the diagnosis and prognosis of STAD patients