3,265 research outputs found

    An Unsupervised Autoregressive Model for Speech Representation Learning

    Full text link
    This paper proposes a novel unsupervised autoregressive neural model for learning generic speech representations. In contrast to other speech representation learning methods that aim to remove noise or speaker variabilities, ours is designed to preserve information for a wide range of downstream tasks. In addition, the proposed model does not require any phonetic or word boundary labels, allowing the model to benefit from large quantities of unlabeled data. Speech representations learned by our model significantly improve performance on both phone classification and speaker verification over the surface features and other supervised and unsupervised approaches. Further analysis shows that different levels of speech information are captured by our model at different layers. In particular, the lower layers tend to be more discriminative for speakers, while the upper layers provide more phonetic content.Comment: Accepted to Interspeech 2019. Code available at: https://github.com/iamyuanchung/Autoregressive-Predictive-Codin

    Constructing an Index for Brand Equity: A Hospital Example

    Get PDF
    If two hospitals are providing identical services in all respects, except for the brand name, why are customers willing to pay more for one hospital than the other? That is, the brand name is not just a name, but a name that contains value (brand equity). Brand equity is the value that the brand name endows to the product, such that consumers are willing to pay a premium price for products with the particular brand name. Accordingly, a company needs to manage its brand carefully so that its brand equity does not depreciate. Although measuring brand equity is important, managers have no brand equity index that is psychometrically robust and parsimonious enough for practice. Indeed, index construction is quite different from conventional scale development. Moreover, researchers might still be unaware of the potential appropriateness of formative indicators for operationalizing particular constructs. Towards this end, drawing on the brand equity literature and following the index construction procedure, this study creates a brand equity index for a hospital. The results reveal a parsimonious five-indicator brand equity index that can adequately capture the full domain of brand equity. This study also illustrates the differences between index construction and scale development

    CONVERSER: Few-Shot Conversational Dense Retrieval with Synthetic Data Generation

    Full text link
    Conversational search provides a natural interface for information retrieval (IR). Recent approaches have demonstrated promising results in applying dense retrieval to conversational IR. However, training dense retrievers requires large amounts of in-domain paired data. This hinders the development of conversational dense retrievers, as abundant in-domain conversations are expensive to collect. In this paper, we propose CONVERSER, a framework for training conversational dense retrievers with at most 6 examples of in-domain dialogues. Specifically, we utilize the in-context learning capability of large language models to generate conversational queries given a passage in the retrieval corpus. Experimental results on conversational retrieval benchmarks OR-QuAC and TREC CAsT 19 show that the proposed CONVERSER achieves comparable performance to fully-supervised models, demonstrating the effectiveness of our proposed framework in few-shot conversational dense retrieval. All source code and generated datasets are available at https://github.com/MiuLab/CONVERSERComment: Accepted to SIGDIAL 202

    Testing Monotonicity of Mean Potential Outcomes in a Continuous Treatment

    Full text link
    While most treatment evaluations focus on binary interventions, a growing literature also considers continuously distributed treatments, e.g. hours spent in a training program to assess its effect on labor market outcomes. In this paper, we propose a Cram\'er-von Mises-type test for testing whether the mean potential outcome given a specific treatment has a weakly monotonic relationship with the treatment dose under a weak unconfoundedness assumption. This appears interesting for testing shape restrictions, e.g. whether increasing the treatment dose always has a non-negative effect, no matter what the baseline level of treatment is. We formally show that the proposed test controls asymptotic size and is consistent against any fixed alternative. These theoretical findings are supported by the method's finite sample behavior in our Monte-Carlo simulations. As an empirical illustration, we apply our test to the Job Corps study and reject a weakly monotonic relationship between the treatment (hours in academic and vocational training) and labor market outcomes like earnings or employment

    Hey! I Have Something for You: Paging Cycle Based Random Access for LTE-A

    Get PDF
    The surge of M2M devices imposes new challenges for the current cellular network architecture, especially in radio access networks. One of the key issues is that the M2M traffic, characterized by small data and massive connection requests, makes significant collisions and congestion during network access via the random access (RA) procedure. To resolve this problem, in this paper, we propose a paging cycle-based protocol to facilitate the random access procedure in LTE-A. The high-level idea of our design is to leverage a UE's paging cycle as a hint to preassign RA preambles so that UEs can avoid preamble collisions at the first place. Our rpHint has two modes: (1) collision-free paging, which completely prevents cross-collision between paged user equipment (UEs) and random access UEs, and (2) collision-avoidance paging, which alleviates cross-collision. Moreover, we formulate a mathematical model to derive the optimal paging ratio that maximizes the expected number of successful UEs. This analysis also allows us to adapt dynamically to the better one between the two modes. We show via extensive simulations that our design increases the number of successful UEs in an RA procedure by more than 3× as compared to the legacy RA scheme of the LTE

    On Optimizing Signaling Efficiency of Retransmissions for Voice LTE

    Get PDF
    The emergence of voice over LTE enables voice traffic transmissions over 4G packet-switched networks. Since voice traffic is characterized by its small payload and frequent transmissions, the corresponding control channel overhead would be high. Semi-persistent scheduling (SPS) is hence proposed in LTE-A to reduce such overhead. However, as wireless channels typically fluctuate, tremendous retransmissions due to poor channel conditions, which are still scheduled dynamically, would lead to a large overhead. To reduce the control message overhead caused by SPS retransmissions, we propose a new SPS retransmission protocol. Different from traditional SPS, which removes the downlink control indicators (DCI) directly, we compress some key fields of all retransmissions' DCIs in the same subframe as a fixed-length hint. Thus, the base station does not need to send this information to different users individually but just announces the hint as a broadcast message. In this way, we reduce the signaling overhead and at the same time, preserve the flexibility of dynamic scheduling. Our simulation results show that, by enabling DCI compression, our design improves signaling efficiency by 2.16\times, and the spectral utilization can be increased by up to 60%

    Model Extraction Attack against Self-supervised Speech Models

    Full text link
    Self-supervised learning (SSL) speech models generate meaningful representations of given clips and achieve incredible performance across various downstream tasks. Model extraction attack (MEA) often refers to an adversary stealing the functionality of the victim model with only query access. In this work, we study the MEA problem against SSL speech model with a small number of queries. We propose a two-stage framework to extract the model. In the first stage, SSL is conducted on the large-scale unlabeled corpus to pre-train a small speech model. Secondly, we actively sample a small portion of clips from the unlabeled corpus and query the target model with these clips to acquire their representations as labels for the small model's second-stage training. Experiment results show that our sampling methods can effectively extract the target model without knowing any information about its model architecture
    corecore