286 research outputs found

    Unsupervised extraction of recurring words from infant-directed speech

    Get PDF
    To date, most computational models of infant word segmentation have worked from phonemic or phonetic input, or have used toy datasets. In this paper, we present an algorithm for word extraction that works directly from naturalistic acoustic input: infant-directed speech from the CHILDES corpus. The algorithm identifies recurring acoustic patterns that are candidates for identification as words or phrases, and then clusters together the most similar patterns. The recurring patterns are found in a single pass through the corpus using an incremental method, where only a small number of utterances are considered at once. Despite this limitation, we show that the algorithm is able to extract a number of recurring words, including some that infants learn earliest, such as Mommy and the child’s name. We also introduce a novel information-theoretic evaluation measure

    Recruitment Market Trend Analysis with Sequential Latent Variable Models

    Full text link
    Recruitment market analysis provides valuable understanding of industry-specific economic growth and plays an important role for both employers and job seekers. With the rapid development of online recruitment services, massive recruitment data have been accumulated and enable a new paradigm for recruitment market analysis. However, traditional methods for recruitment market analysis largely rely on the knowledge of domain experts and classic statistical models, which are usually too general to model large-scale dynamic recruitment data, and have difficulties to capture the fine-grained market trends. To this end, in this paper, we propose a new research paradigm for recruitment market analysis by leveraging unsupervised learning techniques for automatically discovering recruitment market trends based on large-scale recruitment data. Specifically, we develop a novel sequential latent variable model, named MTLVM, which is designed for capturing the sequential dependencies of corporate recruitment states and is able to automatically learn the latent recruitment topics within a Bayesian generative framework. In particular, to capture the variability of recruitment topics over time, we design hierarchical dirichlet processes for MTLVM. These processes allow to dynamically generate the evolving recruitment topics. Finally, we implement a prototype system to empirically evaluate our approach based on real-world recruitment data in China. Indeed, by visualizing the results from MTLVM, we can successfully reveal many interesting findings, such as the popularity of LBS related jobs reached the peak in the 2nd half of 2014, and decreased in 2015.Comment: 11 pages, 30 figure, SIGKDD 201

    A Conserved DNA Repeat Promotes Selection of a Diverse Repertoire of Trypanosoma brucei Surface Antigens from the Genomic Archive.

    Get PDF
    African trypanosomes are mammalian pathogens that must regularly change their protein coat to survive in the host bloodstream. Chronic trypanosome infections are potentiated by their ability to access a deep genomic repertoire of Variant Surface Glycoprotein (VSG) genes and switch from the expression of one VSG to another. Switching VSG expression is largely based in DNA recombination events that result in chromosome translocations between an acceptor site, which houses the actively transcribed VSG, and a donor gene, drawn from an archive of more than 2,000 silent VSGs. One element implicated in these duplicative gene conversion events is a DNA repeat of approximately 70 bp that is found in long regions within each BES and short iterations proximal to VSGs within the silent archive. Early observations showing that 70-bp repeats can be recombination boundaries during VSG switching led to the prediction that VSG-proximal 70-bp repeats provide recombinatorial homology. Yet, this long held assumption had not been tested and no specific function for the conserved 70-bp repeats had been demonstrated. In the present study, the 70-bp repeats were genetically manipulated under conditions that induce gene conversion. In this manner, we demonstrated that 70-bp repeats promote access to archival VSGs. Synthetic repeat DNA sequences were then employed to identify the length, sequence, and directionality of repeat regions required for this activity. In addition, manipulation of the 70-bp repeats allowed us to observe a link between VSG switching and the cell cycle that had not been appreciated. Together these data provide definitive support for the long-standing hypothesis that 70-bp repeats provide recombinatorial homology during switching. Yet, the fact that silent archival VSGs are selected under these conditions suggests the 70-bp repeats also direct DNA pairing and recombination machinery away from the closest homologs (silent BESs) and toward the rest of the archive

    Speech Facilitates the Categorization of Motions in 9-Month-Old Infants

    Get PDF
    Two experiments were used to investigate the influence of both native and non-native speech on the categorization of a set of an object’s motions by 9-month-olds. In Experiment 1, infants were habituated to a set of three object motions and tested with familiar and novel motions. Results of Experiment 1 show that infants were more likely to categorize the motion stimuli if they listened to either the native or non-native speech during the categorization process than if they listened to music or heard nothing at all. Results of Experiment 2 show that discrimination of the motions was not impaired by the presence of the labeling phrases. These results are consistent with a number of findings that report a unique influence of labels on categorization of static objects in infancy and extend those findings to categorization of motions

    What is typical about the typicality effect in category-based induction?

    Full text link

    A perspective on SIDS pathogenesis. The hypotheses: plausibility and evidence

    Get PDF
    Several theories of the underlying mechanisms of Sudden Infant Death Syndrome (SIDS) have been proposed. These theories have born relatively narrow beach-head research programs attracting generous research funding sustained for many years at expense to the public purse. This perspective endeavors to critically examine the evidence and bases of these theories and determine their plausibility; and questions whether or not a safe and reasoned hypothesis lies at their foundation. The Opinion sets specific criteria by asking the following questions: 1. Does the hypothesis take into account the key pathological findings in SIDS? 2. Is the hypothesis congruent with the key epidemiological risk factors? 3. Does it link 1 and 2? Falling short of any one of these answers, by inference, would imply insufficient grounds for a sustainable hypothesis. Some of the hypotheses overlap, for instance, notional respiratory failure may encompass apnea, prone sleep position, and asphyxia which may be seen to be linked to co-sleeping. For the purposes of this paper, each element will be assessed on the above criteria

    Intergenerational Practice in the Community—What Does the Community Think?

    Full text link
    The many changes that occur in the lives of older people put them at an increased risk of being socially isolated and lonely. Intergenerational programs for older adults and young children can potentially address this shortfall, because of the perceived benefit from generations interacting. This study explores whether there is an appetite in the community for intergenerational programs for community dwelling older adults. An online survey was distributed via social media, research team networks, and snowballing recruitment with access provided via QR code or hyperlink. Semi-structured interviews were undertaken with potential participants of a pilot intergenerational program planned for the Eastern Suburbs of Sydney, Australia in 2020. The interviews were thematically analyzed. Over 250 people completed the survey, and 21 interviews took place with older adults (10) and parents of young children (11). The data showed that participants were all in favor of intergenerational programs, but there were different perceptions about who benefits most and how. The study highlighted considerations to be addressed in the development of effective and sustainable intergenerational programs. For example, accessing people in the community who are most socially isolated and lonely was identified as a primary challenge. More evidence-based research is needed to support involvement of different cohorts, such as those who are frail, or living with physical or cognitive limitations.</jats:p

    Learning and Long-Term Retention of Large-Scale Artificial Languages

    Get PDF
    Recovering discrete words from continuous speech is one of the first challenges facing language learners. Infants and adults can make use of the statistical structure of utterances to learn the forms of words from unsegmented input, suggesting that this ability may be useful for bootstrapping language-specific cues to segmentation. It is unknown, however, whether performance shown in small-scale laboratory demonstrations of “statistical learning” can scale up to allow learning of the lexicons of natural languages, which are orders of magnitude larger. Artificial language experiments with adults can be used to test whether the mechanisms of statistical learning are in principle scalable to larger lexicons. We report data from a large-scale learning experiment that demonstrates that adults can learn words from unsegmented input in much larger languages than previously documented and that they retain the words they learn for years. These results suggest that statistical word segmentation could be scalable to the challenges of lexical acquisition in natural language learning.National Science Foundation (U.S.) (NSF DDRIG #0746251
    corecore