27 research outputs found

    What Would They Say? Predicting Users Comments in Pinterest

    Get PDF
    When we refer to an image that attracts our attention, it is natural to mention not only what is literally depicted in the image, but also the sentiments, thoughts and opinions that it invokes in ourselves. In this work we deviate from the standard mainstream tasks of associating tags or keywords to an image, or generating content image descriptions, and we introduce the novel task of automatically generate user comments for an image. We present a new dataset collected from the social media Pinterest and we propose a strategy based on building joint textual and visual user models, tailored to the specificity of the mentioned task. We conduct an extensive experimental analysis of our approach on both qualitative and quantitative terms, which allows assessing the value of the proposed approach and shows its encouraging results against several existing image-to-text methods.status: publishe

    What Would They Say? Predicting Users Comments in Pinterest

    Get PDF
    When we refer to an image that attracts our attention, it is natural to mention not only what is literally depicted in the image, but also the sentiments, thoughts and opinions that it invokes in ourselves. In this work we deviate from the standard mainstream tasks of associating tags or keywords to an image, or generating content image descriptions, and we introduce the novel task of automatically generate user comments for an image. We present a new dataset collected from the social media Pinterest and we propose a strategy based on building joint textual and visual user models, tailored to the specificity of the mentioned task. We conduct an extensive experimental analysis of our approach on both qualitative and quantitative terms, which allows to assess the value of the proposed approach and shows its encouraging results against several existing image-to-text methods

    ‘Costa da Morte’ ataxia is spinocerebellar ataxia 36: clinical and genetic characterization

    Get PDF
    Spinocerebellar ataxia 36 has been recently described in Japanese families as a new type of spinocerebellar ataxia with motor neuron signs. It is caused by a GGCCTG repeat expansion in intron 1 of NOP56. Family interview and document research allowed us to reconstruct two extensive, multigenerational kindreds stemming from the same village (Costa da Morte in Galicia, Spain), in the 17th century. We found the presence of the spinocerebellar ataxia 36 mutation co-segregating with disease in these families in whom we had previously identified an ∼0.8 Mb linkage region to chromosome 20 p. Subsequent screening revealed the NOP56 expansion in eight additional Galician ataxia kindreds. While normal alleles contain 5–14 hexanucleotide repeats, expanded alleles range from ∼650 to 2500 repeats, within a shared haplotype. Further expansion of repeat size was frequent, especially upon paternal transmission, while instances of allele contraction were observed in maternal transmissions. We found a total of 63 individuals carrying the mutation, 44 of whom were confirmed to be clinically affected; over 400 people are at risk. We describe here the detailed clinical picture, consisting of a late-onset, slowly progressive cerebellar syndrome with variable eye movement abnormalities and sensorineural hearing loss. There were signs of denervation in the tongue, as well as mild pyramidal signs, but otherwise no signs of classical amyotrophic lateral sclerosis. Magnetic resonance imaging findings were consistent with the clinical course, showing atrophy of the cerebellar vermis in initial stages, later evolving to a pattern of olivo-ponto-cerebellar atrophy. We estimated the origin of the founder mutation in Galicia to have occurred ∼1275 years ago. Out of 160 Galician families with spinocerebellar ataxia, 10 (6.3%) were found to have spinocerebellar ataxia 36, while 15 (9.4%) showed other of the routinely tested dominant spinocerebellar ataxia types. Spinocerebellar ataxia 36 is thus, so far, the most frequent dominant spinocerebellar ataxia in this region, which may have implications for American countries associated with traditional Spanish emigration

    Ataxin-3 phosphorylation decreases neuronal defects in spinocerebellar ataxia type 3 models

    Get PDF
    Different neurodegenerative diseases are caused by aberrant elongation of repeated glutamine sequences normally found in particular human proteins. Although the proteins involved are ubiquitously distributed in human tissues, toxicity targets only defined neuronal populations. Changes caused by an expanded polyglutamine protein are possibly influenced by endogenous cellular mechanisms, which may be harnessed to produce neuroprotection. Here, we show that ataxin-3, the protein involved in spinocerebellar ataxia type 3, also known as Machado-Joseph disease, causes dendritic and synapse loss in cultured neurons when expanded. We report that S12 of ataxin-3 is phosphorylated in neurons and that mutating this residue so as to mimic a constitutive phosphorylated state counters the neuromorphologic defects observed. In rats stereotaxically injected with expanded ataxin-3–encoding lentiviral vectors, mutation of serine 12 reduces aggregation, neuronal loss, and synapse loss. Our results suggest that S12 plays a role in the pathogenic pathways mediated by polyglutamine-expanded ataxin-3 and that phosphorylation of this residue protects against toxicity

    Latent Variable Models for Language and Image Understanding in Social Media and E-Commerce Data

    No full text
    More content has been created in the past few years than in the entire history of humankind. With the exponential growth of user-contributed content, it becomes increasingly important to develop systems capable to intelligently process both language and images. While understanding language appears effortless for humans from a young age, for computers, this is quite a challenging task. Inherently, languages are ambiguous and rich. Many words can be used to denote the same concept, and conversely, the same word can represent multiple things. This fact is accentuated on the wild, and noisy Web, where users playfully and organically create new words and assign new meaning to existing terms. Consider for example the word ``happy''. It has has many synonyms according to a standard English thesaurus: cheerful, glad, joyful, merry, etc. However, on the Web users choose a wider range of terms to denote the same concept: cheerio, cherry-merry, cheryl, grooved, psyched, stoked, and the list is ever evolving. If we had to find all the documents that refer to one particular concept, it would not be sufficient to rely on a thesaurus. Instead, we wish to develop algorithms that can automatically learn semantically related words from organic and noisy data without relying on previous knowledge or dictionaries. In this context, we address the task of cross-idiomatic linking of Web sources. This task consists of connecting textual content from different domains, where similar concepts are discussed but the language usage differs greatly. Specifically, we focus on linking social media posts from the popular site Pinterest.com to e-commerce products from Amazon.com. The task is framed in an information retrieval setting, where pins (here pins are short snippets of text that a Pinterest user has posted online about something they are interested in) from Pinterest are used as queries, and Amazon products comprise the target collection. We develop novel textual representations based on the family of latent Dirichlet allocation (LDA) models. Our core insight is that we can learn representations that allow us to bridge the query and target language by leveraging pairs of aligned documents. These are documents that discuss the same topic using different words. Our proposed multi-idiomatic latent Dirichlet allocation (MiLDA) model explicitly takes into account the shared topic distribution between sources, while modeling both the differences and similarities in the language. The first set of contributions of this work are as follows: 1) we constructed a new benchmark dataset composed of pins from Pinterest, Amazon product descriptions and corresponding users reviews. This dataset is accompanied by relevance annotations of randomly chosen pins with respect to the Amazon products. 2) We proposed, performed and assessed the novel task of cross-idiomatic linking, as described above. 3) We developed representations for cross-idiomatic modeling of noisy textual sources, as found on the Web. 4) We performed a systematic empirical comparison to evaluate the performance of different latent variable models for connecting cross-idiomatic sources. In addition to language, understanding images is also challenging. While humans can easily ``translate'' visual concepts into words and vice-versa, machines are not quite skilled at this. The challenge is that the raw representations of images and text (as normally stored in a computer) do not reveal their actual meaning; they are just large arrays of numbers. We develop representations that allow us to semantically connect images and language. In this context we address the task of cross-modal search, i.e., given a query image, we aim to retrieve words that describe the visual content (image annotation), and given a set of textual descriptors, we aim to find images that display such attributes (image search). Specifically, we perform this task within the fashion domain. To achieve this, we exploit the alignment between images and their surrounding text in natural language, as found on the Web. Specifically, we investigate different image representations such as scale-invariant feature transform (SIFT) and convolutional neural networks (CNN); different textual representations such as bag of words (bow) and semantic word embeddings; and different latent variable alignment models, such as neural networks (NN), canonical correlation analysis (CCA) and bilingual latent Dirichlet allocation (BiLDA). These yield to the second set of contributions of this work: 1) we constructed a new benchmark dataset composed of pairs of images and noisy textual descriptions in the fashion domain, as found on the Web. 2) We proposed, performed and assessed the novel task of fashion cross-modal search. 3) We developed representations that bridge the gap between noisy and heterogeneous multimodal content. 4) We performed a systematic empirical comparison to evaluate the performance of different latent variable models for connecting cross-modal sources in fashion.status: publishe

    Latent Dirichlet allocation for linking user-generated content and e-commerce data

    No full text
    © 2016 Elsevier Inc. Automatic linking of online content improves navigation possibilities for end users. We focus on linking content generated by users to other relevant sites. In particular, we study the problem of linking information between different usages of the same language, e.g., colloquial and formal idioms or the language of consumers versus the language of sellers. The challenge is that the same items are described using very distinct vocabularies. As a case study, we investigate a new task of linking textual Pinterest.com pins (colloquial) to online webshops (formal). Given this task, our key insight is that we can learn associations between formal and informal language by utilizing aligned data and probabilistic modeling. Specifically, we thoroughly evaluate three different modeling paradigms based on probabilistic topic modeling: monolingual latent Dirichlet allocation (LDA), bilingual LDA (BiLDA) and a novel multi-idiomatic LDA model (MiLDA). We compare these to the unigram model with Dirichlet prior. Our results for all three topic models reveal the usefulness of modeling the hidden thematic structure of the data through topics, as opposed to the linking model based solely on the standard unigram. Moreover, our proposed MiLDA model is able to deal with intrinsic multi-idiomatic data by considering the shared vocabulary between the aligned document pairs. The proposed MiLDA obtains the largest stability (less variation with changes in parameters) and highest mean average precision scores in the linking task.publisher: Elsevier articletitle: Latent Dirichlet allocation for linking user-generated content and e-commerce data journaltitle: Information Sciences articlelink: http://dx.doi.org/10.1016/j.ins.2016.05.047 content_type: article copyright: © 2016 Elsevier Inc. All rights reserved.status: publishe

    Are words enough? A study on text-based representations and retrieval models for linking pins to online shops

    No full text
    User-generated content offers opportunities to learn about people's interests and hobbies. We can leverage this information to help users find interesting shops and businesses find interested users. However this content is highly noisy and unstructured as posted on social media sites and blogs. In this work we evaluate different textual representations and retrieval models that aim to make sense of social media data for retail applications. Our task is to link the text of pins (from Pinterest.com) to online shops (formed by clustering Amazon.com's products). Our results show that document representations that combine latent concepts with single words yield the best performance.status: publishe

    Learning to bridge colloquial and formal language applied to linking and search of e-commerce data

    No full text
    We study the problem of linking information between different idiomatic usages of the same language, for example, colloquial and formal language. We propose a novel probabilistic topic model called multi-idiomatic LDA (MiLDA). Its modeling principles follow the intuition that certain words are shared between two idioms of the same language, while other words are non-shared, that is, idiom-specific. We demonstrate the ability of our model to learn relations between cross-idiomatic topics in a dataset containing product descriptions and reviews. We intrinsically evaluate our model by the perplexity measure. Following that, as an extrinsic evaluation, we present the utility of the new MiLDA topic model in a recently proposed IR task of linking Pinterest pins (given in colloquial English on the users' side) to online webshops (given in formal English on the retailers' side). We show that our multi-idiomatic model outperforms the standard monolingual LDA model and the pure bilingual LDA model both in terms of perplexity and MAP scores in the IR task.status: publishe
    corecore