116 research outputs found

    A Machine Learning Approach to Predicting Alcohol Consumption in Adolescents From Historical Text Messaging Data

    Get PDF
    Techniques based on artificial neural networks represent the current state-of-the-art in machine learning due to the availability of improved hardware and large data sets. Here we employ doc2vec, an unsupervised neural network, to capture the semantic content of text messages sent by adolescents during high school, and encode this semantic content as numeric vectors. These vectors effectively condense the text message data into highly leverageable inputs to a logistic regression classifier in a matter of hours, as compared to the tedious and often quite lengthy task of manually coding data. Using our machine learning approach, we are able to train a logistic regression model to predict adolescents\u27 engagement in substance abuse during distinct life phases with accuracy ranging from 76.5% to 88.1%. We show the effects of grade level and text message aggregation strategy on the efficacy of document embedding generation with doc2vec. Additional examination of the vectorizations for specific terms extracted from the text message data adds quantitative depth to this analysis. We demonstrate the ability of the method used herein to overcome traditional natural language processing concerns related to unconventional orthography. These results suggest that the approach described in this thesis is a competitive and efficient alternative to existing methodologies for predicting substance abuse behaviors. This work reveals the potential for the application of machine learning-based manipulation of text messaging data to development of automatic intervention strategies against substance abuse and other adolescent challenges

    A Network Science and Document Similarity based Hybrid Job Recommendation System

    Get PDF
    Tööde soovitussüsteemid kasutavad erinevaid andmeallikaid lõppkasutajale parema sisu tagamiseks. Hästi toimiva soovitussüsteemi arendamine nõuab keerulisi hübriidseid lähenemisi sarnasuse kujutamisele põhinedes töökuulutuste ja resümeede sisudele ja nendevahelistele interaktsioonidele. Antud töö tulemina arendati efektiivne võrgul baseeruv töökohtade soovitussüsteem, mis kasutab Personalized PageRank algoritmi töökohtade järjestamiseks põhinedes tööotsija resümee ja töökuulutuse kui tekstiliste dokumentide sarnasustele ning eelnevatele kasutaja ja töökuulutuste vahelistele interaktsioonidele.Meie lähenemine saavutas 50%-lise saagise ja tekitas online A/B testi jooksul rohkem kandideerimisi kui eelmised algoritmid.Job recommendation systems mainly use different sources of data in order to give the better content for the end user. Developing the well-performing system requires complex hybrid approaches of representing similarity based on the content of job postings and resumes as well as interactions between them. We develop an efficient hybrid network-based job recommendation system which uses Personalized PageRank algorithm in order to rank vacancies for the users based on the similarity between resumes and job posts as textual documents, along with previous interactions of users with vacancies. Our approach achieved the recall of 50% and generated more applies for the jobs during the online A/B test than previous algorithms

    Discriminative feature learning for multimodal classification

    Get PDF
    The purpose of this thesis is to tackle two related topics: multimodal classification and objective functions to improve the discriminative power of features. First, I worked on image and text classification tasks and performed many experiments to show the effectiveness of different approaches available in literature. Then, I introduced a novel methodology which can classify multimodal documents using singlemodal classifiers merging textual and visual information into images and a novel loss function to improve separability between samples of a dataset. Results show that exploiting multimodal data increases performances on classification tasks rather than using traditional single-modality methods. Moreover the introduced GIT loss function is able to enhance the discriminative power of features, lowering intra-class distance and raising inter-class distance between samples of a multiclass dataset

    Discriminative feature learning for multimodal classification

    Get PDF
    The purpose of this thesis is to tackle two related topics: multimodal classification and objective functions to improve the discriminative power of features. First, I worked on image and text classification tasks and performed many experiments to show the effectiveness of different approaches available in literature. Then, I introduced a novel methodology which can classify multimodal documents using singlemodal classifiers merging textual and visual information into images and a novel loss function to improve separability between samples of a dataset. Results show that exploiting multimodal data increases performances on classification tasks rather than using traditional single-modality methods. Moreover the introduced GIT loss function is able to enhance the discriminative power of features, lowering intra-class distance and raising inter-class distance between samples of a multiclass dataset

    Characterizing and Predicting Early Reviewers for Effective Product Marketing on E-Commerce Websites

    Get PDF
    Online reviews have become an important source of information for users before making an informed purchase decision. Early reviews of a product tend to have a high impact on the subsequent product sales. In this paper, we take the initiative to study the behavior characteristics of early reviewers through their posted reviews on two real-world large e-commerce platforms, i.e., Amazon and Yelp. In specific, we divide product lifetime into three consecutive stages, namely early, majority and laggards. A user who has posted a review in the early stage is considered as an early reviewer. We quantitatively characterize early reviewers based on their rating behaviors, the helpfulness scores received from others and the correlation of their reviews with product popularity. We have found that (1) an early reviewer tends to assign a higher average rating score; and (2) an early reviewer tends to post more helpful reviews. Our analysis of product reviews also indicates that early reviewers' ratings and their received helpfulness scores are likely to influence product popularity. By viewing review posting process as a multiplayer competition game, we propose a novel margin-based embedding model for early reviewer prediction. Extensive experiments on two different e-commerce datasets have shown that our proposed approach outperforms a number of competitive baselines

    A Cascade Framework for Privacy-Preserving Point-of-Interest Recommender System

    Get PDF
    Point-of-interest (POI) recommender systems (RSes) have gained significant popularity in recent years due to the prosperity of location-based social networks (LBSN). However, in the interest of personalization services, various sensitive contextual information is collected, causing potential privacy concerns. This paper proposes a cascaded privacy-preserving POI recommendation (CRS) framework that protects contextual information such as user comments and locations. We demonstrate a minimized trade-off between the privacy-preserving feature and prediction accuracy by applying a semi-decentralized model to real-world datasets
    corecore