6 research outputs found

    Gender detection of Twitter users based on multiple information sources

    Get PDF
    Twitter provides a simple way for users to express feelings, ideas and opinions, makes the user generated content and associated metadata, available to the community, and provides easy-to-use web and application programming interfaces to access data. The user profile information is important for many studies, but essential information, such as gender and age, is not provided when accessing a Twitter account. However, clues about the user profile, such as the age and gender, behaviors, and preferences, can be extracted from other content provided by the user. The main focus of this paper is to infer the gender of the user from unstructured information, including the username, screen name, description and picture, or by the user generated content. We have performed experiments using an English labelled dataset containing 6.5 M tweets from 65 K users, and a Portuguese labelled dataset containing 5.8 M tweets from 58 K users. We have created four distinct classifiers, trained using a supervised approach, each one considering a group of features extracted from four different sources: user name and screen name, user description, content of the tweets, and profile picture. Features related with the activity, such as number of following and number of followers, were discarded, since these features were found not indicative of gender. A final classifier that combines the prediction of each one of the four previous individual classifiers achieves the best performance, corresponding to 93.2% accuracy for English and 96.9% accuracy for Portuguese data.info:eu-repo/semantics/acceptedVersio

    Exploring Footedness, Throwing Arm, and Handedness as Predictors of Eyedness Using Cluster Analysis and Machine Learning: Implications for the Origins of Behavioural Asymmetries

    Get PDF
    Behavioural asymmetries displayed by individuals, such as hand preference and foot preference, tend to be lateralized in the same direction (left or right). This may be because their co-ordination conveys functional benefits for a variety of motor behaviours. To explore the potential functional relationship between key motor asymmetries, we examined whether footedness, handedness, or throwing arm was the strongest predictor of eyedness. Behavioural asymmetries were measured by self-report in 578 left-handed and 612 right-handed individuals. Cluster analysis of the asymmetries revealed four handedness groups: consistent right-handers, left-eyed right-handers, consistent left-handers, and inconsistent left-handers (who were left-handed but right-lateralized for footedness, throwing and eyedness). Supervised machine learning models showed the importance of footedness, in addition to handedness, in determining eyedness. In right-handers, handedness was the best predictor of eyedness, followed closely by footedness, and for left-handers it was footedness. Overall, predictors were more informative in predicting eyedness for individuals with consistent lateral preferences. Implications of the findings in relation to the origins and genetics of handedness and sports training are discussed. Findings are related to fighting theories of handedness and to bipedalism, which evolved after manual dexterity, and which may have led to some humans being right-lateralized for ballistic movements and left-lateralized for hand dexterity

    Improved K-means clustering algorithms : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science, Massey University, New Zealand

    Get PDF
    K-means clustering algorithm is designed to divide the samples into subsets with the goal that maximizes the intra-subset similarity and inter-subset dissimilarity where the similarity measures the relationship between two samples. As an unsupervised learning technique, K-means clustering algorithm is considered one of the most used clustering algorithms and has been applied in a variety of areas such as artificial intelligence, data mining, biology, psychology, marketing, medicine, etc. K-means clustering algorithm is not robust and its clustering result depends on the initialization, the similarity measure, and the predefined cluster number. Previous research focused on solving a part of these issues but has not focused on solving them in a unified framework. However, fixing one of these issues does not guarantee the best performance. To improve K-means clustering algorithm, one of the most famous and widely used clustering algorithms, by solving its issues simultaneously is challenging and significant. This thesis conducts an extensive research on K-means clustering algorithm aiming to improve it. First, we propose the Initialization-Similarity (IS) clustering algorithm to solve the issues of the initialization and the similarity measure of K-means clustering algorithm in a unified way. Specifically, we propose to fix the initialization of the clustering by using sum-of-norms (SON) which outputs the new representation of the original samples and to learn the similarity matrix based on the data distribution. Furthermore, the derived new representation is used to conduct K-means clustering. Second, we propose a Joint Feature Selection with Dynamic Spectral (FSDS) clustering algorithm to solve the issues of the cluster number determination, the similarity measure, and the robustness of the clustering by selecting effective features and reducing the influence of outliers simultaneously. Specifically, we propose to learn the similarity matrix based on the data distribution as well as adding the ranked constraint on the Laplacian matrix of the learned similarity matrix to automatically output the cluster number. Furthermore, the proposed algorithm employs the L2,1-norm as the sparse constraints on the regularization term and the loss function to remove the redundant features and reduce the influence of outliers respectively. Third, we propose a Joint Robust Multi-view (JRM) spectral clustering algorithm that conducts clustering for multi-view data while solving the initialization issue, the cluster number determination, the similarity measure learning, the removal of the redundant features, and the reduction of outlier influence in a unified way. Finally, the proposed algorithms outperformed the state-of-the-art clustering algorithms on real data sets. Moreover, we theoretically prove the convergences of the proposed optimization methods for the proposed objective functions

    The Impact of Consumer Perceptions of Tanking on National Basketball Association Attendance

    Get PDF
    This dissertation studies the impact of consumer perceptions of tanking on National Basketball Attendance (NBA) attendance. The prevalence of tanking in the NBA raised concerns that some teams were purposely avoiding winning games in order to improve their draft position. The majority of previous studies on tanking have focused on developing empirical evidence of the existence of tanking in sport. Yet, no study systematically explored the impact of perceived tanking behavior on consumer demand for sport. As tanking teams rarely reveal their tanking strategy to the public, fans may not correctly identify tanking behavior in sport, and thus are likely to rely on their perceptions of tanking to make attendance decisions. The current dissertation employs tanking discussions on the social media platform Twitter along with data mining tools to quantify consumer perceptions of tanking. Econometric models are then utilized to analyze the effect of the perceived tanking behavior on demand for NBA games. The estimation results provide robust evidence that the increasing awareness of tanking for home teams hurts NBA attendance in both the short and long term. This dissertation also reveals that more negative attitudes toward visiting teams’ tanking behavior can undermine consumer interest in attending NBA games. These findings offer important managerial implications on the urgency of restraining tanking behavior as well as the importance of maintaining integrity in sports competitions
    corecore