611 research outputs found

    Nationality Classification Using Name Embeddings

    Full text link
    Nationality identification unlocks important demographic information, with many applications in biomedical and sociological research. Existing name-based nationality classifiers use name substrings as features and are trained on small, unrepresentative sets of labeled names, typically extracted from Wikipedia. As a result, these methods achieve limited performance and cannot support fine-grained classification. We exploit the phenomena of homophily in communication patterns to learn name embeddings, a new representation that encodes gender, ethnicity, and nationality which is readily applicable to building classifiers and other systems. Through our analysis of 57M contact lists from a major Internet company, we are able to design a fine-grained nationality classifier covering 39 groups representing over 90% of the world population. In an evaluation against other published systems over 13 common classes, our F1 score (0.795) is substantial better than our closest competitor Ethnea (0.580). To the best of our knowledge, this is the most accurate, fine-grained nationality classifier available. As a social media application, we apply our classifiers to the followers of major Twitter celebrities over six different domains. We demonstrate stark differences in the ethnicities of the followers of Trump and Obama, and in the sports and entertainments favored by different groups. Finally, we identify an anomalous political figure whose presumably inflated following appears largely incapable of reading the language he posts in.Comment: 10 pages, 9 figures, 4 table, accepted by CIKM 2017, Demo and free API: www.name-prism.co

    Integrating Multiple Sketch Recognition Methods to Improve Accuracy and Speed

    Get PDF
    Sketch recognition is the computer understanding of hand drawn diagrams. Recognizing sketches instantaneously is necessary to build beautiful interfaces with real time feedback. There are various techniques to quickly recognize sketches into ten or twenty classes. However for much larger datasets of sketches from a large number of classes, these existing techniques can take an extended period of time to accurately classify an incoming sketch and require significant computational overhead. Thus, to make classification of large datasets feasible, we propose using multiple stages of recognition. In the initial stage, gesture-based feature values are calculated and the trained model is used to classify the incoming sketch. Sketches with an accuracy less than a threshold value, go through a second stage of geometric recognition techniques. In the second geometric stage, the sketch is segmented, and sent to shape-specific recognizers. The sketches are matched against predefined shape descriptions, and confidence values are calculated. The system outputs a list of classes that the sketch could be classified as, along with the accuracy, and precision for each sketch. This process both significantly reduces the time taken to classify such huge datasets of sketches, and increases both the accuracy and precision of the recognition

    Integrating Multiple Sketch Recognition Methods to Improve Accuracy and Speed

    Get PDF
    Sketch recognition is the computer understanding of hand drawn diagrams. Recognizing sketches instantaneously is necessary to build beautiful interfaces with real time feedback. There are various techniques to quickly recognize sketches into ten or twenty classes. However for much larger datasets of sketches from a large number of classes, these existing techniques can take an extended period of time to accurately classify an incoming sketch and require significant computational overhead. Thus, to make classification of large datasets feasible, we propose using multiple stages of recognition. In the initial stage, gesture-based feature values are calculated and the trained model is used to classify the incoming sketch. Sketches with an accuracy less than a threshold value, go through a second stage of geometric recognition techniques. In the second geometric stage, the sketch is segmented, and sent to shape-specific recognizers. The sketches are matched against predefined shape descriptions, and confidence values are calculated. The system outputs a list of classes that the sketch could be classified as, along with the accuracy, and precision for each sketch. This process both significantly reduces the time taken to classify such huge datasets of sketches, and increases both the accuracy and precision of the recognition

    Proceedings of the Graduate Student Symposium of the 7th International Conference on the Theory and Application of Diagrams, July 5 2012

    Get PDF
    Proceedings of the Graduate Student Symposium held at the 7th International Conference on the Theory and Application of Diagrams, ( Diagrams 2012 ), held at the University of Kent on July 5, 2012. Dr. Nathaniel Miller, professor of in the School of Mathematical Sciences at UNC, served on the symposium organizing committee

    The relationship sabotage scale: an evaluation of factor analyses and constructive validity

    Get PDF
    Background: Some individuals are no longer entering romantic relationships, others move through relationships too quickly searching for “the one” and making quick assessments of their romantic partners, while others stay in their relationships but “check out” or do not work on their issues. These are conclusions from two studies: (1) an interview with psychologists who specialise in relationship therapy, and (2) an analysis of individuals’ lived experiences of relationships. The concept of relationship sabotage can explain these phenomena. However, presently, there is no instrument to conceptualise and empirically measure how people continue to employ self-defeating attitudes and behaviors in (and out) of relationships to impede success, or withdraw effort, and justify failure. Methods and Results: A series of three studies (involving a total of 1365 English speaking individuals of diverse gender orientation, sexual orientation, and cultural background, with relationship sabotage experience) were conceptualized for the current project to fill the need for scale development and to build empirical evidence on the topic of self-sabotage in romantic relationships. The scale was developed over two studies using exploratory factor analysis and one-congeneric model analyses. The third study, using confirmatory factor analysis, confirmed the final structure for the Relationship Sabotage Scale (RSS), which contains 12 items and three factors: defensiveness, trust difficulty, and lack of relationship skills. Constructive validity analyses were also conducted. Conclusion: The RSS is a brief scale that provides conclusive information about individual patterns in relationships. Findings using this scale can offer explanations regarding the reasons that individuals engage in destructive behaviours from one relationship to the next. Investigations should continue to test a model for sabotage in romantic relationships using the developed scale and other factors such as relationship diferences and insecure attachment. More specifically, this measure can be used to understand mediator constructs of relational outcomes within the attachment framework to explain relationship dissolution and work towards relationship maintenance

    Hybrid semantic-document models

    Get PDF
    This thesis presents the concept of hybrid semantic-document models to aid information management when using standards for complex technical domains such as military data communication. These standards are traditionally text based documents for human interpretation, but prose sections can often be ambiguous and can lead to discrepancies and subsequent implementation problems. Many organisations produce semantic representations of the material to ensure common understanding and to exploit computer aided development. In developing these semantic representations, no relationship is maintained to the original prose. Maintaining relationships between the original prose and the semantic model has key benefits, including assessing conformance at a semantic level, and enabling original content authors to explicitly define their intentions, thus reducing ambiguity and facilitating computer aided functionality. Through the use of a case study method based on the military standard MIL-STD-6016C, a framework of relationships is proposed. These relationships can integrate with common document modelling techniques and provide the necessary functionality to allow semantic content to be mapped into document views. These relationships are then generalised for applicability to a wider context. Additionally, this framework is coupled with a templating approach which, for repeating sections, can improve consistency and further enhance quality. A reflective approach to model driven web rendering is presented and evaluated. This reflective approach uses self-inspection at runtime to read directly from the model, thus eliminating the need for any generative processes which result in data duplication across source used for different purpose
    • …
    corecore