Search CORE

627 research outputs found

Nationality Classification Using Name Embeddings

Author: Coskun Baris
Han Shuchu
Hu Yifan
Liu Meizhu
Qin Hong
Skiena Steven
Ye Junting
Publication venue
Publication date: 25/08/2017
Field of study

Nationality identification unlocks important demographic information, with many applications in biomedical and sociological research. Existing name-based nationality classifiers use name substrings as features and are trained on small, unrepresentative sets of labeled names, typically extracted from Wikipedia. As a result, these methods achieve limited performance and cannot support fine-grained classification. We exploit the phenomena of homophily in communication patterns to learn name embeddings, a new representation that encodes gender, ethnicity, and nationality which is readily applicable to building classifiers and other systems. Through our analysis of 57M contact lists from a major Internet company, we are able to design a fine-grained nationality classifier covering 39 groups representing over 90% of the world population. In an evaluation against other published systems over 13 common classes, our F1 score (0.795) is substantial better than our closest competitor Ethnea (0.580). To the best of our knowledge, this is the most accurate, fine-grained nationality classifier available. As a social media application, we apply our classifiers to the followers of major Twitter celebrities over six different domains. We demonstrate stark differences in the ethnicities of the followers of Trump and Obama, and in the sports and entertainments favored by different groups. Finally, we identify an anomalous political figure whose presumably inflated following appears largely incapable of reading the language he posts in.Comment: 10 pages, 9 figures, 4 table, accepted by CIKM 2017, Demo and free API: www.name-prism.co

arXiv.org e-Print Archive

Crossref

Integrating Multiple Sketch Recognition Methods to Improve Accuracy and Speed

Author: Copesetty Siddhartha Karthik
Publication venue
Publication date: 27/02/2020
Field of study

Sketch recognition is the computer understanding of hand drawn diagrams. Recognizing sketches instantaneously is necessary to build beautiful interfaces with real time feedback. There are various techniques to quickly recognize sketches into ten or twenty classes. However for much larger datasets of sketches from a large number of classes, these existing techniques can take an extended period of time to accurately classify an incoming sketch and require significant computational overhead. Thus, to make classification of large datasets feasible, we propose using multiple stages of recognition. In the initial stage, gesture-based feature values are calculated and the trained model is used to classify the incoming sketch. Sketches with an accuracy less than a threshold value, go through a second stage of geometric recognition techniques. In the second geometric stage, the sketch is segmented, and sent to shape-specific recognizers. The sketches are matched against predefined shape descriptions, and confidence values are calculated. The system outputs a list of classes that the sketch could be classified as, along with the accuracy, and precision for each sketch. This process both significantly reduces the time taken to classify such huge datasets of sketches, and increases both the accuracy and precision of the recognition

Integrating Multiple Sketch Recognition Methods to Improve Accuracy and Speed

Author: Copesetty Siddhartha Karthik
Publication venue
Publication date: 27/02/2020
Field of study

Texas A&M Repository

Proceedings of the Graduate Student Symposium of the 7th International Conference on the Theory and Application of Diagrams, July 5 2012

Author: Acarturk Cengiz
Alacam Ozge
Arslan Farrukh
Blake Andrew
Fanjoy Lillian
Hamfelt Andreas
Howse John
MacNeill Luke
Miller Nathan
Pacaci Gorkem
Rodgers Peter
Stapleton Gem
Stead Alistair
Vivian Peter
Publication venue: Scholarship & Creative Works @ Digital UNC
Publication date: 01/01/2012
Field of study

Proceedings of the Graduate Student Symposium held at the 7th International Conference on the Theory and Application of Diagrams, ( Diagrams 2012 ), held at the University of Kent on July 5, 2012. Dr. Nathaniel Miller, professor of in the School of Mathematical Sciences at UNC, served on the symposium organizing committee

University of Northern Colorado

The relationship sabotage scale: an evaluation of factor analyses and constructive validity

Author: Caltabiano Nerina
Peel Raquel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Background: Some individuals are no longer entering romantic relationships, others move through relationships too quickly searching for “the one” and making quick assessments of their romantic partners, while others stay in their relationships but “check out” or do not work on their issues. These are conclusions from two studies: (1) an interview with psychologists who specialise in relationship therapy, and (2) an analysis of individuals’ lived experiences of relationships. The concept of relationship sabotage can explain these phenomena. However, presently, there is no instrument to conceptualise and empirically measure how people continue to employ self-defeating attitudes and behaviors in (and out) of relationships to impede success, or withdraw effort, and justify failure. Methods and Results: A series of three studies (involving a total of 1365 English speaking individuals of diverse gender orientation, sexual orientation, and cultural background, with relationship sabotage experience) were conceptualized for the current project to fill the need for scale development and to build empirical evidence on the topic of self-sabotage in romantic relationships. The scale was developed over two studies using exploratory factor analysis and one-congeneric model analyses. The third study, using confirmatory factor analysis, confirmed the final structure for the Relationship Sabotage Scale (RSS), which contains 12 items and three factors: defensiveness, trust difficulty, and lack of relationship skills. Constructive validity analyses were also conducted. Conclusion: The RSS is a brief scale that provides conclusive information about individual patterns in relationships. Findings using this scale can offer explanations regarding the reasons that individuals engage in destructive behaviours from one relationship to the next. Investigations should continue to test a model for sabotage in romantic relationships using the developed scale and other factors such as relationship diferences and insecure attachment. More specifically, this measure can be used to understand mediator constructs of relational outcomes within the attachment framework to explain relationship dissolution and work towards relationship maintenance

ResearchOnline@JCU

ResearchOnline at James Cook University

Directory of Open Access Journals

PubMed Central

University of Southern Queensland ePrints

Recommended from our members

Where are you talking about? Advances and Challenges of Geographic Analysis of Text with Application to Disease Monitoring

Author: Gritta Milan
Publication venue: University of Cambridge
Publication date: 16/07/2019
Field of study

The Natural Language Processing task we focus on in this thesis is Geoparsing. Geoparsing is the process of extraction and grounding of toponyms (place names). Consider this sentence: "The victims of the Spanish earthquake off the coast of Malaga were of American and Mexican origin." Four toponyms will be extracted (called Geotagging) and grounded to their geographic coordinates (called Toponym Resolution). However, our research goes further than any previous work by showing how to distinguish the literal place(s) of the event (Spain, Malaga) from other linguistic types/uses such as nationalities (Mexican, American), improving downstream task accuracy. We consolidate and extend the Standard Evaluation Framework, discuss key research problems, then present concrete solutions in order to advance each stage of geoparsing. For geotagging, as well as training a SOTA neural Location-NER tagger, we simplify Metonymy Resolution with a novel minimalist feature extraction combined with an LSTM-based classifier, matching SOTA results. For toponym resolution, we deploy the latest deep learning methods to achieve SOTA performance by augmenting neural models with hitherto unused geographic features called Map Vectors. With each research project, we provide high-quality datasets and system prototypes, further building resources in this field. We then show how these geoparsing advances coupled with our proposed Intra-Document Analysis can be used to associate news articles with locations in order to monitor the spread of public health threats. To this end, we evaluate our research contributions with production data from a real-time downstream application to improve geolocation of news events for disease monitoring. The data was made available to us by the Joint Research Centre (JRC), which operates one such system called MediSys that processes incoming news articles in order to monitor threats to public health and make these available to a variety of governmental, business and non-profit organisations. We also discuss steps towards an end-to-end, automated news monitoring system and make actionable recommendations for future work. In summary, the thesis aims are twofold: (1) Generate original geoparsing research aimed at advancing each stage of the pipeline by addressing pertinent challenges with concrete solutions and actionable proposals. (2) Demonstrate how this research can be applied to news event monitoring to increase the efficacy of existing biosurveillance systems, e.g. European Commission’s MediSys.I was generously funded by DREAM CDT, which was funded by NERC of UKRI

Apollo (Cambridge)

Women's experiences of inpatient mental health care

Author: Scholes Amy
Publication venue
Publication date: 31/12/2020
Field of study

The University of Manchester - Institutional Repository

Hybrid semantic-document models

Author: Darren Clowes (7168448)
Publication venue
Publication date: 01/01/2013
Field of study

This thesis presents the concept of hybrid semantic-document models to aid information management when using standards for complex technical domains such as military data communication. These standards are traditionally text based documents for human interpretation, but prose sections can often be ambiguous and can lead to discrepancies and subsequent implementation problems. Many organisations produce semantic representations of the material to ensure common understanding and to exploit computer aided development. In developing these semantic representations, no relationship is maintained to the original prose. Maintaining relationships between the original prose and the semantic model has key benefits, including assessing conformance at a semantic level, and enabling original content authors to explicitly define their intentions, thus reducing ambiguity and facilitating computer aided functionality. Through the use of a case study method based on the military standard MIL-STD-6016C, a framework of relationships is proposed. These relationships can integrate with common document modelling techniques and provide the necessary functionality to allow semantic content to be mapped into document views. These relationships are then generalised for applicability to a wider context. Additionally, this framework is coupled with a templating approach which, for repeating sections, can improve consistency and further enhance quality. A reflective approach to model driven web rendering is presented and evaluated. This reflective approach uses self-inspection at runtime to read directly from the model, thus eliminating the need for any generative processes which result in data duplication across source used for different purpose

Loughborough University Institutional Repository

PROCAT: Product Catalogue Dataset for Implicit Clustering, Permutation Learning and Structure Prediction

Author: Derczynski Leon
Jurewicz Mateusz
Publication venue
Publication date: 01/12/2021
Field of study

The IT University of Copenhagen's Repository