1,324 research outputs found

    Fake Opinion Detection: How Similar are Crowdsourced Datasets to Real Data?

    Full text link
    [EN] Identifying deceptive online reviews is a challenging tasks for Natural Language Processing (NLP). Collecting corpora for the task is difficult, because normally it is not possible to know whether reviews are genuine. A common workaround involves collecting (supposedly) truthful reviews online and adding them to a set of deceptive reviews obtained through crowdsourcing services. Models trained this way are generally successful at discriminating between `genuineĀæ online reviews and the crowdsourced deceptive reviews. It has been argued that the deceptive reviews obtained via crowdsourcing are very different from real fake reviews, but the claim has never been properly tested. In this paper, we compare (false) crowdsourced reviews with a set of `realĀæ fake reviews published on line. We evaluate their degree of similarity and their usefulness in training models for the detection of untrustworthy reviews. We find that the deceptive reviews collected via crowdsourcing are significantly different from the fake reviews published online. In the case of the artificially produced deceptive texts, it turns out that their domain similarity with the targets affects the modelsĀæ performance, much more than their untruthfulness. This suggests that the use of crowdsourced datasets for opinion spam detection may not result in models applicable to the real task of detecting deceptive reviews. As an alternative method to create large-size datasets for the fake reviews detection task, we propose methods based on the probabilistic annotation of unlabeled texts, relying on the use of meta-information generally available on the e-commerce sites. Such methods are independent from the content of the reviews and allow to train reliable models for the detection of fake reviews.Leticia Cagnina thanks CONICET for the continued financial support. This work was funded by MINECO/FEDER (Grant No. SomEMBED TIN2015-71147-C2-1-P). The work of Paolo Rosso was partially funded by the MISMIS-FAKEnHATE Spanish MICINN research project (PGC2018-096212-B-C31). Massimo Poesio was in part supported by the UK Economic and Social Research Council (Grant Number ES/M010236/1).Fornaciari, T.; Cagnina, L.; Rosso, P.; Poesio, M. (2020). Fake Opinion Detection: How Similar are Crowdsourced Datasets to Real Data?. Language Resources and Evaluation. 54(4):1019-1058. https://doi.org/10.1007/s10579-020-09486-5S10191058544Baeza-Yates, R. (2018). Bias on the web. Communications of the ACM, 61(6), 54ā€“61.Banerjee, S., & Chua, A. Y. (2014). Applauses in hotel reviews: Genuine or deceptive? In: Science and Information Conference (SAI), 2014 (pp. 938ā€“942). New York: IEEE.Bhargava, R., Baoni, A., & Sharma, Y. (2018). Composite sequential modeling for identifying fake reviews. Journal of Intelligent Systems,. https://doi.org/10.1515/jisys-2017-0501.Bickel, P. J., & Doksum, K. A. (2015). Mathematical statistics: Basic ideas and selected topics (2nd ed., Vol. 1). Boca Raton: Chapman and Hall/CRC Press.Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3(Jan), 993ā€“1022.Blum, A., & Mitchell, T. (1998). Combining labeled and unlabeled data with co-training. In: Proceedings of the eleventh annual conference on computational learning theory (pp. 92ā€“100). New York: ACM.Cagnina, L. C., & Rosso, P. (2017). Detecting deceptive opinions: Intra and cross-domain classification using an efficient representation. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 25(Suppl. 2), 151ā€“174. https://doi.org/10.1142/S0218488517400165.Cardoso, E. F., Silva, R. M., & Almeida, T. A. (2018). Towards automatic filtering of fake reviews. Neurocomputing, 309, 106ā€“116. https://doi.org/10.1016/j.neucom.2018.04.074.Carpenter, B. (2008). Multilevel bayesian models of categorical data annotation. Retrieved from http://lingpipe.files.wordpress.com/2008/11/carp-bayesian-multilevel-annotation.pdf.Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20, 273ā€“297.Costa, P. T., & MacCrae, R. R. (1992). Revised NEO personality inventory (NEO PI-R) and NEO five-factor inventory (NEO FFI): Professional manual. Psychological Assessment Resources.Dawid, A. P., & Skene, A. M. (1979). Maximum likelihood estimation of observer error-rates using the EM algorithm. Applied Statistics, 28(1), 20ā€“28.Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B (Methodological), 39(1), 1ā€“38.Elkan, C., & Noto, K. (2008). Learning classifiers from only positive and unlabeled data. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 213ā€“220). New York: ACM.Fei, G., Mukherjee, A., Liu, B., Hsu, M., Castellanos, M., & Ghosh, R. (2013). Exploiting burstiness in reviews for review spammer detection. In: Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media (Vol. 13, pp. 175ā€“184).Feng, S., Banerjee, R., & Choi, Y. (2012). Syntactic stylometry for deception detection. In: Proceedings of the 50th annual meeting of the association for computational linguistics (Vol. 2: Short Papers, pp. 171ā€“175). Jeju Island: Association for Computational Linguistics.Forman, G. (2003). An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research, 3, 1289ā€“1305.Fornaciari, T., & Poesio, M. (2013). Automatic deception detection in Italian court cases. Artificial intelligence and law, 21(3), 303ā€“340. https://doi.org/10.1007/s10506-013-9140-4.Fornaciari, T., & Poesio, M. (2014). Identifying fake amazon reviews as learning from crowds. In: Proceedings of the 14th conference of the European chapter of the Association for Computational Linguistics (pp. 279ā€“287). Gothenburg: Association for Computational Linguistics. Retrieved from http://www.aclweb.org/anthology/E14-1030.Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models., Analytical methods for social research Cambridge: Cambridge University Press.Graves, A., Jaitly, N., & Mohamed, A. R. (2013). Hybrid speech recognition with deep bidirectional LSTM. In: 2013 IEEE workshop on automatic speech recognition and understanding (ASRU) (pp. 273ā€“278). New York: IEEE.HernĆ”ndez-CastaƱeda, Ɓ., & Calvo, H. (2017). Deceptive text detection using continuous semantic space models. Intelligent Data Analysis, 21(3), 679ā€“695.HernĆ”ndezĀ Fusilier, D., GuzmĆ”n, R., MĆ³ntesĀ y Gomez, M., & Rosso, P. (2013). Using pu-learning to detect deceptive opinion spam. In: Proc. of the 4th workshop on computational approaches to subjectivity, sentiment and social media analysis (pp. 38ā€“45).HernĆ”ndez Fusilier, D., Montes-y GĆ³mez, M., Rosso, P., & Cabrera, R. G. (2015). Detecting positive and negative deceptive opinions using pu-learning. Information Processing & Management, 51(4), 433ā€“443.Hovy, D. (2016). The enemy in your own camp: How well can we detect statistically-generated fake reviewsā€“an adversarial study. In: The 54th annual meeting of the association for computational linguistics (p 351).Jelinek, F., Lafferty, J. D., & Mercer, R. L. (1992). Basic methods of probabilistic context free grammars. Speech recognition and understanding (pp. 345ā€“360). New York: Springer.Jindal, N., & Liu, B. (2008). Opinion spam and analysis. In: Proceedings of the 2008 international conference on web search and data mining (pp. 219ā€“230). New York: ACM.Karatzoglou, A., Meyer, D., & Hornik, K. (2006). Support vector machines in R. Journal of Statistical Software, 15(9), 1ā€“28.Kim, S., Lee, S., Park, D., & Kang, J. (2017). Constructing and evaluating a novel crowdsourcing-based paraphrased opinion spam dataset. In: Proceedings of the 26th international conference on world wide web (pp. 827ā€“836). Geneva: International World Wide Web Conferences Steering Committee.Li, F., Huang, M., Yang, Y., & Zhu, X. (2011). Learning to identify review spam. IJCAI Proceedings-International Joint Conference on Artificial Intelligence, 22(3), 2488ā€“2493.Li, H., Chen, Z., Liu, B., Wei, X., & Shao, J. (2014a). Spotting fake reviews via collective positive-unlabeled learning. In: 2014 IEEE international conference on data mining (ICDM) (pp. 899ā€“904). New York: IEEE.Li, H., Fei, G., Wang, S., Liu, B., Shao, W., Mukherjee, A., & Shao, J. (2017). Bimodal distribution and co-bursting in review spam detection. In: Proceedings of the 26th international conference on world wide web (pp. 1063ā€“1072). Geneva: International World Wide Web Conferences Steering Committee.Li, H., Liu, B., Mukherjee, A., & Shao, J. (2014b). Spotting fake reviews using positive-unlabeled learning. ComputaciĆ³n y Sistemas, 18(3), 467ā€“475.Li, J., Ott, M., Cardie, C., & Hovy, E. H. (2014c). Towards a general rule for identifying deceptive opinion spam. In: ACL (Vol. 1, pp. 1566ā€“1576).Lin, C. H., Hsu, P. Y., Cheng, M. S., Lei, H. T., & Hsu, M. C. (2017). Identifying deceptive review comments with rumor and lie theories. In: International conference in swarm intelligence (pp. 412ā€“420). New York: Springer.Liu, B., Dai, Y., Li, X., Lee, W. S., & Yu, P. S. (2003). Building text classifiers using positive and unlabeled examples. In: Third IEEE international conference on data mining (pp. 179ā€“186). New York: IEEE.Liu, B., Lee, W. S., Yu, P. S., & Li, X. (2002). Partially supervised classification of text documents. ICML, 2, 387ā€“394.Martens, D., & Maalej, W. (2019). Towards understanding and detecting fake reviews in app stores. Empirical Software Engineering,. https://doi.org/10.1007/s10664-019-09706-9.Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:13013781.Mukherjee, A., Kumar, A., Liu, B., Wang, J., Hsu, M., Castellanos, M., & Ghosh, R. (2013a). Spotting opinion spammers using behavioral footprints. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 632ā€“640) New York: ACM.Mukherjee, A., Venkataraman, V., Liu, B., & Glance, N. S. (2013b). What yelp fake review filter might be doing? In: Proceedings of the seventh international AAAI conference on weblogs and social media.Negri, M., Bentivogli, L., Mehdad, Y., Giampiccolo, D., & Marchetti, A. (2011). Divide and conquer: Crowdsourcing the creation of cross-lingual textual entailment corpora. In: Proceedings of the conference on empirical methods in natural language processing (pp. 670ā€“679). Stroudsburg: Association for Computational Linguistics.Ott, M., Cardie, C., & Hancock, J. T. (2013). Negative deceptive opinion spam. In: Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: human language technologies (pp. 497ā€“501).Ott, M., Choi, Y., Cardie, C., & Hancock, J. (2011). Finding deceptive opinion spam by any stretch of the imagination. In: Proceedings of the 49th Annual meeting of the association for computational linguistics: human language technologies (pp. 309ā€“319). Portland, Oregon: Association for Computational Linguistics.Pennebaker, J. W., Francis, M. E., & Booth, R. J. (2001). Linguistic inquiry and word count (LIWC): LIWC2001. Mahwah: Lawrence Erlbaum Associates.Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532ā€“1543).Raykar, V. C., Yu, S., Zhao, L. H., Valadez, G. H., Florin, C., Bogoni, L., et al. (2010). Learning from crowds. Journal of Machine Learning Research, 11, 1297ā€“1322.Ren, Y., & Ji, D. (2017). Neural networks for deceptive opinion spam detection: An empirical study. Information Sciences, 385, 213ā€“224.Rout, J. K., Dalmia, A., Choo, K. K. R., Bakshi, S., & Jena, S. K. (2017). Revisiting semi-supervised learning for online deceptive review detection. IEEE Access, 5(1), 1319ā€“1327.Saini, M., & Sharan, A. (2017). Ensemble learning to find deceptive reviews using personality traits and reviews specific features. Journal of Digital Information Management, 12(2), 84ā€“94.Salloum, W., Edwards, E., Ghaffarzadegan, S., Suendermann-Oeft, D., & Miller, M. (2017). Crowdsourced continuous improvement of medical speech recognition. In: The AAAI-17 workshop on crowdsourcing, deep learning, and artificial intelligence agents.Schmid, H. (1994). Probabilistic part-of-speech tagging using decision trees. In: Proceedings of international conference on new methods in language processing. Retrieved from http://www.ims.uni-stuttgart.de/ftp/pub/corpora/tree-tagger1.pdf.Shehnepoor, S., Salehi, M., Farahbakhsh, R., & Crespi, N. (2017). Netspam: A network-based spam detection framework for reviews in online social media. IEEE Transactions on Information Forensics and Security, 12(7), 1585ā€“1595.Skeppstedt, M., Peldszus, A., & Stede, M. (2018). More or less controlled elicitation of argumentative text: Enlarging a microtext corpus via crowdsourcing. In: Proceedings of the 5th workshop on argument mining (pp. 155ā€“163).Strapparava, C., & Mihalcea, R. (2009). The lie detector: Explorations in the automatic recognition of deceptive language. In: Proceedings of the 47th annual meeting of the association for computational linguistics and the 4th international joint conference on natural language processing.Streitfeld, D. (August 25th25{{\rm th}}, 2012). The best book reviews money can buy. The New York Times.Whitehill, J., Wu, T., Bergsma, F., Movellan, J. R., & Ruvolo, P. L. (2009). Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. Advances in neural information processing systems (pp. 2035ā€“2043). Cambridge: MIT Press.Xie, S., Wang, G., Lin, S., & Yu, P. S. (2012). Review spam detection via temporal pattern discovery. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining (pp 823ā€“831). New York: ACM.Yang, Y., & Liu, X. (1999). A re-examination of text categorization methods. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR ā€™99 (pp. 42ā€“49). New York: ACM.Zhang, W., Bu, C., Yoshida, T., & Zhang, S. (2016). Cospa: A co-training approach for spam review identification with support vector machine. Information, 7(1), 12.Zhang, W., Du, Y., Yoshida, T., & Wang, Q. (2018). DRI-RCNN: An approach to deceptive review identification using recurrent convolutional neural network. Information Processing & Management, 54(4), 576ā€“592.Zhou, L., Shi, Y., & Zhang, D. (2008). A Statistical Language Modeling Approach to Online Deception Detection. IEEE Transactions on Knowledge and Data Engineering, 20(8), 1077ā€“1081

    Reading the news through its structure: new hybrid connectivity based approaches

    Get PDF
    In this thesis a solution for the problem of identifying the structure of news published by online newspapers is presented. This problem requires new approaches and algorithms that are capable of dealing with the massive number of online publications in existence (and that will grow in the future). The fact that news documents present a high degree of interconnection makes this an interesting and hard problem to solve. The identification of the structure of the news is accomplished both by descriptive methods that expose the dimensionality of the relations between different news, and by clustering the news into topic groups. To achieve this analysis this integrated whole was studied using different perspectives and approaches. In the identification of news clusters and structure, and after a preparatory data collection phase, where several online newspapers from different parts of the globe were collected, two newspapers were chosen in particular: the Portuguese daily newspaper PĆŗblico and the British newspaper The Guardian. In the first case, it was shown how information theory (namely variation of information) combined with adaptive networks was able to identify topic clusters in the news published by the Portuguese online newspaper PĆŗblico. In the second case, the structure of news published by the British newspaper The Guardian is revealed through the construction of time series of news clustered by a kmeans process. After this approach an unsupervised algorithm, that filters out irrelevant news published online by taking into consideration the connectivity of the news labels entered by the journalists, was developed. This novel hybrid technique is based on Qanalysis for the construction of the filtered network followed by a clustering technique to identify the topical clusters. Presently this work uses a modularity optimisation clustering technique but this step is general enough that other hybrid approaches can be used without losing generality. A novel second order swarm intelligence algorithm based on Ant Colony Systems was developed for the travelling salesman problem that is consistently better than the traditional benchmarks. This algorithm is used to construct Hamiltonian paths over the news published using the eccentricity of the different documents as a measure of distance. This approach allows for an easy navigation between published stories that is dependent on the connectivity of the underlying structure. The results presented in this work show the importance of taking topic detection in large corpora as a multitude of relations and connectivities that are not in a static state. They also influence the way of looking at multi-dimensional ensembles, by showing that the inclusion of the high dimension connectivities gives better results to solving a particular problem as was the case in the clustering problem of the news published online.Neste trabalho resolvemos o problema da identificaĆ§Ć£o da estrutura das notĆ­cias publicadas em linha por jornais e agĆŖncias noticiosas. Este problema requer novas abordagens e algoritmos que sejam capazes de lidar com o nĆŗmero crescente de publicaƧƵes em linha (e que se espera continuam a crescer no futuro). Este facto, juntamente com o elevado grau de interconexĆ£o que as notĆ­cias apresentam tornam este problema num problema interessante e de difĆ­cil resoluĆ§Ć£o. A identificaĆ§Ć£o da estrutura do sistema de notĆ­cias foi conseguido quer atravĆ©s da utilizaĆ§Ć£o de mĆ©todos descritivos que expƵem a dimensĆ£o das relaƧƵes existentes entre as diferentes notĆ­cias, quer atravĆ©s de algoritmos de agrupamento das mesmas em tĆ³picos. Para atingir este objetivo foi necessĆ”rio proceder a ao estudo deste sistema complexo sob diferentes perspectivas e abordagens. ApĆ³s uma fase preparatĆ³ria do corpo de dados, onde foram recolhidos diversos jornais publicados online optou-se por dois jornais em particular: O PĆŗblico e o The Guardian. A escolha de jornais em lĆ­nguas diferentes deve-se Ć  vontade de encontrar estratĆ©gias de anĆ”lise que sejam independentes do conhecimento prĆ©vio que se tem sobre estes sistemas. Numa primeira anĆ”lise Ć© empregada uma abordagem baseada em redes adaptativas e teoria de informaĆ§Ć£o (nomeadamente variaĆ§Ć£o de informaĆ§Ć£o) para identificar tĆ³picos noticiosos que sĆ£o publicados no jornal portuguĆŖs PĆŗblico. Numa segunda abordagem analisamos a estrutura das notĆ­cias publicadas pelo jornal BritĆ¢nico The Guardian atravĆ©s da construĆ§Ć£o de sĆ©ries temporais de notĆ­cias. Estas foram seguidamente agrupadas atravĆ©s de um processo de k-means. Para alĆ©m disso desenvolveuse um algoritmo que permite filtrar de forma nĆ£o supervisionada notĆ­cias irrelevantes que apresentam baixa conectividade Ć s restantes notĆ­cias atravĆ©s da utilizaĆ§Ć£o de Q-analysis seguida de um processo de clustering. Presentemente este mĆ©todo utiliza otimizaĆ§Ć£o de modularidade, mas a tĆ©cnica Ć© suficientemente geral para que outras abordagens hĆ­bridas possam ser utilizadas sem perda de generalidade do mĆ©todo. Desenvolveu-se ainda um novo algoritmo baseado em sistemas de colĆ³nias de formigas para soluĆ§Ć£o do problema do caixeiro viajante que consistentemente apresenta resultados melhores que os tradicionais bancos de testes. Este algoritmo foi aplicado na construĆ§Ć£o de caminhos Hamiltonianos das notĆ­cias publicadas utilizando a excentricidade obtida a partir da conectividade do sistema estudado como medida da distĆ¢ncia entre notĆ­cias. Esta abordagem permitiu construir um sistema de navegaĆ§Ć£o entre as notĆ­cias publicadas que Ć© dependente da conectividade observada na estrutura de notĆ­cias encontrada. Os resultados apresentados neste trabalho mostram a importĆ¢ncia de analisar sistemas complexos na sua multitude de relaƧƵes e conectividades que nĆ£o sĆ£o estĆ”ticas e que influenciam a forma como tradicionalmente se olha para sistema multi-dimensionais. Mostra-se que a inclusĆ£o desta dimensƵes extra produzem melhores resultados na resoluĆ§Ć£o do problema de identificar a estrutura subjacente a este problema da publicaĆ§Ć£o de notĆ­cias em linha

    Extracting product development intelligence from web reviews

    Get PDF
    Product development managers are constantly challenged to learn what the consumer product experience really is, and to learn specifically how the product is performing in the field. Traditionally, they have utilized methods such as prototype testing, customer quality monitoring instruments, field testing methods with sample customers, and independent assessment companies. These methods are limited in that (i) the number of customer evaluations is small, and (ii) the methods are driven by a restrictive structured format. Today the web has created a new source of product intelligence; these are unsolicited reviews from actual product users that are posted across hundreds of websites. The basic hypothesis of this research is that web reviews contain significant amount of information that is of value to the product design community. This research developed the DFOC (Design - Feature - Opinion - Cause Relationship) method for integrating the evaluation of unstructured web reviews into the structured product design process. The key data element in this research is a Web review and its associated opinion polarity (positive, negative, or neutral). Hundreds of Web reviews are collected to form a review database representing a population of customers. The DFOC method (a) identifies a set of design features that are of interest to the product design community, (b) mines the Web review database to identify which features are of significance to customer evaluations, (c) extracts and estimates the sentiment or opinion of the set of significant features, and (d) identifies the likely cause of the customer opinion. To support the DFOC method we develop an association rule based opinion mining procedure for capturing and extracting noun-verb-adjective relationships in the Web review database. This procedure exploits existing opinion mining methods to deconstruct the Web reviews and capture feature-opinion pair polarity. A Design Level Information Quality (DLIQ) measure which evaluates three components (a) Content (b) Complexity and (c) Relevancy is introduced. DLIQ is indicative of the content, complexity and relevancy of the design contextual information that can be extracted from an analysis of Web reviews for a given product. Application of this measure confirms the hypothesis that significant levels of quality design information can be efficiently extracted from Web reviews for a wide variety of product types. Application of the DFOC method and the DLIQ measure to a wide variety of product classes (electronic, automobile, service domain) is demonstrated. Specifically Web review databases for ten products/services are created from real data. Validation occurs by analyzing and presenting the extracted product design information. Examples of extracted features and feature-cause associations for negative polarity opinions are shown along with the observed significance

    Cross-lingual Annotation Projection for Semantic Roles

    Get PDF
    This article considers the task of automatically inducing role-semantic annotations in the FrameNet paradigm for new languages. We propose a general framework that is based on annotation projection, phrased as a graph optimization problem. It is relatively inexpensive and has the potential to reduce the human effort involved in creating role-semantic resources. Within this framework, we present projection models that exploit lexical and syntactic information. We provide an experimental evaluation on an English-German parallel corpus which demonstrates the feasibility of inducing high-precision German semantic role annotation both for manually and automatically annotated English data. 1

    Modelling and Design of Resilient Networks under Challenges

    Get PDF
    Communication networks, in particular the Internet, face a variety of challenges that can disrupt our daily lives resulting in the loss of human lives and significant financial costs in the worst cases. We define challenges as external events that trigger faults that eventually result in service failures. Understanding these challenges accordingly is essential for improvement of the current networks and for designing Future Internet architectures. This dissertation presents a taxonomy of challenges that can help evaluate design choices for the current and Future Internet. Graph models to analyse critical infrastructures are examined and a multilevel graph model is developed to study interdependencies between different networks. Furthermore, graph-theoretic heuristic optimisation algorithms are developed. These heuristic algorithms add links to increase the resilience of networks in the least costly manner and they are computationally less expensive than an exhaustive search algorithm. The performance of networks under random failures, targeted attacks, and correlated area-based challenges are evaluated by the challenge simulation module that we developed. The GpENI Future Internet testbed is used to conduct experiments to evaluate the performance of the heuristic algorithms developed

    An ontology of ethnicity based upon personal names: with implications for neighbourhood profiling

    Get PDF
    Understanding of the nature and detailed composition of ethnic groups remains key to a vast swathe of social science and human natural science. Yet ethnic origin is not easy to define, much less measure, and ascribing ethnic origins is one of the most contested and unstable research concepts of the last decade - not only in the social sciences, but also in human biology and medicine. As a result, much research remains hamstrung by the quality and availability of ethnicity classifications, constraining the meaningful subdivision of populations. This PhD thesis develops an alternative ontology of ethnicity, using personal names to ascribe population ethnicity, at very fine geographical levels, and using a very detailed typology of ethnic groups optimised for the UK population. The outcome is an improved methodology for classifying population registers, as well as small areas, into cultural, ethnic and linguistic groups (CEL). This in turn makes possible the creation of much more detailed, frequently updatable representations of the ethnic kaleidoscope of UK cities, and can be further applied to other countries. The thesis includes a review of the literature on ethnicity measurement and name analysis, and their applications in ethnic inequalities and geographical research. It presents the development of the new name to ethnicity classification methodology using both a heuristic and an automated and integrated approach. It is based on the UK Electoral Register as well as several health registers in London. Furthermore, a validation of the proposed name-based classification using different datasets is offered, as well as examples of applications in profiling neighbourhoods by ethnicity, in particular the measurement of residential segregation in London. The main study area is London, UK

    Computational intelligence approaches to robotics, automation, and control [Volume guest editors]

    Get PDF
    No abstract available

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research
    • ā€¦
    corecore