12 research outputs found

    Traditional versus facebook-based surveys: Evaluation of biases in self-reported demographic and psychometric information

    Get PDF
    Background: Social media in scientific research offers a unique digital observatory of human behaviours and hence great opportunities to conduct research at large scale, answering complex sociodemographic questions. We focus on the identification and assessment of biases in social-media-administered surveys.Objective: This study aims to shed light on population, self-selection, and behavioural biases, empirically comparing the consistency between self-reported information collected traditionally versus social-media-administered questionnaires, including demographic and psychometric attributes.Methods: We engaged a demographically representative cohort of young adults in Italy (approximately 4,000 participants) in taking a traditionally administered online survey and then, after one year, we invited them to use our ad hoc Facebook application (988 accepted) where they filled in part of the initial survey. We assess the statistically significant differences indicating population, self-selection, and behavioural biases due to the different context in which the questionnaire is administered.Results: Our findings suggest that surveys administered on Facebook do not exhibit major biases with respect to traditionally administered surveys in terms of neither demographics nor personality traits. Loyalty, authority, and social binding values were higher in the Facebook platform, probably due to the platform?s intrinsic social character.Conclusions: We conclude that Facebook apps are valid research tools for administering demographic and psychometric surveys, provided that the entailed biases are taken into consideration.Contribution: We contribute to the characterisation of Facebook apps as a valid scientific tool to administer demographic and psychometric surveys, and to the assessment of population, self-selection, and behavioural biases in the collected data.Fil: Kalimeri, Kyriaki. Institute for Scientific Interchange Foundation; ItaliaFil: Beiro, Mariano Gastón. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Tecnologías y Ciencias de la Ingeniería "Hilario Fernández Long". Universidad de Buenos Aires. Facultad de Ingeniería. Instituto de Tecnologías y Ciencias de la Ingeniería "Hilario Fernández Long"; ArgentinaFil: Bonanomi, Andrea. Università Cattolica del Sacro Cuore; ItaliaFil: Rosina, Alessandro. Università Cattolica del Sacro Cuore; ItaliaFil: Cattuto, Ciro. Isi Foundation; Itali

    ‘For good measure’: data gaps in a big data world

    Get PDF
    Policy and data scientists have paid ample attention to the amount of data being collected and the challenge for policymakers to use and utilize it. However, far less attention has been paid towards the quality and coverage of this data specifically pertaining to minority groups. The paper makes the argument that while there is seemingly more data to draw on for policymakers, the quality of the data in combination with potential known or unknown data gaps limits government’s ability to create inclusive policies. In this context, the paper defines primary, secondary, and unknown data gaps that cover scenarios of knowingly or unknowingly missing data and how that is potentially compensated through alternative measures. Based on the review of the literature from various fields and a variety of examples highlighted throughout the paper, we conclude that the big data movement combined with more sophisticated methods in recent years has opened up new opportunities for government to use existing data in different ways as well as fill data gaps through innovative techniques. Focusing specifically on the representativeness of such data, however, shows that data gaps affect the economic opportunities, social mobility, and democratic participation of marginalized groups. The big data movement in policy may thus create new forms of inequality that are harder to detect and whose impact is more difficult to predict.Global Challenges (FSW

    On Detecting and Removing Superficial Redundancy in Vector Databases

    Get PDF
    14 p.A mathematical model is proposed in order to obtain an automatized tool to remove any unnecessary data, to compute the level of the redundancy, and to recover the original and filtered database, at any time of the process, in a vector database. This type of database can be modeled as an oriented directed graph. Thus, the database is characterized by an adjacency matrix. Therefore, a record is no longer a row but a matrix. Then, the problem of cleaning redundancies is addressed from a theoretical point of view. Superficial redundancy is measured and filtered by using the 1-norm of a matrix. Algorithms are presented by Python and MapReduce, and a case study of a real cybersecurity database is performed.S

    Big data CRM

    Get PDF
    Big Data i CRM su dva relativno nova pojma koji u posljednjih nekoliko godina ostvaruju veliki rast u svijetskom poslovanju. Svaki pojam za sebe apsolutno je utjecao na razvoj i promjene kako u marketingu tako i u svim funkcijama u poduzecu. Big Data se odnosi na nacin prikupljanja velikih kolicina podatka, dok je CRM složen sustav djelovanja poduzeca s ciljem postizanja prisnijih odnosa s kupcima

    Big data CRM

    Get PDF
    Big Data i CRM su dva relativno nova pojma koji u posljednjih nekoliko godina ostvaruju veliki rast u svijetskom poslovanju. Svaki pojam za sebe apsolutno je utjecao na razvoj i promjene kako u marketingu tako i u svim funkcijama u poduzecu. Big Data se odnosi na nacin prikupljanja velikih kolicina podatka, dok je CRM složen sustav djelovanja poduzeca s ciljem postizanja prisnijih odnosa s kupcima

    Diseño y Desarrollo de un Sistema de Información para la Gestión de Información sobre Cáncer de Mama

    Full text link
    Diagnosis, treatment and research about such complex diseases as breast cancer is an increasingly complex task due to the big quantity and diversity of involved data and the need of relating them properly to obtain relevant conclusions. Clinical data generation has to be followed by an efficient data management. So, the use of advanced information system technologies is essential to ensure a correct storage, management and exploitation of data. Following a deep study of domain and technologies used to store and manage clinical and biological data about the disease, the main goal of this thesis is to provide a methodological basis to design and implement software systems to manage breast cancer data in a trustable and efficient way. Using Conceptual Modelling techniques in an environment where their use is not as common as it should be, allows to create information systems perfectly adapted to the studied domain. Under this approach, in this thesis some tasks have been carried out among which are conceptual modelling of diagnosis, treatment and research of breast cancer's domain; archetypes' designing under ISO13606 standard to allow systems interoperability; breast cancer data integration from different data sources in a unified database; and designing a prototype of tool for managing and analysing clinical and genic expression data. In order to validate the proposal, a validation process in a real environment as Research Foundation INCLIVA in Valencia has been carried out. During this process, medical and biological researchers have use and assess the efficiency of solution proposed in this doctoral thesis.El diagnóstico, tratamiento e investigación sobre enfermedades tan complejas como el cáncer de mama es una tarea cada vez más complicada por la gran cantidad y diversidad de datos implicados y por la necesidad de relacionarlos adecuadamente para obtener conclusiones relevantes. La generación de los datos clínicos tiene que estar acompañada de una gestión eficiente de los mismos. Ello hace imprescindible la utilización de tecnologías avanzadas de Sistemas de Información que aseguren un correcto almacenamiento, gestión y explotación de los datos. Tras un profundo estudio del dominio y de las tecnologías utilizadas para el almacenamiento y gestión de datos clínicos y biológicos sobre la enfermedad, el objetivo principal de esta tesis es ofrecer una base metodológica que permita diseñar y desarrollar sistemas software para la manipulación eficiente y fiable de la información sobre el cáncer de mama. La utilización de técnicas de Modelado Conceptual en un entorno donde su uso no es tan habitual como debiera ser, permitirá disponer de un sistema de información perfectamente adaptado al dominio de aplicación. Bajo este planteamiento, en esta tesis se ha llevado a cabo el modelado conceptual del dominio del diagnóstico, tratamiento e investigación del cáncer de mama, el diseño de arquetipos bajo el estándar ISO13606 para ofrecer interoperabilidad entre sistemas, la integración de datos de distintos orígenes relacionados con el cáncer de mama en una base de datos unificadora y el diseño de un prototipo de herramienta de gestión y análisis de datos clínicos y de expresión génica. Para validar la idoneidad de esta propuesta, se ha llevado a cabo un proceso de validación en un entorno real como es la Fundación de Investigación INCLIVA de Valencia, donde investigadores clínicos y biólogos han probado y valorado la eficiencia de la solución planteada en esta tesis doctoral.El diagnòstic, tractament i investigació sobre malalties tan complexes com ara el càncer de mama és una tasca cada vegada més complexa per la gran quantitat i diversitat de dades implicades i per la necessitat de relacionar-les adequadament per a obtenir conclusions rellevants. La generació de dades clíniques ha d'estar acompanyada d'una gestió eficient de les mateixes. Açò fa imprescindible la utilització de tecnologies avançades de Sistemes d'Informació que asseguren un correcte emmagatzematge, gestió i explotació de les dades. Després d'un profund estudi del domini i de les tecnologies utilitzades per l'emmagatzematge i gestió de dades clíniques i biològiques sobre la malaltia, el principal objectiu d'aquesta tesi és oferir una base metodològica que permeta dissenyar i desenvolupar sistemes programaris per a la manipulació eficient i fiable de la informació sobre el càncer de mama. La utilització de tècniques de Modelat Conceptual en un entorn on el seu ús no és tan habitual com deuria ser, permetrà disposar d'un sistema d'informació perfectament adaptat al domini d'aplicació. Baix aquest plantejament, en aquesta tesi s'ha dut a terme el modelat conceptual del domini del diagnòstic, tractament i investigació del càncer de mama, el disseny d'arquetips baix l'estàndard ISO13606 per oferir interoperabilitat entre sistemes, la integració de dades de distints orígens sobre el càncer de mama en una base de dades unificadora i el disseny d'un prototip d'eina de gestió i anàlisi de dades clíniques i d'expressió gènica. Per a validar la idoneïtat d'aquesta proposta, s'ha dut a terme un procés de validació en un entorn real com és la Fundació d'Investigació INCLIVA de València, on investigadors clínics i biòlegs han provat i valorat l'eficiència de la solució plantejada en aquesta tesi doctoral.Burriel Coll, V. (2017). Diseño y Desarrollo de un Sistema de Información para la Gestión de Información sobre Cáncer de Mama [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/86158TESI

    Probing the Limits of Social Data:Biases, Methods, and Domain Knowledge

    Get PDF
    Online social data has been hailed to provide unprecedented insights into human phenomena due to its ability to capture human behavior at a scale and level of detail, both in breadth and depth, that is hard to achieve through conventional data collection techniques. This has led to numerous studies that leverage online social data to model or gain insights about real world phenomena, as well as to inform system or methods design for performance gains, or for providing personalized services. Alas, regardless of how large, detailed or varied the online social data is, there are limits to what can be discerned from it about real-world, or even media- or application-specific phenomena. This thesis investigates four instances of such limits that are related to both the properties of the working data sets and of the methods used to acquire and leverage them, including: (1) online social media biases, (2) assessing and (3) reducing data collection biases, and (4) methods sensitivity to data biases and variability. For each of them, we conduct a separate case study that enables us to systematically devise and apply consistent methodologies to collect, process, compare or assess different data sets and dedicated methods. The main contributions of this thesis are: (i) To gain insights into media-specific biases, we run a comparative study juxtaposing social and mainstream media coverage of domain-specific news events for a period of 17 months. To this end, we introduce a generic methodology for comparing news agendas online based on a comparison of spikes of coverage. We expose significant differences in the type of events that are covered by the two media. (ii) To assess possible biases across data collections, we run a transversal study that systematically assembles and examines 26 distinct data sets of social media posts during a variety of crisis events spanning a 2 years period. While we find patterns and consistencies, we also uncover substantial variability across different event data sets, highlighting the pitfalls of generalizing findings from one data set to another. (iii) To improve data collections, we introduce a method that increases the recall of social media samples, while preserving the original distribution of message types and sources. To locate and monitor domain-specific events, this method constructs and applies a domain-specific, yet generic lexicon, automatically learning event-specific terms and adapting the lexicon to the targeted event. The resulted improvements also show that only a fraction of the relevant data is currently mined. (iv) To test the methods sensitivity, to data biases and variability we run an empirical evaluation on 6 real-world data sets dissecting the impact of user and item attributes on the performance of recommendation approaches that leverage distinct social cues--explicit social links vs. implicit interest affinity. We show performance variations not only across data sets, but also within each data set, across different classes of users or items, suggesting that global metrics are often unsuited for assessing recommendation systems performance. The overarching goal of this thesis is to contribute a practical perspective to the body of research that aims to quantify biases, to devise better methods to collect and model social data, and to evaluate such methods in context

    Automated Detection of Financial Events in News Text

    Get PDF
    Today’s financial markets are inextricably linked with financial events like acquisitions, profit announcements, or product launches. Information extracted from news messages that report on such events could hence be beneficial for financial decision making. The ubiquity of news, however, makes manual analysis impossible, and due to the unstructured nature of text, the (semi-)automatic extraction and application of financial events remains a non-trivial task. Therefore, the studies composing this dissertation investigate 1) how to accurately identify financial events in news text, and 2) how to effectively use such extracted events in financial applications. Based on a detailed evaluation of current event extraction systems, this thesis presents a competitive, knowledge-driven, semi-automatic system for financial event extraction from text. A novel pattern language, which makes clever use of the system’s underlying knowledge base, allows for the definition of simple, yet expressive event extraction rules that can be applied to natural language texts. The system’s knowledge-driven internals remain synchronized with the latest market developments through the accompanying event-triggered update language for knowledge bases, enabling the definition of update rules. Additional research covered by this dissertation investigates the practical applicability of extracted events. In automated stock trading experiments, the best performing trading rules do not only make use of traditional numerical signals, but also employ news-based event signals. Moreover, when cleaning stock data from disruptions caused by financial events, financial risk analyses yield more accurate results. These results suggest that events detected in news can be used advantageously as supplementary parameters in financial applications

    Reinforcement Learning Approach for Autonomous UAV Navigation in 3D Space

    Get PDF
    In the last two decades, the rapid development of unmanned aerial vehicles (UAVs) resulted in their usage for a wide range of applications. Miniaturization and cost reduction of electrical components have led to their commercialization, and today they can be utilized for various tasks in an unknown environment. Finding the optimal path based on the start and target pose information is one of the most complex demands for any intelligent UAV system. As this problem requires a high level of adaptability and learning capability of the UAV, the framework based on reinforcement learning is proposed for the localization and navigation tasks. In this paper, Q-learning algorithm for the autonomous navigation of the UAV in 3D space is implemented. To test the proposed methodology for UAV intelligent control, the simulation is conducted in ROS-Gazebo environment. The obtained simulation results have shown that the UAV can reach the target pose autonomously in an efficient way
    corecore