11,926 research outputs found

    Taxonomy for Humans or Computers? Cognitive Pragmatics for Big Data

    Get PDF
    Criticism of big data has focused on showing that more is not necessarily better, in the sense that data may lose their value when taken out of context and aggregated together. The next step is to incorporate an awareness of pitfalls for aggregation into the design of data infrastructure and institutions. A common strategy minimizes aggregation errors by increasing the precision of our conventions for identifying and classifying data. As a counterpoint, we argue that there are pragmatic trade-offs between precision and ambiguity that are key to designing effective solutions for generating big data about biodiversity. We focus on the importance of theory-dependence as a source of ambiguity in taxonomic nomenclature and hence a persistent challenge for implementing a single, long-term solution to storing and accessing meaningful sets of biological specimens. We argue that ambiguity does have a positive role to play in scientific progress as a tool for efficiently symbolizing multiple aspects of taxa and mediating between conflicting hypotheses about their nature. Pursuing a deeper understanding of the trade-offs and synthesis of precision and ambiguity as virtues of scientific language and communication systems then offers a productive next step for realizing sound, big biodiversity data services

    Comprehensive Characterization of the Transmitted/Founder env Genes From a Single MSM Cohort in China

    Get PDF
    Background: The men having sex with men (MSM) population has become one of the major risk groups for HIV-1 infection in China. However, the epidemiological patterns, function of the env genes, and autologous and heterologous neutralization activity in the same MSM population have not been systematically characterized. Methods: The env gene sequences were obtained by the single genome amplification. The time to the most recent common ancestor was estimated for each genotype using the Bayesian Markov Chain Monte Carlo approach. Coreceptor usage was determined in NP-2 cells. Neutralization was analyzed using Env pseudoviruses in TZM-bl cells. Results: We have obtained 547 full-length env gene sequences by single genome amplification from 30 acute/early HIV-1–infected individuals in the Beijing MSM cohort. Three genotypes (subtype B, CRF01_AE, and CRF07_BC) were identified and 20% of the individuals were infected with multiple transmitted/founder (T/F) viruses. The tight clusters of the MSM sequences regardless of geographic origins indicated nearly exclusive transmission within the MSM population and limited number of introductions. The time to the most recent common ancestor for each genotype was 10–15 years after each was first introduced in China. Disparate preferences for coreceptor usages among 3 genotypes might lead to the changes in percentage of different genotypes in the MSM population over time. The genotype-matched and genotype-mismatched neutralization activity varied among the 3 genotypes. Conclusions: The identification of unique characteristics for transmission, coreceptor usage, neutralization profile, and epidemic patterns of HIV-1 is critical for the better understanding of transmission mechanisms, development of preventive strategies, and evaluation of vaccine efficacy in the MSM population in China

    Toward a Robust Diversity-Based Model to Detect Changes of Context

    Get PDF
    Being able to automatically and quickly understand the user context during a session is a main issue for recommender systems. As a first step toward achieving that goal, we propose a model that observes in real time the diversity brought by each item relatively to a short sequence of consultations, corresponding to the recent user history. Our model has a complexity in constant time, and is generic since it can apply to any type of items within an online service (e.g. profiles, products, music tracks) and any application domain (e-commerce, social network, music streaming), as long as we have partial item descriptions. The observation of the diversity level over time allows us to detect implicit changes. In the long term, we plan to characterize the context, i.e. to find common features among a contiguous sub-sequence of items between two changes of context determined by our model. This will allow us to make context-aware and privacy-preserving recommendations, to explain them to users. As this is an ongoing research, the first step consists here in studying the robustness of our model while detecting changes of context. In order to do so, we use a music corpus of 100 users and more than 210,000 consultations (number of songs played in the global history). We validate the relevancy of our detections by finding connections between changes of context and events, such as ends of session. Of course, these events are a subset of the possible changes of context, since there might be several contexts within a session. We altered the quality of our corpus in several manners, so as to test the performances of our model when confronted with sparsity and different types of items. The results show that our model is robust and constitutes a promising approach.Comment: 27th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2015), Nov 2015, Vietri sul Mare, Ital

    Translating Video Recordings of Mobile App Usages into Replayable Scenarios

    Full text link
    Screen recordings of mobile applications are easy to obtain and capture a wealth of information pertinent to software developers (e.g., bugs or feature requests), making them a popular mechanism for crowdsourced app feedback. Thus, these videos are becoming a common artifact that developers must manage. In light of unique mobile development constraints, including swift release cycles and rapidly evolving platforms, automated techniques for analyzing all types of rich software artifacts provide benefit to mobile developers. Unfortunately, automatically analyzing screen recordings presents serious challenges, due to their graphical nature, compared to other types of (textual) artifacts. To address these challenges, this paper introduces V2S, a lightweight, automated approach for translating video recordings of Android app usages into replayable scenarios. V2S is based primarily on computer vision techniques and adapts recent solutions for object detection and image classification to detect and classify user actions captured in a video, and convert these into a replayable test scenario. We performed an extensive evaluation of V2S involving 175 videos depicting 3,534 GUI-based actions collected from users exercising features and reproducing bugs from over 80 popular Android apps. Our results illustrate that V2S can accurately replay scenarios from screen recordings, and is capable of reproducing \approx 89% of our collected videos with minimal overhead. A case study with three industrial partners illustrates the potential usefulness of V2S from the viewpoint of developers.Comment: In proceedings of the 42nd International Conference on Software Engineering (ICSE'20), 13 page

    From Statistical to Geolinguistic Data: Mapping and Measuring Linguistic Diversity

    Get PDF
    The aim of this paper is describing a new methodology for mapping and measuring linguistic diversity in a territory. The three methods that have been created by the Centro di eccellenza della ricerca Osservatorio linguistico permanente dell’italiano diffuso fra stranieri e delle lingue immigrate in Italia at the Università per Stranieri di Siena are the following: - the Toscane favelle model, a procedural application which passes from quantitative statistical data to a demolinguistic paradigm; - the Monterotondo-Mentana model. The surveys of quantitative and qualitative data are carried out using traditional tools (questionnaires, audio and video recordings) as well as advanced technologies; - the Esquilino model. Digital maps are created which present the distribution of the immigrant languages through the presence of signs in linguistic landscape. The final objective is putting together the data surveyed by the three methods in order to have a “speaking” territory, in which each point surveyed identifies the languages spoken and the various linguistic manifestations.Language Contact, Linguistic Diversity, Immigrant Languages, Geolinguistic Data, New Methodologies in Sociolinguistic Research

    MarinEye - A tool for marine monitoring

    Get PDF
    This work presents an autonomous system for marine integrated physical-chemical and biological monitoring – the MarinEye system. It comprises a set of sensors providing diverse and relevant information for oceanic environment characterization and marine biology studies. It is constituted by a physicalchemical water properties sensor suite, a water filtration and sampling system for DNA collection, a plankton imaging system and biomass assessment acoustic system. The MarinEye system has onboard computational and logging capabilities allowing it either for autonomous operation or for integration in other marine observing systems (such as Observatories or robotic vehicles. It was designed in order to collect integrated multi-trophic monitoring data. The validation in operational environment on 3 marine observatories: RAIA, BerlengasWatch and Cascais on the coast of Portugal is also discussed.info:eu-repo/semantics/publishedVersio
    corecore