492 research outputs found

    Dataset search: a survey

    Get PDF
    Generating value from data requires the ability to find, access and make sense of datasets. There are many efforts underway to encourage data sharing and reuse, from scientific publishers asking authors to submit data alongside manuscripts to data marketplaces, open data portals and data communities. Google recently beta released a search service for datasets, which allows users to discover data stored in various online repositories via keyword queries. These developments foreshadow an emerging research field around dataset search or retrieval that broadly encompasses frameworks, methods and tools that help match a user data need against a collection of datasets. Here, we survey the state of the art of research and commercial systems in dataset retrieval. We identify what makes dataset search a research field in its own right, with unique challenges and methods and highlight open problems. We look at approaches and implementations from related areas dataset search is drawing upon, including information retrieval, databases, entity-centric and tabular search in order to identify possible paths to resolve these open problems as well as immediate next steps that will take the field forward.Comment: 20 pages, 153 reference

    The broadcast marketplace : Designing a more efficient local marketplace for goods and services

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 71-74).Today's online marketplaces for goods and services are imperfect. Participants make an initial post expressing their intention to buy or sell an object, but all offers on this post are private. These offers can be seen as expressions of other participants' intentions to buy or sell the same item. What if these offers were as public as the initial post? Would this decrease market friction and enable participants to close transactions more efficiently? What if every post and offer were tagged with a location enabling a real-time proximal picture of supply and demand? In this thesis, we explore a different kind of marketplace, a broadcast marketplace, where a combination of public post, proximal awareness and mobility decrease the friction of information flow and facilitate efficiency. This thesis explores the design, implementation and deployment of a system which enables users to efficiently view, understand and act upon this proximal picture of supply and demand. To test the viability of the broadcast marketplace we deployed Peddl, an implementation of the idea, in the MIT and Cambridge, MA community. Over the course of the trial we collected data on 5,839 unique visitors and 805 registered users, who made 726 posts totaling $234,913 in value. From this data we show that the additional transparency of supply and demand afforded by our design results in increased marketplace activity.Matthew Blackshaw.S.M

    Authenticated Outlier Mining for Outsourced Databases

    Get PDF
    The Data-Mining-as-a-Service (DMaS) paradigm is becoming the focus of research, as it allows the data owner (client) who lacks expertise and/or computational resources to outsource their data and mining needs to a third-party service provider (server). Outsourcing, however, raises some issues about result integrity : how could the client verify the mining results returned by the server are both sound and complete? In this paper, we focus on outlier mining, an important mining task. Previous verification techniques use an authenticated data structure (ADS) for correctness authentication, which may incur much space and communication cost. In this paper, we propose a novel solution that returns a probabilistic result integrity guarantee with much cheaper verification cost. The key idea is to insert a set of artificial records ( ARAR A R s) into the dataset, from which it constructs a set of artificial outliers ( AOAO A O s) and artificial non-outliers ( ANOANO A N O s). The AOAO A O s and ANOANO A N O s are used by the client to detect any incomplete and/or incorrect mining results with a probabilistic guarantee. The main challenge that we address is how to construct ARAR A R s so that they do not change the (non-)outlierness of original records, while guaranteeing that the client can identify ANOANO A N O s and AOAO A O s without executing mining. Furthermore, we build a strategic game and show that a Nash equilibrium exists only when the server returns correct outliers. Our implementation and experiments demonstrate that our verification solution is efficient and lightweight

    Barley sodium content is regulated by natural variants of the Na+ transporter HvHKT1;5

    Get PDF
    During plant growth, sodium (Na+) in the soil is transported via the xylem from the root to the shoot. While excess Na+ is toxic to most plants, non-toxic concentrations have been shown to improve crop yields under certain conditions, such as when soil K+ is low. We quantified grain Na+ across a barley genome-wide association study panel grown under non-saline conditions and identified variants of a Class 1 HIGH-AFFINITY-POTASSIUM-TRANSPORTER (HvHKT1;5)-encoding gene responsible for Na+ content variation under these conditions. A leucine to proline substitution at position 189 (L189P) in HvHKT1;5 disturbs its characteristic plasma membrane localisation and disrupts Na+ transport. Under low and moderate soil Na+, genotypes containing HvHKT1:5P189 accumulate high concentrations of Na+ but exhibit no evidence of toxicity. As the frequency of HvHKT1:5P189 increases significantly in cultivated European germplasm, we cautiously speculate that this non-functional variant may enhance yield potential in non-saline environments, possibly by offsetting limitations of low available K+

    Establishing the digital chain of evidence in biometric systems

    Get PDF
    Traditionally, a chain of evidence or chain of custody refers to the chronological documentation, or paper trail, showing the seizure, custody, control, transfer, analysis, and disposition of evidence, physical or electronic. Whether in the criminal justice system, military applications, or natural disasters, ensuring the accuracy and integrity of such chains is of paramount importance. Intentional or unintentional alteration, tampering, or fabrication of digital evidence can lead to undesirable effects. We find despite the consequences at stake, historically, no unique protocol or standardized procedure exists for establishing such chains. Current practices rely on traditional paper trails and handwritten signatures as the foundation of chains of evidence.;Copying, fabricating or deleting electronic data is easier than ever and establishing equivalent digital chains of evidence has become both necessary and desirable. We propose to consider a chain of digital evidence as a multi-component validation problem. It ensures the security of access control, confidentiality, integrity, and non-repudiation of origin. Our framework, includes techniques from cryptography, keystroke analysis, digital watermarking, and hardware source identification. The work offers contributions to many of the fields used in the formation of the framework. Related to biometric watermarking, we provide a means for watermarking iris images without significantly impacting biometric performance. Specific to hardware fingerprinting, we establish the ability to verify the source of an image captured by biometric sensing devices such as fingerprint sensors and iris cameras. Related to keystroke dynamics, we establish that user stimulus familiarity is a driver of classification performance. Finally, example applications of the framework are demonstrated with data collected in crime scene investigations, people screening activities at port of entries, naval maritime interdiction operations, and mass fatality incident disaster responses

    Maximizing Adherence and Gaining New Information For Your Chronic Obstructive Pulmonary Disease (MAGNIFY COPD):Study Protocol for the Pragmatic, Cluster Randomized Trial Evaluating the Impact of Dual Bronchodilator with Add-On Sensor and Electronic Monitoring on Clinical Outcomes

    Get PDF
    Background: Poor treatment adherence in COPD patients is associated with poor clinical outcomes and increased healthcare burden. Personalized approaches for adherence management, supported with technology-based interventions, may offer benefits to patients and providers but are currently unproven in terms of clinical outcomes as opposed to adherence outcomes. Methods: Maximizing Adherence and Gaining New Information For Your COPD (MAGNIFY COPD study), a pragmatic cluster randomized trial, aims to evaluate the impact of an adherence technology package (interventional package), comprising an adherence review, ongoing provision of a dual bronchodilator but with an add-on inhaler sensor device and a connected mobile application. This will compare time to treatment failure and other clinical outcomes in patients identified at high risk of exacerbations with historic poor treatment adherence as measured by prescription collection to mono/dual therapy over one year (1312 patients) versus usual care. Treatment failure is defined as the first occurrence of one of the following: (1) moderate/severe COPD exacerbation, (2) prescription of triple therapy (inhaled corticosteroid/long-acting β2-agonist/long-acting muscarinic antagonist [ICS/LABA/LAMA]), (3) prescription of additional chronic therapy for COPD, or (4) respiratory-related death. Adherence, moderate/severe exacerbations, respiratory-related healthcare resource utilization and costs, and intervention package acceptance rate will also be assessed. Eligible primary care practices (N=176) participating in the Optimum Patient Care Quality Improvement Program will be randomized (1:1) to either adherence support cluster arm (suitable patients already receiving or initiated Ultibro® Breezhaler® [indacaterol/glycopyrronium] will be offered interventional package) or the control cluster arm (suitable patients continue to receive usual clinical care). Patients will be identified and outcomes collected from anonymized electronic medical records within the Optimum Patient Care Research Database. On study completion, electronic medical record data will be re-extracted to analyze outcomes in both study groups. Registration Number: ISRCTN10567920. Conclusion: MAGNIFY will explore patient benefits of technology-based interventions for electronic adherence monitoring

    Connected Information Management

    Get PDF
    Society is currently inundated with more information than ever, making efficient management a necessity. Alas, most of current information management suffers from several levels of disconnectedness: Applications partition data into segregated islands, small notes don’t fit into traditional application categories, navigating the data is different for each kind of data; data is either available at a certain computer or only online, but rarely both. Connected information management (CoIM) is an approach to information management that avoids these ways of disconnectedness. The core idea of CoIM is to keep all information in a central repository, with generic means for organization such as tagging. The heterogeneity of data is taken into account by offering specialized editors. The central repository eliminates the islands of application-specific data and is formally grounded by a CoIM model. The foundation for structured data is an RDF repository. The RDF editing meta-model (REMM) enables form-based editing of this data, similar to database applications such as MS access. Further kinds of data are supported by extending RDF, as follows. Wiki text is stored as RDF and can both contain structured text and be combined with structured data. Files are also supported by the CoIM model and are kept externally. Notes can be quickly captured and annotated with meta-data. Generic means for organization and navigation apply to all kinds of data. Ubiquitous availability of data is ensured via two CoIM implementations, the web application HYENA/Web and the desktop application HYENA/Eclipse. All data can be synchronized between these applications. The applications were used to validate the CoIM ideas

    RNA-Seq: Yellow Nutsedge (Cyperus esculentus) transcriptome analysis of lipid-accumulating tubers from early to late developmental stages

    Get PDF
    Thanks to high amounts of starch and oil amassed in the parenchyma of its tubers, yellow nutsedge (Cyperus esculentus) stands as a unique plant species with regards to nutrient biosynthesis and accumulation in underground organs. In the last decades, understanding of enzymatic processes in lipid, starch and sugar pathways underwent great improvements. Nevertheless, the underlying mechanisms of carbon allocation in sink tissues are still obscure, and insights may be rendered through the study of yellow nutsedge. Furthermore, in the global context of a still rising need for vegetable oils, Cyperus esculentus appears as a promising candidate for the introduction of novel high-yield oil species. Here is presented the first in-depth analysis of the yellow nutsedge tuber transcriptome, which was conducted using Roche 454 sequencing and targeted two developmental stages, coinciding with (i) the beginning of oil accumulation, but also (ii) an important increase of starch content, and finally (iii) a substantial drop in sugar amount. Denovo assembly led to a reference transcriptome of 37k transcripts, which underwent extensive functional and biological pathway annotation, leaving only 7 % of completely unknown sequences. A set of 186 differentially expressed genes (DEGs) was cross-confirmed by three different R packages. To cover the most important changes, top-30 rankings of up and down-regulated genes were investigated. Except a pronounced up-regulation of the WRI1 transcription factor (27-fold), no enzyme related to lipid, starch or sugar was found. Instead, massive changes in growth activity and stress response were observed. Analysis of expression at individual stages showed that several lipid, sugar and starch genes are actually abundant but would undergo changes of lower intensities, hence not visible in the top-30s. A private and user-friendly web-interface has been developed and compiles all the data and results generated through this study, providing with a convenient access for additional investigations, along with directives for further work
    corecore