361,550 research outputs found

    Overview of the gene ontology task at BioCreative IV

    Get PDF
    Gene Ontology (GO) annotation is a common task among model organism databases (MODs) for capturing gene function data from journal articles. It is a time-consuming and labor-intensive task, and is thus often considered as one of the bottlenecks in literature curation. There is a growing need for semiautomated or fully automated GO curation techniques that will help database curators to rapidly and accurately identify gene function information in full-length articles. Despite multiple attempts in the past, few studies have proven to be useful with regard to assisting real-world GO curation. The shortage of sentence-level training data and opportunities for interaction between text-mining developers and GO curators has limited the advances in algorithm development and corresponding use in practical circumstances. To this end, we organized a text-mining challenge task for literature-based GO annotation in BioCreative IV. More specifically, we developed two subtasks: (i) to automatically locate text passages that contain GO-relevant information (a text retrieval task) and (ii) to automatically identify relevant GO terms for the genes in a given article (a concept-recognition task). With the support from five MODs, we provided teams with >4000 unique text passages that served as the basis for each GO annotation in our task data. Such evidence text information has long been recognized as critical for text-mining algorithm development but was never made available because of the high cost of curation. In total, seven teams participated in the challenge task. From the team results, we conclude that the state of the art in automatically mining GO terms from literature has improved over the past decade while much progress is still needed for computer-assisted GO curation. Future work should focus on addressing remaining technical challenges for improved performance of automatic GO concept recognition and incorporating practical benefits of text-mining tools into real-world GO annotation

    The Case Against Cosmology

    Get PDF
    It is argued that some of the recent claims for cosmology are grossly overblown. Cosmology rests on a very small database: it suffers from many fundamental difficulties as a science (if it is a science at all) whilst observations of distant phenomena are difficult to make and harder to interpret. It is suggested that cosmological inferences should be tentatively made and sceptically received.Comment: 9 pages, no figure

    The impact of technology on data collection: Case studies in privacy and economics

    Get PDF
    Technological advancement can often act as a catalyst for scientific paradigm shifts. Today the ability to collect and process large amounts of data about individuals is arguably a paradigm-shift enabling technology in action. One manifestation of this technology within the sciences is the ability to study historically qualitative fields with a more granular quantitative lens than ever before. Despite the potential for this technology, wide-adoption is accompanied by some risks. In this thesis, I will present two case studies. The first, focuses on the impact of machine learning in a cheapest-wins motor insurance market by designing a competition-based data collection mechanism. Pricing models in the insurance industry are changing from statistical methods to machine learning. In this game, close to 2000 participants, acting as insurance companies, trained and submitted pricing models to compete for profit using real motor insurance policies --- with a roughly equal split between legacy and advanced models. With this trend towards machine learning in motion, preliminary analysis of the results suggest that future markets might realise cheaper prices for consumers. Additionally legacy models competing against modern algorithms, may experience a reduction in earning stability --- accelerating machine learning adoption. Overall, the results of this field experiment demonstrate the potential for digital competition-based studies of markets in the future. The second case studies the privacy risks of data collection technologies. Despite a large body of research in re-identification of anonymous data, the question remains: if a dataset was big enough, would records become anonymous by being "lost in the crowd"? Using 3 months of location data, we show that the risk of re-identification decreases slowly with dataset size. This risk is modelled and extrapolated to larger populations with 93% of people being uniquely identifiable using 4 points of auxiliary information among 60M people. These results show how the privacy of individuals is very unlikely to be preserved even in country-scale location datasets and that alternative paradigms of data sharing are still required.Open Acces

    Why and How Should We Account For the Environment?

    Get PDF
    As a guide to economic policy many countries nowadays have a system of national accounts. The basic system was developed in the post World War II period, and was the outcome of a truly international effort. Involving the United Nations and other international organizations, it has been very successful in developing an international bookkeeping system, nowadays accepted and introduced by all developed and many developing countries. National accounts basically are compiled because policy-makers wish to have an overview of the economic performance of their country. The most well-known indicator for this is Gross Domestic Product. Other important indicators are those for industrial production, investments, consumption and the trade figures. Basically we are dealing with a system where the quantity of goods is measured in physical units valued at market prices. The rise in public sector administration has complicated matters, because of missing market prices. However, here good approximations have been developed. A fundamental problem arose when the wish originated to include nature into this accounting system. The idea was quite clear, i.e. to lend support to policy making when natural functions are included and/or affected. However, a problem that did not go away was which properties to attribute to nature or its functions. The present paper aims to show, first, that environmental accounts are a necessary prerequisite for environmental policy, and second, to explain what kind of environmental accounting system is the most preferable one. In principle, economists have developed two different approaches to account for the environment. One approach is based on the vision that the environment should be valued in monetary terms. The other one is to relate the environment, measured in physical units, to economic variables. The question then is what kind of environmental system should the preferred one. In a certain sense the discussions in the Netherlands concerning environmental accounting are the mirror image of the discussions which have taken place in the rest internationally. In the Netherlands, however, the discussion got a particular twist. In fact, two systems have been discussed and developed in statistical bureaus. In essence, the Dutch discussion reflects the discussion between the two main strands of thought. The decisive difference between both goes back to the question: "Is it possible to value natural functions in monetary units?" If yes, it is possible to calculate something like a Green National Income (GNI), which was proposed by the Dutch national accountant Roefie Hueting (1969, 1974) atfirst. His operationalization of the basic idea was to value all environmental damages in monetary units and then to subtract these numbers from the net national income (NNI). He called this figure the Sustainable National Income (SNI). Only if the difference between NNI and SNI would be zero, the economy would be environmental sustainable. If it is not possible to value the nature in monetary terms, it is impossible to calculate a Green National Income. During the 1990s, the Dutch national accountant Steven Keuning developed an alternative system (the National Accounting Matrix including Environmental Accounts - the so-called NAMEA system), where he related quantities of emissions measured in physical units to figures of the conventional accounting system, e.g. CO2 emissions to GDP. The question then is which system should be preferred to inform policy-makers and the public about the state of the economy concerning the environment. The paper will go into these issues, and come to a conclusion

    Assessing the odds of achieving the MDGs

    Get PDF
    How many countries are on target to achieve the Millennium Development Goals by 2015? How many countries are off target, and how far are they from the goals? And what factors are essential for improving the odds that off-target countries can reach the goals? This paper examines these questions and takes a closer look at the diversity of country progress. The authors argue that the answers from the available data are surprisingly hopeful. In particular, two-thirds of developing countries are on target or close to being on target for all the Millennium Development Goals. Among developing countries that are falling short, the average gap of the top half is about 10 percent. For those countries that are on target, or close to it, solid economic growth and good policies and institutions have been the key factors in their success. With improved policies and faster growth, many countries that are close to becoming on target could still achieve the targets in 2015 or soon after.Population Policies,E-Business,Achieving Shared Growth,Primary Education,Country Strategy&Performance

    Health Care Informatics Support of a Simulated Study

    Get PDF
    The objective of this project is to assess the value of REDCap (Harris, 2009) by conducting a simulated breast cancer clinical trial and demonstration. REDCap is a free, secure, web-based application designed to support data capture for research studies. To assess REDCap\u27s value, we conducted a simulation of a clinical trial study designed to compare the use of two new technologies for breast cancer diagnosis and treatment with current best practice breast cancer diagnosis and treatment. We call the trial, Real-Time Operating Room BC Diagnostic Treatment (RORBCDT) . The RORBCDT clinical trial is designed to assess the value of a new breast cancer operating room diagnostic technology Intra-operative diagnosis of Sentinel Lymph Node (Sentimag) ( Keshtgar, n.d) and real-time treatment option Intra-operative Radiotherapy (IORT) ( Williams, 2011.) A Sentimag is used to stage certain cancers to determine their degree of spread to lymph nodes. If the diagnose is positive, then the new treatment device (IORT) is used to treat the remaining cancerous tissue. This Clinical Trial simulation consists of several steps: 1. Design the clinical trial 2. Create the REDCap project environment to conduct the trial 3. Recruit and train fictitious patients, providers and project team. 4. Execute the simulated trial 5. Assess the value of REDCap in conducting the simulation. Note: This entire exercise is a SIMULATION - no actual patients, physicians or devices were used. Rather, a simulated clinical trial was performed with trained volunteers pretending to be patients, physicians and research staff. Consequently, the assessment of REDCap and all results, conclusions and observations are based on a SIMULATION. The simulated clinical trial - RORBCDT - is a two arm clinical trial designed to test the efficacy of the real-time diagnostic (Sentimag) and treatment (IORT) devices. In one arm of the trial, breast cancer surgery patients (volunteers not real patients) will experience a simulated best practice operation and therapy. In the other arm, breast cancer surgery patients (volunteers not real patients) will experience a simulated real-time diagnosis and treatment using the intra-operative devices. This research seeks to execute a fictitious clinical trial to assess the value of REDCap. We anticipate that this simulation and those similar to it, will help health care researchers become familiar with REDCap and assess REDCap\u27s value and use in conducting health research

    How Far Have We Come? Foundation CEOs on Progress and Impact

    Get PDF
    The performance of major U.S. foundations is much discussed and debated. It is also very difficult to gauge. The past decade or so has seen increased interest and effort related to the question of how foundations are doing, and how they might do better. These questions are not new. The earliest major American philanthropists were interested in answering them. But recent years have seen an uptick in at least the discussion of these issues.Indeed, our organization, the Center for Effective Philanthropy (CEP) has focused much energy on this issue, and we have noted how uniquely challenging assessing foundation performance can be. Among the challenges are the difficulty of drawing a causal link between what a foundation funds and change on the ground, the extended time horizons associated with making progress on the difficult issues foundations often address, and the fact that information from different program areas cannot be easily aggregated using some common measure. There is no universal measure -- no easy analog to return on investment -- for foundations.So what conclusion do foundation leaders draw about their success? Brest and others suggest that, "philanthropy remains an underperformer in achieving social outcomes."6 Do foundation CEOs agree? How much progress do they believe foundations have made?In January 2013, we sent surveys to 472 full-time CEOs leading U.S.-based foundations that give at least $5 million annually in grants; 211 CEOs completed the survey for a 45 percent response rate. The survey was designed to collect data on CEOs' understanding of progress and their attitudes and practices in relation to foundation impact. This research was not meant to serve as an objective evaluation of how much progress foundations have made through their work
    corecore