112 research outputs found

    Community content building for evolutionary biology: Lessons learned from LepTree and Encyclopedia of Life

    Get PDF
    Online resources to aid large-scale ecological and evolutionary biology are beginning to take root, only a decade behind fields such as genomics and molecular biology. One barrier has been a long tradition, in evolutionary biology at least, of work by individuals on the order of a few hundred of species rather than the thousands or hundreds of thousands necessary to understand the general evolutionary or ecological processes that explain species characteristics and distributions. Advances in collaborative and semantic software offer promise – it should be possible to develop high quality online species-level datasets for comparative analyses and even to integrate, via machine reasoning, across highly customized datasets. In this talk we will compare and contrast two approaches to assembling the data

    A study in Rashomon curves and volumes: A new perspective on generalization and model simplicity in machine learning

    Full text link
    The Rashomon effect occurs when many different explanations exist for the same phenomenon. In machine learning, Leo Breiman used this term to characterize problems where many accurate-but-different models exist to describe the same data. In this work, we study how the Rashomon effect can be useful for understanding the relationship between training and test performance, and the possibility that simple-yet-accurate models exist for many problems. We consider the Rashomon set - the set of almost-equally-accurate models for a given problem - and study its properties and the types of models it could contain. We present the Rashomon ratio as a new measure related to simplicity of model classes, which is the ratio of the volume of the set of accurate models to the volume of the hypothesis space; the Rashomon ratio is different from standard complexity measures from statistical learning theory. For a hierarchy of hypothesis spaces, the Rashomon ratio can help modelers to navigate the trade-off between simplicity and accuracy. In particular, we find empirically that a plot of empirical risk vs. Rashomon ratio forms a characteristic Γ\Gamma-shaped Rashomon curve, whose elbow seems to be a reliable model selection criterion. When the Rashomon set is large, models that are accurate - but that also have various other useful properties - can often be obtained. These models might obey various constraints such as interpretability, fairness, or monotonicity.Comment: Revisited sections 3, 4, 5, 6, 7, and

    Facilitating an Educational Development Initiative Focused on Reading Comprehension Instruction: Exploring University Professors' Experiences and Beliefs

    Get PDF
    This qualitative exploratory case study focused on the experiences of university professors as they implemented reading comprehension instruction in their discipline-specific first- and second-year courses within the context of an educational development initiative. During 3 individual interviews, a pre-instructional dialogue, and 2 group sessions across 1 academic year, 5 professors reflected on their beliefs about reading and teaching as they engaged with planning and implementation of reading comprehension instruction. Collectively, participants appeared to plan comprehension instruction in ways consistent with their beliefs about academic reading, teaching first- and second-year students, and prior instructional approaches, and cited learning that challenged, confirmed, and/or intensified their pre-existing beliefs. Participants also suggested that a variety of formats for interaction and information dissemination during the educational development initiative were valuable in that they allowed for flexible facilitation. The study may offer insights into reading comprehension and its instruction within university courses as well as personalized educational development for university professors. Participants’ beliefs, experiences, and meaning making processes are positioned as influences on learning, and participants’ investments of self during educational development are emphasized. Implications for theory include the importance of acknowledging and honouring the complexities of professors’ investments of self in the design and facilitation of initiatives. Related implications for practice include exploration of professors’ beliefs, demonstrated respect and consideration, and responsive communication. Recommendations for future research include extension of the study’s scope and lines of inquiry

    A Path to Simpler Models Starts With Noise

    Full text link
    The Rashomon set is the set of models that perform approximately equally well on a given dataset, and the Rashomon ratio is the fraction of all models in a given hypothesis space that are in the Rashomon set. Rashomon ratios are often large for tabular datasets in criminal justice, healthcare, lending, education, and in other areas, which has practical implications about whether simpler models can attain the same level of accuracy as more complex models. An open question is why Rashomon ratios often tend to be large. In this work, we propose and study a mechanism of the data generation process, coupled with choices usually made by the analyst during the learning process, that determines the size of the Rashomon ratio. Specifically, we demonstrate that noisier datasets lead to larger Rashomon ratios through the way that practitioners train models. Additionally, we introduce a measure called pattern diversity, which captures the average difference in predictions between distinct classification patterns in the Rashomon set, and motivate why it tends to increase with label noise. Our results explain a key aspect of why simpler models often tend to perform as well as black box models on complex, noisier datasets.Comment: NeurIPS 202

    An Analysis of Federal Policy on Public Access to Scientific Research Data

    Get PDF
    The 2013 Office of Science and Technology Policy (OSTP) Memo on federally-funded research directed agencies with research and development budgets above $100 million to develop and release plans to increase and broaden access to research results, both published literature and data. The agency responses have generated discussion and interest but are yet to be analyzed and compared. In this paper, we examine how 19 federal agencies responded to the memo, written by John Holdren, on issues of scientific data and the extent of their compliance to the directives outlined in the memo. We present a varied picture of the readiness of federal science agencies to comply with the memo through a comparative analysis and close reading of the contents of these responses. While some agencies, particularly those with a long history of supporting and conducting science, scored well, other responses indicate that some agencies have only taken a few steps towards implementing policies that comply with the memo. These results are of interest to the data curation community as they reveal how different agencies across the federal government approach their responsibilities for research data management, and how new policies and requirements might continue to affect scientists and research communities.The authors wish to acknowledge the USDA National Agricultural Library for supporting this work through a Cooperative Agreement

    Facilitating individual growth and development

    Get PDF
    viii, 74 p. ; 29 cm. --Both regular and special educators are continually involved in the process of change to better meet the needs of a range of learners. Effective leadership to facilitate and support the process of growth and development is a necessary component of teacher growth. Through this project a small group of special and regular classroom teachers met to develop personal growth goals in the area of special education and to work together to assist each other in meeting their individual goals related to special education. This opportunity allowed participants to share their findings and frustrations and, in a collegial fashion, support each other in the process of change. The goal of the writer was to develop her own leadership skills and abilities in facilitating teachers' individual growth and development in the area of special education. Through ongoing reflection group members continually revisited their individual goals and grew through the process of shared reflection. Through the sharing of ideas each group member was able to incorporate their individual understandings into their own classroom experience. The overall goal was to cause teachers to reflect on and modify their teaching practises. Through this process each member shared their stories and reflections and grew as educators. The process of shared reflection encouraged each member to look deeper at what they were doing as educators of students with special needs and further develop the learning opportunities they offered their students. Through this qualitative research project the writer attempts to share some of the stories, experiences and understandings that demonstrate that growth and development occurred during the course of the project

    TraitBank : practical semantics for organism attribute data

    Get PDF
    © IOS Press and The Author(s), 2016. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in Semantic Web 7 (2016): 577-588, doi:10.3233/SW-150190.Encyclopedia of Life (EOL) has developed TraitBank (http://eol.org/traitbank), a new repository for organism attribute (trait) data. TraitBank aggregates, manages and serves attribute data for organisms across the tree of life, including life history characteristics, habitats, distributions, ecological relationships and other data types. We describe how TraitBank ingests and manages these data in a way that leverages EOL’s existing infrastructure and semantic annotations to facilitate reasoning across the TraitBank corpus and interoperability with other resources. We also discuss TraitBank’s impact on users and collaborators and the challenges and benefits of our lightweight, scalable approach to the integration of biodiversity data.Support for TraitBank was provided by the Alfred P. Sloan Foundation, the Smithsonian Institution, the Marine Biological Laboratory, and the John D. and Catherine T. MacArthur Foundation

    A Discussion of Value Metrics for Data Repositories in Earth and Environmental Sciences

    Get PDF
    Despite growing recognition of the importance of public data to the modern economy and to scientific progress, long-term investment in the repositories that manage and disseminate scientific data in easily accessible-ways remains elusive. Repositories are asked to demonstrate that there is a net value of their data and services to justify continued funding or attract new funding sources. Here, representatives from a number of environmental and Earth science repositories evaluate approaches for assessing the costs and benefits of publishing scientific data in their repositories, identifying various metrics that repositories typically use to report on the impact and value of their data products and services, plus additional metrics that would be useful but are not typically measured. We rated each metric by (a) the difficulty of implementation by our specific repositories and (b) its importance for value determination. As managers of environmental data repositories, we find that some of the most easily obtainable data-use metrics (such as data downloads and page views) may be less indicative of value than metrics that relate to discoverability and broader use. Other intangible but equally important metrics (e.g., laws or regulations impacted, lives saved, new proposals generated), will require considerable additional research to describe and develop, plus resources to implement at scale. As value can only be determined from the point of view of a stakeholder, it is likely that multiple sets of metrics will be needed, tailored to specific stakeholder needs. Moreover, economically based analyses or the use of specialists in the field are expensive and can happen only as resources permit

    Agricultural data management and sharing: Best practices and case study

    Get PDF
    Agricultural data are crucial to many aspects of production, commerce, and research involved in feeding the global community. However, in most agricultural research disciplines standard best practices for data management and publication do not exist. Here we propose a set of best practices in the areas of peer review, minimal dataset development, data repositories, citizen science initiatives, and support for best data management. We illustrate some of these best practices with a case study in dairy agroecosystems research. While many common, and increasingly disparate data management and publication practices are entrenched in agricultural disciplines, opportunities are readily available for promoting and adopting best practices that better enable and enhance data-intensive agricultural research and production
    • …
    corecore