81 research outputs found

    Utilizing AI/ML methods for measuring data quality

    Get PDF
    Kvalitní data jsou zásadní pro důvěryhodná rozhodnutí na datech založená. Značná část současných přístupů k měření kvality dat je spojena s náročnou, odbornou a časově náročnou prací, která vyžaduje manuální přístup k dosažení odpovídajících výsledků. Tyto přístupy jsou navíc náchylné k chybám a nevyužívají plně potenciál umělé inteligence (AI). Možným řešením je prozkoumat inovativní nové metody založené na strojovém učení (ML), které využívají potenciál AI k překonání těchto problémů. Významná část práce se zabývá teorií kvality dat, která poskytuje komplexní vhled do této oblasti. V existující literatuře byly objeveny čtyři moderní metody založené na ML a byla navržena jedna nová metoda založená na autoenkodéru (AE). Byly provedeny experimenty s AE a dolováním asociačních pravidel za pomoci metod zpracování přirozeného jazyka. Navrhované metody založené na AE prokázaly schopnost detekce potenciálních problémů s kvalitou dat na datasetech z reálného světa. Dolování asociačních pravidel dokázalo extrahovat byznys pravidla pro stanovený problém, ale vyžadovalo značné úsilí s předzpracováním dat. Alternativní metody nezaložené na AI byly také podrobeny analýze, ale vyžadovaly odborné znalosti daného problému a domény.High-quality data is crucial for trusted data-based decisions. A considerable part of current data quality measuring approaches is associated with expensive, expert and time-consuming work that includes manual effort to achieve adequate results. Furthermore, these approaches are prone to error and do not take full advantage of the AI potential. A possible solution is to explore ML-based state-of-the-art methods that are using the potential of AI to overcome these issues. A significant part of the thesis deals with data quality theory which provides a comprehensive insight into the field of data quality. Four ML-based state-of-the-art methods were discovered in the existing literature, and one novel method based on Autoencoders (AE) was proposed. Experiments with AE and Association Rule Mining using NLP were conducted. Proposed methods based on AE proved to detect potential data quality defects in real-world datasets. Association Rule Mining approach was able to extract business rules for a given business question, but the required significant preprocessing effort. Alternative non-AI methods were also analyzed but required reliance on expert and domain knowledge

    Inside the sequence universe: the amazing life of data and the people who look after them

    Get PDF
    This thesis provides an ethnographic exploration of two large nucleotide sequence databases, the European Molecular Biology Laboratory Bank, UK and GenBank, US. It describes and analyses their complex bioinformatic environments as well as their material-discursive environments – the objects, narratives and practices that recursively constitute these databases. In doing so, it unravels a rich bioinformational ecology – the “sequence universe”. Here, mosquitoes have mumps, the louse is “huge” and self-styled information plumbers patch-up high-throughput data pipelines while data curators battle the indiscriminate coming-to-life caused by metagenomics. Given the intensification of data production, the biosciences have reached a point where concerns have squarely turned to fundamental questions about how to know within and between all that data. This thesis assembles a database imaginary, recovering inventive terms of scholarly engagement with bioinformational databases and data, terms that remain critical without necessarily reverting to a database logic. Science studies and related disciplines, investigating illustrious projects like the UK Biobank, have developed a sustained critique of the perceived conflation of bodies and data. This thesis argues that these accounts forego an engagement with the database sui generis, as a situated arrangement of people, things, routines and spaces. It shows that databases have histories and continue established practices of collecting and curating. At the same time, it maps entanglements of the databases with experiments and discovery thereby demonstrates the vibrancy of data. Focusing on the question of what happens at these databases, the thesis follows data curators and programmers but also database records and the entities documented by them, such as uncultured bacteria. It contextualises ethnographic findings within the literature on the sociology and philosophy of science and technology while also making references to works of art and literature in order to bring into relief the boundary-defying scope of the issues raised

    Open Pedagogy Approaches: Faculty, Library, and Student Collaborations

    Get PDF
    Open Pedagogy Approaches: Faculty, Library, and Student Collaborations is a collection of case studies from higher education institutions across the United States. An open educational resource (OER) in its own right, it offers a diverse compilation of OER and open pedagogy projects grounded in faculty, library, and student collaborations. Open Pedagogy Approaches provides ideas, practical tips, and inspiration for educators willing to explore the power of open, whether that involves a small innovation or a large-scale initiative. Particularly during this pandemic, as libraries struggle against publisher limitations to offer traditional print texts in e-format, libraries are a natural partner in the creation and facilitation of open educational resources and practices. “Going open” offers innovative alternatives that can equitably shift the culture of student access and empowerment in learning. List of chapters: Editor\u27s Preface / Alexis Clifton Foreword / Robin DeRosa Introduction / Kimberly Davies Hoffman, Robert Berkman, Deborah Rossen-Knill, Kristen Totleben, Eileen Daly-Boas, Alexis Clifton, Moriana Garcia, Lev Earle, and Joe Easterly Evolving into the Open: A Framework for Collaborative Design of Renewable Assignments / Stacy Katz and Jennifer Van Allen Informed Open Pedagogy and Information Literacy Instruction in Student-Authored Open Projects / Cynthia Mari Orozco Approaching Open Pedagogy in Community and Collaboration / Caroline Sinkinson and Amanda McAndrew Open Pedagogy Big and Small: Comparing Open Pedagogy Efforts in Large and Small Higher Education Settings / Shanna Hollich and Jacob Moore Adapting Open Educational Course Materials in Undergraduate General Psychology: A Faculty-Librarian-Student Partnership / Dennis E. Schell, Dorinne E. Banks, and Neringa Liutkaite Reading British Modernist Texts: A Case in Open Pedagogy / Mantra Roy, Joe Easterly, and Bette London Humanities in the Open: The Challenges of Creating an Open Literature Anthology / Christian Beck, Lily J. Dubach, Sarah A. Norris, and John Venecek A 2-for-1 Deal: Earn Your AA While Learning About Information Literacy Using OER / Mary Lee Cunill, Sheri Brown, and Tia Esposito Mathematics Courses and the Ohio Open Ed Collaborative: Collaborative Course Content Building for Statewide Use / Daniel Dotson, Anna Davis, Amanda L. Folk, Shanna Jaggars, Marcos D. Rivera, and Kaity Prieto Library Support for Scaffolding OER-enabled Pedagogy in a General Education Science Course / Lindsey Gumb and Heather Miceli Sharing the End of the World: Students’ Perceptions of Their Self-Efficacy in the Creation of Open Access Digital Learning Objects / Sarah Hutton, Lisa Di Valentino, and Paul Musgrave Teaching Wikipedia: A Model for Critical Engagement with Open Information / Amanda Koziura, Jennifer M. Starkey, and Einav Rabinovitch-Fox “And Still We Rise”: Open Pedagogy and Black History at a Rural Comprehensive State College / Joshua F. Beatty, Timothy C. Hartnett, Debra Kimok, and John McMahon Building a Collection of Openly Licensed Student-Developed Videos / Ashley Shea Whose History?: Expanding Place-Based Initiatives Through Open Collaboration / Sean D. Visintainer, Stephanie Anckle, and Kristen Weischedel Scholarly Bridges: SciComm Skill-Building with Student-Created Open Educational Resources / Carrie Baldwin-SoRelle and Jennifer M. Swann Harnessing the Power of Student-Created Content: Faculty and Librarians Collaborating in the Open Educational Environment / Bryan James McGeary, Ashwini Ganeshan, and Christopher S. Guder Open Pedagogical Practices to Train Undergraduates in the Research Process: A Case Study in Course Design and Co-Teaching Strategies / Stephanie N. Lewis, Anne M. Brown, and Amanda B. MacDonald Open Pedagogical Design for Graduate Student Internships, A New Collaborative Model / Laurie N. Taylor and Brian Keith Adventures in a Connectivist MOOC on Open Learning / Susan J. Erickson Invitation to Innovation: Transforming the Argument-Based Research Paper to Multimodal Project / Denise G. Malloy and Sarah Siddiqui “What If We Were To Go?”: Undergraduates Simulate the Building of an NGO From Theory To Practice / Kimberly Davies Hoffman, Rose-Marie Chierici, and Amanda Spencehttps://knightscholar.geneseo.edu/geneseo-authors/1010/thumbnail.jp

    Systematic Analysis of the Factors Contributing to the Variation and Change of the Microbiome

    Get PDF
    abstract: Understanding changes and trends in biomedical knowledge is crucial for individuals, groups, and institutions as biomedicine improves people’s lives, supports national economies, and facilitates innovation. However, as knowledge changes what evidence illustrates knowledge changes? In the case of microbiome, a multi-dimensional concept from biomedicine, there are significant increases in publications, citations, funding, collaborations, and other explanatory variables or contextual factors. What is observed in the microbiome, or any historical evolution of a scientific field or scientific knowledge, is that these changes are related to changes in knowledge, but what is not understood is how to measure and track changes in knowledge. This investigation highlights how contextual factors from the language and social context of the microbiome are related to changes in the usage, meaning, and scientific knowledge on the microbiome. Two interconnected studies integrating qualitative and quantitative evidence examine the variation and change of the microbiome evidence are presented. First, the concepts microbiome, metagenome, and metabolome are compared to determine the boundaries of the microbiome concept in relation to other concepts where the conceptual boundaries have been cited as overlapping. A collection of publications for each concept or corpus is presented, with a focus on how to create, collect, curate, and analyze large data collections. This study concludes with suggestions on how to analyze biomedical concepts using a hybrid approach that combines results from the larger language context and individual words. Second, the results of a systematic review that describes the variation and change of microbiome research, funding, and knowledge are examined. A corpus of approximately 28,000 articles on the microbiome are characterized, and a spectrum of microbiome interpretations are suggested based on differences related to context. The collective results suggest the microbiome is a separate concept from the metagenome and metabolome, and the variation and change to the microbiome concept was influenced by contextual factors. These results provide insight into how concepts with extensive resources behave within biomedicine and suggest the microbiome is possibly representative of conceptual change or a preview of new dynamics within science that are expected in the future.Dissertation/ThesisDoctoral Dissertation Biology 201

    Out of cite, out of mind: the current state of practice, policy, and technology for the citation of data

    Get PDF
    PREFACE The growth in the capacity of the research community to collect and distribute data presents huge opportunities. It is already transforming old methods of scientific research and permitting the creation of new ones. However, the exploitation of these opportunities depends upon more than computing power, storage, and network connectivity. Among the promises of our growing universe of online digital data are the ability to integrate data into new forms of scholarly publishing to allow peer-examination and review of conclusions or analysis of experimental and observational data and the ability for subsequent researchers to make new analyses of the same data, including their combination with other data sets and uses that may have been unanticipated by the original producer or collector. The use of published digital data, like the use of digitally published literature, depends upon the ability to identify, authenticate, locate, access, and interpret them. Data citations provide necessary support for these functions, as well as other functions such as attribution of credit and establishment of provenance. References to data, however, present challenges not encountered in references to literature. For example, how can one specify a particular subset of data in the absence of familiar conventions such as page numbers or chapters? The traditions and good practices for maintaining the scholarly record by proper references to a work are well established and understood in regard to journal articles and other literature, but attributing credit by bibliographic references to data are not yet so broadly implemented

    At the crossroads of big science, open science, and technology transfer

    Get PDF
    Les grans infraestructures científiques s’enfronten a demandes creixents de responsabilitat pública, no només per la seva contribució al descobriment científic, sinó també per la seva capacitat de generar valor econòmic secundari. Per construir i operar les seves infraestructures sofisticades, sovint generen tecnologies frontereres dissenyant i construint solucions tècniques per a problemes d’enginyeria complexos i sense precedents. En paral·lel, la dècada anterior ha presenciat la ràpida irrupció de canvis tecnològics que han afectat la manera com es fa i es comparteix la ciència, cosa que ha comportat l’emergència del concepte d’Open Science (OS). Els governs avancen ràpidament vers aquest paradigma de OS i demanen a les grans infraestructures científiques que "obrin" els seus processos científics. No obstant, aquestes dues forces s'oposen, ja que la comercialització de tecnologies i resultats científics requereixen normalment d’inversions financeres importants i les empreses només estan disposades a assumir aquest cost si poden protegir la innovació de la imitació o de la competència deslleial. Aquesta tesi doctoral té com a objectiu comprendre com les noves aplicacions de les TIC afecten els resultats de la recerca i la transferència de tecnologia resultant en el context de les grans infraestructures científiques. La tesis pretén descobrir les tensions entre aquests dos vectors normatius, així com identificar els mecanismes que s’utilitzen per superar-les. La tesis es compon de quatre estudis: 1) Un estudi que aplica un mètode de recerca mixt que combina dades de dues enquestes d’escala global realitzades online (2016, 2018), amb dos cas d’estudi de dues comunitats científiques en física d’alta energia i biologia molecular que avaluen els factors explicatius darrere les pràctiques de compartir dades per part dels científics; 2) Un estudi de cas d’Open Targets, una infraestructura d’informació basada en dades considerades bens comuns, on el Laboratori Europeu de Biologia Molecular-EBI i empreses farmacèutiques col·laboren i comparteixen dades científiques i eines tecnològiques per accelerar el descobriment de medicaments; 3) Un estudi d’un conjunt de dades únic de 170 projectes finançats en el marc d’ATTRACT (un nou instrument de la Comissió Europea liderat per les grans infraestructures científiques europees) que té com a objectiu comprendre la naturalesa del procés de serendipitat que hi ha darrere de la transició de tecnologies de grans infraestructures científiques a aplicacions comercials abans no anticipades. ; i 4) un cas d’estudi sobre la tecnologia White Rabbit, un hardware sofisticat de codi obert desenvolupat al Consell Europeu per a la Recerca Nuclear (CERN) en col·laboració amb un extens ecosistema d’empreses.Las grandes infraestructuras científicas se enfrentan a crecientes demandas de responsabilidad pública, no solo por su contribución al descubrimiento científico sino también por su capacidad de generar valor económico para la sociedad. Para construir y operar sus sofisticadas infraestructuras, a menudo generan tecnologías de vanguardia al diseñar y construir soluciones técnicas para problemas de ingeniería complejos y sin precedentes. Paralelamente, la década anterior ha visto la irrupción de rápidos cambios tecnológicos que afectan la forma en que se genera y comparte la ciencia, lo que ha llevado a acuñar el concepto de Open Science (OS). Los gobiernos se están moviendo rápidamente hacia este nuevo paradigma y están pidiendo a las grandes infraestructuras científicas que "abran" el proceso científico. Sin embargo, estas dos fuerzas se oponen, ya que la comercialización de tecnología y productos científicos generalmente requiere importantes inversiones financieras y las empresas están dispuestas a asumir este coste solo si pueden proteger la innovación de la imitación o la competencia desleal. Esta tesis doctoral tiene como objetivo comprender cómo las nuevas aplicaciones de las TIC están afectando los resultados científicos y la transferencia de tecnología resultante en el contexto de las grandes infraestructuras científicas. La tesis pretende descubrir las tensiones entre estas dos fuerzas normativas e identificar los mecanismos que se emplean para superarlas. La tesis se compone de cuatro estudios: 1) Un estudio que emplea un método mixto de investigación que combina datos de dos encuestas de escala global realizadas online (2016, 2018), con dos caso de estudio sobre dos comunidades científicas distintas -física de alta energía y biología molecular- que evalúan los factores explicativos detrás de las prácticas de intercambio de datos científicos; 2) Un caso de estudio sobre Open Targets, una infraestructura de información basada en datos considerados como bienes comunes, donde el Laboratorio Europeo de Biología Molecular-EBI y compañías farmacéuticas colaboran y comparten datos científicos y herramientas tecnológicas para acelerar el descubrimiento de fármacos; 3) Un estudio de un conjunto de datos único de 170 proyectos financiados bajo ATTRACT, un nuevo instrumento de la Comisión Europea liderado por grandes infraestructuras científicas europeas, que tiene como objetivo comprender la naturaleza del proceso fortuito detrás de la transición de las tecnologías de grandes infraestructuras científicas a aplicaciones comerciales previamente no anticipadas ; y 4) un estudio de caso de la tecnología White Rabbit, un sofisticado hardware de código abierto desarrollado en el Consejo Europeo de Investigación Nuclear (CERN) en colaboración con un extenso ecosistema de empresas.Big science infrastructures are confronting increasing demands for public accountability, not only within scientific discovery but also their capacity to generate secondary economic value. To build and operate their sophisticated infrastructures, big science often generates frontier technologies by designing and building technical solutions to complex and unprecedented engineering problems. In parallel, the previous decade has seen the disruption of rapid technological changes impacting the way science is done and shared, which has led to the coining of the concept of Open Science (OS). Governments are quickly moving towards the OS paradigm and asking big science centres to "open up” the scientific process. Yet these two forces run in opposition as the commercialization of scientific outputs usually requires significant financial investments and companies are willing to bear this cost only if they can protect the innovation from imitation or unfair competition. This PhD dissertation aims at understanding how new applications of ICT are affecting primary research outcomes and the resultant technology transfer in the context of big and OS. It attempts to uncover the tensions in these two normative forces and identify the mechanisms that are employed to overcome them. The dissertation is comprised of four separate studies: 1) A mixed-method study combining two large-scale global online surveys to research scientists (2016, 2018), with two case studies in high energy physics and molecular biology scientific communities that assess explanatory factors behind scientific data-sharing practices; 2) A case study of Open Targets, an information infrastructure based upon data commons, where European Molecular Biology Laboratory-EBI and pharmaceutical companies collaborate and share scientific data and technological tools to accelerate drug discovery; 3) A study of a unique dataset of 170 projects funded under ATTRACT -a novel policy instrument of the European Commission lead by European big science infrastructures- which aims to understand the nature of the serendipitous process behind transitioning big science technologies to previously unanticipated commercial applications; and 4) a case study of White Rabbit technology, a sophisticated open-source hardware developed at the European Council for Nuclear Research (CERN) in collaboration with an extensive ecosystem of companies

    Literacy for digital futures : Mind, body, text

    Get PDF
    The unprecedented rate of global, technological, and societal change calls for a radical, new understanding of literacy. This book offers a nuanced framework for making sense of literacy by addressing knowledge as contextualised, embodied, multimodal, and digitally mediated. In today’s world of technological breakthroughs, social shifts, and rapid changes to the educational landscape, literacy can no longer be understood through established curriculum and static text structures. To prepare teachers, scholars, and researchers for the digital future, the book is organised around three themes – Mind and Materiality; Body and Senses; and Texts and Digital Semiotics – to shape readers’ understanding of literacy. Opening up new interdisciplinary themes, Mills, Unsworth, and Scholes confront emerging issues for next-generation digital literacy practices. The volume helps new and established researchers rethink dynamic changes in the materiality of texts and their implications for the mind and body, and features recommendations for educational and professional practice

    Social context of creativity

    Get PDF
    This thesis analyses the long-distance control of the environmentally-situated imagination, in both spatial and temporal dimensions. Central to the project is what I call the extended social brain hypothesis. Grounded in the Peircean conception of 'pragmaticism‘, this re-introduces technical intelligence to Dunbar‘s social brain—conceptually, through Clark‘s 'extended mind‘ philosophy, and materially, through Callon‘s 'actor–network theory‘. I claim that: There is no subjectivity without intersubjectivity. That is to say: as an evolutionary matter, it was necessary for the empathic capacities to evolve before the sense of self we identify as human could emerge. Intersubjectivity is critical to human communication, because of its role in interpreting intention. While the idea that human communication requires three levels of intentionality carries analytical weight, I argue that the inflationary trajectory is wrong as an evolutionary matter. The trend is instead towards increasing powers of individuation. The capacity for tool-use is emphasized less under the social brain hypothesis, but the importance of digital manipulation needs to be reasserted as part of a mature ontology. These claims are modulated to substantiate the work-maker, a socially situated (and embodied) creative agent who draws together Peircean notions of epistemology, phenomenology and oral performance

    SKR1BL

    Get PDF
    corecore