6,824 research outputs found

    Augmented Behavioral Annotation Tools, with Application to Multimodal Datasets and Models: A Systematic Review

    Get PDF
    Annotation tools are an essential component in the creation of datasets for machine learning purposes. Annotation tools have evolved greatly since the turn of the century, and now commonly include collaborative features to divide labor efficiently, as well as automation employed to amplify human efforts. Recent developments in machine learning models, such as Transformers, allow for training upon very large and sophisticated multimodal datasets and enable generalization across domains of knowledge. These models also herald an increasing emphasis on prompt engineering to provide qualitative fine-tuning upon the model itself, adding a novel emerging layer of direct machine learning annotation. These capabilities enable machine intelligence to recognize, predict, and emulate human behavior with much greater accuracy and nuance, a noted shortfall of which have contributed to algorithmic injustice in previous techniques. However, the scale and complexity of training data required for multimodal models presents engineering challenges. Best practices for conducting annotation for large multimodal models in the most safe and ethical, yet efficient, manner have not been established. This paper presents a systematic literature review of crowd and machine learning augmented behavioral annotation methods to distill practices that may have value in multimodal implementations, cross-correlated across disciplines. Research questions were defined to provide an overview of the evolution of augmented behavioral annotation tools in the past, in relation to the present state of the art. (Contains five figures and four tables)

    Endogenous measures for contextualising large-scale social phenomena: a corpus-based method for mediated public discourse

    Get PDF
    This work presents an interdisciplinary methodology for developing endogenous measures of group membership through analysis of pervasive linguistic patterns in public discourse. Focusing on political discourse, this work critiques the conventional approach to the study of political participation, which is premised on decontextualised, exogenous measures to characterise groups. Considering the theoretical and empirical weaknesses of decontextualised approaches to large-scale social phenomena, this work suggests that contextualisation using endogenous measures might provide a complementary perspective to mitigate such weaknesses. This work develops a sociomaterial perspective on political participation in mediated discourse as affiliatory action performed through language. While the affiliatory function of language is often performed consciously (such as statements of identity), this work is concerned with unconscious features (such as patterns in lexis and grammar). This work argues that pervasive patterns in such features that emerge through socialisation are resistant to change and manipulation, and thus might serve as endogenous measures of sociopolitical contexts, and thus of groups. In terms of method, the work takes a corpus-based approach to the analysis of data from the Twitter messaging service whereby patterns in users’ speech are examined statistically in order to trace potential community membership. The method is applied in the US state of Michigan during the second half of 2018—6 November having been the date of midterm (i.e. non-Presidential) elections in the United States. The corpus is assembled from the original posts of 5,889 users, who are nominally geolocalised to 417 municipalities. These users are clustered according to pervasive language features. Comparing the linguistic clusters according to the municipalities they represent finds that there are regular sociodemographic differentials across clusters. This is understood as an indication of social structure, suggesting that endogenous measures derived from pervasive patterns in language may indeed offer a complementary, contextualised perspective on large-scale social phenomena

    Impact of language skills and system experience on medical information retrieval

    No full text

    Ab Initio Language Teaching in British Higher Education

    Get PDF
    Drawing extensively on the expertise of teachers of German in universities across the UK, this volume offers an overview of recent trends, new pedagogical approaches and practical guidance for teaching at beginners level in the higher education classroom. At a time when entries for UK school exams in modern foreign languages are decreasing, this book serves the urgent need for research and guidance on ab initio learning and teaching in HE. Using the example of teaching German, it offers theoretical reflections on teaching ab initio and practice-oriented approaches that will be useful for teachers of both German and other languages in higher education. The first chapters assess the role of ab initio provision within the wider context of modern languages departments and language centres. They are followed by sections on teaching methods and innovative approaches in the ab initio classroom that include chapters on the use of music, textbook evaluation, the effective use of a flipped classroom and the contribution of language apps. Finally, the book focuses on the learner in the ab initio context and explores issues around autonomy and learner strengths. The whole builds into a theoretically grounded guide that sketches out perspectives for teaching and learning ab initio languages that will benefit current and future generations of students

    Inclusive Intelligent Learning Management System Framework - Application of Data Science in Inclusive Education

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceBeing a disabled student the author faced higher education with a handicap which as experience studying during COVID 19 confinement periods matched the findings in recent research about the importance of digital accessibility through more e-learning intensive academic experiences. Narrative and systematic literature reviews enabled providing context in World Health Organization’s International Classification of Functioning, Disability and Health, legal and standards framework and information technology and communication state-of-the art. Assessing Portuguese higher education institutions’ web sites alerted to the fact that only outlying institutions implemented near perfect, accessibility-wise, websites. Therefore a gap was identified in how accessible the Portuguese higher education websites are, the needs of all students, including those with disabilities, and even the accessibility minimum legal requirements for digital products and the services provided by public or publicly funded organizations. Having identified a problem in society and exploring the scientific base of knowledge for context and state of the art was a first stage in the Design Science Research methodology, to which followed development and validation cycles of an Inclusive Intelligent Learning Management System Framework. The framework blends various Data Science study fields contributions with accessibility guidelines compliant interface design and content upload accessibility compliance assessment. Validation was provided by a focus group whose inputs were considered for the version presented in this dissertation. Not being the purpose of the research to deliver a complete implementation of the framework and lacking consistent data to put all the modules interacting with each other, the most relevant modules were tested with open data as proof of concept. The rigor cycle of DSR started with the inclusion of the previous thesis on Atlñntica University Institute Scientific Repository and is to be completed with the publication of this thesis and the already started PhD’s findings in relevant journals and conferences

    Foundation Models and Fair Use

    Full text link
    Existing foundation models are trained on copyrighted material. Deploying these models can pose both legal and ethical risks when data creators fail to receive appropriate attribution or compensation. In the United States and several other countries, copyrighted content may be used to build foundation models without incurring liability due to the fair use doctrine. However, there is a caveat: If the model produces output that is similar to copyrighted data, particularly in scenarios that affect the market of that data, fair use may no longer apply to the output of the model. In this work, we emphasize that fair use is not guaranteed, and additional work may be necessary to keep model development and deployment squarely in the realm of fair use. First, we survey the potential risks of developing and deploying foundation models based on copyrighted content. We review relevant U.S. case law, drawing parallels to existing and potential applications for generating text, source code, and visual art. Experiments confirm that popular foundation models can generate content considerably similar to copyrighted material. Second, we discuss technical mitigations that can help foundation models stay in line with fair use. We argue that more research is needed to align mitigation strategies with the current state of the law. Lastly, we suggest that the law and technical mitigations should co-evolve. For example, coupled with other policy mechanisms, the law could more explicitly consider safe harbors when strong technical tools are used to mitigate infringement harms. This co-evolution may help strike a balance between intellectual property and innovation, which speaks to the original goal of fair use. But we emphasize that the strategies we describe here are not a panacea and more work is needed to develop policies that address the potential harms of foundation models

    BIM-GPT: a Prompt-Based Virtual Assistant Framework for BIM Information Retrieval

    Full text link
    Efficient information retrieval (IR) from building information models (BIMs) poses significant challenges due to the necessity for deep BIM knowledge or extensive engineering efforts for automation. We introduce BIM-GPT, a prompt-based virtual assistant (VA) framework integrating BIM and generative pre-trained transformer (GPT) technologies to support NL-based IR. A prompt manager and dynamic template generate prompts for GPT models, enabling interpretation of NL queries, summarization of retrieved information, and answering BIM-related questions. In tests on a BIM IR dataset, our approach achieved 83.5% and 99.5% accuracy rates for classifying NL queries with no data and 2% data incorporated in prompts, respectively. Additionally, we validated the functionality of BIM-GPT through a VA prototype for a hospital building. This research contributes to the development of effective and versatile VAs for BIM IR in the construction industry, significantly enhancing BIM accessibility and reducing engineering efforts and training data requirements for processing NL queries.Comment: 35 pages, 15 figure

    Bridging Systems: Open Problems for Countering Destructive Divisiveness across Ranking, Recommenders, and Governance

    Full text link
    Divisiveness appears to be increasing in much of the world, leading to concern about political violence and a decreasing capacity to collaboratively address large-scale societal challenges. In this working paper we aim to articulate an interdisciplinary research and practice area focused on what we call bridging systems: systems which increase mutual understanding and trust across divides, creating space for productive conflict, deliberation, or cooperation. We give examples of bridging systems across three domains: recommender systems on social media, collective response systems, and human-facilitated group deliberation. We argue that these examples can be more meaningfully understood as processes for attention-allocation (as opposed to "content distribution" or "amplification") and develop a corresponding framework to explore similarities - and opportunities for bridging - across these seemingly disparate domains. We focus particularly on the potential of bridging-based ranking to bring the benefits of offline bridging into spaces which are already governed by algorithms. Throughout, we suggest research directions that could improve our capacity to incorporate bridging into a world increasingly mediated by algorithms and artificial intelligence.Comment: 40 pages, 11 figures. See https://bridging.systems for more about this wor

    Interdisciplinarity as a political instrument of governance and its consequences for doctoral training

    Get PDF
    UK educational policies exploit interdisciplinarity as a marketing tool in a competitive educational world by building images of prosperous futures for society, the economy, and universities. Following this narrative, interdisciplinary science is promoted as superior to disciplinary forms of research and requires the training of future researchers accordingly, with interdisciplinary doctoral education becoming more established in universities. This emphasis on the growth of interdisciplinary science polarises scholars’ views on the role of academic research between the production of knowledge on the one hand and knowledge as an economic resource at the other end of the spectrum. This research asks: what is the rationale behind the perceived value of interdisciplinary research and training, and how does it affect graduate students’ experiences of their PhD? Based on a practice theory perspective for its suitability in generating insights into how university’s social life is organised, reproduced and transformed, the doctorate is conceptualised as sets of interconnected practices that are observable as they happen. This current study, therefore, comprised two stages of data collection and analysis; the examination of documents to elucidate educational policy practices and an educational ethnography of an interdisciplinary doctoral programme. This study found interdisciplinary doctoral training is hindered by the lack of role models and positive social relationships, which are crucial to the way interdisciplinary students learn. Furthermore, it is argued that interdisciplinarity is sometimes applied to research as a label to fit with funders’ requirements. Specifically, in this case, medical optical imaging is best seen as an interdiscipline as it does not exhibit true interdisciplinary integration. Further insights show that while interdisciplinarity is promoted in policy around promises and expectations for a better future, it is in tension with how it is organisationally embedded in higher education. These insights form the basis for a list of practical recommendations for institutions. Overall, interdisciplinary doctoral training was observed to present students with difficulties and to leave policy concerns unaddressed

    Towards Mobility Data Science (Vision Paper)

    Full text link
    Mobility data captures the locations of moving objects such as humans, animals, and cars. With the availability of GPS-equipped mobile devices and other inexpensive location-tracking technologies, mobility data is collected ubiquitously. In recent years, the use of mobility data has demonstrated significant impact in various domains including traffic management, urban planning, and health sciences. In this paper, we present the emerging domain of mobility data science. Towards a unified approach to mobility data science, we envision a pipeline having the following components: mobility data collection, cleaning, analysis, management, and privacy. For each of these components, we explain how mobility data science differs from general data science, we survey the current state of the art and describe open challenges for the research community in the coming years.Comment: Updated arXiv metadata to include two authors that were missing from the metadata. PDF has not been change
    • 

    corecore