2,551 research outputs found

    2023 Projects Day Booklet

    Get PDF
    https://scholarworks.seattleu.edu/projects-day/1002/thumbnail.jp

    Natural language processing

    Get PDF
    Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems

    The Future of Information Sciences : INFuture2015 : e-Institutions – Openness, Accessibility, and Preservation

    Get PDF

    A mixed-method triangular approach to best practices in combating plagiarism and impersonation in online bachelor’s degree programs

    Get PDF
    This study examines the phenomenon of plagiarism and impersonation in online course assignments. Technological advancements, coupled with lower costs and accessibility, have made online courses and programs a practical option for higher education students. Unfortunately, the increasing online enrollment and advancing technology have allowed an increase in the opportunity for students to commit the act of plagiarism and impersonation in online course assignments, thus potentially compromising the academic integrity of online degree programs. This study examines the various practices and approaches of plagiarism and impersonation made available to students. Utilizing the systemic review of literature, the researcher compiles a list of 20 best practices in combating plagiarism and impersonation in online course assignments. A Delphi method approach is employed, utilizing the expertise of professors who teach in fully online bachelor’s degree programs. The 20 best practices established through the literature review will be narrowed down to ten best practices via an ordinal ranking questionnaire using a two-round format. The questionnaire distribution occurs via e-mails. Researching professors that teach in fully online bachelor’s degree programs is how the researcher will obtain the e-mails. The first-round e-mail consists of the consent form and the original set of 20 best practices. In addition, a link to the Qualtrics ranking survey will be included in the e-mail. The second-round e-mail consists of the updated 15 best practices ranked from the initial e-mail and a link to the ranking survey. After completing the second round, the establishment of the ten best practices for reducing plagiarism and impersonation in online assignments will emerge. To further validate the 10 best practices, the researcher interviews 10 professors that participated in the original Delphi study. The original consent form includes a link for the participants to access if they select to participate in the interview. After verifying the professors’ intent to participate, a consent form will be obtained. The interviews will be conducted and recorded virtually through zoom. The recordings will be deleted once they are transcribed. This study potentially benefits all online degree programs by establishing the ten best practices for reducing plagiarism and impersonation in online assignments

    Datasets for Large Language Models: A Comprehensive Survey

    Full text link
    This paper embarks on an exploration into the Large Language Model (LLM) datasets, which play a crucial role in the remarkable advancements of LLMs. The datasets serve as the foundational infrastructure analogous to a root system that sustains and nurtures the development of LLMs. Consequently, examination of these datasets emerges as a critical topic in research. In order to address the current lack of a comprehensive overview and thorough analysis of LLM datasets, and to gain insights into their current status and future trends, this survey consolidates and categorizes the fundamental aspects of LLM datasets from five perspectives: (1) Pre-training Corpora; (2) Instruction Fine-tuning Datasets; (3) Preference Datasets; (4) Evaluation Datasets; (5) Traditional Natural Language Processing (NLP) Datasets. The survey sheds light on the prevailing challenges and points out potential avenues for future investigation. Additionally, a comprehensive review of the existing available dataset resources is also provided, including statistics from 444 datasets, covering 8 language categories and spanning 32 domains. Information from 20 dimensions is incorporated into the dataset statistics. The total data size surveyed surpasses 774.5 TB for pre-training corpora and 700M instances for other datasets. We aim to present the entire landscape of LLM text datasets, serving as a comprehensive reference for researchers in this field and contributing to future studies. Related resources are available at: https://github.com/lmmlzn/Awesome-LLMs-Datasets.Comment: 181 pages, 21 figure

    A comparison of statistical machine learning methods in heartbeat detection and classification

    Get PDF
    In health care, patients with heart problems require quick responsiveness in a clinical setting or in the operating theatre. Towards that end, automated classification of heartbeats is vital as some heartbeat irregularities are time consuming to detect. Therefore, analysis of electro-cardiogram (ECG) signals is an active area of research. The methods proposed in the literature depend on the structure of a heartbeat cycle. In this paper, we use interval and amplitude based features together with a few samples from the ECG signal as a feature vector. We studied a variety of classification algorithms focused especially on a type of arrhythmia known as the ventricular ectopic fibrillation (VEB). We compare the performance of the classifiers against algorithms proposed in the literature and make recommendations regarding features, sampling rate, and choice of the classifier to apply in a real-time clinical setting. The extensive study is based on the MIT-BIH arrhythmia database. Our main contribution is the evaluation of existing classifiers over a range sampling rates, recommendation of a detection methodology to employ in a practical setting, and extend the notion of a mixture of experts to a larger class of algorithms

    A Closer Look into Recent Video-based Learning Research: A Comprehensive Review of Video Characteristics, Tools, Technologies, and Learning Effectiveness

    Full text link
    People increasingly use videos on the Web as a source for learning. To support this way of learning, researchers and developers are continuously developing tools, proposing guidelines, analyzing data, and conducting experiments. However, it is still not clear what characteristics a video should have to be an effective learning medium. In this paper, we present a comprehensive review of 257 articles on video-based learning for the period from 2016 to 2021. One of the aims of the review is to identify the video characteristics that have been explored by previous work. Based on our analysis, we suggest a taxonomy which organizes the video characteristics and contextual aspects into eight categories: (1) audio features, (2) visual features, (3) textual features, (4) instructor behavior, (5) learners activities, (6) interactive features (quizzes, etc.), (7) production style, and (8) instructional design. Also, we identify four representative research directions: (1) proposals of tools to support video-based learning, (2) studies with controlled experiments, (3) data analysis studies, and (4) proposals of design guidelines for learning videos. We find that the most explored characteristics are textual features followed by visual features, learner activities, and interactive features. Text of transcripts, video frames, and images (figures and illustrations) are most frequently used by tools that support learning through videos. The learner activity is heavily explored through log files in data analysis studies, and interactive features have been frequently scrutinized in controlled experiments. We complement our review by contrasting research findings that investigate the impact of video characteristics on the learning effectiveness, report on tasks and technologies used to develop tools that support learning, and summarize trends of design guidelines to produce learning video
    corecore