2,551 research outputs found
2023 Projects Day Booklet
https://scholarworks.seattleu.edu/projects-day/1002/thumbnail.jp
Natural language processing
Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems
A mixed-method triangular approach to best practices in combating plagiarism and impersonation in online bachelor’s degree programs
This study examines the phenomenon of plagiarism and impersonation in online course assignments. Technological advancements, coupled with lower costs and accessibility, have made online courses and programs a practical option for higher education students. Unfortunately, the increasing online enrollment and advancing technology have allowed an increase in the opportunity for students to commit the act of plagiarism and impersonation in online course assignments, thus potentially compromising the academic integrity of online degree programs. This study examines the various practices and approaches of plagiarism and impersonation made available to students. Utilizing the systemic review of literature, the researcher compiles a list of 20 best practices in combating plagiarism and impersonation in online course assignments. A Delphi method approach is employed, utilizing the expertise of professors who teach in fully online bachelor’s degree programs. The 20 best practices established through the literature review will be narrowed down to ten best practices via an ordinal ranking questionnaire using a two-round format. The questionnaire distribution occurs via e-mails. Researching professors that teach in fully online bachelor’s degree programs is how the researcher will obtain the e-mails. The first-round e-mail consists of the consent form and the original set of 20 best practices. In addition, a link to the Qualtrics ranking survey will be included in the e-mail. The second-round e-mail consists of the updated 15 best practices ranked from the initial e-mail and a link to the ranking survey. After completing the second round, the establishment of the ten best practices for reducing plagiarism and impersonation in online assignments will emerge. To further validate the 10 best practices, the researcher interviews 10 professors that participated in the original Delphi study. The original consent form includes a link for the participants to access if they select to participate in the interview. After verifying the professors’ intent to participate, a consent form will be obtained. The interviews will be conducted and recorded virtually through zoom. The recordings will be deleted once they are transcribed. This study potentially benefits all online degree programs by establishing the ten best practices for reducing plagiarism and impersonation in online assignments
Datasets for Large Language Models: A Comprehensive Survey
This paper embarks on an exploration into the Large Language Model (LLM)
datasets, which play a crucial role in the remarkable advancements of LLMs. The
datasets serve as the foundational infrastructure analogous to a root system
that sustains and nurtures the development of LLMs. Consequently, examination
of these datasets emerges as a critical topic in research. In order to address
the current lack of a comprehensive overview and thorough analysis of LLM
datasets, and to gain insights into their current status and future trends,
this survey consolidates and categorizes the fundamental aspects of LLM
datasets from five perspectives: (1) Pre-training Corpora; (2) Instruction
Fine-tuning Datasets; (3) Preference Datasets; (4) Evaluation Datasets; (5)
Traditional Natural Language Processing (NLP) Datasets. The survey sheds light
on the prevailing challenges and points out potential avenues for future
investigation. Additionally, a comprehensive review of the existing available
dataset resources is also provided, including statistics from 444 datasets,
covering 8 language categories and spanning 32 domains. Information from 20
dimensions is incorporated into the dataset statistics. The total data size
surveyed surpasses 774.5 TB for pre-training corpora and 700M instances for
other datasets. We aim to present the entire landscape of LLM text datasets,
serving as a comprehensive reference for researchers in this field and
contributing to future studies. Related resources are available at:
https://github.com/lmmlzn/Awesome-LLMs-Datasets.Comment: 181 pages, 21 figure
A comparison of statistical machine learning methods in heartbeat detection and classification
In health care, patients with heart problems require quick responsiveness in a clinical setting or in the operating theatre. Towards that end, automated classification of heartbeats is vital as some heartbeat irregularities are time consuming to detect. Therefore, analysis of electro-cardiogram (ECG) signals is an active area of research. The methods proposed in the literature depend on the structure of a heartbeat cycle. In this paper, we use interval and amplitude based features together with a few samples from the ECG signal as a feature vector. We studied a variety of classification algorithms focused especially on a type of arrhythmia known as the ventricular ectopic fibrillation (VEB). We compare the performance of the classifiers against algorithms proposed in the literature and make recommendations regarding features, sampling rate, and choice of the classifier to apply in a real-time clinical setting. The extensive study is based on the MIT-BIH arrhythmia database. Our main contribution is the evaluation of existing classifiers over a range sampling rates, recommendation of a detection methodology to employ in a practical setting, and extend the notion of a mixture of experts to a larger class of algorithms
A Closer Look into Recent Video-based Learning Research: A Comprehensive Review of Video Characteristics, Tools, Technologies, and Learning Effectiveness
People increasingly use videos on the Web as a source for learning. To
support this way of learning, researchers and developers are continuously
developing tools, proposing guidelines, analyzing data, and conducting
experiments. However, it is still not clear what characteristics a video should
have to be an effective learning medium. In this paper, we present a
comprehensive review of 257 articles on video-based learning for the period
from 2016 to 2021. One of the aims of the review is to identify the video
characteristics that have been explored by previous work. Based on our
analysis, we suggest a taxonomy which organizes the video characteristics and
contextual aspects into eight categories: (1) audio features, (2) visual
features, (3) textual features, (4) instructor behavior, (5) learners
activities, (6) interactive features (quizzes, etc.), (7) production style, and
(8) instructional design. Also, we identify four representative research
directions: (1) proposals of tools to support video-based learning, (2) studies
with controlled experiments, (3) data analysis studies, and (4) proposals of
design guidelines for learning videos. We find that the most explored
characteristics are textual features followed by visual features, learner
activities, and interactive features. Text of transcripts, video frames, and
images (figures and illustrations) are most frequently used by tools that
support learning through videos. The learner activity is heavily explored
through log files in data analysis studies, and interactive features have been
frequently scrutinized in controlled experiments. We complement our review by
contrasting research findings that investigate the impact of video
characteristics on the learning effectiveness, report on tasks and technologies
used to develop tools that support learning, and summarize trends of design
guidelines to produce learning video
- …