Search CORE

27 research outputs found

An Empirical Analysis of Predictive Machine Learning Algorithms on High-Dimensional Microarray Cancer Data

Author: Bill Jo A
Publication venue: RIT Scholar Works
Publication date: 01/07/2015
Field of study

This research evaluates pattern recognition techniques on a subclass of big data where the dimensionality of the input space p is much larger than the number of observations n. Seven gene-expression microarray cancer datasets, where the ratio κ = n/p is less than one, were chosen for evaluation. The statistical and computational challenges inherent with this type of high-dimensional low sample size (HDLSS) data were explored. The capability and performance of a diverse set of machine learning algorithms is presented and compared. The sparsity and collinearity of the data being employed, in conjunction with the complexity of the algorithms studied, demanded rigorous and careful tuning of the hyperparameters and regularization parameters. This necessitated several extensions of cross-validation to be investigated, with the purpose of culminating in the best predictive performance. For the techniques evaluated in this thesis, regularization or kernelization, and often both, produced lower classiﬁcation error rates than randomized ensemble for all datasets used in this research. However, no one technique evaluated for classifying HDLSS microarray cancer data emerged as the universally best technique for predicting the generalization error.1 From the empirical analysis performed in this thesis, the following fundamentals emerged as being instrumental in consistently resulting in lower error rates when estimating the generalization error in this HDLSS microarray cancer data: • Thoroughly investigate and understand the data • Stratify during all sampling due to the uneven classes and extreme sparsity of this data. • Perform 3 to 5 replicates of stratiﬁed cross-validation, implementing an adaptive K-fold, to determine the optimal tuning parameters. • To estimate the generalization error in HDLSS data, replication is paramount. Replicate R=500 or R=1000 times with training and test sets of 2/3 and 1/3, respectively, to get the best generalization error estimate. • Whenever possible, obtain an independent validation dataset. • Seed the data for a fair and unbiased comparison among techniques. • Deﬁne a methodology or standard set of process protocols to apply to machine learning research. This would prove very beneﬁcial in ensuring reproducibility and would enable better comparisons among techniques. _____ 1A predominant portion of this research was published in the Serdica Journal of Computing (Volume 8, Number 2, 2014) as proceedings from the 2014 Flint International Statistical Conference at Kettering University, Michigan, USA

RIT Scholar Works

Measuring academic stress ‘in the wild’ with wearable sensors: removal of noise from wearable sensor data using FIR filters

Author: Dobbins Chelsea
Fairclough Stephen
Harris Benjamin
Lisboa Paulo
Publication venue
Publication date: 01/01/2017
Field of study

University of Queensland eSpace

Semantic Approaches for Knowledge Discovery and Retrieval in Biomedicine

Author: Wilkowski Bartlomiej
Publication venue: Technical University of Denmark
Publication date: 01/01/2011
Field of study

Online Research Database In Technology

Handbook of Digital Face Manipulation and Detection

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/02/2022
Field of study

This open access book provides the first comprehensive collection of studies dealing with the hot topic of digital face manipulation such as DeepFakes, Face Morphing, or Reenactment. It combines the research fields of biometrics and media forensics including contributions from academia and industry. Appealing to a broad readership, introductory chapters provide a comprehensive overview of the topic, which address readers wishing to gain a brief overview of the state-of-the-art. Subsequent chapters, which delve deeper into various research challenges, are oriented towards advanced readers. Moreover, the book provides a good starting point for young researchers as well as a reference guide pointing at further literature. Hence, the primary readership is academic institutions and industry currently involved in digital face manipulation and detection. The book could easily be used as a recommended text for courses in image processing, machine learning, media forensics, biometrics, and the general security area

Directory of Open Access Books (DOAB)

Adopting Automated Bug Assignment in Practice: A Longitudinal Case Study at Ericsson

Author: Bartalos Béla
Borg Markus
Engström Emelie
Jonsson Leif
Szabó Attila
Publication venue
Publication date: 19/09/2022
Field of study

The continuous inflow of bug reports is a considerable challenge in large development projects. Inspired by contemporary work on mining software repositories, we designed a prototype bug assignment solution based on machine learning in 2011-2016. The prototype evolved into an internal Ericsson product, TRR, in 2017-2018. TRR's first bug assignment without human intervention happened in April 2019. Our study evaluates the adoption of TRR within its industrial context at Ericsson. Moreover, we investigate 1) how TRR performs in the field, 2) what value TRR provides to Ericsson, and 3) how TRR has influenced the ways of working. We conduct an industrial case study combining interviews with TRR stakeholders, minutes from sprint planning meetings, and bug tracking data. The data analysis includes thematic analysis, descriptive statistics, and Bayesian causal analysis. TRR is now an incorporated part of the bug assignment process. Considering the abstraction levels of the telecommunications stack, high-level modules are more positive while low-level modules experienced some drawbacks. On average, TRR automatically assigns 30% of the incoming bug reports with an accuracy of 75%. Auto-routed TRs are resolved around 21% faster within Ericsson, and TRR has saved highly seasoned engineers many hours of work. Indirect effects of adopting TRR include process improvements, process awareness, increased communication, and higher job satisfaction. TRR has saved time at Ericsson, but the adoption of automated bug assignment was more intricate compared to similar endeavors reported from other companies. We primarily attribute the difference to the very large size of the organization and the complex products. Key facilitators in the successful adoption include a gradual introduction, product champions, and careful stakeholder analysis.Comment: Under revie

arXiv.org e-Print Archive

Handbook of Digital Face Manipulation and Detection

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

OAPEN Library

Predicting Academic Performance: A Systematic Literature Review

Author: Abdulwahhab R. S.
Aggarwal H.W.
Agudo-Peregrina Ángel F
al Rifaie Mohammad Majid
Allan Fransiskus
Almuniri Ismail
Ashenafi Michael Mogessie
Aziz Fatihah
Baker Ryan SJD
Barba-Guamán L.
Bayer Jaroslav
Blei David M.
Bydžovská Hana
Cengiz Nihat
Chan L.
Chaturvedi R.
Chen YY
Choi D.S.
Chunqiao Mi
Collura Michael A
Corsatea B. M.
Cribbs Jennifer D
Deliz José R
DeMonbrun R.M.
Dávila Saylisse
Edmundo
Evale Digna S
Fincher Sally
Gil-Herrera Eleazar
Gray Geraldine
Güner Necdet
Haig Thomas
Han M.
Ho Chia-Lin
Hornik Kurt
Howell Larry L
Hu Qian
Huang Shaobo
Huang Yun
Imbrie PK
Jiang Suhang
Jove E.
Kai Shimin
Kaur P.
Kentli Fulya Damla
Kuehn M.
Kumar A. Dinesh
Kumar Mukesh
Luo Jingyi
Luo Ling
Manoharan J James
Mashiloane Lebogang
Mayilvaganan M
Mhetre V.
Moradi F.
Morsy S.
Nedungadi Prema
Ninrutsirikun U.
Paimin Aini Nazura
Paimin Aini Nazura
Palmer Stuart
Pandey Mrinal
Papamitsiou Zacharoula
Pardos Zachary A
Patrick A Borrego
Pushpa S.K.
Raman D.R.
Ramanathan L.
Ramirez Nichole
Raura G.
Raymond Ting Siu-Man
Reid Kenneth
Reisberg Rachelle
Ren Zhiyun
Rhodes Nicholas
Ringenberg Jeff
Sadati S.
Sadler William E
Schar Mark
Sievert Carson
Sivasakthi M.
Sorby Sheryl
Strawderman Lesley
Sugiharti E.
Tieu Hoang
Tomkins Sabina
Tsalatsanis Athanasios
Uswatun Annisa
Verma S.K.
Vihavainen Arto
Vogt Christina
Wang Feng
Wolff Thomas F
Wu Xinhui
Wyk Barend Van
Yang T.-Y.
Yeh Her-Tyan
Zhu Ke
Publication venue: ACM
Publication date: 01/01/2018
Field of study

The ability to predict student performance in a course or program creates opportunities to improve educational outcomes. With effective performance prediction approaches, instructors can allocate resources and instruction more accurately. Research in this area seeks to identify features that can be used to make predictions, to identify algorithms that can improve predictions, and to quantify aspects of student performance. Moreover, research in predicting student performance seeks to determine interrelated features and to identify the underlying reasons why certain features work better than others. This working group report presents a systematic literature review of work in the area of predicting student performance. Our analysis shows a clearly increasing amount of research in this area, as well as an increasing variety of techniques used. At the same time, the review uncovered a number of issues with research quality that drives a need for the community to provide more detailed reporting of methods and results and to increase efforts to validate and replicate work.Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

Monash University Research Portal

Compilation of thesis abstracts, September 2009

Author
Publication venue: Monterey, California; Naval Postgraduate School
Publication date: 01/09/2009
Field of study

NPS Class of September 2009This quarter’s Compilation of Abstracts summarizes cutting-edge, security-related research conducted by NPS students and presented as theses, dissertations, and capstone reports. Each expands knowledge in its field.http://archive.org/details/compilationofsis109452751

Calhoun, Institutional Archive of the Naval Postgraduate School

PaLM: Scaling Language Modeling with Pathways

Author: Agrawal Shivani
Austin Jacob
Barham Paul
Barnes Parker
Bosma Maarten
Bradbury James
Catasta Michele
Child Rewon
Chowdhery Aakanksha
Chung Hyung Won
Dai Andrew M.
Dean Jeff
Dev Sunipa
Devlin Jacob
Diaz Mark
Dohan David
Du Nan
Duke Toju
Eck Douglas
Fedus Liam
Fiedel Noah
Firat Orhan
Garcia Xavier
Gehrmann Sebastian
Ghemawat Sanjay
Gur-Ari Guy
Hutchinson Ben
Ippolito Daphne
Isard Michael
Lee Katherine
Levskaya Anselm
Lewkowycz Aitor
Lim Hyeontaek
Luan David
Maynez Joshua
Meier-Hellstern Kathy
Michalewski Henryk
Mishra Gaurav
Misra Vedant
Moreira Erica
Narang Sharan
Omernick Mark
Pellat Marie
Petrov Slav
Pillai Thanumalayan Sankaranarayana
Polozov Oleksandr
Pope Reiner
Prabhakaran Vinodkumar
Rao Abhishek
Reif Emily
Roberts Adam
Robinson Kevin
Saeta Brennan
Schuh Parker
Sepassi Ryan
Shazeer Noam
Shi Kensen
Spiridonov Alexander
Sutton Charles
Tay Yi
Tsvyashchenko Sasha
Wang Xuezhi
Wei Jason
Yin Pengcheng
Zhou Denny
Zhou Zongwei
Zoph Barret
Publication venue
Publication date: 19/04/2022
Field of study

Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM. We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods. We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks. On a number of these tasks, PaLM 540B achieves breakthrough performance, outperforming the finetuned state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark. A significant number of BIG-bench tasks showed discontinuous improvements from model scale, meaning that performance steeply increased as we scaled to our largest model. PaLM also has strong capabilities in multilingual tasks and source code generation, which we demonstrate on a wide array of benchmarks. We additionally provide a comprehensive analysis on bias and toxicity, and study the extent of training data memorization with respect to model scale. Finally, we discuss the ethical considerations related to large language models and discuss potential mitigation strategies

arXiv.org e-Print Archive

Entropy in Image Analysis II

Author
Publication venue: 'MDPI AG'
Publication date: 01/05/2021
Field of study

Image analysis is a fundamental task for any application where extracting information from images is required. The analysis requires highly sophisticated numerical and analytical methods, particularly for those applications in medicine, security, and other fields where the results of the processing consist of data of vital importance. This fact is evident from all the articles composing the Special Issue "Entropy in Image Analysis II", in which the authors used widely tested methods to verify their results. In the process of reading the present volume, the reader will appreciate the richness of their methods and applications, in particular for medical imaging and image security, and a remarkable cross-fertilization among the proposed research areas

Directory of Open Access Books (DOAB)