535 research outputs found

    GAGA: Deciphering Age-path of Generalized Self-paced Regularizer

    Full text link
    Nowadays self-paced learning (SPL) is an important machine learning paradigm that mimics the cognitive process of humans and animals. The SPL regime involves a self-paced regularizer and a gradually increasing age parameter, which plays a key role in SPL but where to optimally terminate this process is still non-trivial to determine. A natural idea is to compute the solution path w.r.t. age parameter (i.e., age-path). However, current age-path algorithms are either limited to the simplest regularizer, or lack solid theoretical understanding as well as computational efficiency. To address this challenge, we propose a novel \underline{G}eneralized \underline{Ag}e-path \underline{A}lgorithm (GAGA) for SPL with various self-paced regularizers based on ordinary differential equations (ODEs) and sets control, which can learn the entire solution spectrum w.r.t. a range of age parameters. To the best of our knowledge, GAGA is the first exact path-following algorithm tackling the age-path for general self-paced regularizer. Finally the algorithmic steps of classic SVM and Lasso are described in detail. We demonstrate the performance of GAGA on real-world datasets, and find considerable speedup between our algorithm and competing baselines.Comment: 33 pages. Published as a conference paper at NeurIPS 202

    MetaAge: Meta-Learning Personalized Age Estimators

    Full text link
    Different people age in different ways. Learning a personalized age estimator for each person is a promising direction for age estimation given that it better models the personalization of aging processes. However, most existing personalized methods suffer from the lack of large-scale datasets due to the high-level requirements: identity labels and enough samples for each person to form a long-term aging pattern. In this paper, we aim to learn personalized age estimators without the above requirements and propose a meta-learning method named MetaAge for age estimation. Unlike most existing personalized methods that learn the parameters of a personalized estimator for each person in the training set, our method learns the mapping from identity information to age estimator parameters. Specifically, we introduce a personalized estimator meta-learner, which takes identity features as the input and outputs the parameters of customized estimators. In this way, our method learns the meta knowledge without the above requirements and seamlessly transfers the learned meta knowledge to the test set, which enables us to leverage the existing large-scale age datasets without any additional annotations. Extensive experimental results on three benchmark datasets including MORPH II, ChaLearn LAP 2015 and ChaLearn LAP 2016 databases demonstrate that our MetaAge significantly boosts the performance of existing personalized methods and outperforms the state-of-the-art approaches.Comment: Accepted by IEEE Transactions on Image Processing (TIP

    Searching for Needles in the Cosmic Haystack

    Get PDF
    Searching for pulsar signals in radio astronomy data sets is a difficult task. The data sets are extremely large, approaching the petabyte scale, and are growing larger as instruments become more advanced. Big Data brings with it big challenges. Processing the data to identify candidate pulsar signals is computationally expensive and must utilize parallelism to be scalable. Labeling benchmarks for supervised classification is costly. To compound the problem, pulsar signals are very rare, e.g., only 0.05% of the instances in one data set represent pulsars. Furthermore, there are many different approaches to candidate classification with no consensus on a best practice. This dissertation is focused on identifying and classifying radio pulsar candidates from single pulse searches. First, to identify and classify Dispersed Pulse Groups (DPGs), we developed a supervised machine learning approach that consists of RAPID (a novel peak identification algorithm), feature extraction, and supervised machine learning classification. We tested six algorithms for classification with four imbalance treatments. Results showed that classifiers with imbalance treatments had higher recall values. Overall, classifiers using multiclass RandomForests combined with Synthetic Majority Oversampling TEchnique (SMOTE) were the most efficient; they identified additional known pulsars not in the benchmark, with less false positives than other classifiers. Second, we developed a parallel single pulse identification method, D-RAPID, and introduced a novel automated multiclass labeling (ALM) technique that we combined with feature selection to improve execution performance. D-RAPID improved execution performance over RAPID by a factor of 5. We also showed that the combination of ALM and feature selection sped up the execution performance of RandomForest by 54% on average with less than a 2% average reduction in classification performance. Finally, we proposed CoDRIFt, a novel classification algorithm that is distributed for scalability and employs semi-supervised learning to leverage unlabeled data to inform classification. We evaluated and compared CoDRIFt to eleven other classifiers. The results showed that CoDRIFt excelled at classifying candidates in imbalanced benchmarks with a majority of non-pulsar signals (\u3e95%). Furthermore, CoDRIFt models created with very limited sets of labeled data (as few as 22 labeled minority class instances) were able to achieve high recall (mean = 0.98). In comparison to the other algorithms trained on similar sets, CoDRIFt outperformed them all, with recall 2.9% higher than the next best classifier and a 35% average improvement over all eleven classifiers. CoDRIFt is customizable for other problem domains with very large, imbalanced data sets, such as fraud detection and cyber attack detection

    Computing Competencies for Undergraduate Data Science Curricula: ACM Data Science Task Force

    Get PDF
    At the August 2017 ACM Education Council meeting, a task force was formed to explore a process to add to the broad, interdisciplinary conversation on data science, with an articulation of the role of computing discipline-specific contributions to this emerging field. Specifically, the task force would seek to define what the computing/computational contributions are to this new field, and provide guidance on computing-specific competencies in data science for departments offering such programs of study at the undergraduate level. There are many stakeholders in the discussion of data science – these include colleges and universities that (hope to) offer data science programs, employers who hope to hire a workforce with knowledge and experience in data science, as well as individuals and professional societies representing the fields of computing, statistics, machine learning, computational biology, computational social sciences, digital humanities, and others. There is a shared desire to form a broad interdisciplinary definition of data science and to develop curriculum guidance for degree programs in data science. This volume builds upon the important work of other groups who have published guidelines for data science education. There is a need to acknowledge the definition and description of the individual contributions to this interdisciplinary field. For instance, those interested in the business context for these concepts generally use the term “analytics”; in some cases, the abbreviation DSA appears, meaning Data Science and Analytics. This volume is the third draft articulation of computing-focused competencies for data science. It recognizes the inherent interdisciplinarity of data science and situates computing-specific competencies within the broader interdisciplinary space

    A Mixed-Methods Study of Students’ Success and Persistence in Biology

    Get PDF
    Undergraduate success and persistence in Science Technology Engineering and Mathematics (STEM) fields is of critical importance to the United States (U.S.) maintenance of its position as the world leader in scientific innovations. While the total number of undergraduate degrees awarded annually has nearly tripled over the past 40 years, the same cannot be said for the proportion of degrees in Science Technology, Engineering and Mathematics (STEM) fields. The U.S. share of the world’s STEM graduates is sharply declining, on average less than 40% of incoming college freshmen elect to pursue a degree in a STEM field each year, with more than half of those individuals declaring a major in the biological sciences or a closely related area (e.g., premedicine, pre-health or nursing). Research indicates that, there is need to promote success and persistence among the undergraduates undertaking STEM fields. In an effort to address this call, a majority of research has employed a variety of empirically validated instruction strategies designed to promote undergraduate success and persistence in biological sciences. Although of integral importance, such studies have often not extensively explored the impact of motivational and attitudinal factors in tandem with demographic and educational characteristics, especially in the field of biology. The current study used quantitative methods utilizing Quasi experimental design to examine the impact of motivational and attitudinal factors alongside with demographic and secondary characteristics in relation to students’ success and persistence in biology among students enrolled in two introductory biology courses (Principles of Biology and Organismal Biology) at a mid-size research and teaching university. Additionally, the study examined to what extent do such factors differentially predict success and persistence among underrepresented minority and first generation students within the aforementioned cohort. A second component of the study used qualitative inquiry and thematic data analysis techniques, to explore the persistence of both average and below average performing students in biology by examining their experiences in biology program. Analyses examining student success found that motivational factors were equally important predictors of success among all student types. The top demographic predictors of success were: index score (a combination of high school GPA, SAT and ACT scores), minority status and first generation status, uniquely explaining 4.7%, 3.0% and 1% of variance in students’ course grade, respectively. The attitudinal predictors of students’ success were: students’ ability to apply knowledge to solve biology-specific tasks and enjoyment of the biology major each explaining 1.0% of variance in students’ final course grade. Among the underrepresented minority students, dual enrollment in an active learning-based supplemental instruction course explained 1.1% of the variance. Analyses examining predictors of persistence in biology found that self-efficacy and grade motivation were the important motivational factors predicting students’ persistence. Strategies employed by students to solve biology problems was the only attitudinal factor important for persistence in biology. Students’ final percent course grade in introductory biology courses also emerged as a significant predictor of student persistence in biology. Interestingly, first generation students were more likely to persist in biology compared to continuing students, while minority students were less likely to persist in biology compared to non-minority students. The qualitative aspects of this study involved 12 participants, among these, 10 had persisted in biology while 2 had switched from biology to other majors. The four most important factors highlighted by the participants were: challenges associated with transitioning from high school to college, instructional aspects of the introductory biology courses, effects of participants’ social interactions and aspects of competition and weeding out in biology introductory courses. The results and findings from this study suggests several things. First, developing and nurturing proper motivations and positive attitudes in post-secondary classrooms alongside with factoring motivational and attitudinal factors that are important for URMs and FGs success and persistence may be a step forward in addressing the critical problem of success in STEM fields in general. Second, meaningful engagement of students in solving biology related problems appears to be an essential task of educators leading first-semester biology experiences. Thirdly, approaches geared towards increasing student success in introductory courses seem to be essential in students’ persistence in specific majors. Finally, the study findings suggest that students’ success and persistence in biology may be reduced with sufficient streamlining of high school preparation to meet college level expectations with respect to what high school graduates entering college need to know and be able to do for success and persistence in college

    Machine Learning in Sensors and Imaging

    Get PDF
    Machine learning is extending its applications in various fields, such as image processing, the Internet of Things, user interface, big data, manufacturing, management, etc. As data are required to build machine learning networks, sensors are one of the most important technologies. In addition, machine learning networks can contribute to the improvement in sensor performance and the creation of new sensor applications. This Special Issue addresses all types of machine learning applications related to sensors and imaging. It covers computer vision-based control, activity recognition, fuzzy label classification, failure classification, motor temperature estimation, the camera calibration of intelligent vehicles, error detection, color prior model, compressive sensing, wildfire risk assessment, shelf auditing, forest-growing stem volume estimation, road management, image denoising, and touchscreens

    Methods to Improve the Prediction Accuracy and Performance of Ensemble Models

    Get PDF
    The application of ensemble predictive models has been an important research area in predicting medical diagnostics, engineering diagnostics, and other related smart devices and related technologies. Most of the current predictive models are complex and not reliable despite numerous efforts in the past by the research community. The performance accuracy of the predictive models have not always been realised due to many factors such as complexity and class imbalance. Therefore there is a need to improve the predictive accuracy of current ensemble models and to enhance their applications and reliability and non-visual predictive tools. The research work presented in this thesis has adopted a pragmatic phased approach to propose and develop new ensemble models using multiple methods and validated the methods through rigorous testing and implementation in different phases. The first phase comprises of empirical investigations on standalone and ensemble algorithms that were carried out to ascertain their performance effects on complexity and simplicity of the classifiers. The second phase comprises of an improved ensemble model based on the integration of Extended Kalman Filter (EKF), Radial Basis Function Network (RBFN) and AdaBoost algorithms. The third phase comprises of an extended model based on early stop concepts, AdaBoost algorithm, and statistical performance of the training samples to minimize overfitting performance of the proposed model. The fourth phase comprises of an enhanced analytical multivariate logistic regression predictive model developed to minimize the complexity and improve prediction accuracy of logistic regression model. To facilitate the practical application of the proposed models; an ensemble non-invasive analytical tool is proposed and developed. The tool links the gap between theoretical concepts and practical application of theories to predict breast cancer survivability. The empirical findings suggested that: (1) increasing the complexity and topology of algorithms does not necessarily lead to a better algorithmic performance, (2) boosting by resampling performs slightly better than boosting by reweighting, (3) the prediction accuracy of the proposed ensemble EKF-RBFN-AdaBoost model performed better than several established ensemble models, (4) the proposed early stopped model converges faster and minimizes overfitting better compare with other models, (5) the proposed multivariate logistic regression concept minimizes the complexity models (6) the performance of the proposed analytical non-invasive tool performed comparatively better than many of the benchmark analytical tools used in predicting breast cancers and diabetics ailments. The research contributions to ensemble practice are: (1) the integration and development of EKF, RBFN and AdaBoost algorithms as an ensemble model, (2) the development and validation of ensemble model based on early stop concepts, AdaBoost, and statistical concepts of the training samples, (3) the development and validation of predictive logistic regression model based on breast cancer, and (4) the development and validation of a non-invasive breast cancer analytic tools based on the proposed and developed predictive models in this thesis. To validate prediction accuracy of ensemble models, in this thesis the proposed models were applied in modelling breast cancer survivability and diabetics’ diagnostic tasks. In comparison with other established models the simulation results of the models showed improved predictive accuracy. The research outlines the benefits of the proposed models, whilst proposes new directions for future work that could further extend and improve the proposed models discussed in this thesis

    Hope College Abstracts: 20th Annual Celebration of Undergraduate Research and Creative Activity

    Get PDF
    The 20th Annual Celebration of Undergraduate Research and Creative Activity was held online on April 30, 2021. The event featured student-faculty collaborative research projects presented remotely due to the COVID-19 pandemic. This program is a record reflective of those projects between the 2020-2021 academic year

    Event-Based Measurement and Mean Annual Flux Assessment of Suspended Sediment in Meso Scale Catchments

    Get PDF
    Eingriffe durch den Menschen haben den Sedimenthaushalt fluvialer Systeme grundlegend verändert, mit negativen ökologischen und wirtschaftlichen Folgen. Landwirtschaft und Abholzung haben das Ausmaß natürlicher Erosion mehr als verdoppelt, während große Staudämme mindestens 25% des global transportierten Sediments auffangen. Sediment ist ein Transportvektor für Schad- und Nährstoffe. Große Anteile der mittleren Jahresfracht werden in kurzen und häufig unzureichend beprobten Zeiträumen mit hohen Abflüssen transportiert. Genaue Schätzungen der mittleren Jahresfracht von suspendiertem Sediment (abfiltrierbare Stoffe) und partikulär transportierten Stoffen werden für mesoskalige Einzugsgebieten (50–250 km²) zur Bewirtschaftungsplanung und zur Plausibilisierung von flächendeckenden Stoffeintragsmodellen benötigt. Um Hochwasserfrachten von abfiltrierbaren Stoffen und Phosphor zu erfassen, wurden großvolumige Sammler (LVS, Probevolumen = 1 m³) in drei mesoskaligen Einzugsgebieten unterschiedlicher klimatischer, pedologischer und landnutzungsbezogener Bedingungen zur abflussproportionalen Beprobung installiert. Aus dem höchsten gemessenen Abfluss im Beprobungszeitraum und der mittleren Ereigniskonzentration wurde ein neuartiger Ansatz zur Erstellung von Konzentrationsschlüsselkurven entwickelt. So konnten Frachten berechnet und mit Berechnungen auf Basis von Einzelbeprobungen verglichen werden. Der LVS erwies sich an zwei Standorten als robust gegenüber veränderlichen Abflussbedingungen, konnte am dritten Standort wegen technischer Mängel aber keine Hochwassermessungen durchführen. An diesem Standort wurde der Sedimentkörper des unterliegenden Stausees als Bezugswert verwendet. Die Berechnungen auf Basis von Einzelbeprobungen sind systematisch zu niedrig. Der Fehler wird durch die spezifische Sedimentfracht, den Grad der Assoziation zu Partikeln und die Volatilität der Abflüsse bestimmt. Ein existierender Abflussvolatilitätsindex (FFI) wurde zu einem Frachtvolatilitätsindex (LFI) angepasst um die Kürze von hohen Abflüssen und deren Bedeutung für die Gesamtfracht in einer Kennzahl zu vereinen. Die Beprobung mit dem LVS und die Anwendung der Schlüsselkurven wird für frachtvolatile Einzugsgebiete und Parameter, zur Begrenzung von Analysekosten sowie in Fällen empfohlen, wenn in-situ gemessene Näherungsparameter nicht angemessen sind. Durch Frachtmessungen und –modellierung wurden Defizite beim Stoffeintragsmodell deutlich, die größtenteils auf den Eintragspfad Erosion und dessen statische Konnektivitätsansätze des Einzugsgebiets zurückgehen. Entgegen der derzeitig üblichen Praxis müssen Beprobungsstrategien der Frachtvolatilität angepasst werden, um Eintragsmodelle aussagekräftig Plausibilisieren zu können. Zusätzlich wurde eine neue wartungsarme Methode zur autonomen, kontaktlosen Reflektanzmessung entwickelt und mit Trübungsmessungen und Schätzungen durch Schlüsselkurven verglichen. Während mehrerer Messkampagnen auf einem Stausee konnten durch Reflektanzmessungen in einer Regression der partiellen kleinsten Quadrate hohe Anpassungsgüten für Gesamtphosphor und abfiltrierbare Stoffe erreicht werden. Dauermessungen am Fluss wurden durch veränderliche Lichtbedingungen im Tages- und Jahresverlauf eingeschränkt und konnten die Frachtermittlung in elf Validierungsmessungen nicht verbessern. Trübemessungen und Schlüsselkurven auf Basis von Einzelbeprobungen konnten Ereignisfrachten ebenfalls nicht angemessen nachzeichnen, LVS-Schlüsselkurven erzielten die höchste Güte. Reflektanzmessungen haben Vorzüge hinsichtlich des niedrigen Wartungsbedarfs und der gleichzeitigen Bestimmung mehrerer Parameter. Ihr Potential sollte mit künstlicher Beleuchtung sowie an größeren Gewässern mit niedrigerer Volatilität und optisch Tiefen Bedingungen weiter getestet werden. Optische Messmethoden können wichtige Informationen liefern, benötigen aber spezielle Fachkenntnis und Kalibrationsmessungen über die volle Breite des Messbereichs. Die explizite Beprobung von Hochwasserereignissen kann diese Daten liefern und zugleich Frachtabschätzungen verbessern, weshalb sie stets in Beprobungsprogramme integriert werden müssen. Während der LFI zur Priorisierung der Standorte genutzt werden kann, liefern LVS-Beprobungen und LVS-Frachten eine aussagekräftige und zugleich anwendbare Beobachtungsskala, die für die Wiedergabe der partikulären Frachtprozesse in mesoskaligen Einzugsgebieten benötigt wird. Die breite Implementierung dieser Beprobungstechnik sollte daher geprüft werden
    • …
    corecore