Search CORE

350,793 research outputs found

ResumeNet: A Learning-based Framework for Automatic Resume Quality Assessment

Author: Luo Yong
Wang Yongjie
We Yonggang
Zhang Huaizheng
Zhang Xinwen
Publication venue
Publication date: 01/01/2018
Field of study

Recruitment of appropriate people for certain positions is critical for any companies or organizations. Manually screening to select appropriate candidates from large amounts of resumes can be exhausted and time-consuming. However, there is no public tool that can be directly used for automatic resume quality assessment (RQA). This motivates us to develop a method for automatic RQA. Since there is also no public dataset for model training and evaluation, we build a dataset for RQA by collecting around 10K resumes, which are provided by a private resume management company. By investigating the dataset, we identify some factors or features that could be useful to discriminate good resumes from bad ones, e.g., the consistency between different parts of a resume. Then a neural-network model is designed to predict the quality of each resume, where some text processing techniques are incorporated. To deal with the label deficiency issue in the dataset, we propose several variants of the model by either utilizing the pair/triplet-based loss, or introducing some semi-supervised learning technique to make use of the abundant unlabeled data. Both the presented baseline model and its variants are general and easy to implement. Various popular criteria including the receiver operating characteristic (ROC) curve, F-measure and ranking-based average precision (AP) are adopted for model evaluation. We compare the different variants with our baseline model. Since there is no public algorithm for RQA, we further compare our results with those obtained from a website that can score a resume. Experimental results in terms of different criteria demonstrate the effectiveness of the proposed method. We foresee that our approach would transform the way of future human resources management.Comment: ICD

arXiv.org e-Print Archive

Crossref

DR-NTU (Digital Repository of NTU)

Deep-Learning for Classification of Colorectal Polyps on Whole-Slide Images

Author: Hassanpour Saeed
Korbar Bruno
Miraflor Allen P.
Nicka Katherine M.
Olofson Andrea M.
Suriawinata Arief A.
Suriawinata Matthew A.
Torresani Lorenzo
Publication venue
Publication date: 01/01/2017
Field of study

Histopathological characterization of colorectal polyps is an important principle for determining the risk of colorectal cancer and future rates of surveillance for patients. This characterization is time-intensive, requires years of specialized training, and suffers from significant inter-observer and intra-observer variability. In this work, we built an automatic image-understanding method that can accurately classify different types of colorectal polyps in whole-slide histology images to help pathologists with histopathological characterization and diagnosis of colorectal polyps. The proposed image-understanding method is based on deep-learning techniques, which rely on numerous levels of abstraction for data representation and have shown state-of-the-art results for various image analysis tasks. Our image-understanding method covers all five polyp types (hyperplastic polyp, sessile serrated polyp, traditional serrated adenoma, tubular adenoma, and tubulovillous/villous adenoma) that are included in the US multi-society task force guidelines for colorectal cancer risk assessment and surveillance, and encompasses the most common occurrences of colorectal polyps. Our evaluation on 239 independent test samples shows our proposed method can identify the types of colorectal polyps in whole-slide images with a high efficacy (accuracy: 93.0%, precision: 89.7%, recall: 88.3%, F1 score: 88.8%). The presented method in this paper can reduce the cognitive burden on pathologists and improve their accuracy and efficiency in histopathological characterization of colorectal polyps, and in subsequent risk assessment and follow-up recommendations

arXiv.org e-Print Archive

Directory of Open Access Journals

Recommended from our members

Nutrient Estimation from 24-Hour Food Recalls Using Machine Learning and Database Mapping: A Case Study with Lactose.

Author: Bouzid Yasmine Y
Burnett Dustin J
Chin Elizabeth L
Kan Annie
Lemay Danielle G
Simmons Gabriel
Tagkopoulos Ilias
Publication venue: eScholarship, University of California
Publication date: 01/12/2019
Field of study

The Automated Self-Administered 24-Hour Dietary Assessment Tool (ASA24) is a free dietary recall system that outputs fewer nutrients than the Nutrition Data System for Research (NDSR). NDSR uses the Nutrition Coordinating Center (NCC) Food and Nutrient Database, both of which require a license. Manual lookup of ASA24 foods into NDSR is time-consuming but currently the only way to acquire NCC-exclusive nutrients. Using lactose as an example, we evaluated machine learning and database matching methods to estimate this NCC-exclusive nutrient from ASA24 reports. ASA24-reported foods were manually looked up into NDSR to obtain lactose estimates and split into training (n = 378) and test (n = 189) datasets. Nine machine learning models were developed to predict lactose from the nutrients common between ASA24 and the NCC database. Database matching algorithms were developed to match NCC foods to an ASA24 food using only nutrients ("Nutrient-Only") or the nutrient and food descriptions ("Nutrient + Text"). For both methods, the lactose values were compared to the manual curation. Among machine learning models, the XGB-Regressor model performed best on held-out test data (R2 = 0.33). For the database matching method, Nutrient + Text matching yielded the best lactose estimates (R2 = 0.76), a vast improvement over the status quo of no estimate. These results suggest that computational methods can successfully estimate an NCC-exclusive nutrient for foods reported in ASA24

eScholarship - University of California

Modeling Big Medical Survival Data Using Decision Tree Analysis with Apache Spark

Author: Abdelqader Ikhlas
Alsaedi Abdalrahman
Altaie Khulud
Delano Mohammed Niaz
Fong Alvis
Publication venue: ScholarWorks at WMU
Publication date: 01/11/2019
Field of study

In many medical studies, an outcome of interest is not only whether an event occurred, but when an event occurred; and an example of this is Alzheimer’s disease (AD). Identifying patients with Mild Cognitive Impairment (MCI) who are likely to develop Alzheimer’s disease (AD) is highly important for AD treatment. Previous studies suggest that not all MCI patients will convert to AD. Massive amounts of data from longitudinal and extensive studies on thousands of Alzheimer’s patients have been generated. Building a computational model that can predict conversion form MCI to AD can be highly beneficial for early intervention and treatment planning for AD. This work presents a big data model that contains machine-learning techniques to determine the level of AD in a participant and predict the time of conversion to AD. The proposed framework considers one of the widely used screening assessment for detecting cognitive impairment called Montreal Cognitive Assessment (MoCA). MoCA data set was collected from different centers and integrated into our large data framework storage using a Hadoop Data File System (HDFS); the data was then analyzed using an Apache Spark framework. The accuracy of the proposed framework was compared with a semi-parametric Cox survival analysis model

ScholarWorks at WMU

Predicting Outcomes in Investment Treaty Arbitration

Author: Franck Susan D.
Wylie Lindsey E.
Publication venue: Duke University School of Law
Publication date: 01/12/2015
Field of study

Crafting appropriate dispute settlement processes is challenging for any conflict-management system, particularly for politically sensitive international economic law disputes. As the United States negotiates investment treaties with Asian and European countries, the terms of dispute settlement have become contentious. There is a vigorous debate about whether investment treaty arbitration (ITA) is an appropriate dispute settlement mechanism. While some sing the praises of ITA, others offer a spirited critique. Some critics claim that ITA is biased against states, while others suggest ITA is predictable but unfair due to factors like arbitrator identity or venue. Using data from 159 final cases derived from 272 publicly available ITA awards, this Article examines outcomes of ITA cases to explore those concerns. Key descriptive findings demonstrate that states reliably won a greater proportion of cases than investors; and for the subset of cases investors won, the mean award was US$45.6 million with mean investor success rate of 35%. State success rates were roughly similar to respondent-favorable or state-favorable results in whistleblowing, qui tam, and medical-malpractice litigation in U.S. courts. The Article then explores whether ITA outcomes varied depending upon investor identity, state identity, the presence of repeat-player counsel, arbitrator-related, or venue variables. Models using case-based variables always predicted outcomes whereas arbitrator-venue models did not. The results provide initial evidence that the most critical variables for predicting outcomes involved some form of investor identity and the experience of parties’ lawyers. For investor identity, the most robust predictor was whether investors were human beings, with cases brought by people exhibiting greater success than corporations; and when at least one named investor or corporate parent was ranked in the Financial Times 500, investors sometimes secured more favorable outcomes. Following Marc Galanter’s scholarship demonstrating that repeat-player lawyers are critical to litigation outcomes, attorney experience also affected ITA outcomes. Investors with experienced counsel were more likely to obtain a damage award against a state, whereas states retaining experienced counsel were only reliably associated with decreased levels of relative investor success. Although there was variation in outcomes, ultimately, the data did not support a conclusion that ITA was completely unpredictable; rather, the results called into question some critiques of ITA and did not prove that ITA is a wholly unacceptable form of dispute settlement. Instead, the results suggest the vital debate about ITA’s future would be well served by focusing on evidence-based insights and reliance on data rather than nonreplicable intuition

bepress Legal Repository

Washington and Lee University School of Law

Duke Law Scholarship Repository

The University of Nebraska, Omaha

Smart Asset Management for Electric Utilities: Big Data and Future

Author: A Koronis
C Zhou
P Tamilselvan
SR Khuntia
SR Khuntia
V Khatri
X Peng
Publication venue
Publication date: 17/02/2018
Field of study

This paper discusses about future challenges in terms of big data and new technologies. Utilities have been collecting data in large amounts but they are hardly utilized because they are huge in amount and also there is uncertainty associated with it. Condition monitoring of assets collects large amounts of data during daily operations. The question arises "How to extract information from large chunk of data?" The concept of "rich data and poor information" is being challenged by big data analytics with advent of machine learning techniques. Along with technological advancements like Internet of Things (IoT), big data analytics will play an important role for electric utilities. In this paper, challenges are answered by pathways and guidelines to make the current asset management practices smarter for the future.Comment: 13 pages, 3 figures, Proceedings of 12th World Congress on Engineering Asset Management (WCEAM) 201

arXiv.org e-Print Archive

Crossref