350,793 research outputs found
ResumeNet: A Learning-based Framework for Automatic Resume Quality Assessment
Recruitment of appropriate people for certain positions is critical for any
companies or organizations. Manually screening to select appropriate candidates
from large amounts of resumes can be exhausted and time-consuming. However,
there is no public tool that can be directly used for automatic resume quality
assessment (RQA). This motivates us to develop a method for automatic RQA.
Since there is also no public dataset for model training and evaluation, we
build a dataset for RQA by collecting around 10K resumes, which are provided by
a private resume management company. By investigating the dataset, we identify
some factors or features that could be useful to discriminate good resumes from
bad ones, e.g., the consistency between different parts of a resume. Then a
neural-network model is designed to predict the quality of each resume, where
some text processing techniques are incorporated. To deal with the label
deficiency issue in the dataset, we propose several variants of the model by
either utilizing the pair/triplet-based loss, or introducing some
semi-supervised learning technique to make use of the abundant unlabeled data.
Both the presented baseline model and its variants are general and easy to
implement. Various popular criteria including the receiver operating
characteristic (ROC) curve, F-measure and ranking-based average precision (AP)
are adopted for model evaluation. We compare the different variants with our
baseline model. Since there is no public algorithm for RQA, we further compare
our results with those obtained from a website that can score a resume.
Experimental results in terms of different criteria demonstrate the
effectiveness of the proposed method. We foresee that our approach would
transform the way of future human resources management.Comment: ICD
Deep-Learning for Classification of Colorectal Polyps on Whole-Slide Images
Histopathological characterization of colorectal polyps is an important
principle for determining the risk of colorectal cancer and future rates of
surveillance for patients. This characterization is time-intensive, requires
years of specialized training, and suffers from significant inter-observer and
intra-observer variability. In this work, we built an automatic
image-understanding method that can accurately classify different types of
colorectal polyps in whole-slide histology images to help pathologists with
histopathological characterization and diagnosis of colorectal polyps. The
proposed image-understanding method is based on deep-learning techniques, which
rely on numerous levels of abstraction for data representation and have shown
state-of-the-art results for various image analysis tasks. Our
image-understanding method covers all five polyp types (hyperplastic polyp,
sessile serrated polyp, traditional serrated adenoma, tubular adenoma, and
tubulovillous/villous adenoma) that are included in the US multi-society task
force guidelines for colorectal cancer risk assessment and surveillance, and
encompasses the most common occurrences of colorectal polyps. Our evaluation on
239 independent test samples shows our proposed method can identify the types
of colorectal polyps in whole-slide images with a high efficacy (accuracy:
93.0%, precision: 89.7%, recall: 88.3%, F1 score: 88.8%). The presented method
in this paper can reduce the cognitive burden on pathologists and improve their
accuracy and efficiency in histopathological characterization of colorectal
polyps, and in subsequent risk assessment and follow-up recommendations
Recommended from our members
Nutrient Estimation from 24-Hour Food Recalls Using Machine Learning and Database Mapping: A Case Study with Lactose.
The Automated Self-Administered 24-Hour Dietary Assessment Tool (ASA24) is a free dietary recall system that outputs fewer nutrients than the Nutrition Data System for Research (NDSR). NDSR uses the Nutrition Coordinating Center (NCC) Food and Nutrient Database, both of which require a license. Manual lookup of ASA24 foods into NDSR is time-consuming but currently the only way to acquire NCC-exclusive nutrients. Using lactose as an example, we evaluated machine learning and database matching methods to estimate this NCC-exclusive nutrient from ASA24 reports. ASA24-reported foods were manually looked up into NDSR to obtain lactose estimates and split into training (n = 378) and test (n = 189) datasets. Nine machine learning models were developed to predict lactose from the nutrients common between ASA24 and the NCC database. Database matching algorithms were developed to match NCC foods to an ASA24 food using only nutrients ("Nutrient-Only") or the nutrient and food descriptions ("Nutrient + Text"). For both methods, the lactose values were compared to the manual curation. Among machine learning models, the XGB-Regressor model performed best on held-out test data (R2 = 0.33). For the database matching method, Nutrient + Text matching yielded the best lactose estimates (R2 = 0.76), a vast improvement over the status quo of no estimate. These results suggest that computational methods can successfully estimate an NCC-exclusive nutrient for foods reported in ASA24
Modeling Big Medical Survival Data Using Decision Tree Analysis with Apache Spark
In many medical studies, an outcome of interest is not only whether an event occurred, but when an event occurred; and an example of this is Alzheimer’s disease (AD). Identifying patients with Mild Cognitive Impairment (MCI) who are likely to develop Alzheimer’s disease (AD) is highly important for AD treatment. Previous studies suggest that not all MCI patients will convert to AD. Massive amounts of data from longitudinal and extensive studies on thousands of Alzheimer’s patients have been generated. Building a computational model that can predict conversion form MCI to AD can be highly beneficial for early intervention and treatment planning for AD. This work presents a big data model that contains machine-learning techniques to determine the level of AD in a participant and predict the time of conversion to AD. The proposed framework considers one of the widely used screening assessment for detecting cognitive impairment called Montreal Cognitive Assessment (MoCA). MoCA data set was collected from different centers and integrated into our large data framework storage using a Hadoop Data File System (HDFS); the data was then analyzed using an Apache Spark framework. The accuracy of the proposed framework was compared with a semi-parametric Cox survival analysis model
Predicting Outcomes in Investment Treaty Arbitration
Crafting appropriate dispute settlement processes is challenging for any conflict-management system, particularly for politically sensitive international economic law disputes. As the United States negotiates investment treaties with Asian and European countries, the terms of dispute settlement have become contentious. There is a vigorous debate about whether investment treaty arbitration (ITA) is an appropriate dispute settlement mechanism. While some sing the praises of ITA, others offer a spirited critique. Some critics claim that ITA is biased against states, while others suggest ITA is predictable but unfair due to factors like arbitrator identity or venue. Using data from 159 final cases derived from 272 publicly available ITA awards, this Article examines outcomes of ITA cases to explore those concerns. Key descriptive findings demonstrate that states reliably won a greater proportion of cases than investors; and for the subset of cases investors won, the mean award was US$45.6 million with mean investor success rate of 35%. State success rates were roughly similar to respondent-favorable or state-favorable results in whistleblowing, qui tam, and medical-malpractice litigation in U.S. courts. The Article then explores whether ITA outcomes varied depending upon investor identity, state identity, the presence of repeat-player counsel, arbitrator-related, or venue variables. Models using case-based variables always predicted outcomes whereas arbitrator-venue models did not. The results provide initial evidence that the most critical variables for predicting outcomes involved some form of investor identity and the experience of parties’ lawyers. For investor identity, the most robust predictor was whether investors were human beings, with cases brought by people exhibiting greater success than corporations; and when at least one named investor or corporate parent was ranked in the Financial Times 500, investors sometimes secured more favorable outcomes. Following Marc Galanter’s scholarship demonstrating that repeat-player lawyers are critical to litigation outcomes, attorney experience also affected ITA outcomes. Investors with experienced counsel were more likely to obtain a damage award against a state, whereas states retaining experienced counsel were only reliably associated with decreased levels of relative investor success. Although there was variation in outcomes, ultimately, the data did not support a conclusion that ITA was completely unpredictable; rather, the results called into question some critiques of ITA and did not prove that ITA is a wholly unacceptable form of dispute settlement. Instead, the results suggest the vital debate about ITA’s future would be well served by focusing on evidence-based insights and reliance on data rather than nonreplicable intuition
Smart Asset Management for Electric Utilities: Big Data and Future
This paper discusses about future challenges in terms of big data and new
technologies. Utilities have been collecting data in large amounts but they are
hardly utilized because they are huge in amount and also there is uncertainty
associated with it. Condition monitoring of assets collects large amounts of
data during daily operations. The question arises "How to extract information
from large chunk of data?" The concept of "rich data and poor information" is
being challenged by big data analytics with advent of machine learning
techniques. Along with technological advancements like Internet of Things
(IoT), big data analytics will play an important role for electric utilities.
In this paper, challenges are answered by pathways and guidelines to make the
current asset management practices smarter for the future.Comment: 13 pages, 3 figures, Proceedings of 12th World Congress on
Engineering Asset Management (WCEAM) 201
- …