350,793 research outputs found

    ResumeNet: A Learning-based Framework for Automatic Resume Quality Assessment

    Full text link
    Recruitment of appropriate people for certain positions is critical for any companies or organizations. Manually screening to select appropriate candidates from large amounts of resumes can be exhausted and time-consuming. However, there is no public tool that can be directly used for automatic resume quality assessment (RQA). This motivates us to develop a method for automatic RQA. Since there is also no public dataset for model training and evaluation, we build a dataset for RQA by collecting around 10K resumes, which are provided by a private resume management company. By investigating the dataset, we identify some factors or features that could be useful to discriminate good resumes from bad ones, e.g., the consistency between different parts of a resume. Then a neural-network model is designed to predict the quality of each resume, where some text processing techniques are incorporated. To deal with the label deficiency issue in the dataset, we propose several variants of the model by either utilizing the pair/triplet-based loss, or introducing some semi-supervised learning technique to make use of the abundant unlabeled data. Both the presented baseline model and its variants are general and easy to implement. Various popular criteria including the receiver operating characteristic (ROC) curve, F-measure and ranking-based average precision (AP) are adopted for model evaluation. We compare the different variants with our baseline model. Since there is no public algorithm for RQA, we further compare our results with those obtained from a website that can score a resume. Experimental results in terms of different criteria demonstrate the effectiveness of the proposed method. We foresee that our approach would transform the way of future human resources management.Comment: ICD

    Deep-Learning for Classification of Colorectal Polyps on Whole-Slide Images

    Full text link
    Histopathological characterization of colorectal polyps is an important principle for determining the risk of colorectal cancer and future rates of surveillance for patients. This characterization is time-intensive, requires years of specialized training, and suffers from significant inter-observer and intra-observer variability. In this work, we built an automatic image-understanding method that can accurately classify different types of colorectal polyps in whole-slide histology images to help pathologists with histopathological characterization and diagnosis of colorectal polyps. The proposed image-understanding method is based on deep-learning techniques, which rely on numerous levels of abstraction for data representation and have shown state-of-the-art results for various image analysis tasks. Our image-understanding method covers all five polyp types (hyperplastic polyp, sessile serrated polyp, traditional serrated adenoma, tubular adenoma, and tubulovillous/villous adenoma) that are included in the US multi-society task force guidelines for colorectal cancer risk assessment and surveillance, and encompasses the most common occurrences of colorectal polyps. Our evaluation on 239 independent test samples shows our proposed method can identify the types of colorectal polyps in whole-slide images with a high efficacy (accuracy: 93.0%, precision: 89.7%, recall: 88.3%, F1 score: 88.8%). The presented method in this paper can reduce the cognitive burden on pathologists and improve their accuracy and efficiency in histopathological characterization of colorectal polyps, and in subsequent risk assessment and follow-up recommendations

    Modeling Big Medical Survival Data Using Decision Tree Analysis with Apache Spark

    Get PDF
    In many medical studies, an outcome of interest is not only whether an event occurred, but when an event occurred; and an example of this is Alzheimer’s disease (AD). Identifying patients with Mild Cognitive Impairment (MCI) who are likely to develop Alzheimer’s disease (AD) is highly important for AD treatment. Previous studies suggest that not all MCI patients will convert to AD. Massive amounts of data from longitudinal and extensive studies on thousands of Alzheimer’s patients have been generated. Building a computational model that can predict conversion form MCI to AD can be highly beneficial for early intervention and treatment planning for AD. This work presents a big data model that contains machine-learning techniques to determine the level of AD in a participant and predict the time of conversion to AD. The proposed framework considers one of the widely used screening assessment for detecting cognitive impairment called Montreal Cognitive Assessment (MoCA). MoCA data set was collected from different centers and integrated into our large data framework storage using a Hadoop Data File System (HDFS); the data was then analyzed using an Apache Spark framework. The accuracy of the proposed framework was compared with a semi-parametric Cox survival analysis model

    Predicting Outcomes in Investment Treaty Arbitration

    Get PDF
    Crafting appropriate dispute settlement processes is challenging for any conflict-management system, particularly for politically sensitive international economic law disputes. As the United States negotiates investment treaties with Asian and European countries, the terms of dispute settlement have become contentious. There is a vigorous debate about whether investment treaty arbitration (ITA) is an appropriate dispute settlement mechanism. While some sing the praises of ITA, others offer a spirited critique. Some critics claim that ITA is biased against states, while others suggest ITA is predictable but unfair due to factors like arbitrator identity or venue. Using data from 159 final cases derived from 272 publicly available ITA awards, this Article examines outcomes of ITA cases to explore those concerns. Key descriptive findings demonstrate that states reliably won a greater proportion of cases than investors; and for the subset of cases investors won, the mean award was US$45.6 million with mean investor success rate of 35%. State success rates were roughly similar to respondent-favorable or state-favorable results in whistleblowing, qui tam, and medical-malpractice litigation in U.S. courts. The Article then explores whether ITA outcomes varied depending upon investor identity, state identity, the presence of repeat-player counsel, arbitrator-related, or venue variables. Models using case-based variables always predicted outcomes whereas arbitrator-venue models did not. The results provide initial evidence that the most critical variables for predicting outcomes involved some form of investor identity and the experience of parties’ lawyers. For investor identity, the most robust predictor was whether investors were human beings, with cases brought by people exhibiting greater success than corporations; and when at least one named investor or corporate parent was ranked in the Financial Times 500, investors sometimes secured more favorable outcomes. Following Marc Galanter’s scholarship demonstrating that repeat-player lawyers are critical to litigation outcomes, attorney experience also affected ITA outcomes. Investors with experienced counsel were more likely to obtain a damage award against a state, whereas states retaining experienced counsel were only reliably associated with decreased levels of relative investor success. Although there was variation in outcomes, ultimately, the data did not support a conclusion that ITA was completely unpredictable; rather, the results called into question some critiques of ITA and did not prove that ITA is a wholly unacceptable form of dispute settlement. Instead, the results suggest the vital debate about ITA’s future would be well served by focusing on evidence-based insights and reliance on data rather than nonreplicable intuition

    Smart Asset Management for Electric Utilities: Big Data and Future

    Full text link
    This paper discusses about future challenges in terms of big data and new technologies. Utilities have been collecting data in large amounts but they are hardly utilized because they are huge in amount and also there is uncertainty associated with it. Condition monitoring of assets collects large amounts of data during daily operations. The question arises "How to extract information from large chunk of data?" The concept of "rich data and poor information" is being challenged by big data analytics with advent of machine learning techniques. Along with technological advancements like Internet of Things (IoT), big data analytics will play an important role for electric utilities. In this paper, challenges are answered by pathways and guidelines to make the current asset management practices smarter for the future.Comment: 13 pages, 3 figures, Proceedings of 12th World Congress on Engineering Asset Management (WCEAM) 201
    • …
    corecore