947 research outputs found

    Knowledge Base Population using Semantic Label Propagation

    Get PDF
    A crucial aspect of a knowledge base population system that extracts new facts from text corpora, is the generation of training data for its relation extractors. In this paper, we present a method that maximizes the effectiveness of newly trained relation extractors at a minimal annotation cost. Manual labeling can be significantly reduced by Distant Supervision, which is a method to construct training data automatically by aligning a large text corpus with an existing knowledge base of known facts. For example, all sentences mentioning both 'Barack Obama' and 'US' may serve as positive training instances for the relation born_in(subject,object). However, distant supervision typically results in a highly noisy training set: many training sentences do not really express the intended relation. We propose to combine distant supervision with minimal manual supervision in a technique called feature labeling, to eliminate noise from the large and noisy initial training set, resulting in a significant increase of precision. We further improve on this approach by introducing the Semantic Label Propagation method, which uses the similarity between low-dimensional representations of candidate training instances, to extend the training set in order to increase recall while maintaining high precision. Our proposed strategy for generating training data is studied and evaluated on an established test collection designed for knowledge base population tasks. The experimental results show that the Semantic Label Propagation strategy leads to substantial performance gains when compared to existing approaches, while requiring an almost negligible manual annotation effort.Comment: Submitted to Knowledge Based Systems, special issue on Knowledge Bases for Natural Language Processin

    Predicción de postulantes que cometerán fraude interno en una compañía con algoritmos de aprendizaje supervisado

    Get PDF
    Internal fraud is a big problem for companies since it causes significant monetary losses. Several research studies have proposed to improve the personnel selection process using data mining. The present work suggests to use applicants’ historical information in order to predict if they will commit fraud during their working period in a company. There are models with high precision level but with a higher error rate to find fraud. After several ex­perimentations, around seven variables which contribute more to the model were found. Some of these variables match those mentioned in studies about antisocial personality disorder. The algorithm with best results was a convolutional neural network with 80% accuracy rate. It is concluded that applicants’ information is important to establish if they will commit internal fraud during their working period in a company.El fraude interno es un gran problema para las empresas, ocasionando pérdidas monetarias importantes. Diversas investigaciones han propuesto mejoras al proceso de selección de personal utilizando minería de datos. El pre­sente trabajo propone utilizar la información histórica de postulantes a una empresa para predecir si cometerán fraude durante su estadía. Existen modelos con un nivel de precisión alto, pero que tienen un error de clasificación mayor para encontrar los casos de fraude. Después de diversas experimentaciones, se identifican alrededor de 7 características de este universo que aportan más al modelo. Algunas de estas variables coinciden con variables mencionadas en la literatura encontrada sobre trastornos antisociales. El algoritmo con mejores resultados es una red neuronal convolucional con 80 % de precisión. Se concluye que hay valor en la información de postulantes para determinar si cometerán fraude interno durante su estadía en la empresa. &nbsp

    Predicción de postulantes que cometerán fraude interno en una compañía con algoritmos de aprendizaje supervisado

    Get PDF
    Internal fraud is a big problem for companies since it causes significant monetary losses. Several research studies have proposed to improve the personnel selection process using data mining. The present work suggests to use applicants’ historical information in order to predict if they will commit fraud during their working period in a company. There are models with high precision level but with a higher error rate to find fraud. After several ex­perimentations, around seven variables which contribute more to the model were found. Some of these variables match those mentioned in studies about antisocial personality disorder. The algorithm with best results was a convolutional neural network with 80% accuracy rate. It is concluded that applicants’ information is important to establish if they will commit internal fraud during their working period in a company.El fraude interno es un gran problema para las empresas, ocasionando pérdidas monetarias importantes. Diversas investigaciones han propuesto mejoras al proceso de selección de personal utilizando minería de datos. El pre­sente trabajo propone utilizar la información histórica de postulantes a una empresa para predecir si cometerán fraude durante su estadía. Existen modelos con un nivel de precisión alto, pero que tienen un error de clasificación mayor para encontrar los casos de fraude. Después de diversas experimentaciones, se identifican alrededor de 7 características de este universo que aportan más al modelo. Algunas de estas variables coinciden con variables mencionadas en la literatura encontrada sobre trastornos antisociales. El algoritmo con mejores resultados es una red neuronal convolucional con 80 % de precisión. Se concluye que hay valor en la información de postulantes para determinar si cometerán fraude interno durante su estadía en la empresa. &nbsp

    Expert Finding in Disparate Environments

    Get PDF
    Providing knowledge workers with access to experts and communities-of-practice is central to expertise sharing, and crucial to effective organizational performance, adaptation, and even survival. However, in complex work environments, it is difficult to know who knows what across heterogeneous groups, disparate locations, and asynchronous work. As such, where expert finding has traditionally been a manual operation there is increasing interest in policy and technical infrastructure that makes work visible and supports automated tools for locating expertise. Expert finding, is a multidisciplinary problem that cross-cuts knowledge management, organizational analysis, and information retrieval. Recently, a number of expert finders have emerged; however, many tools are limited in that they are extensions of traditional information retrieval systems and exploit artifact information primarily. This thesis explores a new class of expert finders that use organizational context as a basis for assessing expertise and for conferring trust in the system. The hypothesis here is that expertise can be inferred through assessments of work behavior and work derivatives (e.g., artifacts). The Expert Locator, developed within a live organizational environment, is a model-based prototype that exploits organizational work context. The system associates expertise ratings with expert’s signaling behavior and is extensible so that signaling behavior from multiple activity space contexts can be fused into aggregate retrieval scores. Post-retrieval analysis supports evidence review and personal network browsing, aiding users in both detection and selection. During operational evaluation, the prototype generated high-precision searches across a range of topics, and was sensitive to organizational role; ranking true experts (i.e., authorities) higher than brokers providing referrals. Precision increased with the number of activity spaces used in the model, but varied across queries. The highest performing queries are characterized by high specificity terms, and low organizational diffusion amongst retrieved experts; essentially, the highest rated experts are situated within organizational niches

    A hybrid machine learning and text-mining approach for the automated generation of early warnings in construction project management.

    Get PDF
    The thesis develops an early warning prediction methodology for project failure prediction by analysing unstructured project documentation. Project management documents contain certain subtle aspects that directly affect or contribute to various Key Performance Indicators (KPIs). Extracting actionable outcomes as early warnings (EWs) from management documents (e.g. minutes and project reports) to prevent or minimise discontinuities such as delays, shortages or amendments is a challenging process. These EWs, if modelled properly, may inform the project planners and managers in advance of any impending risks. At presents, there are no suitable machine learning techniques to benchmark the identification of such EWs in construction management documents. Extraction of semantically crucial information is a challenging task which is reflected substantially as teams communicate via various project management documents. Realisation of various hidden signals from these documents in without a human interpreter is a challenging task due to the highly ambiguous nature of language used and can in turn be used to provide decision support to optimise a project’s goals by pre-emptively warning teams. Following up on the research gap, this work develops a “weak signal” classification methodology from management documents via a two-tier machine learning model. The first-tier model exploits the capability of a probabilistic Naïve Bayes classifier to extract early warnings from construction management text data. In the first step, a database corpus is prepared via a qualitative analysis of expertly-fed questionnaire responses that indicate relationships between various words and their mappings to EW classes. The second-tier model uses a Hybrid Naïve Bayes classifier which evaluates real-world construction management documents to identify the probabilistic relationship of various words used against certain EW classes and compare them with the KPIs. The work also reports on a supervised K-Nearest-Neighbour (KNN) TF-IDF methodology to cluster and model various “weak signals” based on their impact on the KPIs. The Hybrid Naïve Bayes classifier was trained on a set of documents labelled based on expertly-guided and indicated keyword categories. The overall accuracy obtained via a 5-fold cross-validation test was 68.5% which improved to 71.5% for a class-reduced (6-class) KNN-analysis. The Weak Signal analysis of the same dataset generated an overall accuracy of 64%. The results were further analysed with Jack-Knife resembling and showed consistent accuracies of 65.15%, 71.42% and 64.1% respectively.PhD in Manufacturin

    CWI Self-evaluation 1999-2004

    Get PDF

    From people to entities : typed search in the enterprise and the web

    Get PDF
    [no abstract
    corecore