947 research outputs found
Knowledge Base Population using Semantic Label Propagation
A crucial aspect of a knowledge base population system that extracts new
facts from text corpora, is the generation of training data for its relation
extractors. In this paper, we present a method that maximizes the effectiveness
of newly trained relation extractors at a minimal annotation cost. Manual
labeling can be significantly reduced by Distant Supervision, which is a method
to construct training data automatically by aligning a large text corpus with
an existing knowledge base of known facts. For example, all sentences
mentioning both 'Barack Obama' and 'US' may serve as positive training
instances for the relation born_in(subject,object). However, distant
supervision typically results in a highly noisy training set: many training
sentences do not really express the intended relation. We propose to combine
distant supervision with minimal manual supervision in a technique called
feature labeling, to eliminate noise from the large and noisy initial training
set, resulting in a significant increase of precision. We further improve on
this approach by introducing the Semantic Label Propagation method, which uses
the similarity between low-dimensional representations of candidate training
instances, to extend the training set in order to increase recall while
maintaining high precision. Our proposed strategy for generating training data
is studied and evaluated on an established test collection designed for
knowledge base population tasks. The experimental results show that the
Semantic Label Propagation strategy leads to substantial performance gains when
compared to existing approaches, while requiring an almost negligible manual
annotation effort.Comment: Submitted to Knowledge Based Systems, special issue on Knowledge
Bases for Natural Language Processin
Predicción de postulantes que cometerán fraude interno en una compañía con algoritmos de aprendizaje supervisado
Internal fraud is a big problem for companies since it causes significant monetary losses. Several research studies have proposed to improve the personnel selection process using data mining. The present work suggests to use applicants’ historical information in order to predict if they will commit fraud during their working period in a company. There are models with high precision level but with a higher error rate to find fraud. After several experimentations, around seven variables which contribute more to the model were found. Some of these variables match those mentioned in studies about antisocial personality disorder. The algorithm with best results was a convolutional neural network with 80% accuracy rate. It is concluded that applicants’ information is important to establish if they will commit internal fraud during their working period in a company.El fraude interno es un gran problema para las empresas, ocasionando pérdidas monetarias importantes. Diversas investigaciones han propuesto mejoras al proceso de selección de personal utilizando minería de datos. El presente trabajo propone utilizar la información histórica de postulantes a una empresa para predecir si cometerán fraude durante su estadía. Existen modelos con un nivel de precisión alto, pero que tienen un error de clasificación mayor para encontrar los casos de fraude. Después de diversas experimentaciones, se identifican alrededor de 7 características de este universo que aportan más al modelo. Algunas de estas variables coinciden con variables mencionadas en la literatura encontrada sobre trastornos antisociales. El algoritmo con mejores resultados es una red neuronal convolucional con 80 % de precisión. Se concluye que hay valor en la información de postulantes para determinar si cometerán fraude interno durante su estadía en la empresa.
 
Predicción de postulantes que cometerán fraude interno en una compañía con algoritmos de aprendizaje supervisado
Internal fraud is a big problem for companies since it causes significant monetary losses. Several research studies have proposed to improve the personnel selection process using data mining. The present work suggests to use applicants’ historical information in order to predict if they will commit fraud during their working period in a company. There are models with high precision level but with a higher error rate to find fraud. After several experimentations, around seven variables which contribute more to the model were found. Some of these variables match those mentioned in studies about antisocial personality disorder. The algorithm with best results was a convolutional neural network with 80% accuracy rate. It is concluded that applicants’ information is important to establish if they will commit internal fraud during their working period in a company.El fraude interno es un gran problema para las empresas, ocasionando pérdidas monetarias importantes. Diversas investigaciones han propuesto mejoras al proceso de selección de personal utilizando minería de datos. El presente trabajo propone utilizar la información histórica de postulantes a una empresa para predecir si cometerán fraude durante su estadía. Existen modelos con un nivel de precisión alto, pero que tienen un error de clasificación mayor para encontrar los casos de fraude. Después de diversas experimentaciones, se identifican alrededor de 7 características de este universo que aportan más al modelo. Algunas de estas variables coinciden con variables mencionadas en la literatura encontrada sobre trastornos antisociales. El algoritmo con mejores resultados es una red neuronal convolucional con 80 % de precisión. Se concluye que hay valor en la información de postulantes para determinar si cometerán fraude interno durante su estadía en la empresa.
 
Expert Finding in Disparate Environments
Providing knowledge workers with access to experts and communities-of-practice is central to expertise sharing, and crucial to effective organizational performance, adaptation, and even survival. However, in complex work environments, it is difficult to know who knows what across heterogeneous groups, disparate locations, and asynchronous work. As such, where expert finding has traditionally been a manual operation there is increasing interest in policy and technical infrastructure that makes work visible and supports automated tools for locating expertise.
Expert finding, is a multidisciplinary problem that cross-cuts knowledge management, organizational analysis, and information retrieval. Recently, a number of expert finders have emerged; however, many tools are limited in that they are extensions of traditional information retrieval systems and exploit artifact information primarily. This thesis explores a new class of expert finders that use organizational context as a basis for assessing expertise and for conferring trust in the system. The hypothesis here is that expertise can be inferred through assessments of work behavior and work derivatives (e.g., artifacts).
The Expert Locator, developed within a live organizational environment, is a model-based prototype that exploits organizational work context. The system associates expertise ratings with expert’s signaling behavior and is extensible so that signaling behavior from multiple activity space contexts can be fused into aggregate retrieval scores. Post-retrieval analysis supports evidence review and personal network browsing, aiding users in both detection and selection. During operational evaluation, the prototype generated high-precision searches across a range of topics, and was sensitive to organizational role; ranking true experts (i.e., authorities) higher than brokers providing referrals. Precision increased with the number of activity spaces used in the model, but varied across queries. The highest performing queries are characterized by high specificity terms, and low organizational diffusion amongst retrieved experts; essentially, the highest rated experts are situated within organizational niches
A hybrid machine learning and text-mining approach for the automated generation of early warnings in construction project management.
The thesis develops an early warning prediction methodology for project failure prediction by analysing unstructured project documentation. Project management documents contain certain subtle aspects that directly affect or contribute to various Key Performance Indicators (KPIs). Extracting actionable outcomes as early warnings (EWs) from management documents (e.g. minutes and project reports) to prevent or minimise discontinuities such as delays, shortages or amendments is a challenging process. These EWs, if modelled properly, may inform the project planners and managers in advance of any impending risks. At presents, there are no suitable machine learning techniques to benchmark the identification of such EWs in construction management documents.
Extraction of semantically crucial information is a challenging task which is reflected substantially as teams communicate via various project management documents. Realisation of various hidden signals from these documents in without a human interpreter is a challenging task due to the highly ambiguous nature of language used and can in turn be used to provide decision support to optimise a project’s goals by pre-emptively warning teams. Following up on the research gap, this work develops a “weak signal” classification methodology from management documents via a two-tier machine learning model.
The first-tier model exploits the capability of a probabilistic Naïve Bayes classifier to extract early warnings from construction management text data. In the first step, a database corpus is prepared via a qualitative analysis of expertly-fed questionnaire responses that indicate relationships between various words and their mappings to EW classes. The second-tier model uses a Hybrid Naïve Bayes classifier which evaluates real-world construction management documents to identify the probabilistic relationship of various words used against certain EW classes and compare them with the KPIs. The work also reports on a supervised K-Nearest-Neighbour (KNN) TF-IDF methodology to cluster and model various “weak signals” based on their impact on the KPIs.
The Hybrid Naïve Bayes classifier was trained on a set of documents labelled based on expertly-guided and indicated keyword categories. The overall accuracy obtained via a 5-fold cross-validation test was 68.5% which improved to 71.5% for a class-reduced (6-class) KNN-analysis. The Weak Signal analysis of the same dataset generated an overall accuracy of 64%. The results were further analysed with Jack-Knife resembling and showed consistent accuracies of 65.15%, 71.42% and 64.1% respectively.PhD in Manufacturin
- …