45,114 research outputs found

    Measuring Possible Future Selves: Using Natural Language Processing for Automated Analysis of Posts about Life Concerns

    Get PDF
    Individuals have specific perceptions regarding their lives pertaining to how well they are doing in particular life domains, what their ideas are, and what to pursue in the future. These concepts are called possible future selves (PFS), a schema that contains the ideas of people, who they currently are, and who they wish to be in the future. The goal of this research project is to create a program to capture PFS using natural language processing. This program will allow automated analysis to measure people's perceptions and goals in a particular life domain and assess their view of the importance regarding their thoughts on each part of their PFS. The data used in this study were adopted from Kennard, Willis, Robinson, and Knobloch-Westerwick (2015) in which 214 women, aged between 21-35 years, viewed magazine portrayals of women in gender-congruent and gender-incongruent roles. The participants were prompted to write about their PFS with the questions: "Over the past 7 days, how much have you thought about your current life situation and your future? What were your thoughts? How much have you thought about your goals in life and your relationships? What were your thoughts?" The text PFS responses were then coded for mentions of different life domains and the emotions explicitly expressed from the text-data by human coders. Combinations of machine learning techniques were utilized to show the robustness of machine learning in predicting PFS. Long Short-Term Memory networks (LSTM), Convolutional Neural Networks (CNN), and decision trees were used in the ensemble learning of the machine learning model. Two different training and evaluation methods were used to find the most optimal machine learning approach in analyzing PFS. The machine learning approach was found successful in predicting PFS with high accuracy, labeling a person's concerns over PFS the same as human coders have done in The Allure of Aphrodite. While the models were inaccurate in spotting some measures, for example labeling a person's career concern in the present with around 60% accuracy, it was accurate finding a concern in a person's past romantic life with above 95% accuracy. Overall, the accuracy was found to be around 83% for life-domain concerns.Undergraduate Research Scholarship by the College of EngineeringNo embargoAcademic Major: Computer Science and Engineerin

    Implementation of Knowledge-Based Expert System Using Probabilistic Network Models

    Get PDF
    The latest development in machine learning techniques has enabled the development of intelligent tools which can identify anomalies in the system in real time. These intelligent tools become expert systems when they combine the algorithmic result of root cause analysis with the domain knowledge. Truth maintenance, fuzzy logic, ontology classification are just a few out of many techniques used in building these systems. Logic is embedded in the code in most of the traditional computer program, which makes it difficult for domain experts to retrieve the underlying rule set and make any changes. These system bridge the gap by making information explicit rather than implicit. In this paper, we present a new approach for developing an expert system using decision tree analysis with probabilistic network models such as Bayes-network. The proposed model facilitate the process of correlation between belief probability with the unseen data by use of logical flowcharting, loopy belief propagation algorithm, and decision trees analysis. The performance of the model will be measured by evaluation and cross validation techniques

    Interpretable multiclass classification by MDL-based rule lists

    Get PDF
    Interpretable classifiers have recently witnessed an increase in attention from the data mining community because they are inherently easier to understand and explain than their more complex counterparts. Examples of interpretable classification models include decision trees, rule sets, and rule lists. Learning such models often involves optimizing hyperparameters, which typically requires substantial amounts of data and may result in relatively large models. In this paper, we consider the problem of learning compact yet accurate probabilistic rule lists for multiclass classification. Specifically, we propose a novel formalization based on probabilistic rule lists and the minimum description length (MDL) principle. This results in virtually parameter-free model selection that naturally allows to trade-off model complexity with goodness of fit, by which overfitting and the need for hyperparameter tuning are effectively avoided. Finally, we introduce the Classy algorithm, which greedily finds rule lists according to the proposed criterion. We empirically demonstrate that Classy selects small probabilistic rule lists that outperform state-of-the-art classifiers when it comes to the combination of predictive performance and interpretability. We show that Classy is insensitive to its only parameter, i.e., the candidate set, and that compression on the training set correlates with classification performance, validating our MDL-based selection criterion

    Landslide Risk: Economic Valuation in the North-Eastern Zone of Medellin City

    Get PDF
    Natural disasters of a geodynamic nature can cause enormous economic and human losses. The economic costs of a landslide disaster include relocation of communities and physical repair of urban infrastructure. However, when performing a quantitative risk analysis, generally, the indirect economic consequences of such an event are not taken into account. A probabilistic approach methodology that considers several scenarios of hazard and vulnerability to measure the magnitude of the landslide and to quantify the economic costs is proposed. With this approach, it is possible to carry out a quantitative evaluation of the risk by landslides, allowing the calculation of the economic losses before a potential disaster in an objective, standardized and reproducible way, taking into account the uncertainty of the building costs in the study zone. The possibility of comparing different scenarios facilitates the urban planning process, the optimization of interventions to reduce risk to acceptable levels and an assessment of economic losses according to the magnitude of the damage. For the development and explanation of the proposed methodology, a simple case study is presented, located in north-eastern zone of the city of Medellín. This area has particular geomorphological characteristics, and it is also characterized by the presence of several buildings in bad structural conditions. The proposed methodology permits to obtain an estimative of the probable economic losses by earthquake-induced landslides, taking into account the uncertainty of the building costs in the study zone. The obtained estimative shows that the structural intervention of the buildings produces a reduction the order of 21 % in the total landslide risk. © Published under licence by IOP Publishing Ltd

    The Libra Toolkit for Probabilistic Models

    Full text link
    The Libra Toolkit is a collection of algorithms for learning and inference with discrete probabilistic models, including Bayesian networks, Markov networks, dependency networks, and sum-product networks. Compared to other toolkits, Libra places a greater emphasis on learning the structure of tractable models in which exact inference is efficient. It also includes a variety of algorithms for learning graphical models in which inference is potentially intractable, and for performing exact and approximate inference. Libra is released under a 2-clause BSD license to encourage broad use in academia and industry

    Code Prediction by Feeding Trees to Transformers

    Full text link
    We advance the state-of-the-art in the accuracy of code prediction (next token prediction) used in autocomplete systems. First, we report that using the recently proposed Transformer architecture even out-of-the-box outperforms previous neural and non-neural systems for code prediction. We then show that by making the Transformer architecture aware of the syntactic structure of code, we further increase the margin by which a Transformer-based system outperforms previous systems. With this, it outperforms the accuracy of an RNN-based system (similar to Hellendoorn et al. 2018) by 18.3\%, the Deep3 system (Raychev et al 2016) by 14.1\%, and an adaptation of Code2Seq (Alon et al., 2018) for code prediction by 14.4\%. We present in the paper several ways of communicating the code structure to the Transformer, which is fundamentally built for processing sequence data. We provide a comprehensive experimental evaluation of our proposal, along with alternative design choices, on a standard Python dataset, as well as on a Facebook internal Python corpus. Our code and data preparation pipeline will be available in open source
    • …
    corecore