4,264 research outputs found

    Detecting, Modeling, and Predicting User Temporal Intention

    Get PDF
    The content of social media has grown exponentially in the recent years and its role has evolved from narrating life events to actually shaping them. Unfortunately, content posted and shared in social networks is vulnerable and prone to loss or change, rendering the context associated with it (a tweet, post, status, or others) meaningless. There is an inherent value in maintaining the consistency of such social records as in some cases they take over the task of being the first draft of history as collections of these social posts narrate the pulse of the street during historic events, protest, riots, elections, war, disasters, and others as shown in this work. The user sharing the resource has an implicit temporal intent: either the state of the resource at the time of sharing, or the current state of the resource at the time of the reader \clicking . In this research, we propose a model to detect and predict the user\u27s temporal intention of the author upon sharing content in the social network and of the reader upon resolving this content. To build this model, we first examine the three aspects of the problem: the resource, time, and the user. For the resource we start by analyzing the content on the live web and its persistence. We noticed that a portion of the resources shared in social media disappear, and with further analysis we unraveled a relationship between this disappearance and time. We lose around 11% of the resources after one year of sharing and a steady 7% every following year. With this, we turn to the public archives and our analysis reveals that not all posted resources are archived and even they were an average 8% per year disappears from the archives and in some cases the archived content is heavily damaged. These observations prove that in regards to archives resources are not well-enough populated to consistently and reliably reconstruct the missing resource as it existed at the time of sharing. To analyze the concept of time we devised several experiments to estimate the creation date of the shared resources. We developed Carbon Date, a tool which successfully estimated the correct creation dates for 76% of the test sets. Since the resources\u27 creation we wanted to measure if and how they change with time. We conducted a longitudinal study on a data set of very recently-published tweet-resource pairs and recording observations hourly. We found that after just one hour, ~4% of the resources have changed by ≥30% while after a day the change rate slowed to be ~12% of the resources changed by ≥40%. In regards to the third and final component of the problem we conducted user behavioral analysis experiments and built a data set of 1,124 instances manually assigned by test subjects. Temporal intention proved to be a difficult concept for average users to understand. We developed our Temporal Intention Relevancy Model (TIRM) to transform the highly subjective temporal intention problem into the more easily understood idea of relevancy between a tweet and the resource it links to, and change of the resource through time. On our collected data set TIRM produced a significant 90.27% success rate. Furthermore, we extended TIRM and used it to build a time-based model to predict temporal intention change or steadiness at the time of posting with 77% accuracy. We built a service API around this model to provide predictions and a few prototypes. Future tools could implement TIRM to assist users in pushing copies of shared resources into public web archives to ensure the integrity of the historical record. Additional tools could be used to assist the mining of the existing social media corpus by derefrencing the intended version of the shared resource based on the intention strength and the time between the tweeting and mining

    The Effect of Tommy John (UCL) Reconstructive Surgery on a Pitcher’s Arm and Career Progression

    Get PDF
    Injuries have plagued professional athletes since their sports have been in existence. The examination of how teams can diminish the side effects of the injuries en route to a speedy recovery remains an evolving process and a topic of concern for all. Injury preventative tactics have been implemented by coaching staffs and various training personnel. Major League Baseball (MLB) pitchers are noticing an increase in the number of surgeries performed each year. The tearing of the ulnar collateral ligament (UCL) in the elbow has become a predominant injury among pitchers in the MLB. Reconstructive surgery, also known as Tommy John surgery, has been a necessity for any pitcher wishing to return to the mound. The goal of this research is to examine performance of players who elect to undergo Tommy John surgery. The development of a predictive model can only go so-far to include factual statistical data to determine the stress of pitchers’ arms. However, the byproducts of teams acquiring this knowledge has a large impact on their decision making abilities. The research includes analytical techniques to predict future outcomes of MLB pitchers as well as an avenue to provide statistical evidence of the before and after effects on their arms

    Analysis of triglyceride synthesis unveils a green algal soluble diacylglycerol acyltransferase and provides clues to potential enzymatic components of the chloroplast pathway

    Get PDF
    Background: Microalgal triglyceride (TAG) synthesis has attracted considerable attention. Particular emphasis has been put towards characterizing the algal homologs of the canonical rate-limiting enzymes, diacylglycerol acyltransferase (DGAT) and phospholipid:diacylglycerol acyltransferase (PDAT). Less work has been done to analyze homologs from a phylogenetic perspective. In this work, we used HMMER iterative profiling and phylogenetic and functional analyses to determine the number and sequence characteristics of algal DGAT and PDAT, as well as related sequences that constitute their corresponding superfamilies. We included most algae with available genomes, as well as representative eukaryotic and prokaryotic species. Results: Amongst our main findings, we identified a novel clade of DGAT1-like proteins exclusive to red algae and glaucophyta and a previously uncharacterized subclade of DGAT2 proteins with an unusual number of transmembrane segments. Our analysis also revealed the existence of a novel DGAT exclusive to green algae with moderate similarity to plant soluble DGAT3. The DGAT3 clade shares a most recent ancestor with a group of uncharacterized proteins from cyanobacteria. Subcellular targeting prediction suggests that most green algal DGAT3 proteins are imported to the chloroplast, evidencing that the green algal chloroplast might have a soluble pathway for the de novo synthesis of TAGs. Heterologous expression of C. reinhardtii DGAT3 produces an increase in the accumulation of TAG, as evidenced by thin layer chromatography. Conclusions: Our analysis contributes to advance in the knowledge of complex superfamilies involved in lipid metabolism and provides clues to possible enzymatic players of chloroplast TAG synthesis.Instituto de Investigaciones BioquĂ­micas de La PlataFacultad de Ciencias MĂ©dica

    Interpretable machine learning models for predicting with missing values

    Get PDF
    Machine learning models are often used in situations where model inputs are missing either during training or at the time of prediction. If missing values are not handled appropriately, they can lead to increased bias or to models that are not applicable in practice without imputing the values of the unobserved variables. However, the imputation of missing values is often inadequate and difficult to interpret for complex imputation functions. In this thesis, we focus on predictions in the presence of incomplete data at test time, using interpretable models that allow humans to understand the predictions. Interpretability is especially necessary when important decisions are at stake, such as in healthcare. First, we investigate, the situation where variables are missing in recurrent patterns and sample sizes are small per pattern. We propose SPSM that allows coefficient sharing between a main model and pattern submodels in order to make efficient use of data and to be independent on imputation. To enable interpretability, the model can be expressed as a short description introduced by sparsity. Then, we explore situations where missingness does not occur in patterns and suggest the sparse linear rule model MINTY that naturally trades off between interpretability and the goodness of fit while being sensitive to missing values at test time. To this end, we learn replacement variables, indicating which features in a rule can be alternatively used when the original feature was not measured, assuming some redundancy in the covariates. Our results have shown that the proposed interpretable models can be used for prediction with missing values, without depending on imputation. We conclude that more work can be done in evaluating interpretable machine learning models in the context of missing values at test time

    Predicting injury outcomes in mining industry - a machine learning approach

    Get PDF
    The mining industry plays an essential role in the US economy. Mining is known to be one of the most dangerous occupations. Even though there have been efforts to create a safer work environment for miners, there is still a significant number of accidents occurring on the mining sites. Mine operators are required to report all accidents, injuries, or illness that occurs at a mine to Mine Safety and Health Administration(MSHA). These reports contain several fixed fields entries as well as the narrative of the accident. In this study, we use machine learning models such as Decision Tree (DT), Random Forest (RF) and Deep Neural Network (DNN) to predict the outcome of the accident and the number of days the worker is going to be away from work (DAFW) using the MSHA dataset. These predictive models would be helpful for the safety experts in their efforts to create a safer work environment. Predicting days away from work would help the supervisor to plan for a temporary replacement. We compare the performance of all the models with the performance of traditional logistic regression model. We divide the study into two parts. In the first part, we use the structured data (fixed fields) and unstructured (injury narratives) separately to predict the injury outcome. We use the injury narratives because they provide more information about the accident than the fixed field entries. We also investigate the use of synthetic data augmentation technique using word embedding to tackle the data imbalance problem while predicting the injury outcome using the narratives. Our experiment results show that Random Forest with narratives as the input provides the best F1 score of 0.94. DNN has the least root mean squared error (0.62) while predicting DAFW using injury narratives as the input. The F1 score of all the underrepresented classes except one improved after the use of data augmentation technique. We use the DNN model to find the features which are most important in determining injury outcome and DAFW. We found that Nature of injury is the most important predictor of injury outcome

    Design Driven Development of a Web-Enabled System for Data Mining in Arthroplasty Registry

    Get PDF
    This research was inspired by the work at the Norwegian Arthroplasty Registry, which serves as a national resource for understanding the longevity of implanted prostheses, analyzing risks, and patient outcomes in general. At this moment, they have no online system that would help and enable several user groups to take advantage of the data for clinical, research, and informative purposes. This thesis has contributed with a high-fidelity prototype of a desktop application named LeddPOR. The system is dedicated to three user groups: patients, physicians, and researchers. The project was completed in collaboration with three other master students, comprising a back-end and front-end development team. Knut T. Hufthamer and Sølve Ånneland, who provided valuable data mining tasks to be incorporated in the prototype, and Arle Farsund Solheim created visualizations that allow interactive data exploration. The project followed the User-Centered Design approach, as a method to produce a prototype that would be appreciated by real users. The Design Science Research methodology allowed five iterations, within which prototypes from low- to high fidelity have taken form. The final, fully interactive prototype is intended for physicians, researchers, and patients. There are two dedicated parts; one for hip, and the other for knee. Under those, a number of data mining tasks could be performed at the convenience of the expert user. The sessions can be saved and reviewed according to users' preferences and needs. The patient part of the system is offering mainly information, but also some resources such as formerly developed applications supporting post-operative care. During this development, we have defined two patient personas, acknowledging their different needs. On the expert side, two personas were created, one for physicians and one for researchers. Usability testing was conducted with both expert and novice users, which suggested a high success rate. The final System Usability Score (SUS) of 95 points, as well as feedback from evaluation, indicate a potential to develop a product that could be valuable for several user groups.Masteroppgave i informasjonsvitenskapINFO390MASV-INF
    • …
    corecore