2,440 research outputs found

    Digital Stylometry: Linking Profiles Across Social Networks

    Get PDF
    There is an ever growing number of users with accounts on multiple social media and networking sites. Consequently, there is increasing interest in matching user accounts and profiles across different social networks in order to create aggregate profiles of users. In this paper, we present models for Digital Stylometry, which is a method for matching users through stylometry inspired techniques. We experimented with linguistic, temporal, and combined temporal-linguistic models for matching user accounts, using standard and novel techniques. Using publicly available data, our best model, a combined temporal-linguistic one, was able to correctly match the accounts of 31% of 5,612 distinct users across Twitter and Facebook.Comment: SocInfo'15, Beijing, China. In proceedings of the 7th International Conference on Social Informatics (SocInfo 2015). Beijing, Chin

    MODELLING PRICE DYNAMICS IN THE HONG KONG PROPERTY MARKET

    Get PDF
    The property market in Hong Kong plays an important role in the political, social and economic life of this vibrant city. Understanding the dynamics of the market is essential to guide government policy making and investment decisions. Using data collected between 1993 and 2006, this study investigates the monthly returns, volatilities, and time-varying correlations in the residential, office, and retail property markets in Hong Kong. A vector autoregressive (VAR) model is used to examine the conditional mean, and a multivariate generalized autoregressive conditional heteroscedasticity (MGARCH) model is adopted to analyze the conditional variance. The dynamic conditional correlation (DCC) approach is utilized to specify the MGARCH model. All of the property types show strong auto- and cross-correlations, which indicates that the sectors relate to each other closely. All three sectors have higher volatilities when major political and economic events occur. The findings reveal the possibility of balancing investment portfolios between the three sectors in the Hong Kong property market. However, exposure to the residential sector may reduce the chance of investment diversification because of the higher correlation of this sector with the other property sectors.Return, volatility, dynamic conditional correlation.

    Robustly detecting differential expression in RNA sequencing data using observation weights

    Get PDF
    A popular approach for comparing gene expression levels between (replicated) conditions of RNA sequencing data relies on counting reads that map to features of interest. Within such count-based methods, many flexible and advanced statistical approaches now exist and offer the ability to adjust for covariates (e.g. batch effects). Often, these methods include some sort of ‘sharing of information' across features to improve inferences in small samples. It is important to achieve an appropriate tradeoff between statistical power and protection against outliers. Here, we study the robustness of existing approaches for count-based differential expression analysis and propose a new strategy based on observation weights that can be used within existing frameworks. The results suggest that outliers can have a global effect on differential analyses. We demonstrate the effectiveness of our new approach with real data and simulated data that reflects properties of real datasets (e.g. dispersion-mean trend) and develop an extensible framework for comprehensive testing of current and future methods. In addition, we explore the origin of such outliers, in some cases highlighting additional biological or technical factors within the experiment. Further details can be downloaded from the project website: http://imlspenticton.uzh.ch/robinson_lab/edgeR_robus

    Domain Adaptation under Missingness Shift

    Full text link
    Rates of missing data often depend on record-keeping policies and thus may change across times and locations, even when the underlying features are comparatively stable. In this paper, we introduce the problem of Domain Adaptation under Missingness Shift (DAMS). Here, (labeled) source data and (unlabeled) target data would be exchangeable but for different missing data mechanisms. We show that when missing data indicators are available, DAMS can reduce to covariate shift. Focusing on the setting where missing data indicators are absent, we establish the following theoretical results for underreporting completely at random: (i) covariate shift is violated (adaptation is required); (ii) the optimal source predictor can perform worse on the target domain than a constant one; (iii) the optimal target predictor can be identified, even when the missingness rates themselves are not; and (iv) for linear models, a simple analytic adjustment yields consistent estimates of the optimal target parameters. In experiments on synthetic and semi-synthetic data, we demonstrate the promise of our methods when assumptions hold. Finally, we discuss a rich family of future extensions

    Smart charging of electric vehicle: An innovative business model for utility firms

    Get PDF

    Evaluating Model Performance in Medical Datasets Over Time

    Full text link
    Machine learning (ML) models deployed in healthcare systems must face data drawn from continually evolving environments. However, researchers proposing such models typically evaluate them in a time-agnostic manner, splitting datasets according to patients sampled randomly throughout the entire study time period. This work proposes the Evaluation on Medical Datasets Over Time (EMDOT) framework, which evaluates the performance of a model class across time. Inspired by the concept of backtesting, EMDOT simulates possible training procedures that practitioners might have been able to execute at each point in time and evaluates the resulting models on all future time points. Evaluating both linear and more complex models on six distinct medical data sources (tabular and imaging), we show how depending on the dataset, using all historical data may be ideal in many cases, whereas using a window of the most recent data could be advantageous in others. In datasets where models suffer from sudden degradations in performance, we investigate plausible explanations for these shocks. We release the EMDOT package to help facilitate further works in deployment-oriented evaluation over time.Comment: To appear at Conference on Health, Inference, and Learning (CHIL) 2023. arXiv admin note: substantial text overlap with arXiv:2211.0716

    Enhanced Twitter Sentiment Classification Using Contextual Information

    Get PDF
    The rise in popularity and ubiquity of Twitter has made sentiment analysis of tweets an important and well-covered area of research. However, the 140 character limit imposed on tweets makes it hard to use standard linguistic methods for sentiment classification. On the other hand, what tweets lack in structure they make up with sheer volume and rich metadata. This metadata includes geolocation, temporal and author information. We hypothesize that sentiment is dependent on all these contextual factors. Different locations, times and authors have different emotional valences. In this paper, we explored this hypothesis by utilizing distant supervision to collect millions of labelled tweets from different locations, times and authors. We used this data to analyse the variation of tweet sentiments across different authors, times and locations. Once we explored and understood the relationship between these variables and sentiment, we used a Bayesian approach to combine these variables with more standard linguistic features such as n-grams to create a Twitter sentiment classifier. This combined classifier outperforms the purely linguistic classifier, showing that integrating the rich contextual information available on Twitter into sentiment classification is a promising direction of research.Twitter (Firm

    The cold adapted and temperature sensitive influenza A/Ann Arbor/6/60 virus, the master donor virus for live attenuated influenza vaccines, has multiple defects in replication at the restrictive temperature

    Get PDF
    AbstractWe have previously determined that the temperature sensitive (ts) and attenuated (att) phenotypes of the cold adapted influenza A/Ann Arbor/6/60 strain (MDV-A), the master donor virus for the live attenuated influenza A vaccines (FluMist®), are specified by the five amino acids in the PB1, PB2 and NP gene segments. To understand how these loci control the ts phenotype of MDV-A, replication of MDV-A at the non-permissive temperature (39 °C) was compared with recombinant wild-type A/Ann Arbor/6/60 (rWt). The mRNA and protein synthesis of MDV-A in the infected MDCK cells were not significantly reduced at 39 °C during a single-step replication, however, vRNA synthesis was reduced and the nuclear–cytoplasmic export of viral RNP (vRNP) was blocked. In addition, the virions released from MDV-A infected cells at 39 °C exhibited irregular morphology and had a greatly reduced amount of the M1 protein incorporated. The reduced M1 protein incorporation and vRNP export blockage correlated well with the virus ts phenotype because these defects could be partially alleviated by removing the three ts loci from the PB1 gene. The virions and vRNPs isolated from the MDV-A infected cells contained a higher level of heat shock protein 70 (Hsp70) than those of rWt, however, whether Hsp70 is involved in thermal inhibition of MDV-A replication remains to be determined. Our studies demonstrate that restrictive replication of MDV-A at the non-permissive temperature occurs in multiple steps of the virus replication cycle
    • …
    corecore