11 research outputs found
Pin-Align
To date, few tools for aligning protein-protein interaction networks have been suggested. These tools typically find conserved interaction patterns using various local or global alignment algorithms. However, the improvement of the speed, scalability, simplification, and accuracy of network alignment tools is still the target of new researches. In this paper, we introduce Pin-Align, a new tool for local alignment of protein-protein interaction networks. Pin-Align accuracy is tested on protein interaction networks from IntAct, DIP, and the Stanford Network Database and the results are compared with other well-known algorithms. It is shown that Pin-Align has higher sensitivity and specificity in terms of KEGG Ortholog groups
Automatic Detection of Emotions and Distress in Textual Data
Online data can be analyzed for many purposes, including the prediction of stock market, business, and political planning. Online data can also be used to develop systems for the automatic emotion detection and mental health assessment of users. These systems can be used as complementary measures in monitoring online forums by detecting users who are in need of attention.
In this thesis, we first present a new approach for contextual emotion detection, i.e. emotion detection in short conversations. The approach is based on a neural feature extractor, composed of a recurrent neural network with an attention mechanism, followed by a final classifier, that can be neural or SVM-based. The results from our experiments showed that, by providing a higher and more robust performance, SVM can act as a better final classifier in comparison to a feed-forward neural network.
We then extended our model for emotion detection, and created an ensemble approach for the task of distress detection from online data. This extended approach utilizes several attention-based neural sub-models to extract features and predict class probabilities, which are later used as input features to a Support Vector Machine (SVM) making the final classification. Our experiments show that using an ensemble approach which makes use different sub-models accessing diverse sources of information can improve classification in the absence of a large annotated dataset.
The extended model was evaluated on two shared tasks, CLPsych and eRisk 2019, which aim at suicide risk assessment, and early risk detection of anorexia, respectively. The model ranked first in tasks A and C of CLPsych 2019 (with macro-average F1 scores of 0.481 and 0.268, respectively), and ranked first in the first task of eRisk 2019 in terms of F1 and latency-weighted F1 scores (0.71 and 0.69, respectively)
Unlocking Accuracy and Fairness in Differentially Private Image Classification
Privacy-preserving machine learning aims to train models on private data
without leaking sensitive information. Differential privacy (DP) is considered
the gold standard framework for privacy-preserving training, as it provides
formal privacy guarantees. However, compared to their non-private counterparts,
models trained with DP often have significantly reduced accuracy. Private
classifiers are also believed to exhibit larger performance disparities across
subpopulations, raising fairness concerns. The poor performance of classifiers
trained with DP has prevented the widespread adoption of privacy preserving
machine learning in industry. Here we show that pre-trained foundation models
fine-tuned with DP can achieve similar accuracy to non-private classifiers,
even in the presence of significant distribution shifts between pre-training
data and downstream tasks. We achieve private accuracies within a few percent
of the non-private state of the art across four datasets, including two medical
imaging benchmarks. Furthermore, our private medical classifiers do not exhibit
larger performance disparities across demographic groups than non-private
models. This milestone to make DP training a practical and reliable technology
has the potential to widely enable machine learning practitioners to train
safely on sensitive datasets while protecting individuals' privacy
Biological investigation and predictive modelling of foaming in anaerobic digester
Anaerobic digestion (AD) of waste has been identified as a leading technology for greener renewable energy generation as an alternative to fossil fuel. AD will reduce waste through biochemical processes, converting it to biogas which could be used as a source of renewable energy and the residue bio-solids utilised in enriching the soil. A problem with AD though is with its foaming and the associated biogas loss. Tackling this problem effectively requires identifying and effectively controlling factors that trigger and promote foaming. In this research, laboratory experiments were initially carried out to differentiate foaming causal and exacerbating factors. Then the impact of the identified causal factors (organic loading rate-OLR and volatile fatty acid-VFA) on foaming occurrence were monitored and recorded. Further analysis of foaming and nonfoaming sludge samples by metabolomics techniques confirmed that the OLR and VFA are the prime causes of foaming occurrence in AD. In addition, the metagenomics analysis showed that the phylum bacteroidetes and proteobacteria were found to be predominant with a higher relative abundance of 30% and 29% respectively while the phylum actinobacteria representing the most prominent filamentous foam causing bacteria such as Norcadia amarae and Microthrix Parvicella had a very low and consistent relative abundance of 0.9% indicating that the foaming occurrence in the AD studied was not triggered by the presence of filamentous bacteria. Consequently, data driven models to predict foam formation were developed based on experimental data with inputs (OLR and VFA in the feed) and output (foaming occurrence). The models were extensively validated and assessed based on the mean squared error (MSE), root mean squared error (RMSE), R2 and mean absolute error (MAE). Levenberg Marquadt neural network model proved to be the best model for foaming prediction in AD, with RMSE = 5.49, MSE = 30.19 and R2 = 0.9435. The significance of this study is the development of a parsimonious and effective modelling tool that enable AD operators to proactively avert foaming occurrence, as the two model input variables (OLR and VFA) can be easily adjustable through simple programmable logic controller
PRIVACY LITERACY 2.0: A THREE-LAYERED APPROACH COMPREHENSIVE LITERATURE REVIEW
With technological advancement, privacy has become a concept that is difficult to define, understand, and research. Social networking sites, as an example of technological advancements, have blurred the lines between physical and virtual spaces. Sharing and self-disclosure with our networks of people, or with strangers at times, is becoming a socially acceptable norm. However, the vast sharing of personal data with others on social networking sites engenders concern over data loss, concern for unintended audience, and an opportunity for mass surveillance.
Through a dialectical pluralism lens and following the comprehensive literature methodological framework, the purpose of this study was to map and define what it means to be a privacy literate citizen. The goal was to inform privacy research and educational practices.
The findings of this study revealed that placing the sole responsibility on the individual user to manage their privacy is an inefficient model. Users are guided by unmasked and hidden software practices, which they do not fully comprehend. Another finding was the noticeable increase of citizen targeting and liquified surveillance, which are accepted practices in society. Liquified surveillance takes any shape; is both concreate and discrete; and it happens through complete profile data collection as well as raw data aggregation.
Privacy management, as a research model or management approach, does not prevent data from leaking nor does it stop surveillance. For privacy to be successful, privacy engineering should include citizens’ opinions and require high levels of data transparency prior to any data collection software design. The implications of this study showed that privacy literacy 2.0 is a combination of several inter-connected skills, such as knowledge about the law, software, platform architecture, and the psychology of self-disclosure
Recommended from our members
Generating Reliable and Responsive Observational Evidence: Reducing Pre-analysis Bias
A growing body of evidence generated from observational data has demonstrated the potential to influence decision-making and improve patient outcomes. For observational evidence to be actionable, however, it must be generated reliably and in a timely manner. Large distributed observational data networks enable research on diverse patient populations at scale and develop new sound methods to improve reproducibility and robustness of real-world evidence. Nevertheless, the problems of generalizability, portability and scalability persist and compound. As analytical methods only partially address bias, reliable observational research (especially in networks) must address the bias at the design stage (i.e., pre-analysis bias) including the strategies for identifying patients of interest and defining comparators.
This thesis synthesizes and enumerates a set of challenges to addressing pre-analysis bias in observational studies and presents mixed-methods approaches and informatics solutions for overcoming a number of those obstacles. We develop frameworks, methods and tools for scalable and reliable phenotyping including data source granularity estimation, comprehensive concept set selection, index date specification, and structured data-based patient review for phenotype evaluation. We cover the research on potential bias in the unexposed comparator definition including systematic background rates estimation and interpretation, and definition and evaluation of the unexposed comparator.
We propose that the use of standardized approaches and methods as described in this thesis not only improves reliability but also increases responsiveness of observational evidence. To test this hypothesis, we designed and piloted a Data Consult Service - a service that generates new on-demand evidence at the bedside. We demonstrate that it is feasible to generate reliable evidence to address clinicians’ information needs in a robust and timely fashion and provide our analysis of the current limitations and future steps needed to scale such a service