1,704 research outputs found
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Common Information and Decentralized Inference with Dependent Observations
Wyner\u27s common information was originally defined for a pair of dependent discrete random variables. This thesis generalizes its definition in two directions: the number of dependent variables can be arbitrary, so are the alphabets of those random variables. New properties are determined for the generalized Wyner\u27s common information of multiple dependent variables. More importantly, a lossy source coding interpretation of Wyner\u27s common information is developed using the Gray-Wyner network. It is established that the common information equals to the smallest common message rate when the total rate is arbitrarily close to the rate distortion function with joint decoding if the distortions are within some distortion region.
The application of Wyner\u27s common information to inference problems is also explored in the thesis. A central question is under what conditions does Wyner\u27s common information capture the entire information about the inference object. Under a simple Bayesian model, it is established that for infinitely exchangeable random variables that the common information is asymptotically equal to the information of the inference object. For finite exchangeable random variables, connection between common information and inference performance metrics are also established.
The problem of decentralized inference is generally intractable with conditional dependent observations. A promising approach for this problem is to utilize a hierarchical conditional independence model. Utilizing the hierarchical conditional independence model, we identify a more general condition under which the distributed detection problem becomes tractable, thereby broadening the classes of distributed detection problems with dependent observations that can be readily solved.
We then develop the sufficiency principle for data reduction for decentralized inference. For parallel networks, the hierarchical conditional independence model is used to obtain conditions such that local sufficiency implies global sufficiency. For tandem networks, the notion of conditional sufficiency is introduced and the related theory and tools are developed. Connections between the sufficiency principle and distributed source coding problems are also explored. Furthermore, we examine the impact of quantization on decentralized data reduction. The conditions under which sufficiency based data reduction with quantization constraints is optimal are identified. They include the case when the data at decentralized nodes are conditionally independent as well as a class of problems with conditionally dependent observations that admit conditional independence structure through the hierarchical conditional independence model
Security and Privacy Dimensions in Next Generation DDDAS/Infosymbiotic Systems: A Position Paper
AbstractThe omnipresent pervasiveness of personal devices will expand the applicability of the Dynamic Data Driven Application Systems (DDDAS) paradigm in innumerable ways. While every single smartphone or wearable device is potentially a sensor with powerful computing and data capabilities, privacy and security in the context of human participants must be addressed to leverage the infinite possibilities of dynamic data driven application systems. We propose a security and privacy preserving framework for next generation systems that harness the full power of the DDDAS paradigm while (1) ensuring provable privacy guarantees for sensitive data; (2) enabling field-level, intermediate, and central hierarchical feedback-driven analysis for both data volume mitigation and security; and (3) intrinsically addressing uncertainty caused either by measurement error or security-driven data perturbation. These thrusts will form the foundation for secure and private deployments of large scale hybrid participant-sensor DDDAS systems of the future
Arbitrarily Strong Utility-Privacy Tradeoff in Multi-Agent Systems
Each agent in a network makes a local observation that is linearly related to
a set of public and private parameters. The agents send their observations to a
fusion center to allow it to estimate the public parameters. To prevent leakage
of the private parameters, each agent first sanitizes its local observation
using a local privacy mechanism before transmitting it to the fusion center. We
investigate the utility-privacy tradeoff in terms of the Cram\'er-Rao lower
bounds for estimating the public and private parameters. We study the class of
privacy mechanisms given by linear compression and noise perturbation, and
derive necessary and sufficient conditions for achieving arbitrarily strong
utility-privacy tradeoff in a multi-agent system for both the cases where prior
information is available and unavailable, respectively. We also provide a
method to find the maximum estimation privacy achievable without compromising
the utility and propose an alternating algorithm to optimize the
utility-privacy tradeoff in the case where arbitrarily strong utility-privacy
tradeoff is not achievable
Role of artificial intelligence in cloud computing, IoT and SDN: Reliability and scalability issues
Information technology fields are now more dominated by artificial intelligence, as it is playing a key role in terms of providing better services. The inherent strengths of artificial intelligence are driving the companies into a modern, decisive, secure, and insight-driven arena to address the current and future challenges. The key technologies like cloud, internet of things (IoT), and software-defined networking (SDN) are emerging as future applications and rendering benefits to the society. Integrating artificial intelligence with these innovations with scalability brings beneficiaries to the next level of efficiency. Data generated from the heterogeneous devices are received, exchanged, stored, managed, and analyzed to automate and improve the performance of the overall system and be more reliable. Although these new technologies are not free of their limitations, nevertheless, the synthesis of technologies has been challenged and has put forth many challenges in terms of scalability and reliability. Therefore, this paper discusses the role of artificial intelligence (AI) along with issues and opportunities confronting all communities for incorporating the integration of these technologies in terms of reliability and scalability. This paper puts forward the future directions related to scalability and reliability concerns during the integration of the above-mentioned technologies and enable the researchers to address the current research gaps
Privacy in the Genomic Era
Genome sequencing technology has advanced at a rapid pace and it is now
possible to generate highly-detailed genotypes inexpensively. The collection
and analysis of such data has the potential to support various applications,
including personalized medical services. While the benefits of the genomics
revolution are trumpeted by the biomedical community, the increased
availability of such data has major implications for personal privacy; notably
because the genome has certain essential features, which include (but are not
limited to) (i) an association with traits and certain diseases, (ii)
identification capability (e.g., forensics), and (iii) revelation of family
relationships. Moreover, direct-to-consumer DNA testing increases the
likelihood that genome data will be made available in less regulated
environments, such as the Internet and for-profit companies. The problem of
genome data privacy thus resides at the crossroads of computer science,
medicine, and public policy. While the computer scientists have addressed data
privacy for various data types, there has been less attention dedicated to
genomic data. Thus, the goal of this paper is to provide a systematization of
knowledge for the computer science community. In doing so, we address some of
the (sometimes erroneous) beliefs of this field and we report on a survey we
conducted about genome data privacy with biomedical specialists. Then, after
characterizing the genome privacy problem, we review the state-of-the-art
regarding privacy attacks on genomic data and strategies for mitigating such
attacks, as well as contextualizing these attacks from the perspective of
medicine and public policy. This paper concludes with an enumeration of the
challenges for genome data privacy and presents a framework to systematize the
analysis of threats and the design of countermeasures as the field moves
forward
Curious Negotiator
n negotiation the exchange of information is as important as the exchange of offers. The curious negotiator is a multiagent system with three types of agents. Two negotiation agents, each representing an individual, develop consecutive offers, supported by information, whilst requesting information from its opponent. A mediator agent, with experience of prior negotiations, suggests how the negotiation may develop. A failed negotiation is a missed opportunity. An observer agent analyses failures looking for new opportunities. The integration of negotiation theory and data mining enables the curious negotiator to discover and exploit negotiation opportunities. Trials will be conducted in electronic business
- …