328 research outputs found
Co-Occurring Disorders: An Outpatient Latent Class Analysis
Over the past 20 years researchers and health care practitioners have come to realize in addition to high prevalence rates, individuals with co-occurring disorders did not represent a homogeneous group (Drake, et al., 1998: 2001; Lehman, et al., 1994: 2000; Mueser, et al., 2000). It is essential to consider the heterogeneity of co-occurring disorders when considering new treatment modalities. Thus, it becomes pivotal to identify these differences for treatment approaches and program goals. Research shows that heterogeneity of treatment populations can be reduced through empirically-derived homogeneous groups based on multivariate analysis (Ries, et al., 1993; Lehman et al., 2000; Mueser, et al., 2000).
The purpose of the current study was to address a significant void in knowledge on the heterogeneity of co-occurring disorders by determining if homogeneous subgroups exist within an outpatient population presenting for treatment and if so how many groups exist and what makes up group membership. Identification of subgroups can provide a mechanism to better understand the interrelationships between determinants that contribute to the etiology and problem severity at an individual and group level. Secondly, in an effort to improve service delivery, empirically-derived subgroups hold important clinical implications for treatment models.
The exploratory research was conducted through a retrospective analysis seeking a parsimonious model of subgroups made up of individuals with co-occurring disorders entering an outpatient program using a latent class analysis (LCA). The best fitting statistical model in the latent class analysis was one in which the overall sample was composed of three (3) subgroups. The three-class model that included alcohol use, illegal drug use, education level and serious depression was identified as best fitting the data
New Statistical Algorithms for the Analysis of Mass Spectrometry Time-Of-Flight Mass Data with Applications in Clinical Diagnostics
Mass spectrometry (MS) based techniques have emerged as a standard forlarge-scale protein analysis. The ongoing progress in terms of more sensitive
machines and improved data analysis algorithms led to a constant expansion of
its fields of applications. Recently, MS was introduced into clinical proteomics
with the prospect of early disease detection using proteomic pattern matching.
Analyzing biological samples (e.g. blood) by mass spectrometry generates
mass spectra that represent the components (molecules) contained in a
sample as masses and their respective relative concentrations.
In this work, we are interested in those components that are constant within a
group of individuals but differ much between individuals of two distinct groups.
These distinguishing components that dependent on a particular medical condition
are generally called biomarkers. Since not all biomarkers found by the
algorithms are of equal (discriminating) quality we are only interested in a
small biomarker subset that - as a combination - can be used as a
fingerprint for a disease. Once a fingerprint for a particular disease
(or medical condition) is identified, it can be used in clinical diagnostics to
classify unknown spectra.
In this thesis we have developed new algorithms for automatic extraction of
disease specific fingerprints from mass spectrometry data. Special emphasis has
been put on designing highly sensitive methods with respect to signal detection.
Thanks to our statistically based approach our methods are able to
detect signals even below the noise level inherent in data acquired by common MS
machines, such as hormones.
To provide access to these new classes of algorithms to collaborating groups
we have created a web-based analysis platform that provides all necessary
interfaces for data transfer, data analysis and result inspection.
To prove the platform's practical relevance it has been utilized in several
clinical studies two of which are presented in this thesis. In these studies it
could be shown that our platform is superior to commercial systems with respect
to fingerprint identification. As an outcome of these studies several
fingerprints for different cancer types (bladder, kidney, testicle, pancreas,
colon and thyroid) have been detected and validated. The clinical partners in
fact emphasize that these results would be impossible with a less sensitive
analysis tool (such as the currently available systems).
In addition to the issue of reliably finding and handling signals in noise we
faced the problem to handle very large amounts of data, since an average dataset
of an individual is about 2.5 Gigabytes in size and we have data of hundreds to
thousands of persons. To cope with these large datasets, we developed a new
framework for a heterogeneous (quasi) ad-hoc Grid - an infrastructure that
allows to integrate thousands of computing resources (e.g. Desktop Computers,
Computing Clusters or specialized hardware, such as IBM's Cell Processor in a
Playstation 3)
Segurança e privacidade em terminologia de rede
Security and Privacy are now at the forefront of modern concerns, and drive
a significant part of the debate on digital society. One particular aspect that
holds significant bearing in these two topics is the naming of resources in the
network, because it directly impacts how networks work, but also affects how
security mechanisms are implemented and what are the privacy implications
of metadata disclosure. This issue is further exacerbated by interoperability
mechanisms that imply this information is increasingly available regardless of
the intended scope.
This work focuses on the implications of naming with regards to security and
privacy in namespaces used in network protocols. In particular on the imple-
mentation of solutions that provide additional security through naming policies
or increase privacy. To achieve this, different techniques are used to either
embed security information in existing namespaces or to minimise privacy ex-
posure. The former allows bootstraping secure transport protocols on top of
insecure discovery protocols, while the later introduces privacy policies as part
of name assignment and resolution.
The main vehicle for implementation of these solutions are general purpose
protocols and services, however there is a strong parallel with ongoing re-
search topics that leverage name resolution systems for interoperability such
as the Internet of Things (IoT) and Information Centric Networks (ICN), where
these approaches are also applicable.Segurança e Privacidade são dois topicos que marcam a agenda na discus-
sĂŁo sobre a sociedade digital. Um aspecto particularmente subtil nesta dis-
cussĂŁo Ă© a forma como atribuĂmos nomes a recursos na rede, uma escolha
com consequências práticas no funcionamento dos diferentes protocols de
rede, na forma como se implementam diferentes mecanismos de segurança
e na privacidade das várias partes envolvidas. Este problema torna-se ainda
mais significativo quando se considera que, para promover a interoperabili-
dade entre diferentes redes, mecanismos autónomos tornam esta informação
acessĂvel em contextos que vĂŁo para lá do que era pretendido.
Esta tese foca-se nas consequĂŞncias de diferentes polĂticas de atribuição de
nomes no contexto de diferentes protocols de rede, para efeitos de segurança
e privacidade. Com base no estudo deste problema, são propostas soluções
que, atravĂ©s de diferentes polĂticas de atribuição de nomes, permitem introdu-
zir mecanismos de segurança adicionais ou mitigar problemas de privacidade
em diferentes protocolos. Isto resulta na implementação de mecanismos de
segurança sobre protocolos de descoberta inseguros, assim como na intro-
dução de mecanismos de atribuiçao e resolução de nomes que se focam na
protecçao da privacidade.
O principal veĂculo para a implementação destas soluções Ă© atravĂ©s de ser-
viços e protocolos de rede de uso geral. No entanto, a aplicabilidade destas
soluções extende-se também a outros tópicos de investigação que recorrem
a mecanismos de resolução de nomes para implementar soluções de intero-
perabilidade, nomedamente a Internet das Coisas (IoT) e redes centradas na
informação (ICN).Programa Doutoral em Informátic
Recommended from our members
From Dataveillance to Data Economy: Firm View on Data Protection
The increasing availability of electronic records and the expanded reliance on online communications and services have made available a huge amount of data about people’s behaviours, characteristics, and preferences. Advancements in data processing technology, known as big data, offer opportunities to increase organisational efficiency and competitiveness. Analytically sophisticated companies excel in their ability to extract value from the analysis of digital data. However, in order to exploit the potential economic benefits produced by big data and analytics, issues of data privacy and information security need to be addressed. In Europe, organisations processing personal data are being required to implement basic data protection principles, which are considered difficult to implement in big data environments. Little is known in the privacy studies literature about how companies manage the trade-off between data usage and data protection. This study contributes to explore the corporate data privacy environment, by focusing on the interrelationship between the data protection legal regime, the application of big data analytics to achieve corporate objectives, and the creation of an organisational privacy culture. It also draws insights from surveillance studies, particularly the idea of dataveillance, to identify potential limitations of the current legal privacy regime. The findings from the analysis of survey data show that big data and data protection support each other, but also that some frictions can emerge around data collection and data fusion. The demand for the integration of different data sources poses challenges to the implementation of data protection principles. However, this study finds no evidence that data protection laws prevent data gathering. Implications relevant for the debate on the reform of European data protection law are also drawn from these findings
Computing and estimating information leakage with a quantitative point-to-point information flow model
Information leakage occurs when a system exposes its secret information to an unauthorised entity. Information flow analysis is concerned with tracking flows of information through systems to determine whether they process information securely or leak information.
We present a novel information flow model that permits an arbitrary amount of secret and publicly-observable information to occur at any point and in any order in a system. This is an improvement over previous models, which generally assume that systems process a single piece of secret information present before execution and produce a single piece of publicly-observable information upon termination. Our model precisely quantifies the information leakage from secret to publicly-observable values at user-defined points - hence, a "point-to-point" model - using the information-theoretic measures of mutual information and min-entropy leakage; it is ideal for analysing systems of low to moderate complexity.
We also present a relaxed version of our information flow model that estimates, rather than computes, the measures of mutual information and min-entropy leakage via sampling of a system. We use statistical techniques to bound the accuracy of the estimates this model provides. We demonstrate how our relaxed model is more suitable for analysing complex systems by implementing it in a quantitative information flow analysis tool for Java programs
Approximate Data Analytics Systems
Today, most modern online services make use of big data analytics systems to extract useful information from the raw digital data. The data normally arrives as a continuous data stream at a high speed and in huge volumes. The cost of handling this massive data can be significant. Providing interactive latency in processing the data is often impractical due to the fact that the data is growing exponentially and even faster than Moore’s law predictions. To overcome this problem, approximate computing has recently emerged as a promising solution. Approximate computing is based on the observation that many modern applications are amenable to an approximate, rather than the exact output. Unlike traditional computing, approximate computing tolerates lower accuracy to achieve lower latency by computing over a partial subset instead of the entire input data. Unfortunately, the advancements in approximate computing are primarily geared towards batch analytics and cannot provide low-latency guarantees in the context of stream processing, where new data continuously arrives as an unbounded stream. In this thesis, we design and implement approximate computing techniques for processing and interacting with high-speed and large-scale stream data to achieve low latency and efficient utilization of resources.
To achieve these goals, we have designed and built the following approximate data analytics systems:
• StreamApprox—a data stream analytics system for approximate computing. This system supports approximate computing for low-latency stream analytics in a transparent way and has an ability to adapt to rapid fluctuations of input data streams. In this system, we designed an online adaptive stratified reservoir sampling algorithm to produce approximate output with bounded error.
• IncApprox—a data analytics system for incremental approximate computing. This system adopts approximate and incremental computing in stream processing to achieve high-throughput and low-latency with efficient resource utilization. In this system, we designed an online stratified sampling algorithm that uses self-adjusting computation to produce an incrementally updated approximate output with bounded error.
• PrivApprox—a data stream analytics system for privacy-preserving and approximate computing. This system supports high utility and low-latency data analytics and preserves user’s privacy at the same time. The system is based on the combination of privacy-preserving data analytics and approximate computing.
• ApproxJoin—an approximate distributed joins system. This system improves the performance of joins — critical but expensive operations in big data systems. In this system, we employed a sketching technique (Bloom filter) to avoid shuffling non-joinable data items through the network as well as proposed a novel sampling mechanism that executes during the join to obtain an unbiased representative sample of the join output. Our evaluation based on micro-benchmarks and real world case studies shows that these systems can achieve significant performance speedup compared to state-of-the-art systems by tolerating negligible accuracy loss of the analytics output. In addition, our systems allow users to systematically make a trade-off between accuracy and throughput/latency and require no/minor modifications to the existing applications
TOWARDS A HOLISTIC EFFICIENT STACKING ENSEMBLE INTRUSION DETECTION SYSTEM USING NEWLY GENERATED HETEROGENEOUS DATASETS
With the exponential growth of network-based applications globally, there has been a transformation in organizations\u27 business models. Furthermore, cost reduction of both computational devices and the internet have led people to become more technology dependent. Consequently, due to inordinate use of computer networks, new risks have emerged. Therefore, the process of improving the speed and accuracy of security mechanisms has become crucial.Although abundant new security tools have been developed, the rapid-growth of malicious activities continues to be a pressing issue, as their ever-evolving attacks continue to create severe threats to network security. Classical security techniquesfor instance, firewallsare used as a first line of defense against security problems but remain unable to detect internal intrusions or adequately provide security countermeasures. Thus, network administrators tend to rely predominantly on Intrusion Detection Systems to detect such network intrusive activities. Machine Learning is one of the practical approaches to intrusion detection that learns from data to differentiate between normal and malicious traffic. Although Machine Learning approaches are used frequently, an in-depth analysis of Machine Learning algorithms in the context of intrusion detection has received less attention in the literature.Moreover, adequate datasets are necessary to train and evaluate anomaly-based network intrusion detection systems. There exist a number of such datasetsas DARPA, KDDCUP, and NSL-KDDthat have been widely adopted by researchers to train and evaluate the performance of their proposed intrusion detection approaches. Based on several studies, many such datasets are outworn and unreliable to use. Furthermore, some of these datasets suffer from a lack of traffic diversity and volumes, do not cover the variety of attacks, have anonymized packet information and payload that cannot reflect the current trends, or lack feature set and metadata.This thesis provides a comprehensive analysis of some of the existing Machine Learning approaches for identifying network intrusions. Specifically, it analyzes the algorithms along various dimensionsnamely, feature selection, sensitivity to the hyper-parameter selection, and class imbalance problemsthat are inherent to intrusion detection. It also produces a new reliable dataset labeled Game Theory and Cyber Security (GTCS) that matches real-world criteria, contains normal and different classes of attacks, and reflects the current network traffic trends. The GTCS dataset is used to evaluate the performance of the different approaches, and a detailed experimental evaluation to summarize the effectiveness of each approach is presented. Finally, the thesis proposes an ensemble classifier model composed of multiple classifiers with different learning paradigms to address the issue of detection accuracy and false alarm rate in intrusion detection systems
Confidential Data-Outsourcing and Self-Optimizing P2P-Networks: Coping with the Challenges of Multi-Party Systems
This work addresses the inherent lack of control and trust in Multi-Party Systems at the examples of the Database-as-a-Service (DaaS) scenario and public Distributed Hash Tables (DHTs). In the DaaS field, it is shown how confidential information in a database can be protected while still allowing the external storage provider to process incoming queries. For public DHTs, it is shown how these highly dynamic systems can be managed by facilitating monitoring, simulation, and self-adaptation
- …