Search CORE

8 research outputs found

Covariance and PCA for Categorical Variables

Author: Niitsuma Hirotaka
Okada Takashi
Publication venue
Publication date: 28/11/2007
Field of study

Covariances from categorical variables are defined using a regular simplex expression for categories. The method follows the variance definition by Gini, and it gives the covariance as a solution of simultaneous equations. The calculated results give reasonable values for test data. A method of principal component analysis (RS-PCA) is also proposed using regular simplex expressions, which allows easy interpretation of the principal components. The proposed methods apply to variable selection problem of categorical data USCensus1990 data. The proposed methods give appropriate criterion for the variable selection problem of categoricalComment: 12 pages, 5 figure

arXiv.org e-Print Archive

CiteSeerX

Potential risk factors associated with human encephalitis: application of canonical correlation analysis

Author: A Flower
A Nicolosi
A Razavi
A Starza-Smith
AD Aygun
CA Glaser
Christopher Meaney
CW Gini
H Hotelling
H Kolski
H Niitsuma
I González
J Granerod
J Granerod
JE Resznicek
Jemila S Hamid
Joseph Beyene
Julia Granerod
K Mardia
K McGarigal
KL Davison
L Ridderstolpe
M Cizman
M Koskiniemi
M Koskiniemi
M Studahl
Natasha S Crowcroft
P Cinque
P Lewis
R Darlington
R Development Core Team
RJ Light
RT Johnson
T Lee
T Okada
T Okada
T Okada
U Menzel
UK Misra
W Cooley
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Infection of the CNS is considered to be the major cause of encephalitis and more than 100 different pathogens have been recognized as causative agents. Despite being identified worldwide as an important public health concern, studies on encephalitis are very few and often focus on particular types (with respect to causative agents) of encephalitis (e.g. West Nile, Japanese, etc.). Moreover, a number of other infectious and non-infectious conditions present with similar symptoms, and distinguishing encephalitis from other disguising conditions continues to a challenging task. Methods We used canonical correlation analysis (CCA) to assess associations between set of exposure variable and set of symptom and diagnostic variables in human encephalitis. Data consists of 208 confirmed cases of encephalitis from a prospective multicenter study conducted in the United Kingdom. We used a covariance matrix based on Gini's measure of similarity and used permutation based approaches to test significance of canonical variates. Results Results show that weak pair-wise correlation exists between the risk factor (exposure and demographic) and symptom/laboratory variables. However, the first canonical variate from CCA revealed strong multivariate correlation (ρ = 0.71, se = 0.03, p = 0.013) between the two sets. We found a moderate correlation (ρ = 0.54, se = 0.02) between the variables in the second canonical variate, however, the value is not statistically significant (p = 0.68). Our results also show that a very small amount of the variation in the symptom sets is explained by the exposure variables. This indicates that host factors, rather than environmental factors might be important towards understanding the etiology of encephalitis and facilitate early diagnosis and treatment of encephalitis patients. Conclusions There is no standard laboratory diagnostic strategy for investigation of encephalitis and even experienced physicians are often uncertain about the cause, appropriate therapy and prognosis of encephalitis. Exploration of human encephalitis data using advanced multivariate statistical modelling approaches that can capture the inherent complexity in the data is, therefore, crucial in understanding the causes of human encephalitis. Moreover, application of multivariate exploratory techniques will generate clinically important hypotheses and offer useful insight into the number and nature of variables worthy of further consideration in a confirmatory statistical analysis.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The University of Manchester - Institutional Repository

Machine Learning Methods for Social Signal Processing

Author: Nicolaou Mihalis
Pavlovic V.
Rudovic O.
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2017
Field of study

Goldsmiths Research Online

Crossref

Partial Least Squares and Principal Component Analysis with Non-metric Variables for Composite Indices

Author: Yoon Jisu
Publication venue
Publication date: 24/04/2015
Field of study

Ein zusammengesetzter Index ist eine aggregierte Variable, die aus individuellen Indikatoren und Gewichten besteht, wobei die Gewichte die relative Wichtigkeit jedes Indikators darstellen. Zusammengesetzte Indizes werden oft benutzt um latente Phänomene zu schreiben oder komplexe Informationen zu einer geringen Anzahl an Variablen zusammenzufassen. Es ist von großer Bedeutung richtige Gewichte für die Variablen, die einen zusammengesetzten Index bilden, zu wählen. Hauptkomponentenanalyse (PCA) ist ein populärer Ansatz um Gewichte abzuleiten, aber es ist ungeeignet, wenn informative Variationen nur kleine Varianzen der Variablen in einem zusammengesetzten Index haben. Deshalb schlägt diese Studie vor, Partial Least Squares (PLS) anzuwenden, welches die Beziehung zwischen Zielvariablen and den Variablen in einem zusammengesetzten Index ausnutzt. Unsere Simulationsstudie zeigt, dass PLS so gut wie PCA funktioniert oder erheblich es übertrifft. Zusätzlich sind in der Praxis die Variablen in einem zusammengesetzten Index häufig nicht-metrisch. Solche Variablen benötigen spezielle Verfahren, um PCA oder PLS anzuwenden. Diese Studie untersucht mehrere PCA und PLS Algorithmen für nicht-metrische Variablen in der vorliegenden Literatur und vergleicht sie durch umfangreiche Simulationsstudien, um Empfehlungen für die Praxis abzugeben. Dummy coding zeigt häufig zufriedenstellende Leistung im Vergleich zu komplizierteren Methoden. Als unsere Anwendungen betrachten wir Vermögen, Globalisierung, Geschlechtergleichheit und Korruption, indem PCA- und PLS-basierte zusammengesetzte Indizes angewendet werden. PLS erzeugt für die jeweiligen Zielvariablen massgeschnittene zusammengesetzte Indizes, die häufig bessere Leistung als PCA zeigten. Ein Vergleich zwischen PCA und PLS Gewichten und Koeffizienten zeigt, welche Variablen für die jeweiligen Zielvariablen besonders relevant sind

Georg-August-University Göttingen

On the Viability of Quantitative Assessment Methods in Software Engineering and Software Services

Author: Lucente Joseph D.
Publication venue: Digital Commons @ DU
Publication date: 01/06/2015
Field of study

IT help desk operations are expensive. Costs associated with IT operations present challenges to profit goals. Help desk managers need a way to plan staffing levels so that labor costs are minimized while problems are resolved efficiently. An incident prediction method is needed for planning staffing levels. The potential value of a solution to this problem is important to an IT service provider since software failures are inevitable and their timing is difficult to predict. In this research, a cost model for help desk operations is developed. The cost model relates predicted incidents to labor costs using real help desk data. Incidents are predicted using software reliability growth models. Cluster analysis is used to group products with similar help desk incident characteristics. Principal Components Analysis is used to determine one product per cluster for the prediction of incidents for all members of the cluster. Incident prediction accuracy is demonstrated using cluster representatives, and is done so successfully for all clusters with accuracy comparable to making predictions for each product in the portfolio. Linear regression is used with cost data for the resolution of incidents to relate incident predictions to help desk labor costs. Following a series of four pilot studies, the cost model is validated by successfully demonstrating cost prediction accuracy for one month prediction intervals over a 22 month period

University of Denver

The Application of Data Mining Techniques to Learning Analytics and Its Implications for Interventions with Small Class Sizes

Author: Wakelam Edward
Publication venue
Publication date: 12/05/2020
Field of study

There has been significant progress in the development of techniques to deliver effective technology enhanced learning systems in education, with substantial progress in the field of learning analytics. These analyses are able to support academics in the identification of students at risk of failure or withdrawal. The early identification of students at risk is critical to giving academic staff and institutions the opportunity to make timely interventions. This thesis considers established machine learning techniques, as well as a novel method, for the prediction of student outcomes and the support of interventions, including the presentation of a variety of predictive analyses and of a live experiment. It reviews the status of technology enhanced learning systems and the associated institutional obstacles to their implementation and deployment. Many courses are comprised of relatively small student cohorts, with institutional privacy protocols limiting the data readily available for analysis. It appears that very little research attention has been devoted to this area of analysis and prediction. I present an experiment conducted on a final year university module, with a student cohort of 23, where the data available for prediction is limited to lecture/tutorial attendance, virtual learning environment accesses and intermediate assessments. I apply and compare a variety of machine learning analyses to assess and predict student performance, applied at appropriate points during module delivery. Despite some mixed results, I found potential for predicting student performance in small student cohorts with very limited student attributes, with accuracies comparing favourably with published results using large cohorts and significantly more attributes. I propose that the analyses will be useful to support module leaders in identifying opportunities to make timely academic interventions. Student data may include a combination of nominal and numeric data. A large variety of techniques are available to analyse numeric data, however there are fewer techniques applicable to nominal data. I summarise the results of what I believe to be a novel technique to analyse nominal data by making a systematic comparison of data pairs. In this thesis I have surveyed existing intelligent learning/training systems and explored the contemporary AI techniques which appear to offer the most promising contributions to the prediction of student attainment. I have researched and catalogued the organisational and non-technological challenges to be addressed for successful system development and implementation and proposed a set of critical success criteria to apply. This dissertation is supported by published work

University of Hertfordshire Research Archive

Representações euclidianas de dados : uma abordagem para variáveis heterogéneas

Author: Dória Isabel Maria Tudela Reimão Pinto de França, 1952-
Publication venue
Publication date: 01/01/2008
Field of study

Tese de doutoramento, Medicina (Biomatemática), Universidade de Lisboa, Faculdade de Medicina, 2009Disponível no document

Universidade de Lisboa: Repositório.UL