4,762 research outputs found

    Big Data Analysis for PV Applications

    Get PDF
    With increasing photovoltaic (PV) installations, large amounts of time series data from utility-scale PV systems such as meteorological data and string level measurements are collected [1, 2]. Due to fluctuations in irradiance and temperature, PV data is highly stochastic. Spatio-temporal differences with potential time-lagged correlation are also exhibited, due to the wind directions affecting cloud movements [3]. Coupling these variations with different types of PV systems in terms of power output and wiring configuration, as well as localised PV effects like partial shading and module mismatches, lengthy time series data from solar systems are highly multi-dimensional and challenging to process. In addition, these raw datasets can rarely be used directly due to the possibly high noise and irrelevant information embedded in them. Moreover, it is challenging to operate directly on the raw datasets, especially when it comes to visualizing and analyzing these data. On this point, the Pareto principle, or better-known as the 80/20 rule, commonly applies: researchers and solar engineers often spend most of their time collecting, cleaning, filtering, reducing and formatting the data. In this work, a data analytics algorithm is applied to mitigate some of the complexities and make sense of the large time series data in PV systems. Each time series is treated as an individual entity which can be characterized by a set of generic or application-specific features. This reduces the dimension of the data, i.e., from hundreds of samples in a time series to a few descriptive features. It is is also easier to visualize big time series data in the feature space, as compared to the traditional time series visualization methods, such as the spaghetti plot and horizon plot, which are informative but not very scalable. The time series data is processed to extract features through clustering and identify correspondence between specific measurements and geographical location of the PV systems. This characterisation of the time series data can be used for several PV applications, namely, (1) PV fault identification, (2) PV network design and (3) PV type pre-design for PV installation in locations with different geographical attributes

    Compositional analysis of archaeological glasses

    Get PDF
    At CoDaWork'03 we presented work on the analysis of archaeological glass composi- tional data. Such data typically consist of geochemical compositions involving 10-12 variables and approximates completely compositional data if the main component, sil- ica, is included. We suggested that what has been termed `crude' principal component analysis (PCA) of standardized data often identi ed interpretable pattern in the data more readily than analyses based on log-ratio transformed data (LRA). The funda- mental problem is that, in LRA, minor oxides with high relative variation, that may not be structure carrying, can dominate an analysis and obscure pattern associated with variables present at higher absolute levels. We investigate this further using sub- compositional data relating to archaeological glasses found on Israeli sites. A simple model for glass-making is that it is based on a `recipe' consisting of two `ingredients', sand and a source of soda. Our analysis focuses on the sub-composition of components associated with the sand source. A `crude' PCA of standardized data shows two clear compositional groups that can be interpreted in terms of di erent recipes being used at di erent periods, re ected in absolute di erences in the composition. LRA analysis can be undertaken either by normalizing the data or de ning a `residual'. In either case, after some `tuning', these groups are recovered. The results from the normalized LRA are di erently interpreted as showing that the source of sand used to make the glass di ered. These results are complementary. One relates to the recipe used. The other relates to the composition (and presumed sources) of one of the ingredients. It seems to be axiomatic in some expositions of LRA that statistical analysis of compositional data should focus on relative variation via the use of ratios. Our analysis suggests that absolute di erences can also be informativeGeologische Vereinigung; Institut d’Estadística de Catalunya; International Association for Mathematical Geology; Patronat de l’Escola Politècnica Superior de la Universitat de Girona; Fundació privada: Girona, Universitat i Futur; Càtedra Lluís Santaló d’Aplicacions de la Matemàtica; Consell Social de la Universitat de Girona; Ministerio de Ciencia i Tecnología

    Distributional equivalence and subcompositional coherence in the analysis of contingency tables, ratio-scale measurements and compositional data

    Get PDF
    We consider two fundamental properties in the analysis of two-way tables of positive data: the principle of distributional equivalence, one of the cornerstones of correspondence analysis of contingency tables, and the principle of subcompositional coherence, which forms the basis of compositional data analysis. For an analysis to be subcompositionally coherent, it suffices to analyse the ratios of the data values. The usual approach to dimension reduction in compositional data analysis is to perform principal component analysis on the logarithms of ratios, but this method does not obey the principle of distributional equivalence. We show that by introducing weights for the rows and columns, the method achieves this desirable property. This weighted log-ratio analysis is theoretically equivalent to “spectral mapping”, a multivariate method developed almost 30 years ago for displaying ratio-scale data from biological activity spectra. The close relationship between spectral mapping and correspondence analysis is also explained, as well as their connection with association modelling. The weighted log-ratio methodology is applied here to frequency data in linguistics and to chemical compositional data in archaeology.Association models, biplot, compositional data, contingency tables, correspondence analysis, distributional equivalence, log-ration transformation, ratio-scale data, singular value decomposition

    Formulation and statistical evaluation of a ready-to-drink whey based orange beverage and its storage stability

    Get PDF
    A value-added functional beverage is formulated utilizing unprocessed liquid whey. Whey has excellent nutritional qualities and bland flavors; it is easy to digest and has a unique functionality in a beverage system. The ready-to-drink beverage is formulated with concentrated whey, orange juice along with an adequate amount of sugar, stabilizer, citric acid and flavor. Orange juice is used since the acidic flavor of whey is compatible With citrus flavors and particularly orange. The health and nutrition benefits of orange further imparts the value to the formulated beverage. Nine blend formulations are prepared by varying the dry matter of whey, fruit juice and sugar content Based on a statistical analysis of the sensory evaluation of the drinks, the optimal formulation is found to have a ratio 3:2 for concentrated liquid whey and orange juice followed by an addition of 8% sugar (w/v) and 0.1% stabilizer (w/v). The shelf-life of the final product is carried out both at room temperature (30+/-2 degrees C) and refrigeration temperature (7+/-1 degrees C) with and without addition of preservatives. The product remains in good condition up to eleven days at room temperature and up to three months under refrigeration condition with addition of 150 ppm of sodium benzoate

    Biplots of fuzzy coded data

    Get PDF
    A biplot, which is the multivariate generalization of the two-variable scatterplot, can be used to visualize the results of many multivariate techniques, especially those that are based on the singular value decomposition. We consider data sets consisting of continuous-scale measurements, their fuzzy coding and the biplots that visualize them, using a fuzzy version of multiple correspondence analysis. Of special interest is the way quality of fit of the biplot is measured, since it is well-known that regular (i.e., crisp) multiple correspondence analysis seriously under-estimates this measure. We show how the results of fuzzy multiple correspondence analysis can be defuzzified to obtain estimated values of the original data, and prove that this implies an orthogonal decomposition of variance. This permits a measure of fit to be calculated in the familiar form of a percentage of explained variance, which is directly comparable to the corresponding fit measure used in principal component analysis of the original data. The approach is motivated initially by its application to a simulated data set, showing how the fuzzy approach can lead to diagnosing nonlinear relationships, and finally it is applied to a real set of meteorological data.defuzzification, fuzzy coding, indicator matrix, measure of fit, multivariate data, multiple correspondence analysis, principal component analysis.

    Genetic Diversity of Selected Upland Rice Genotypes (Oryza sativa L.) for Grain Yield and Related Traits

    Get PDF
    Seventy-seven upland rice genotypes including popular cultivars in Nigeria and introduced varieties selected from across rice-growing regions of the world were evaluated under optimal upland ecology. These genotypes were characterised for 10 traits and the quantitative data subjected to Pearson correlation matrix, Principal Component Analysis and cluster analysis to determine the level of diversity and degree of association existing between grain yield and its related component traits. Yield and most related component traits exhibited higher PCV compared to growth parameters. Yield had the highest PCV (41.72%) while all other parameters had low to moderate GCV. Genetic Advance (GA) ranged from 9.88% for plant height at maturity to 41.08% for yield. High heritability estimates were recorded for 1000 grain weight (88.71%), days to 50% flowering (86.67%) and days to 85% maturity (71.98%). Furthermore, grain yield showed significant positive correlation with days to 50% flowering and number of panicles m-2. Three cluster groups were obtained based on the UPGMA and the first three principal components explained about 64.55% of the total variation among the 10 characters. The PCA results suggests that characters such as grain yield, days to flowering, leaf area and plant height at maturity were the principal discriminatory traits for this rice germplasm indicating that selection in favour of these traits might be effective in this population and environment
    corecore