158 research outputs found

    Some statistical models for high-dimensional data

    Get PDF

    A treelet transform analysis to relate nutrient patterns to the risk of hormonal receptor-defined breast cancer in the European Prospective Investigation into Cancer and Nutrition (EPIC)

    Get PDF
    Objective: Pattern analysis has emerged as a tool to depict the role of multiple nutrients/foods in relation to health outcomes. The present study aimed at extracting nutrient patterns with respect to breast cancer (BC) aetiology. Design: Nutrient patterns were derived with treelet transform (TT) and related to BC risk. TT was applied to twenty-three log-transformed nutrient densities from dietary questionnaires. Hazard ratios (HR) and 95 % confidence intervals computed using Cox proportional hazards models quantified the association between quintiles of nutrient pattern scores and risk of overall BC, and by hormonal receptor and menopausal status. Principal component analysis was applied for comparison. Setting: The European Prospective Investigation into Cancer and Nutrition (EPIC). Subjects: Women (n 334 850) from the EPIC study. Results: The first TT component (TC1) highlighted a pattern rich in nutrients found in animal foods loading on cholesterol, protein, retinol, vitamins B12 and D, while the second TT component (TC2) reflected a diet rich in β-carotene, riboflavin, thiamin, vitamins C and B6, fibre, Fe, Ca, K, Mg, P and folate. While TC1 was not associated with BC risk, TC2 was inversely associated with BC risk overall (HRQ5 v. Q1=0·89, 95 % CI 0·83, 0·95, Ptrend<0·01) and showed a significantly lower risk in oestrogen receptor-positive (HRQ5 v. Q1=0·89, 95 % CI 0·81, 0·98, Ptrend=0·02) and progesterone receptor-positive tumours (HRQ5 v. Q1=0·87, 95 % CI 0·77, 0·98, Ptrend<0·01). Conclusions: TT produces readily interpretable sparse components explaining similar amounts of variation as principal component analysis. Our results suggest that participants with a nutrient pattern high in micronutrients found in vegetables, fruits and cereals had a lower risk of BC

    Economic Reforms and Human Development: Evidence from Transition Economies

    Get PDF
    Do market-oriented economic reforms result in higher levels of human well-being? This article studies the impact of macro-level institutional and infrastructure reforms on the economic, educational and health dimensions of human well-being among 25 transition economies. We use panel data econometrics based on the LSDVC technique to analyse the effects of market-oriented reforms on the human development index (HDI), as a measure of human well-being, from 1992 to 2007. The results show the complexity of reform impacts in transition countries. They show that institutional and economic reforms led to positive economic effect and significant impacts on other dimensions of human development. We find some positive economic impacts from infrastructure sectors reforms. However, not every reform measure appears to generate positive impacts. Large-scale privatizations show negative effects in health and economic outcomes. The overall results show the importance of the interaction among different reform measures and the combined effect of these on human development

    Study of gene expression representation with Treelets and hierarchical clustering algorithms

    Get PDF
    English: Since the mid-1990's, the field of genomic signal processing has exploded due to the development of DNA microarray technology, which made possible the measurement of mRNA expression of thousands of genes in parallel. Researchers had developed a vast body of knowledge in classification methods. However, microarray data is characterized by extremely high dimensionality and comparatively small number of data points. This makes microarray data analysis quite unique. In this work we have developed various hierarchical clustering algorthims in order to improve the microarray classification task. At first, the original feature set of gene expression values are enriched with new features that are linear combinations of the original ones. These new features are called metagenes and are produced by different proposed hierarchical clustering algorithms. In order to prove the utility of this methodology to classify microarray datasets the building of a reliable classifier via feature selection process is introduced. This methodology has been tested on three public cancer datasets: Colon, Leukemia and Lymphoma. The proposed method has obtained better classification results than if this enhancement is not performed. Confirming the utility of the metagenes generation to improve the final classifier. Secondly, a new technique has been developed in order to use the hierarchical clustering to perform a reduction on the huge microarray datasets, removing the initial genes that will not be relevant for the cancer classification task. The experimental results of this method are also presented and analyzed when it is applied to one public database demonstrating the utility of this new approach.Castellano: Desde finales de la década de los años 90, el campo de la genómica fue revolucionado debido al desarrollo de la tecnología de los DNA microarrays. Con ésta técnica es posible medir la expresión de los mRNA de miles de genes en paralelo. Los investigadores han desarrollado un vasto conocimiento en los métodos de clasificación. Sin embargo, los microarrays están caracterizados por tener un alto número de genes y un número de muestras comparativamente pequeño. Éste hecho convierte al estudio de los microarrays en único. En éste trabajo se ha desarrollado diversos algoritmos de agrupación jerárquica para mejorar la clasificación de los microarrays. La primera y gran aplicación ha sido el enriquecimiento de las bases de datos originales mediante la introducción de nuevos elementos que son obtenidos como combinaciones lineales los genes originales. Estos nuevos elementos se han denominado metagenes y son producidos mediante los diferentes algoritmos propuestos de agrupación jerárquica. A fin de demostrar la utilidad de esta metodología para clasificar las bases de datos de microarrays se ha introducido la construcción de un clasificador fiable a través de un proceso de selección de características. Esta metodología ha sido probada en tres bases de datos de cáncer públicas: Colon, Leucemia y Linfoma. El método propuesto ha obtenido mejores resultados en la clasificación que cuando éste enriquecimiento no se ha llevado a cabo. De ésta manera se ha confirmado la utilidad de la generación de los metagenes para mejorar el clasificador. En segundo lugar, se ha desarrollado una nueva técnica para realizar una reducción inicial en las bases de datos, consistente en eliminar los genes que no son relevantes para realizar la clasificación. Éste método se ha aplicado a una de las bases de datos públicas, y los resultados experimentales se presentan y analizan demostrando la utilidad de éste nuevo enfoque.Català: Des de finals de la dècada dels 90, el camp de la genómica va ser revolucionat gràcies al desenvolupament de la tecnología dels DNA microarrays. Amb aquesta tècnica es possible mesurar l'expresió dels mRNA de milers de gens en paralel. Els investigadors han desenvolupat un ample coneixement dels mètodes de classificació. No obstant, els microarrays estàn caracteritzats per tindre una alt nombre de genes i comparativament un nombre petit de mostres. Aquest fet fa que l'estudi dels microarrays sigui únic. Amb aquest treball s' han desenvolupat diversos algoritmes d'agrupació jeràrquica per millorar la classificació dels microarrays. La primera i gran aplicació ha sigut l'enriqueiment de les bases de dades originals mitjançant l'introducció de nous elements que s'obtenen com combinacions lineals dels gens originals. Aquests nous elements han sigut denominats com metagens i són calculats mitjantçant els diferents algoritmes d'agrupació jerárquica proposats. Per a demostrar l'utilitat d'aquesta metodología per a classificar les bases de dades de microarrays s'ha introduït la construcció d'un classificador fiable mitjantçant un procés de selecció de característiques. Aquesta metodología ha sigut aplicada a tres bases de dades públiques de càncer: Colon, Leucèmia i Limfoma. El métode proposat ha obtenigut millors resultats en la classificació que quan aquest enriqueiment no ha sigut realitzat. D'aquesta manera s'ha confirmat l'utilitat de la generació dels metagens per a millorar els classificadors. En segon lloc, s'ha desenvolupat una nova técnica per a realitzar una reducció inicial en les bases de dades, aquest mètode consisteix en l'eliminació dels gens que no són relevants a l'hora de realitzar la classificació dels pacients. Aquest mètode ha sigut aplicat a una de les bases de dades públiques. Els resultats experimentals es presenten i analitzen demostrant l'utilitat d'aquesta nova tècnica

    Ray tracing techniques for computer games and isosurface visualization

    Get PDF
    Ray tracing is a powerful image synthesis technique, that has been used for high-quality offline rendering since decades. In recent years, this technique has become more important for realtime applications, but still plays only a minor role in many areas. Some of the reasons are that ray tracing is compute intensive and has to rely on preprocessed data structures to achieve fast performance. This dissertation investigates methods to broaden the applicability of ray tracing and is divided into two parts. The first part explores the opportunities offered by ray tracing based game technology in the context of current and expected future performance levels. In this regard, novel methods are developed to efficiently support certain kinds of dynamic scenes, while avoiding the burden to fully recompute the required data structures. Furthermore, todays ray tracing performance levels are below what is needed for 3D games. Therefore, the multi-core CPU of the Playstation 3 is investigated, and an optimized ray tracing architecture presented to take steps towards the required performance. In part two, the focus shifts to isosurface raytracing. Isosurfaces are particularly important to understand the distribution of certain values in volumetric data. Since the structure of volumetric data sets is diverse, op- timized algorithms and data structures are developed for rectilinear as well as unstructured data sets which allow for realtime rendering of isosurfaces including advanced shading and visualization effects. This also includes tech- niques for out-of-core and time-varying data sets.Ray-tracing ist ein flexibles Bildgebungsverfahren, das schon seit Jahrzehnten für hoch qualitative, aber langsame Bilderzeugung genutzt wird. In den letzten Jahren wurde Ray-tracing auch für Echtzeitanwendungen immer interessanter, spielt aber in vielen Anwendungsbereichen noch immer eine untergeordnete Rolle. Einige der Gründe sind die Rechenintensität von Ray-tracing sowie die Abhängigkeit von vorberechneten Datenstrukturen um hohe Geschwindigkeiten zu erreichen. Diese Dissertation untersucht Methoden um die Anwendbarkeit von Ray-tracing in zwei verschiedenen Bereichen zu erhöhen. Im ersten Teil dieser Dissertation werden die Möglichkeiten, die Ray- tracing basierte Spieletechnologie bietet, im Kontext mit aktueller sowie zukünftig erwarteten Geschwindigkeiten untersucht. Darüber hinaus werden in diesem Zusammenhang Methoden entwickelt um bestimmte zeitveränderliche Szenen darstellen zu können ohne die dafür benötigen Datenstrukturen von Grund auf neu erstellen zu müssen. Da die Geschwindigkeit von Ray-tracing für Spiele bisher nicht ausreichend ist, wird die Mehrkern- CPU der Playstation 3 untersucht, und ein optimiertes Ray-tracing System beschrieben, das Ray-tracing näher an die benötigte Geschwindigkeit heranbringt. Der zweite Teil beschäftigt sich mit der Darstellung von Isoflächen mittels Ray-tracing. Isoflächen sind insbesonders wichtig um die Verteilung einzelner Werte in volumetrischen Datensätzen zu verstehen. Da diese Datensätze verschieden strukturiert sein können, werden für gitterförmige und unstrukturierte Datensätze optimierte Algorithmen und Datenstrukturen entwickelt, die die Echtzeitdarstellung von Isoflächen erlauben. Dies beinhaltet auch Erweiterungen für extrem große und zeitveränderliche Datensätze

    Dependency reordering features for Japanese-English phrase-based translation

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.Includes bibliographical references (p. 101-106).Translating Japanese into English is very challenging because of the vast difference in word order between the two languages. For example, the main verb is always at the very end of a Japanese sentence, whereas it comes near the beginning of an English sentence. In this thesis, we develop a Japanese-to-English translation system capable of performing the long-distance reordering necessary to fluently translate Japanese into English. Our system uses novel feature functions, based on a dependency parse of the input Japanese sentence, which identify candidate translations that put dependency relationships into correct English order. For example, one feature identifies translations that put verbs before their objects. The weights for these feature functions are discriminatively trained, and so can be used for any language pair. In our Japanese-to-English system, they improve the BLEU score from 27.96 to 28.54, and we show clear improvements in subjective quality. We also experiment with a well-known technique of training the translation system on a Japanese training corpus that has been reordered into an English-like word order. Impressive results can be achieved by naively reordering each Japanese sentence into reverse order. Translating these reversed sentences with the dependency-parse-based feature functions gives further improvement. Finally, we evaluate our translation systems with human judgment, BLEU score, and METEOR score. We compare these metrics on corpus and sentence level and examine how well they capture improvements in translation word order.by Jason Edward Katz-Brown.M.Eng

    Higher Performance Traversal and Construction of Tree-Based Raytracing Acceleration Structures

    Get PDF
    Ray tracing is an important computational primitive used in different algorithms including collision detection, line-of-sight computations, ray tracing-based sound propagation, and most prominently light transport algorithms. It computes the closest intersections for a given set of rays and geometry. The geometry is usually modeled with a set of geometric primitives such as triangles or quadrangles which define a scene. An efficient ray tracing implementation needs to rely on an acceleration structure to decouple ray tracing complexity from scene complexity as far as possible. The most common ray tracing acceleration structures are kd-trees and bounding volume hierarchies (BVHs) which have an O(log n) ray tracing complexity in the number of scene primitives. Both structures offer similar ray tracing performance in practice. This thesis presents theoretical insights and practical approaches for higher quality, improved graphics processing unit (GPU) ray tracing performance, and faster construction of BVHs and kd-trees, where the focus is on BVHs. The chosen construction strategy for BVHs and kd-trees has a significant impact on final ray tracing performance. The most common measure for the quality of BVHs and kd-trees is the surface area metric (SAM). Using assumptions on the distribution of ray origins and directions the SAM gives an approximation for the cost of traversing an acceleration structure without having to trace a single ray. High quality construction algorithms aim at reducing the SAM cost. The most widespread high quality greedy plane-sweep algorithm applies the surface area heuristic (SAH) which is a simplification of the SAM. Advances in research on quality metrics for BVHs have shown that greedy SAH-based plane-sweep builders often construct BVHs with superior traversal performance despite the fact that the resulting SAM costs are higher than those created by more sophisticated builders. Motivated by this observation we examine different construction algorithms that use the SAM cost of temporarily constructed SAH-built BVHs to guide the construction to higher quality BVHs. An extensive evaluation reveals that the resulting BVHs indeed achieve significantly higher trace performance for primary and secondary diffuse rays compared to BVHs constructed with standard plane-sweeping. Compared to the Spatial-BVH, a kd-tree/BVH hybrid, we still achieve an acceptable increase in performance. We show that the proposed algorithm has subquadratic computational complexity in the number of primitives, which renders it usable in practical applications. An alternative construction algorithm to the plane-sweep BVH builder is agglomerative clustering, which constructs BVHs in a bottom-up fashion. It clusters primitives with a SAM-inspired heuristic and gives mixed quality BVHs compared to standard plane-sweeping construction. While related work only focused on the construction speed of this algorithm we examine clustering heuristics, which aim at higher hierarchy quality. We propose a fully SAM-based clustering heuristic which on average produces better performing BVHs compared to original agglomerative clustering. The definitions of SAM and SAH are based on assumptions on the distribution of ray origins and directions to define a conditional geometric probability for intersecting nodes in kd-trees and BVHs. We analyze the probability function definition and show that the assumptions allow for an alternative probability definition. Unlike the conventional probability, our definition accounts for directional variation in the likelihood of intersecting objects from different directions. While the new probability does not result in improved practical tracing performance, we are able to provide an interesting insight on the conventional probability. We show that the conventional probability function is directly linked to our examined probability function and can be interpreted as covertly accounting for directional variation. The path tracing light transport algorithm can require tracing of billions of rays. Thus, it can pay off to construct high quality acceleration structures to reduce the ray tracing cost of each ray. At the same time, the arising number of trace operations offers a tremendous amount of data parallelism. With CPUs moving towards many-core architectures and GPUs becoming more general purpose architectures, path tracing can now be well parallelized on commodity hardware. While parallelization is trivial in theory, properties of real hardware make efficient parallelization difficult, especially when tracing so called incoherent rays. These rays cause execution flow divergence, which reduces efficiency of SIMD-based parallelism and memory read efficiency due to incoherent memory access. We investigate how different BVH and node memory layouts as well as storing the BVH in different memory areas impacts the ray tracing performance of a GPU path tracer. We also optimize the BVH layout using information gathered in a pre-processing pass by applying a number of different BVH reordering techniques. This results in increased ray tracing performance. Our final contribution is in the field of fast high quality BVH and kd-tree construction. Increased quality usually comes at the cost of higher construction time. To reduce construction time several algorithms have been proposed to construct acceleration structures in parallel on GPUs. These are able to perform full rebuilds in realtime for moderate scene sizes if all data completely fits into GPU memory. The sheer amount of data arising from geometric detail used in production rendering makes construction on GPUs, however, infeasible due to GPU memory limitations. Existing out-of-core GPU approaches perform hybrid bottom-up top-down construction which suffers from reduced acceleration structure quality in the critical upper levels of the tree. We present an out-of-core multi-GPU approach for full top-down SAH-based BVH and kd-tree construction, which is designed to work on larger scenes than conventional approaches and yields high quality trees. The algorithm is evaluated for scenes consisting of up to 1 billion triangles and performance scales with an increasing number of GPUs
    corecore