172 research outputs found

    Exploring New Forms of Random Projections for Prediction and Dimensionality Reduction in Big-Data Regimes

    Get PDF
    The story of this work is dimensionality reduction. Dimensionality reduction is a method that takes as input a point-set P of n points in R^d where d is typically large and attempts to find a lower-dimensional representation of that dataset, in order to ease the burden of processing for down-stream algorithms. In today’s landscape of machine learning, researchers and practitioners work with datasets that either have a very large number of samples, and or include high-dimensional samples. Therefore, dimensionality reduction is applied as a pre-processing technique primarily to overcome the curse of dimensionality. Generally, dimensionality reduction improves time and storage space required for processing the point-set, removes multi-collinearity and redundancies in the dataset where different features may depend on one another, and may enable simple visualizations of the dataset in 2-D and 3-D making the relationships in the data easy for humans to comprehend. Dimensionality reduction methods come in many shapes and sizes. Methods such as Principal Component Analysis (PCA), Multi-dimensional Scaling, IsoMaps, and Locally Linear Embeddings are amongst the most commonly used method of this family of algorithms. However, the choice of dimensionality reduction method proves critical in many applications as there is no one-size-fits-all solution, and special care must be considered for different datasets and tasks. Furthermore, the aforementioned popular methods are data-dependent, and commonly rely on computing either the Kernel / Gram matrix or the covariance matrix of the dataset. These matrices scale with increasing number of samples and increasing number of data dimensions, respectively, and are consequently poor choices in today’s landscape of big-data applications. Therefore, it is pertinent to develop new dimensionality reduction methods that can be efficiently applied to large and high-dimensional datasets, by either reducing the dependency on the data, or side-stepping it altogether. Furthermore, such new dimensionality reduction methods should be able to perform on par with, or better than, traditional methods such as PCA. To achieve this goal, we turn to a simple and powerful method called random projections. Random projections are a simple, efficient, and data-independent method for stably embedding a point-set P of n points in R^d to R^k where d is typically large and k is on the order of log n. Random projections have a long history of use in dimensionality reduction literature with great success. In this work, we are inspired to build on the ideas of random projection theory, and extend the framework and build a powerful new setup of random projections for large high-dimensional datasets, with comparable performance to state-of-the-art data-dependent and nonlinear methods. Furthermore, we study the use of random projections in domains other than dimensionality reduction, including prediction, and show the competitive performance of such methods for processing small dataset regimes

    Predicting Factors for the Communication between hydatid Cyst and Biliary Tract

    Get PDF
    Background: Hydatid cyst communicated with biliary tract increases the morbidity and mortality rates. Therefore, early detection and treatment are vital.Methods: From 96 patients undergone hydatid cyst surgery, 12 were excluded. The specifications, size, location, and position of the cyst, the condition of the cyst wall thickness, the caught lobe, cyst rupture, liver abscess, and the size of the inside and outside liver bile ducts were identified through computed tomography scanning. Age, gender, icterus, white blood cell (WBC) count, the total, direct, and indirect bilirubin, alkaline phosphatase, alanine aminotransferase, and aspartate aminotransferase were identified.Results: In 21 patients (13 men and 8 women), there was communication between the hydatid cyst and biliary tract; from them, 14 patients had icterus. There were significant differences between the size of the cyst, the levels of liver enzymes, bilirubin, and alkaline phosphatase, and WBC count in communicated and non-communicated hydatid cysts (P = 0.001). There were no significant differences between the two groups in terms of age, gender, location of the cysts in the liver, and the thickness of the liver.Conclusions: Only the cyst size and the level of bilirubin were the predicting factor for the communication between hydatid cyst and biliary tract

    Ensembles of Random Projections for Nonlinear Dimensionality Reduction

    Get PDF
    Dimensionality reduction methods are widely used in informationprocessing systems to better understand the underlying structuresof datasets, and to improve the efficiency of algorithms for bigdata applications. Methods such as linear random projections haveproven to be simple and highly efficient in this regard, however,there is limited theoretical and experimental analysis for nonlinearrandom projections. In this study, we review the theoretical frameworkfor random projections and nonlinear rectified random projections,and introduce ensemble of nonlinear maximum random projections.We empirically evaluate the embedding performance on 3commonly used natural datasets and compare with linear randomprojections and traditional techniques such as PCA, highlightingthe superior generalization performance and stable embedding ofthe proposed method
    • …
    corecore