11 research outputs found

    Novel neural architectures & algorithms for efficient inference

    Get PDF
    In the last decade, the machine learning universe embraced deep neural networks (DNNs) wholeheartedly with the advent of neural architectures such as recurrent neural networks (RNNs), convolutional neural networks (CNNs), transformers, etc. These models have empowered many applications, such as ChatGPT, Imagen, etc., and have achieved state-of-the-art (SOTA) performance on many vision, speech, and language modeling tasks. However, SOTA performance comes with various issues, such as large model size, compute-intensive training, increased inference latency, higher working memory, etc. This thesis aims at improving the resource efficiency of neural architectures, i.e., significantly reducing the computational, storage, and energy consumption of a DNN without any significant loss in performance. Towards this goal, we explore novel neural architectures as well as training algorithms that allow low-capacity models to achieve near SOTA performance. We divide this thesis into two dimensions: \textit{Efficient Low Complexity Models}, and \textit{Input Hardness Adaptive Models}. Along the first dimension, i.e., \textit{Efficient Low Complexity Models}, we improve DNN performance by addressing instabilities in the existing architectures and training methods. We propose novel neural architectures inspired by ordinary differential equations (ODEs) to reinforce input signals and attend to salient feature regions. In addition, we show that carefully designed training schemes improve the performance of existing neural networks. We divide this exploration into two parts: \textsc{(a) Efficient Low Complexity RNNs.} We improve RNN resource efficiency by addressing poor gradients, noise amplifications, and BPTT training issues. First, we improve RNNs by solving ODEs that eliminate vanishing and exploding gradients during the training. To do so, we present Incremental Recurrent Neural Networks (iRNNs) that keep track of increments in the equilibrium surface. Next, we propose Time Adaptive RNNs that mitigate the noise propagation issue in RNNs by modulating the time constants in the ODE-based transition function. We empirically demonstrate the superiority of ODE-based neural architectures over existing RNNs. Finally, we propose Forward Propagation Through Time (FPTT) algorithm for training RNNs. We show that FPTT yields significant gains compared to the more conventional Backward Propagation Through Time (BPTT) scheme. \textsc{(b) Efficient Low Complexity CNNs.} Next, we improve CNN architectures by reducing their resource usage. They require greater depth to generate high-level features, resulting in computationally expensive models. We design a novel residual block, the Global layer, that constrains the input and output features by approximately solving partial differential equations (PDEs). It yields better receptive fields than traditional convolutional blocks and thus results in shallower networks. Further, we reduce the model footprint by enforcing a novel inductive bias that formulates the output of a residual block as a spatial interpolation between high-compute anchor pixels and low-compute cheaper pixels. This results in spatially interpolated convolutional blocks (SI-CNNs) that have better compute and performance trade-offs. Finally, we propose an algorithm that enforces various distributional constraints during training in order to achieve better generalization. We refer to this scheme as distributionally constrained learning (DCL). In the second dimension, i.e., \textit{Input Hardness Adaptive Models}, we introduce the notion of the hardness of any input relative to any architecture. In the first dimension, a neural network allocates the same resources, such as compute, storage, and working memory, for all the inputs. It inherently assumes that all examples are equally hard for a model. In this dimension, we challenge this assumption using input hardness as our reasoning that some inputs are relatively easy for a network to predict compared to others. Input hardness enables us to create selective classifiers wherein a low-capacity network handles simple inputs while abstaining from a prediction on the complex inputs. Next, we create hybrid models that route the hard inputs from the low-capacity abstaining network to a high-capacity expert model. We design various architectures that adhere to this hybrid inference style. Further, input hardness enables us to selectively distill the knowledge of a high-capacity model into a low-capacity model by cleverly discarding hard inputs during the distillation procedure. Finally, we conclude this thesis by sketching out various interesting future research directions that emerge as an extension of different ideas explored in this work

    Behavior quantification as the missing link between fields: Tools for digital psychiatry and their role in the future of neurobiology

    Full text link
    The great behavioral heterogeneity observed between individuals with the same psychiatric disorder and even within one individual over time complicates both clinical practice and biomedical research. However, modern technologies are an exciting opportunity to improve behavioral characterization. Existing psychiatry methods that are qualitative or unscalable, such as patient surveys or clinical interviews, can now be collected at a greater capacity and analyzed to produce new quantitative measures. Furthermore, recent capabilities for continuous collection of passive sensor streams, such as phone GPS or smartwatch accelerometer, open avenues of novel questioning that were previously entirely unrealistic. Their temporally dense nature enables a cohesive study of real-time neural and behavioral signals. To develop comprehensive neurobiological models of psychiatric disease, it will be critical to first develop strong methods for behavioral quantification. There is huge potential in what can theoretically be captured by current technologies, but this in itself presents a large computational challenge -- one that will necessitate new data processing tools, new machine learning techniques, and ultimately a shift in how interdisciplinary work is conducted. In my thesis, I detail research projects that take different perspectives on digital psychiatry, subsequently tying ideas together with a concluding discussion on the future of the field. I also provide software infrastructure where relevant, with extensive documentation. Major contributions include scientific arguments and proof of concept results for daily free-form audio journals as an underappreciated psychiatry research datatype, as well as novel stability theorems and pilot empirical success for a proposed multi-area recurrent neural network architecture.Comment: PhD thesis cop

    Information Theory and Machine Learning

    Get PDF
    The recent successes of machine learning, especially regarding systems based on deep neural networks, have encouraged further research activities and raised a new set of challenges in understanding and designing complex machine learning algorithms. New applications require learning algorithms to be distributed, have transferable learning results, use computation resources efficiently, convergence quickly on online settings, have performance guarantees, satisfy fairness or privacy constraints, incorporate domain knowledge on model structures, etc. A new wave of developments in statistical learning theory and information theory has set out to address these challenges. This Special Issue, "Machine Learning and Information Theory", aims to collect recent results in this direction reflecting a diverse spectrum of visions and efforts to extend conventional theories and develop analysis tools for these complex machine learning systems

    Essays in time series econometrics and machine learning

    Full text link
    This dissertation collects three works developed on the broad topic of time series analysis, with a specific focus on machine learning, non- and semi-parametric methods, and regularization. In particular, the discussion will take an econometric perspective with respect to the three key problems of estimation, forecasting and inference. Chapter 1 develops a new ML approach to forecast economic time series - focusing on US GDP growth - within an environment consisting of many series with observations sampled at different frequencies. We introduce a method that is based on a reservoir computing approach, which, broadly speaking, leverages the universal approximation properties of nonlinear state-space models with random coefficients matrices. Our proposed scheme is computationally efficient, empirically effective - reaching or surpassing state-of-the-art forecasting performance - and straightforward to implement even when there are many different data frequencies. Chapter 2 deals instead with the important question of regularization in the estimation of linear time series. Vector autoregressive models (VARs) are a fundamental benchmark and foundational analytical tool of modern econometrics. Yet, even in moderate data environments with a few dozen series, estimation of VARs can be severely impacted by efficiency issues - that is, too many parameters need to be recovered compared to the sample size. This is true even in settings that do not fall within the category of high-dimensional processes. Drawing a comparison with Bayesian methods, I propose to apply anisotropic ridge regression as an estimation procedure in order to effectively exploit prior information or beliefs on the structure of the VAR model. The theory for inference on impulse responses functions and cross-validation is developed, and in simulations I find that the trade-off of ridge penalization can be positive whenever one is correctly informed about the nature of the underlying data generating process. Finally, in Chapter 3 I provide a semi-nonparametric approach for the estimation of impulse responses of nonlinear autoregressive models. Impulse response functions (IRFs) are widely studied objects in macroeconometrics, because they quantify the response of a model economy to an unforeseen shock. For example, central banks are often interested in studying the potential effects of credibly exogenous changes in monetary policy over short and long horizons. If one also wants to incorporate nonlinear relationships in a model, I prove that estimating the linear and nonlinear (functional) autoregressive coefficients with a semi-nonparametric series approach is a uniformly consistent strategy. In turn, this allows the constructions asymptotically consistent nonlinear IRF estimates - meaning that IRFs can be correctly recovered in large samples. The empirical applications I provide showcase the potential impact of nonlinear IRFs on policy: comparing pointwise linear and nonlinear estimates suggest that linear models can underestimate to varying degrees the negative effects of contractionary monetary policy. This, in turn, provides evidence that proper estimation of nonlinear interactions may lead to better quantitative analysis of macroeconomic dynamics

    Uncertainty in Artificial Intelligence: Proceedings of the Thirty-Fourth Conference

    Get PDF

    Collected Papers (on Neutrosophic Theory and Applications), Volume VIII

    Get PDF
    This eighth volume of Collected Papers includes 75 papers comprising 973 pages on (theoretic and applied) neutrosophics, written between 2010-2022 by the author alone or in collaboration with the following 102 co-authors (alphabetically ordered) from 24 countries: Mohamed Abdel-Basset, Abduallah Gamal, Firoz Ahmad, Ahmad Yusuf Adhami, Ahmed B. Al-Nafee, Ali Hassan, Mumtaz Ali, Akbar Rezaei, Assia Bakali, Ayoub Bahnasse, Azeddine Elhassouny, Durga Banerjee, Romualdas Bausys, Mircea Boșcoianu, Traian Alexandru Buda, Bui Cong Cuong, Emilia Calefariu, Ahmet Çevik, Chang Su Kim, Victor Christianto, Dae Wan Kim, Daud Ahmad, Arindam Dey, Partha Pratim Dey, Mamouni Dhar, H. A. Elagamy, Ahmed K. Essa, Sudipta Gayen, Bibhas C. Giri, Daniela Gîfu, Noel Batista Hernández, Hojjatollah Farahani, Huda E. Khalid, Irfan Deli, Saeid Jafari, Tèmítópé Gbóláhàn Jaíyéolá, Sripati Jha, Sudan Jha, Ilanthenral Kandasamy, W.B. Vasantha Kandasamy, Darjan Karabašević, M. Karthika, Kawther F. Alhasan, Giruta Kazakeviciute-Januskeviciene, Qaisar Khan, Kishore Kumar P K, Prem Kumar Singh, Ranjan Kumar, Maikel Leyva-Vázquez, Mahmoud Ismail, Tahir Mahmood, Hafsa Masood Malik, Mohammad Abobala, Mai Mohamed, Gunasekaran Manogaran, Seema Mehra, Kalyan Mondal, Mohamed Talea, Mullai Murugappan, Muhammad Akram, Muhammad Aslam Malik, Muhammad Khalid Mahmood, Nivetha Martin, Durga Nagarajan, Nguyen Van Dinh, Nguyen Xuan Thao, Lewis Nkenyereya, Jagan M. Obbineni, M. Parimala, S. K. Patro, Peide Liu, Pham Hong Phong, Surapati Pramanik, Gyanendra Prasad Joshi, Quek Shio Gai, R. Radha, A.A. Salama, S. Satham Hussain, Mehmet Șahin, Said Broumi, Ganeshsree Selvachandran, Selvaraj Ganesan, Shahbaz Ali, Shouzhen Zeng, Manjeet Singh, A. Stanis Arul Mary, Dragiša Stanujkić, Yusuf Șubaș, Rui-Pu Tan, Mirela Teodorescu, Selçuk Topal, Zenonas Turskis, Vakkas Uluçay, Norberto Valcárcel Izquierdo, V. Venkateswara Rao, Volkan Duran, Ying Li, Young Bae Jun, Wadei F. Al-Omeri, Jian-qiang Wang, Lihshing Leigh Wang, Edmundas Kazimieras Zavadskas
    corecore