531 research outputs found

    Identification and estimation in panel models with overspecified number of groups

    Get PDF
    In this thesis, we provide a simple approach to identify and estimate group structure in panel models by adapting the M-estimationmethod. We consider both linear and nonlinear panel models where the regression coefficients are heterogeneous across groups but homogeneous within a group and the group membership is unknown to researchers. The main result of the thesis is that under certain assumptions, our approach is able to provide uniformly consistent group parameter estimator as long as the number of groups used in estimation is not smaller than the true number of groups. We also show that, with probability approaching one, our method can partition some true groups into further subgroups, but cannot mix individuals from different groups. When the true number of groups is used in estimation, all the individuals can be categorized correctly with probability approaching one, and we establish the limiting distribution for the estimates of the group parameters. In addition, we provide an information criterion to choose the number of group and established its consistency under some mild conditions. Monte Carlo simulations are conducted to examine the finite sample performance of our proposed method. Findings in the simulation confirm our theoretical results in the paper. Application to labor force participation also highlights the necessity to take into account of individual heterogeneity and group heterogeneity

    Tianran

    Get PDF
    Tianran Qian, is a 27-year-old New York University student majoring in Interactive Communications (ITP). She is from a wealthy and renowned family in Hangzhou, China. Tianran’s parents got divorced when she was eight years old. Her father was busy running his business; so she grew up with her grandfather. Tianran’s grandfather was a respected poet in China and he told Tianran that she was the only heir in her generation with an ability to understand and practice traditional Chinese culture. With the guidance and influence of her grandfather, she got into traditional Chinese culture and wanted to find a way to share it with the world. Tianran comes to the United States to pursue her dream. She tries hard to keep up with her studies and also take care of herself in New York City where she lives thousands miles away from home. Twenty-four hours of a day is never enough for her. She keeps busy all day with schoolwork, but also involves herself in a lot of Chinese culture organizations and its events. Tianran knows she can fulfill her grandfather’s mission only by working hard. In the documentary, we see her looking fabulous in a Chinese cheongsam on stage hosting a Chinese opera in New York City, but we also see her in the subway looking exhausted – all proof of how hard she works. It is Christmas time in New York City. The city is decorated with beautiful warm lights and Christmas trees. The strong holiday season atmosphere makes Tianran really miss her family in China. She comes back to her 34th street Manhattan apartment at midnight from school and dials her dad’s FaceTime number. The bad Internet connection makes this call difficult. She calls her dad several times; however, she can only hear him busy talking business with his employees over the phone. Finally, her dad hangs up, Tianran cries. The next morning as usual, she dresses up nicely as usual and heads to another place to pursue her dreams

    Estimation of the three key parameters and the lead time distribution in lung cancer screening.

    Get PDF
    This dissertation contains three research projects on cancer screening probability modeling. Cancer screening is the primary technique for early detection. The goal of screening is to catch the disease early before clinical symptoms appear. In these projects, the three key parameters and lead time distribution were estimated to provide a statistical point of view on the effectiveness of cancer screening programs. In the first project, cancer screening probability model was used to analyze the computed tomography (CT) scan group in the National Lung Screening Trial (NLST) data. Three key parameters were estimated using Bayesian approach and Markov Chain Monte Carlo simulations. The NLST CT arm data have been used for the estimation. The sensitivity for lung cancer screening using CT scan is much higher than those screening using X-ray. The transition probability from disease-free to preclinical state has a peak around age 70 for both genders. The posterior mean sojourn time is around 1.5 years for all groups. The second project is dealing with lead time distribution estimation. Since the lead time is unobservable, the effectiveness of screening exams regarding the survival benefits becomes a major concern. In this study, the estimates for the projected lead time was presented by using the NLST CT arm data. Simulation results show that the probability of no-early-detection increases monotonically when the screening interval increases for both genders. The mean lead time appears longer for women than for men. In previous study, it was assumed that a person has no screening history before entering the study. However, the participants of the screening programs are usually aged population and they may already have at least one prior screening exam in the past and look healthy. In the third project, we extended the previously developed lead time distribution to consider an individual\u27s screening history and to see how much this history will affect the lead time. We did simulation for each combination of initial screening ages, sensitivities, mean sojourn times, current ages and screening schedules in the past and in the future. We also applied the newly developed lead time distribution to the NLST data

    Compound identification using penalized linear regression.

    Get PDF
    In this study, we propose a new method for compound identification using penalized linear regression. Compound identification is often achieved by matching the experimental mass spectra to the mass spectra stored in a reference library based on mass spectral similarity. In the context of the linear regression, the response variable is an experimental mass spectrum (i.e., query) and all the compounds in the reference library are the independent variables. However, the number of compounds in the reference library is much larger than the range of m/z values so that the data become high dimensional data with suffering from singularity. For this reason, we use penalized linear regression such as ridge regression and the Lasso. Furthermore, we also propose two-step approaches using dot product and Pearson’s correlation along with the penalized linear regression in this study

    Distributed Adaptive Nearest Neighbor Classifier: Algorithm and Theory

    Full text link
    When data is of an extraordinarily large size or physically stored in different locations, the distributed nearest neighbor (NN) classifier is an attractive tool for classification. We propose a novel distributed adaptive NN classifier for which the number of nearest neighbors is a tuning parameter stochastically chosen by a data-driven criterion. An early stopping rule is proposed when searching for the optimal tuning parameter, which not only speeds up the computation but also improves the finite sample performance of the proposed Algorithm. Convergence rate of excess risk of the distributed adaptive NN classifier is investigated under various sub-sample size compositions. In particular, we show that when the sub-sample sizes are sufficiently large, the proposed classifier achieves the nearly optimal convergence rate. Effectiveness of the proposed approach is demonstrated through simulation studies as well as an empirical application to a real-world dataset
    • …
    corecore