483 research outputs found
Inaccurate age and sex data in the Census PUMS files: evidence and implications
We discover and document errors in public use microdata samples ("PUMS files") of the 2000 Census, the 2003-2006 American Community Survey, and the 2004-2009 Current Population Survey. For women and men ages 65 and older, age- and sex-specific population estimates generated from the PUMS files differ by as much as 15% from counts in published data tables. Moreover, an analysis of labor force participation and marriage rates suggests the PUMS samples are not representative of the population at individual ages for those ages 65 and over. PUMS files substantially underestimate labor force participation of those near retirement ages and overestimate labor force participation rates of those at older ages. These problems were an unintentional by-product of the misapplication of a newer generation of disclosure avoidance procedures carried out on the data. The resulting errors in the public use data could significantly impact studies of people ages 65 and older, particularly analyses of variables that are expected to change by age.Census ; Population ; Labor supply
Inaccurate Age and Sex Data in the Census PUMS Files: Evidence and Implications
We discover and document errors in public use microdata samples ("PUMS files") of the 2000 Census, the 2003-2006 American Community Survey, and the 2004-2009 Current Population Survey. For women and men ages 65 and older, age- and sex-specific population estimates generated from the PUMS files differ by as much as 15% from counts in published data tables. Moreover, an analysis of labor force participation and marriage rates suggests the PUMS samples are not representative of the population at individual ages for those ages 65 and over. PUMS files substantially underestimate labor force participation of those near retirement ages and overestimate labor force participation rates of those at older ages. These problems were an unintentional by-product of the misapplication of a newer generation of disclosure avoidance procedures carried out on the data. The resulting errors in the public use data could significantly impact studies of people ages 65 and older, particularly analyses of variables that are expected to change by age.Current Population Survey, American Community Survey, Census, disclosure avoidance, aging, data, sex, labor force participation, marriage
Discovery of a Visual T-Dwarf Triple System and Binarity at the L/T Transition
We present new high contrast imaging of 8 L/T transition brown dwarfs using
the NIRC2 camera on the Keck II telescope. One of our targets, the T3.5 dwarf
2MASS J08381155 + 1511155, was resolved into a hierarchal triple with projected
separations of 2.5+/-0.5 AU and 27+/-5 AU for the BC and A(BC) components
respectively. Resolved OSIRIS spectroscopy of the A(BC) components confirm that
all system members are T dwarfs. The system therefore constitutes the first
triple T-dwarf system ever reported. Using resolved photometry to model the
integrated-light spectrum, we infer spectral types of T3, T3, and T4.5 for the
A, B, and C components respectively. The uniformly brighter primary has a bluer
J-Ks color than the next faintest component, which may reflect a sensitive
dependence of the L/T transition temperature on gravity, or alternatively
divergent cloud properties amongst components. Relying on empirical trends and
evolutionary models we infer a total system mass of 0.034-0.104 Msun for the BC
components at ages of 0.3-3 Gyr, which would imply a period of 12-21 yr
assuming the system semi-major axis to be similar to its projection. We also
infer differences in effective temperatures and surface gravities between
components of no more than ~150 K and ~0.1 dex. Given the similar physical
properties of the components, the 2M0838+15 system provides a controlled sample
for constraining the relative roles of effective temperature, surface gravity,
and dust clouds in the poorly understood L/T transition regime. Combining our
imaging survey results with previous work we find an observed binary fraction
of 4/18 or 22_{-8}^{+10}% for unresolved spectral types of L9-T4 at separations
>~0.1 arcsec. This translates into a volume-corrected frequency of
13^{-6}_{+7}%, which is similar to values of ~9-12% reported outside the
transition. (ABRIDGED)Comment: Accepted for publication in the Astrophysical Journal. 23 pages, 12
figure
Inaccurate age and sex data in the Census PUMS files: Evidence and Implications
We discover and document errors in public use microdata samples ( PUMS files ) of the 2000 Census, the 2003-2006 American Community Survey, and the 2004-2009 Current Population Survey. For women and men ages 65 and older, age- and sex-specific population estimates generated from the PUMS files differ by as much as 15% from counts in published data tables. Moreover, an analysis of labor force participation and marriage rates suggest the PUMS samples are not representative of the population at individual ages for those ages 65 and over. PUMS files substantially underestimate labor force participation of those near retirement ages and overestimate labor force participation rates of those at older ages. These problems were an unintentional by-product of the misapplication of a newer generation of disclosure avoidance procedures carried out on the data. The resulting errors in the public use data could significantly impact studies of people ages 65 and older, particularly analyses of variables that are expected to change by age
Inaccurate age and sex data in the Census PUMS files: Evidence and Implications
We discover and document errors in public use microdata samples ("PUMS files") of the 2000 Census, the 2003-2006 American Community Survey, and the 2004-2009 Current Population Survey. For women and men ages 65 and older, age- and sex-specific population estimates generated from the PUMS files differ by as much as 15% from counts in published data tables. Moreover, an analysis of labor force participation and marriage rates suggests the PUMS samples are not representative of the population at individual ages for those ages 65 and over. PUMS files substantially underestimate labor force participation of those near retirement ages and overestimate labor force participation rates of those at older ages. These problems were an unintentional by-product of the misapplication of a newer generation of disclosure avoidance procedures carried out on the data. The resulting errors in the public use data could significantly impact studies of people ages 65 and older, particularly analyses of variables that are expected to change by age.
Building a repository for record linkage
ICPSR is building LinkageLibrary, a repository and community space for researchers involved in linking and combining datasets, as a collaboration between social, statistical, and computer scientists. Unlike surveys or experiments where causal and outcome variables are measured in tandem, it is often necessary when working with organic, non-design data to link to other measures. This makes linkage methodologies particularly important when conducting analyses using administrative data. A common benchmarking repository of linkage methodologies will propel the field to the next level of rigor by facilitating comparison of different algorithms, understanding which types of algorithms work best under different conditions and problem domains, promoting transparency and replicability of research, and encouraging proper citation of methodological contributions and their resulting datasets. It will bring together the diverse scholarly communities (e.g., computer scientists, statisticians, and social, behavioral, economic, and health (SBEH) scientists) who are currently addressing these challenges in disparate ways that do not build on one another’s work. Improving linkage methodologies is critical to the production of representative samples, and thus to unbiased estimates of a wide variety of social and economic phenomena. The repository will accelerate the development of new record linkage algorithms and evaluation methods, improve the reproducibility of analyses conducted on integrated data, allow comparisons on same and different data, and move forward the provision of privacy-aware integrated data. The presentation will focus on lessons learned while building the repository and the community, and introduce the LinkageLibrary website
Resolving a One-Year Ecesis Interval for Alaska Paper Birch: Dating a Rockfall Event, Wishbone Hill, Southcentral Alaska
Numerous large boulders at the base of Wishbone Hill, northeast of Anchorage, Alaska, suggest a historic rockfall event and potential for future surface instability, putting lives and property at risk. The source of the rockfall-boulders is an exposed syncline with a cliff face composed of conglomerate. The age of trees growing atop boulders provides a minimum exposure-age of those boulders and, thus, the rockfall event. To determine when the rockfall occurred, we dated trees growing atop the boulders using tree-ring samples collected from 30 Alaska paper birch trees. After mounting and polishing, each tree-ring sample was dot-counted, and tree-ring widths were measured using Measure J2X software to generate a master chronology (1938-2017). To estimate the youngest age for the rockfall event, we recorded pith-year for each sample. For samples lacking a pith (n=21), we used pith indicators to match existing rings to diagrams of corresponding ring widths, projecting approximate pith for each sample. All samples we corrected for sampling height (mean=0.8m) using a low estimate growth rate (0.6m/yr). The oldest birch tree sampled included pith and, with height correction, we estimate a germination year of 1936. When using first-year growth as an event’s temporal marker, accounting for the ecesis interval, the time between the availability of a new surface (i.e., boulders) and germination provides a more representative date of the event than using the pith/germination date alone. Considering birch ecesis and primary observations recorded in 1935, we propose that the rockfall event most likely occurred in 1934-1935. This finding suggests an ecesis interval as low as one year for Alaska paper birch in fresh rockfall areas. The risk of another destabilizing event may prompt those utilizing this area for recreational and residential purposes to reconsider future use
Reconciling Parent-Child Relationships across US Administrative Datasets
Introduction
Population data capture children, parents, relatives, and others moving in and out of households. The U.S. has seen falling marriage rates, and increases in multigenerational households and complex families, young children living with grandparents, and adult children living with parents. Robust parent-child linkages are critical to understand these demographic shifts.
Objectives and Approach
We construct and validate parent-child linkages over a century to observe how U.S. households are changing over time. The three largest person-based datafiles in the U.S. are the decennial censuses, the Social Security Administration transaction file, and individual tax returns from the Internal Revenue Service. These sources operationalize relationships differently, capture data at various frequencies, and gather the data for unique purposes. We use probabilistic matching to observe and reconcile parent-child relationships across these sources. The data include a variety of personal identifiers including name, date of birth, parents’ names, address, and place of birth that support matching and validation.
Results
We find that understanding the content, consistency, and coverage of the files before matching is critical for high quality linkages. The representativeness of the parent-child relationship file improves over time, with the weakest coverage for the Greatest Generation and the strongest coverage for Millennials. Coverage varies by source: tax data underrepresent non-white children and have duplicate records for SSNs, while names and dates of birth are missing from Census data. Multiple match rates differ among demographic groups and over time. In the matching process, the blocking variables rely on common variables across the population datasets. Our approach provides robust entity resolution for women, despite married-maiden name changes. We describe challenges due to data problems in old census records and validation changes in social security data.
Conclusion/Implications
We conduct a successful reconciliation of parent-child relationships in U.S. population level files. The project supports operational and research uses, such as the 2020 Census. We will extend this work using graph matching and will expand the method to validate other relationship links including spouses and siblings
- …