576 research outputs found

    A Performance-Driven Benchmark for Feature Selection in Tabular Deep Learning

    Full text link
    Academic tabular benchmarks often contain small sets of curated features. In contrast, data scientists typically collect as many features as possible into their datasets, and even engineer new features from existing ones. To prevent overfitting in subsequent downstream modeling, practitioners commonly use automated feature selection methods that identify a reduced subset of informative features. Existing benchmarks for tabular feature selection consider classical downstream models, toy synthetic datasets, or do not evaluate feature selectors on the basis of downstream performance. Motivated by the increasing popularity of tabular deep learning, we construct a challenging feature selection benchmark evaluated on downstream neural networks including transformers, using real datasets and multiple methods for generating extraneous features. We also propose an input-gradient-based analogue of Lasso for neural networks that outperforms classical feature selection methods on challenging problems such as selecting from corrupted or second-order features

    Fate of the esophagogastric anastomosis

    Get PDF
    ObjectiveThe study objective was to evaluate histopathology of the esophagogastric anastomosis after esophagectomy, determine time trends of histologic changes, and identify factors influencing those findings.MethodsA total of 231 patients underwent 468 upper gastrointestinal endoscopies with anastomotic biopsy a median of 3.5 years after esophagectomy. Mean age was 59 ± 12 years, 74% (171) were male, and 96% (222) were white. Seventy-eight percent (179) had esophagectomy for cancer, 13% (30) had chemoradiotherapy, and 13% (30) had prior esophageal surgery. The anastomosis was 20 ± 2.0 cm from the incisors. Anti-reflux medications were used in 59% of patients (276/468) at esophagoscopy. Histopathology was graded as normal (0), consistent with reflux (1), cardia mucosa (2), intestinal metaplasia (3), and dysplasia (4). Repeated-measures nonlinear time-trend analysis and multivariable analyses were used.ResultsGrades 0 and 1 were constant, 5% and 92% at 10 years, respectively. Anti-reflux medication, induction therapy, and higher anastomosis were predictive of less grade 1 histopathology. Grades 2 and 3 increased with time: 12% and 33% at 5 years and 4% and 16% at 10 years, respectively. No variable was predictive of grade 2 or 3 (P > .15) except passage of time. No patient’s condition progressed to dysplasia or cancer.ConclusionsThe esophagogastric anastomosis is subject to gastroesophageal reflux. To minimize histopathologic changes of reflux, the anastomosis should be constructed as high as possible (closer to incisors) and anti-reflux medications prescribed. Surveillance endoscopy, if performed, will document a time-related progression of reflux-related histopathologic changes. However, during surveillance, intestinal metaplasia is uncommon and progression to cancer rare

    When Do Neural Nets Outperform Boosted Trees on Tabular Data?

    Full text link
    Tabular data is one of the most commonly used types of data in machine learning. Despite recent advances in neural nets (NNs) for tabular data, there is still an active discussion on whether or not NNs generally outperform gradient-boosted decision trees (GBDTs) on tabular data, with several recent works arguing either that GBDTs consistently outperform NNs on tabular data, or vice versa. In this work, we take a step back and question the importance of this debate. To this end, we conduct the largest tabular data analysis to date, comparing 19 algorithms across 176 datasets, and we find that the 'NN vs. GBDT' debate is overemphasized: for a surprisingly high number of datasets, either the performance difference between GBDTs and NNs is negligible, or light hyperparameter tuning on a GBDT is more important than choosing between NNs and GBDTs. A remarkable exception is the recently-proposed prior-data fitted network, TabPFN: although it is effectively limited to training sets of size 3000, we find that it outperforms all other algorithms on average, even when randomly sampling 3000 training datapoints. Next, we analyze dozens of metafeatures to determine what properties of a dataset make NNs or GBDTs better-suited to perform well. For example, we find that GBDTs are much better than NNs at handling skewed or heavy-tailed feature distributions and other forms of dataset irregularities. Our insights act as a guide for practitioners to determine which techniques may work best on their dataset. Finally, with the goal of accelerating tabular data research, we release the TabZilla Benchmark Suite: a collection of the 36 'hardest' of the datasets we study. Our benchmark suite, codebase, and all raw results are available at https://github.com/naszilla/tabzilla.Comment: NeurIPS Datasets and Benchmarks Track 202

    Esophageal submucosa: The watershed for esophageal cancer

    Get PDF
    ObjectivesSubmucosal esophageal cancers (pT1b) are considered superficial, implying good survival. However, some are advanced, metastasizing to regional lymph nodes. Interplay of cancer characteristics and lymphatic anatomy may create a watershed, demarcating low-risk from high-risk cancers. Therefore, we characterized submucosal cancers according to depth of invasion and identified those with high likelihood of lymph node metastases and poor survival.MethodsFrom 1983 to 2010, 120 patients underwent esophagectomy for submucosal cancers at Cleveland Clinic. Correlations were sought among cancer characteristics (location, dimensions, histopathologic cell type, histologic grade, and lymphovascular invasion [LVI]), and their associations with lymph node metastasis were identified by logistic regression. Associations with mortality were identified by Cox regression.ResultsAs submucosal invasion increased, cancer length (P < .001), width (P < .001), area (P < .001), LVI (P = .007), and grade (P = .05) increased. Invasion of the deep submucosa (P < .001) and LVI (P = .06) predicted lymph node metastases: 45% (23/51) of deep versus 10% (3/29) of middle-third and 7.5% (3/40) of inner-third cancers had lymph node metastases, as did 46% (12/26) with LVI versus 18% (17/94) without. Older age and lymph node metastases predicted worse 5-year survival: 94% for younger pN0 patients, 62% for older pN0 patients, and 36% for pN1-2 patients regardless of age.ConclusionsSubmucosal cancer characteristics and lymphatic anatomy create a watershed for regional lymph node metastases in the deep submucosa. This previously unrecognized divide distinguishes superficial submucosal cancers with good survival from deep submucosal cancers with poor survival. Aggressive therapy of more superficial cancers is critical before submucosal invasion occurs

    The Nuclear Network: Multiplex Network Analysis for Interconnected Systems

    Get PDF
    States facing the decision to develop a nuclear weapons program do so within a broader context of their relationships with other countries. How these diplomatic, economic, and strategic relationships impact proliferation decisions, however, remains under-specified. Adding to the existing empirical literature that attempts to model state proliferation decisions, this article introduces the first quantitative heterogeneous network analysis of how networks of conflict, alliances, trade, and nuclear cooperation interact to spur or deter nuclear proliferation. Using a multiplex network model, we conceptualize states as nodes linked by different modes of interaction represented on individual network layers. Node strength is used to quantify factors correlated with nuclear proliferation and these are combined in a weighted sum across layers to provide a metric characterizing the proliferation behavior of the state. This multiplex network modeling approach provides a means for identifying states with the highest relative likelihood of proliferation—based only on their relationships to other states. This work demonstrates that latent conflict and nuclear cooperation are positively correlated with proliferation, while an increased trade dependence suggests a decreased proliferation likelihood. A case study on Iran’s controversial nuclear program and past nuclear activity is also provided. These findings have clear, policy-relevant conclusions related to alliance posture, sanctions policy, and nuclear assistance. Abstract ©The Authors

    Comparison of the ICare® rebound tonometer with the Goldmann tonometer in a normal population

    Get PDF
    The aim of this study was to evaluate the accuracy of measurement of intraocular pressure (IOP) using a new induction/impact rebound tonometer (ICare) in comparison with the Goldmann applanation tonometer (AT). The left eyes of 46 university students were assessed with the two tonometers, with induction tonometry being performed first. The ICare was handled by an optometrist and the Goldmann tonometer by an ophthalmologist. In this study, statistically significant differences were found when comparing the ICare rebound tonometer with applanation tonometry (AT) (p < 0.05). The mean difference between the two tonometers was 1.34 +/- 2.03 mmHg (mean +/- S.D.) and the 95% limits of agreement were +/-3.98 mmHg. A frequency distribution of the differences demonstrated that in more than 80% of cases the IOP readings differed by <3 mmHg between the ICare and the AT. In the present population the ICare overestimates the IOP value by 1.34 mmHg on average when compared with Goldmann tonometer. Nevertheless, the ICare tonometer may be helpful as a screening tool when Goldmann applanation tonometry is not applicable or not recommended, as it is able to estimate IOP within a range of +/-3.00 mmHg in more than 80% of the populatio

    Low Energy Light Yield of Fast Plastic Scintillators

    Full text link
    Compact neutron imagers using double-scatter kinematic reconstruction are being designed for localization and characterization of special nuclear material. These neutron imaging systems rely on scintillators with a rapid prompt temporal response as the detection medium. As n-p elastic scattering is the primary mechanism for light generation by fast neutron interactions in organic scintillators, proton light yield data are needed for accurate assessment of scintillator performance. The proton light yield of a series of commercial fast plastic organic scintillators---EJ-200, EJ-204, and EJ-208---was measured via a double time-of-flight technique at the 88-Inch Cyclotron at Lawrence Berkeley National Laboratory. Using a tunable deuteron breakup neutron source, target scintillators housed in a dual photomultiplier tube configuration, and an array of pulse-shape-discriminating observation scintillators, the fast plastic scintillator light yield was measured over a broad and continuous energy range down to proton recoil energies of approximately 50 keV. This work provides key input to event reconstruction algorithms required for utilization of these materials in emerging neutron imaging modalities.Comment: 15 pages, 6 figure

    Statistical properties of 243^{243}Pu, and 242^{242}Pu(n,Îł\gamma) cross section calculation

    Full text link
    The level density and gamma-ray strength function (gammaSF) of 243Pu have been measured in the quasi-continuum using the Oslo method. Excited states in 243Pu were populated using the 242Pu(d,p) reaction. The level density closely follows the constant-temperature level density formula for excitation energies above the pairing gap. The gammaSF displays a double-humped resonance at low energy as also seen in previous investigations of actinide isotopes. The structure is interpreted as the scissors resonance and has a centroid of omega_{SR}=2.42(5)MeV and a total strength of B_{SR}=10.1(15)mu_N^2, which is in excellent agreement with sum-rule estimates. The measured level density and gammaSF were used to calculate the 242Pu(n,gamma) cross section in a neutron energy range for which there were previously no measured data.Comment: 9 pages, 8 figure
    • …
    corecore