2,494 research outputs found
Optimizing the C4.5 Decision Tree Algorithm using MSD-Splitting
We propose an optimization of Dr. Ross Quin-lan’s C4.5 decision tree algorithm, used for data mining and classification. We will show that by discretizing and binning a data set’s continuous attributes into four groups using our novel technique called MSD-Splitting, we can significantly improve both the algorithm’s accuracy and efficiency, especially when applied to large data sets. We applied both the standard C4.5 algorithm and our optimized C4.5 algorithm to two data sets obtained from UC Irvine’s Machine Learning Repository: Census Income and Heart Disease. In our initial model, we discretized continuous attributes by splitting them into two groups at the point with the minimum expected information requirement, in accordance with the standard C4.5 algorithm. Using five-fold cross-validation, we calculated the average accuracy of our initial model for each data set. Our initial model yielded a 75.72% average accuracy across both data sets. The average execution time of our initial model was 1,541.57 s for the Census Income data set and 50.54 s for the Heart Disease data set. We then optimized our model by applying MSD-Splitting, which discretizes continuous attributes by splitting them into four groups using the mean and the two values one standard deviation away from the mean as split points. The accuracy of our model improved by an average of 5.11%across both data sets, while the average execution time reduced by an average of 96.72% for the larger Census Income data set and 46.38% for the Heart Disease data set
Cation-Size-Dependent DNA Adsorption Kinetics and Packing Density on Gold Nanoparticles: An Opposite Trend
This document is the Accepted Manuscript version of a Published Work that appeared in final form in Langmuir, copyright © American Chemical Society after peer review and technical editing by publisher. To access the final edited and published work see Liu, B., Kelly, E. Y., & Liu, J. (2014). Cation-Size-Dependent DNA Adsorption Kinetics and Packing Density on Gold Nanoparticles: An Opposite Trend. Langmuir, 30(44), 13228–13234. https://doi.org/10.1021/la503188hThe property of DNA is strongly influenced by counterions. Packing a dense layer of DNA onto a gold nanoparticle (AuNP) generates an interesting colloidal system with many novel physical properties such as a sharp melting transition, protection of DNA against nucleases, and enhanced complementary DNA binding affinity. In this work, the effect of monovalent cation size is studied. First, for free AuNPs without DNA, larger group 1A cations are more efficient in inducing their aggregation. The same trend is observed with group 2A metals using AuNPs capped by various self-assembled monolayers. After establishing the salt range to maintain AuNP stability, the DNA adsorption kinetics is also found to be faster with the larger Cs+ compared to the smaller Li+. This is attributed to the easier dehydration of Cs+, and dehydrated Cs+ might condense on the AuNP surface to reduce the electrostatic repulsion effectively. However, after a long incubation time with a high salt concentration, Li+ allows ∼30% more DNA packing compared to Cs+. Therefore, Li+ is more effective in reducing the charge repulsion among DNA, and Cs+ is more effective in screening the AuNP surface charge. This work suggests that physicochemical information at the bio/nanointerface can be obtained by using counterions as probes.University of Waterloo ||
Canadian Foundation for Innovation ||
Natural Sciences and Engineering Research Council ||
Ontario Ministry of Research and Innovation |
Misplaced Trust: Measuring the Interference of Machine Learning in Human Decision-Making
ML decision-aid systems are increasingly common on the web, but their
successful integration relies on people trusting them appropriately: they
should use the system to fill in gaps in their ability, but recognize signals
that the system might be incorrect. We measured how people's trust in ML
recommendations differs by expertise and with more system information through a
task-based study of 175 adults. We used two tasks that are difficult for
humans: comparing large crowd sizes and identifying similar-looking animals.
Our results provide three key insights: (1) People trust incorrect ML
recommendations for tasks that they perform correctly the majority of the time,
even if they have high prior knowledge about ML or are given information
indicating the system is not confident in its prediction; (2) Four different
types of system information all increased people's trust in recommendations;
and (3) Math and logic skills may be as important as ML for decision-makers
working with ML recommendations.Comment: 10 page
Evidence of Ridge Propagation in the Eastern Gulf of Mexico from Integrated Analysis of Potential Fields and Seismic Data
Integrated analysis of gravity, magnetic, and seismic data reveals two phases of spreading in the eastern Gulf of Mexico (GOM) including two distinct spreading centers, suggesting a major ridge reorganization during the opening of the eastern part of the GOM. Ridge propagation between the two spreading episodes explains the following observations: (1) the drastic asymmetry in the oceanic domain of northeastern GOM, (2) the presence of two distinct crustal zones with dramatically different thickness and physical properties, and (3) the observed seismicity within the oceanic domain that is not aligned with any known tectonic structure. The initial Late Jurassic spreading center (~160 Ma) resulted in a thin (~5 km) and uniform oceanic crust with a fast compressional velocity (7 km/s). Based on our analysis, the estimated full spreading rate of this older spreading event is less than 1 cm/yr. The spreading regime changed in Early Cretaceous around 150 Ma, resulting in a propagation (i.e., jump) of the spreading center. The new spreading episode was characterized by a change in spreading direction and increased magma supply as it produced thicker (up to 9 km) oceanic crust with a typical two-layered structure. Despite the increase in magmatic material, the full rate of this younger spreading event estimated from our analysis is only slightly faster (1.1 cm/yr assuming that spreading ceased at 137 Ma). The later conclusion is consistent with the morphology of the spreading centers mapped by seismic data. Our analysis shows that recent deep crustal earthquakes in the middle of the Gulf of Mexico are aligned with the boundary between the two identified distinct oceanic zones, referred to here as a pseudofault
A View from the Start: A Review of Inhibitory Control Training in Early Childhood
Young children’s capacity to monitor and control their thoughts and behaviors is influenced largely by inhibitory control, which grows rapidly during this age due to brain maturation. This capacity has important implications for children’s development, including academic and social outcomes, and has been shown to be influenced by culture and exposure to adverse life events such as poverty. Research suggests that this capacity, importantly, may be largely trainable, with appropriate training programs
A review of existing methods used to assess demand for integrated education in Northern Ireland
The education system in Northern Ireland (NI) is complex with the diversity of management structures reflecting religious affiliation and academic selection. Within the system, integrated education provides a mechanism to promote reconciliation among divided communities. Integrated education has been aided by legislation—most recently, the Integrated Education Act (NI) 2022, which places responsibility on the Department of Education and the Education Authority to encourage, facilitate and support integrated education. However, there is no standardised or agreed operational methodology on assessing demand for this. This study aims to examine the current approaches to assessing demand for integrated education in NI by collating existing evidence from key stakeholders and reviewing academic literature. Publicly available information was synthesised from the websites of key stakeholders, and a rapid literature review was conducted to identify methods used in NI and internationally to ascertain demand for education provision. The literature review returned limited results, and the review of key stakeholders' websites illustrated that although existing methods used in NI monitor support in principle for integrated education, they do not capture the full range of factors considered by parents when selecting a school. As a result, the findings indicate a mismatch between articulated preferences for integrated education, the availability of places in integrated schools, and the uptake of these. This study concludes that although existing methods provide part of the evidence jigsaw necessary to assess demand, alternative approaches must be considered to acknowledge the existing complexities within the education system and wider societal structures in NI
Recommended from our members
Bayesian models for pooling microarray studies with multiple sources of replications
BACKGROUND: Biologists often conduct multiple but different cDNA microarray studies that all target the same biological system or pathway. Within each study, replicate slides within repeated identical experiments are often produced. Pooling information across studies can help more accurately identify true target genes. Here, we introduce a method to integrate multiple independent studies efficiently. RESULTS: We introduce a Bayesian hierarchical model to pool cDNA microarray data across multiple independent studies to identify highly expressed genes. Each study has multiple sources of variation, i.e. replicate slides within repeated identical experiments. Our model produces the gene-specific posterior probability of differential expression, which provides a direct method for ranking genes, and provides Bayesian estimates of false discovery rates (FDR). In simulations combining two and five independent studies, with fixed FDR levels, we observed large increases in the number of discovered genes in pooled versus individual analyses. When the number of output genes is fixed (e.g., top 100), the pooled model found appreciably more truly differentially expressed genes than the individual studies. We were also able to identify more differentially expressed genes from pooling two independent studies in Bacillus subtilis than from each individual data set. Finally, we observed that in our simulation studies our Bayesian FDR estimates tracked the true FDRs very well. CONCLUSION: Our method provides a cohesive framework for combining multiple but not identical microarray studies with several sources of replication, with data produced from the same platform. We assume that each study contains only two conditions: an experimental and a control sample. We demonstrated our model's suitability for a small number of studies that have been either pre-scaled or have no outliers
Time is of the essence: an observational time-motion study of internal medicine residents while they are on duty
Background: The effects of changes to resident physician duty hours need to be measureable. This time-motion study was done to record internal medicine residents’ workflow while on duty and to determine the feasibility of capturing detailed data using a mobile electronic tool.Methods: Junior and senior residents were shadowed by a single observer during six-hour blocks of time, covering all seven days. Activities were recorded in real-time. Eighty-nine activities grouped into nine categories were determined a priori.Results: A total of 17,714 events were recorded, encompassing 516 hours of observation. Time was apportioned in the following categories: Direct Patient Care (22%), Communication (19%), Personal tasks (15%), Documentation (14%), Education (13%), Indirect care (11%), Transit (6%), Administration (0.6%), and Non-physician tasks (0.4%). Nineteen percent of the education time was spent in self-directed learning activities. Only 9% of the total on duty time was spent in the presence of patients. Sixty-five percent of communication time was devoted to information transfer. A total of 968 interruptions were recorded which took on average 93.5 seconds each to service.Conclusion: Detailed recording of residents’ workflow is feasible and can now lead to the measurement of the effects of future changes to residency training. Education activities accounted for 13% of on-duty time.
- …