2,990 research outputs found

    Assessing the long-term effects of conditional cash transfers on human capital : evidence from Colombia

    Get PDF
    Conditional cash transfers are programs under which poor families get a stipend provided they keep their children in school and take them for health checks. Although there is significant evidence showing that they have positive impacts on school participation, little is known about the long-term impacts of the programs on human capital. This paper investigates whether cohorts of children from poor households that benefited up to nine years from Familias en AcciĂłn, a conditional cash transfer program in Colombia, attained more school and performed better on academic tests at the end of high school. Identification of program impacts is derived from two different strategies using matching techniques with household surveys, and regression discontinuity design using a census of the poor and administrative records of the program. The authors show that, on average, participant children are 4 to 8 percentage points more likely than nonparticipant children to finish high school, particularly girls and beneficiaries in rural areas. Regarding long-term impact on tests scores, the analysis shows that program recipients who graduate from high school seem to perform at the same level as equally poor non-recipient graduates, even after correcting for possible selection bias when low-performing students enter school in the treatment group. Although the positive impacts on high school graduation may improve the employment and earning prospects of participants, the lack of positive effects on test scores raises the need to further explore policy actions to couple the program's objective of increasing human capital with enhanced learning.Education For All,Tertiary Education,Primary Education,Secondary Education,Teaching and Learning

    A computational framework for data-driven infrastructure engineering using advanced statistical learning, prediction, and curing

    Get PDF
    Over the past few decades, in most science and engineering fields, data-driven research has been becoming a promising next-generation research paradigm due to noticeable advances in computing power and accumulation of valuable databases. Despite this valuable accomplishment, the leveraging of these databases is still in its infancy. To address this issue, this dissertation investigates the following studies that use advanced statistical methods. The first study aims to develop a computational framework for collecting and transforming data obtained from heterogeneous databases in the Federal Aviation Administration and build a flexible predictive model using a generalized additive model (GAM) to predict runway incursions for 15 years in the top major US 36 airports. Results show that GAM is a powerful method for RI prediction with a high prediction accuracy. A direct search for finding the best predictor variables appears to be superior over the variable section approach based on a principal component analysis. The prediction power of GAM turns out to be comparable to that of an artificial neural network (ANN). The second study is to build an accurate predictive model based on earthquake engineering databases. As with the previous study, GAM is adopted as a predictive model. The result shows a promising predictive power of GAM with application to existing reinforced concrete shear wall databases. The primary objective of the third study is to suggest an efficient predictor variable selection method and provide relative importance among predictor variables using field survey pavement and simulated airport pavement data. Results show that the direct search method always finds the best predictor model, but the method takes a long time depending on the size of data and the variables\u27 dimensions. The results also depict that all variables are not necessary for the best prediction and identify the relative importance of variables selected for the GAM model. The fourth study deals with the impact of fractional hot-deck imputation (FHDI) on statistical and machine learning and prediction using practical engineering databases. Multiple response rates and internal parameters (i.e., category number and donor number) are investigated regarding the behavior and impacts of FHDI on prediction models. GAM, ANN, support vector machine, and extremely randomized trees are adopted as predictive models. Results show that the FHDI holds a positive impact on the prediction for engineering-based databases. The optimal internal parameters are also suggested to achieve a better prediction accuracy. The last study aims to offer a systematic computational framework including data collection, transformation, and squashing to develop a prediction model for the structural behavior of the target bridge. Missing values in the bridge data are cured by using the FHDI method to avoid an inaccurate data analysis due to biasness and sparseness of data. Results show that the application of FHDI improves prediction performances. This dissertation is expected to provide a notable computational framework for data processing, suggest a seamless data curing method, and offer an advanced statistical predictive model based on multiple projects. This novel research approach will help researchers to investigate their databases with a better understanding and build a statistical model with high accuracy according to their knowledge about the data

    Assessing the Long-term Effects of Conditional Cash Transfers on Human Capital: Evidence from Colombia

    Get PDF
    Conditional Cash Transfers (CCT) are programs under which poor families get a stipend provided they keep their children in school and take them for health checks. While there is significant evidence showing that they have positive impacts on school participation, little is known about their long-term impacts on human capital. In this paper we investigate whether cohorts of children from poor households that benefited up to nine years from Familias en AcciĂłn, a CCT in Colombia, attained more school and performed better in academic tests at the end of high school. Identification of program impacts is derived from two different strategies using matching techniques with household surveys, and regression discontinuity design using census of the poor and administrative records of the program. We show that, on average, participant children are 4 to 8 percentage points more likely than nonparticipant children to finish high school, particularly girls and beneficiaries in rural areas. Regarding long-term impact on tests scores, the analysis shows that program recipients who graduate from high school seem to perform at the same level as equally poor non-recipient graduates, even after correcting for possible selection bias when low-performing students enter school in the treatment group. Even though the positive impacts on high school graduation may improve the employment and earning prospects of participants, the lack of positive effects on the test scores raises the need to further explore policy actions to couple CCT's objective of increasing human capital with enhanced learning.Conditional Cash Transfers, school completion, academic achievement, learning outcomes

    Implementazione ed ottimizzazione di algoritmi per l'analisi di Biomedical Big Data

    Get PDF
    Big Data Analytics poses many challenges to the research community who has to handle several computational problems related to the vast amount of data. An increasing interest involves Biomedical data, aiming to get the so-called personalized medicine, where therapy plans are designed on the specific genotype and phenotype of an individual patient and algorithm optimization plays a key role to this purpose. In this work we discuss about several topics related to Biomedical Big Data Analytics, with a special attention to numerical issues and algorithmic solutions related to them. We introduce a novel feature selection algorithm tailored on omics datasets, proving its efficiency on synthetic and real high-throughput genomic datasets. We tested our algorithm against other state-of-art methods obtaining better or comparable results. We also implemented and optimized different types of deep learning models, testing their efficiency on biomedical image processing tasks. Three novel frameworks for deep learning neural network models development are discussed and used to describe the numerical improvements proposed on various topics. In the first implementation we optimize two Super Resolution models showing their results on NMR images and proving their efficiency in generalization tasks without a retraining. The second optimization involves a state-of-art Object Detection neural network architecture, obtaining a significant speedup in computational performance. In the third application we discuss about femur head segmentation problem on CT images using deep learning algorithms. The last section of this work involves the implementation of a novel biomedical database obtained by the harmonization of multiple data sources, that provides network-like relationships between biomedical entities. Data related to diseases and other biological relates were mined using web-scraping methods and a novel natural language processing pipeline was designed to maximize the overlap between the different data sources involved in this project

    A Deep Understanding of Structural and Functional Behavior of Tabular and Graphical Modules in Technical Documents

    Get PDF
    The rapid increase of published research papers in recent years has escalated the need for automated ways to process and understand them. The successful recognition of the information that is contained in technical documents, depends on the understanding of the document’s individual modalities. These modalities include tables, graphics, diagrams and etc. as defined in Bourbakis’ pioneering work. However, the depth of understanding is correlated to the efficiency of detection and recognition. In this work, a novel methodology is proposed for automatic processing of and understanding of tables and graphics images in technical document. Previous attempts on tables and graphics understanding retrieve only superficial knowledge such as table contents and axis values. However, the focus on capturing the internal associations and relations between the extracted data from each figure is studied here. The proposed methodology is divided into the following steps: 1) figure detection, 2) figure recognition, 3) figure understanding, by figures we mean tables, graphics and diagrams. More specifically, we evaluate different heuristic and learning methods for classifying table and graphics images as part of the detection module. Table recognition and deep understanding includes the extraction of the knowledge that is illustrated in a table image along with the deeper associations between the table variables. The graphics recognition module follows a clustering based approach in order to recognize middle points. Middle points are 2D points where the direction of the curves changes. They delimit the straight line segments that construct the graphics curves. We use these detected middle points in order to understand various features of each line segment and the associations between them. Additionally, we convert the extracted internal tabular associations and the captured curves’ structural and functional behavior into a common and at the same time unique form of representation, which is the Stochastic Petri-net (SPN) graphs. The use of SPN graphs allow for the merging of different document modalities through the functions that describe them, without any prior knowledge about what these functions are. Finally, we achieve a higher level of document understanding through the synergistic merging of the aforementioned SPN graphs that we extract from the table and graphics modalities. We provide results from every step of the document modalities understanding methodologies and the synergistic merging as proof of concept for this research

    Analysis of SHRP2 Data to Understand Normal and Abnormal Driving Behavior in Work Zones

    Get PDF
    This research project used the Second Strategic Highway Research Program (SHRP2) Naturalistic Driving Study(NDS) to improve highway safety by using statistical descriptions of normal driving behavior to identify abnormal driving behaviors in work zones. SHRP2 data used in these analyses included 50 safety-critical events (SCEs) from work zones and 444 baseline events selected on a matched case-control design.Principal components analysis (PCA) was used to summarize kinematic data into “normal” and “abnormal”driving. Each second of driving is described by one point in three-dimensional principal component (PC) space;an ellipse containing the bulk of baseline points is considered “normal” driving. Driving segments without-of-ellipse points have a higher probability of being an SCE. Matched case-control analysis indicates that thespecific individual and traffic flow made approximately equal contributions to predicting out-of-ellipse driving.Structural Topics Modeling (STM) was used to analyze complex categorical data obtained from annotated videos.The STM method finds “words” representing categorical data variables that occur together in many events and describes these associations as “topics.” STM then associates topics with either baselines or SCEs. The STM produced 10 topics: 3 associated with SCEs, 5 associated with baselines, and 2 that were neutral. Distractionoccurs in both baselines and SCEs.Both approaches identify the role of individual drivers in producing situations where SCEs might arise. A countermeasure could use the PC calculation to indicate impending issues or specific drivers who may havehigher crash risk, but not to employ significant interventions such as automatically braking a vehicle without-of-ellipse driving patterns. STM results suggest communication to drivers or placing compliant vehicles in thetraffic stream would be effective. Finally, driver distraction in work zones should be discouraged
    • 

    corecore