46 research outputs found

    Data Analytics for Decision Making at Academic Departments

    Full text link
    In the era of big data where data is being embraced by academic institutions, each academic department has access to lots of data –enrollment data, retention data, student outcomes, faculty productivity, student success rates and resource allocation. As a large four-year public institution, our institution serves a diverse student body where more than 60% of students are considered as economic disadvantaged. In our department (comprising 1900 students and 120 faculty), we are currently using data-driven decision-making to gain deeper insights into the needs of students, faculty and staff. Such well-planned and implemented data-driven strategy has transformed those insights into student success – retention and enrollment. Another area that data-driven culture has benefitted is in creating an unbiased environment (between faculty-student, administration-faculty, and faculty-faculty) where collaboration and communication has become easier. The main objective of this paper is to present our three data-analytic tools: predictive, descriptive and prescriptive and how they have improved student outcomes, intervened at-risk students, strategized cost cutting in the department, project actual outcomes and finally determining the effectiveness of our data-decisions. For example, our Predictive tool is helping identify potential low performing students at the course level and assigning them to mentoring and tutoring resources. Our Prescriptive tool is helping with strategies for cost-cutting suggestions and improving retention at the department level. Our Descriptive tool is helping with data-driven unbiased communication between staff, faculty and students at the college level

    Performance modeling of CMOS inverters using support vector machines (SVM) and adaptive sampling

    Full text link
    Integrated circuit designs are verified through the use of circuit simulators before being reproduced in real silicon. In order for any circuit simulation tool to accurately predict the performance of a CMOS design, it should generate models to predict the transistor’s electrical characteristics. The circuit simulation tools have access to massive amounts of data that are not only dynamic but generated at high speed in real time, hence making fast simulation a bottleneck in integrated circuit design. Using all the available data is prohibitive due to memory and time constraints. Accurate and fast sampling has been shown to enhance processing of large datasets without knowing all of the data. However, it is difficult to know in advance what size of the sample to choose in order to guarantee good performance. Thus, determining the smallest sufficient dataset size that obtains the same accurate model as the entire available dataset remains an important research question. This paper focuses on adaptively determining how many instances to present to the simulation tool for creating accurate models. We use Support Vector Machines (SVMs) with Chernoff inequality to come up with an efficient adaptive sampling technique, for scaling down the data. We then empirically show that the adaptive approach is faster and produces accurate models for circuit simulators as compared to other techniques such as progressive sampling and Artificial Neural Networks

    Intelligent Sampling for Big Data using Bootstrap Sampling and Chebyshev inequality

    Full text link
    The amount of data being generated and stored is growing exponentially, owed in part to the continuing advances in computer technology. These data present tremendous opportunities in data mining, a burgeoning field in computer science that focuses on the development of methods that can extract knowledge from data. In many real world problems, these data mining algorithms have access to massive amounts of data. Mining all the available data is prohibitive due to computational (time and memory) constraints. Much of the current research is concerned with scaling up data mining algorithms (i.e. improving on existing data mining algorithms for larger datasets). An alternative approach is to scale down the data. Thus, determining a smallest sufficient training set size that obtains the same accuracy as the entire available dataset remains an important research question. Our research focuses on selecting how many (sampling) instances to present to the data mining algorithm. The goals of this paper is to study and characterize the properties of learning curves, integrate them with Chebyshev Bound to come up with an efficient general purpose adaptive sampling schedule, and to empirically validate our algorithm for scaling down the data

    How Songbirds Learn to Sing Provides Suggestions for Designing Team Projects for Computing Courses

    Full text link
    Understanding how our brain works and how we learn is perhaps one of the greatest challenges facing twenty-first computer science. Songbirds are good candidates for trying to unravel some of this mystery. Over the last decade, a large amount of research has been made to better understand how songbirds learn complex songs. The Canary (Serinus canaria) and the Zebra Finch (Taeniopygia guttata) have been widely used bird models to study these brain and behavior relationships. Like songbirds, we humans are vocal and social learners. In such learners, the development of communication is initially steered by social interactions with adult tutors. In songbirds, song development is further shaped through interactions with peers and by attending to the consequences of others interacting. In this paper, we review three key areas in a bird’s brain which perform three specific roles (i.e. actor, experimenter and critic). Similarly, there are three roles (i.e. coder, designer and tester) that are being played in software firms for developing products. We can bring the same roles into the computer science classroom by designing a term project which involves students who play these three different roles. We demonstrate our methodology by showing how it works in a senior level computer science course. We then discuss and qualitatively show the benefits of such a role-based project design

    Data Mining using Ensemble Classifiers for Improved Prediction of Student Academic Performance

    Full text link
    In the last decade Data mining (DM) has been applied in the field of education, and is an emerging interdisciplinary research field also known as Educational Data Mining (EDM). One of the goals of EDM is to better understand how to predict student academic performance given personal, socio-economic, psychological and other environmental attributes. Another goal is to identify factors and rules that influence educational academic outcomes. In this paper, we use multiple classifiers (Decision Trees-J48, Naïve Bayes and Random Forest) to improve the quality of student data by eliminating noisy instances, and hence improving predictive accuracy. We also identify association rules that influence student outcomes using a combination of rule based techniques (Apriori, Filtered Associator and Tertius). We empirically compare our technique with single model based techniques and show that using ensemble models not only gives better predictive accuracies on student performance, but also provides better rules for understanding the factors that influence better student outcomes
    corecore