11,318 research outputs found
Exploiting Cognitive Structure for Adaptive Learning
Adaptive learning, also known as adaptive teaching, relies on learning path
recommendation, which sequentially recommends personalized learning items
(e.g., lectures, exercises) to satisfy the unique needs of each learner.
Although it is well known that modeling the cognitive structure including
knowledge level of learners and knowledge structure (e.g., the prerequisite
relations) of learning items is important for learning path recommendation,
existing methods for adaptive learning often separately focus on either
knowledge levels of learners or knowledge structure of learning items. To fully
exploit the multifaceted cognitive structure for learning path recommendation,
we propose a Cognitive Structure Enhanced framework for Adaptive Learning,
named CSEAL. By viewing path recommendation as a Markov Decision Process and
applying an actor-critic algorithm, CSEAL can sequentially identify the right
learning items to different learners. Specifically, we first utilize a
recurrent neural network to trace the evolving knowledge levels of learners at
each learning step. Then, we design a navigation algorithm on the knowledge
structure to ensure the logicality of learning paths, which reduces the search
space in the decision process. Finally, the actor-critic algorithm is used to
determine what to learn next and whose parameters are dynamically updated along
the learning path. Extensive experiments on real-world data demonstrate the
effectiveness and robustness of CSEAL.Comment: Accepted by KDD 2019 Research Track. In Proceedings of the 25th ACM
SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD'19
Implementation of a Realistic Artificial Data Generator for Crash Data Generation
In this paper, a framework is outlined to generate realistic artificial data (RAD) as a tool for comparing different models developed for safety analysis. The primary focus of transportation safety analysis is on identifying and quantifying the influence of factors contributing to traffic crash occurrence and its consequences. The current framework of comparing model structures using only observed data has limitations. With observed data, it is not possible to know how well the models mimic the true relationship between the dependent and independent variables. Further, real datasets do not allow researchers to evaluate the model performance for different levels of complexity of the dataset. RAD offers an innovative framework to address these limitations. Hence, we propose a RAD generation framework embedded with heterogeneous causal structures that generates crash data by considering crash occurrence as a trip level event impacted by trip level factors, demographics, roadway and vehicle attributes. Within our RAD generator we employ three specific modules: (a) disaggregate trip information generation, (b) crash data generation and (c) crash data aggregation. For disaggregate trip information generation, we employ a daily activity-travel realization for an urban region generated from an established activity-based model for the Chicago region. We use this data of more than 2 million daily trips to generate a subset of trips with crash data. For trips with crashes crash location, crash type, driver/vehicle characteristics, and crash severity. The daily RAD generation process is repeated for generating crash records at yearly or multi-year resolution. The crash databases generated can be employed to compare frequency models, severity models, crash type and various other dimensions by facility type - possibly establishing a universal benchmarking system for alternative model frameworks in safety literature
HeATed Alert Triage (HeAT): Transferrable Learning to Extract Multistage Attack Campaigns
With growing sophistication and volume of cyber attacks combined with complex
network structures, it is becoming extremely difficult for security analysts to
corroborate evidences to identify multistage campaigns on their network. This
work develops HeAT (Heated Alert Triage): given a critical indicator of
compromise (IoC), e.g., a severe IDS alert, HeAT produces a HeATed Attack
Campaign (HAC) depicting the multistage activities that led up to the critical
event. We define the concept of "Alert Episode Heat" to represent the analysts
opinion of how much an event contributes to the attack campaign of the critical
IoC given their knowledge of the network and security expertise. Leveraging a
network-agnostic feature set, HeAT learns the essence of analyst's assessment
of "HeAT" for a small set of IoC's, and applies the learned model to extract
insightful attack campaigns for IoC's not seen before, even across networks by
transferring what have been learned. We demonstrate the capabilities of HeAT
with data collected in Collegiate Penetration Testing Competition (CPTC) and
through collaboration with a real-world SOC. We developed HeAT-Gain metrics to
demonstrate how analysts may assess and benefit from the extracted attack
campaigns in comparison to common practices where IP addresses are used to
corroborate evidences. Our results demonstrates the practical uses of HeAT by
finding campaigns that span across diverse attack stages, remove a significant
volume of irrelevant alerts, and achieve coherency to the analyst's original
assessments
Generating Preview Tables for Entity Graphs
Users are tapping into massive, heterogeneous entity graphs for many
applications. It is challenging to select entity graphs for a particular need,
given abundant datasets from many sources and the oftentimes scarce information
for them. We propose methods to produce preview tables for compact presentation
of important entity types and relationships in entity graphs. The preview
tables assist users in attaining a quick and rough preview of the data. They
can be shown in a limited display space for a user to browse and explore,
before she decides to spend time and resources to fetch and investigate the
complete dataset. We formulate several optimization problems that look for
previews with the highest scores according to intuitive goodness measures,
under various constraints on preview size and distance between preview tables.
The optimization problem under distance constraint is NP-hard. We design a
dynamic-programming algorithm and an Apriori-style algorithm for finding
optimal previews. Results from experiments, comparison with related work and
user studies demonstrated the scoring measures' accuracy and the discovery
algorithms' efficiency.Comment: This is the camera-ready version of a SIGMOD16 paper. There might be
tiny differences in layout, spacing and linebreaking, compared with the
version in the SIGMOD16 proceedings, since we must submit TeX files and use
arXiv to compile the file
DeepCare: A Deep Dynamic Memory Model for Predictive Medicine
Personalized predictive medicine necessitates the modeling of patient illness
and care processes, which inherently have long-term temporal dependencies.
Healthcare observations, recorded in electronic medical records, are episodic
and irregular in time. We introduce DeepCare, an end-to-end deep dynamic neural
network that reads medical records, stores previous illness history, infers
current illness states and predicts future medical outcomes. At the data level,
DeepCare represents care episodes as vectors in space, models patient health
state trajectories through explicit memory of historical records. Built on Long
Short-Term Memory (LSTM), DeepCare introduces time parameterizations to handle
irregular timed events by moderating the forgetting and consolidation of memory
cells. DeepCare also incorporates medical interventions that change the course
of illness and shape future medical risk. Moving up to the health state level,
historical and present health states are then aggregated through multiscale
temporal pooling, before passing through a neural network that estimates future
outcomes. We demonstrate the efficacy of DeepCare for disease progression
modeling, intervention recommendation, and future risk prediction. On two
important cohorts with heavy social and economic burden -- diabetes and mental
health -- the results show improved modeling and risk prediction accuracy.Comment: Accepted at JBI under the new name: "Predicting healthcare
trajectories from medical records: A deep learning approach
Addendum to Informatics for Health 2017: Advancing both science and practice
This article presents presentation and poster abstracts that were mistakenly omitted from the original publication
Lessons Learned from the ECML/PKDD Discovery Challenge on the Atherosclerosis Risk Factors Data
It becomes a good habit to organize a data mining cup, a competition or a challenge at machine learning or data mining conferences. The main idea of the Discovery Challenge organized at the European Conferences on Principles and Practice of Knowledge Discovery in Databases since 1999 was to encourage a collaborative research effort rather than a competition between data miners. Different data sets have been used for the Discovery Challenge workshops during the seven years. The paper summarizes our experience gained when organizing and evaluating the Discovery Challenge on the atherosclerosis risk factor data
Generating Synergistic Formulaic Alpha Collections via Reinforcement Learning
In the field of quantitative trading, it is common practice to transform raw
historical stock data into indicative signals for the market trend. Such
signals are called alpha factors. Alphas in formula forms are more
interpretable and thus favored by practitioners concerned with risk. In
practice, a set of formulaic alphas is often used together for better modeling
precision, so we need to find synergistic formulaic alpha sets that work well
together. However, most traditional alpha generators mine alphas one by one
separately, overlooking the fact that the alphas would be combined later. In
this paper, we propose a new alpha-mining framework that prioritizes mining a
synergistic set of alphas, i.e., it directly uses the performance of the
downstream combination model to optimize the alpha generator. Our framework
also leverages the strong exploratory capabilities of reinforcement
learning~(RL) to better explore the vast search space of formulaic alphas. The
contribution to the combination models' performance is assigned to be the
return used in the RL process, driving the alpha generator to find better
alphas that improve upon the current set. Experimental evaluations on
real-world stock market data demonstrate both the effectiveness and the
efficiency of our framework for stock trend forecasting. The investment
simulation results show that our framework is able to achieve higher returns
compared to previous approaches.Comment: Accepted by KDD '23, ADS trac
- …