Search CORE

1,782 research outputs found

Structure Selection from Streaming Relational Data

Author: Mihalkova Lilyana
Moustafa Walaa Eldin
Publication venue
Publication date: 01/01/2011
Field of study

Statistical relational learning techniques have been successfully applied in a wide range of relational domains. In most of these applications, the human designers capitalized on their background knowledge by following a trial-and-error trajectory, where relational features are manually defined by a human engineer, parameters are learned for those features on the training data, the resulting model is validated, and the cycle repeats as the engineer adjusts the set of features. This paper seeks to streamline application development in large relational domains by introducing a light-weight approach that efficiently evaluates relational features on pieces of the relational graph that are streamed to it one at a time. We evaluate our approach on two social media tasks and demonstrate that it leads to more accurate models that are learned faster

arXiv.org e-Print Archive

CiteSeerX

A New Evolutionary Algorithm For Mining Noisy, Epistatic, Geospatial Survey Data Associated With Chagas Disease

Author: Hanley John P.
Publication venue: UVM ScholarWorks
Publication date: 01/01/2017
Field of study

The scientific community is just beginning to understand some of the profound affects that feature interactions and heterogeneity have on natural systems. Despite the belief that these nonlinear and heterogeneous interactions exist across numerous real-world systems (e.g., from the development of personalized drug therapies to market predictions of consumer behaviors), the tools for analysis have not kept pace. This research was motivated by the desire to mine data from large socioeconomic surveys aimed at identifying the drivers of household infestation by a Triatomine insect that transmits the life-threatening Chagas disease. To decrease the risk of transmission, our colleagues at the laboratory of applied entomology and parasitology have implemented mitigation strategies (known as Ecohealth interventions); however, limited resources necessitate the search for better risk models. Mining these complex Chagas survey data for potential predictive features is challenging due to imbalanced class outcomes, missing data, heterogeneity, and the non-independence of some features. We develop an evolutionary algorithm (EA) to identify feature interactions in Big Datasets with desired categorical outcomes (e.g., disease or infestation). The method is non-parametric and uses the hypergeometric PMF as a fitness function to tackle challenges associated with using p-values in Big Data (e.g., p-values decrease inversely with the size of the dataset). To demonstrate the EA effectiveness, we first test the algorithm on three benchmark datasets. These include two classic Boolean classifier problems: (1) the majority-on problem and (2) the multiplexer problem, as well as (3) a simulated single nucleotide polymorphism (SNP) disease dataset. Next, we apply the EA to real-world Chagas Disease survey data and successfully archived numerous high-order feature interactions associated with infestation that would not have been discovered using traditional statistics. These feature interactions are also explored using network analysis. The spatial autocorrelation of the genetic data (SNPs of Triatoma dimidiata) was captured using geostatistics. Specifically, a modified semivariogram analysis was performed to characterize the SNP data and help elucidate the movement of the vector within two villages. For both villages, the SNP information showed strong spatial autocorrelation albeit with different geostatistical characteristics (sills, ranges, and nuggets). These metrics were leveraged to create risk maps that suggest the more forested village had a sylvatic source of infestation, while the other village had a domestic/peridomestic source. This initial exploration into using Big Data to analyze disease risk shows that novel and modified existing statistical tools can improve the assessment of risk on a fine-scale

ScholarWorks @ UVM

Recommended from our members

Assessing unidimensionality: A comparison of Rasch Modeling, Parallel Analysis, and TETRAD

Author: DiGangi Samuel
Jannasch-Pennell Angel
Osborn-Popp Sharon
Yu Chong Ho
Publication venue: ScholarWorks@UMass Amherst
Publication date: 23/11/2019
Field of study

The evaluation of assessment dimensionality is a necessary stage in the gathering of evidence to support the validity of interpretations based on a total score, particularly when assessment development and analysis are conducted within an item response theory (IRT) framework. In this study, we employ polytomous item responses to compare two methods that have received increased attention in recent years (Rasch model and Parallel analysis) with a method for evaluating assessment structure that is less well-known in the educational measurement community (TETRAD). The three methods were all found to be reasonably effective. Parallel Analysis successfully identified the correct number of factors and while the Rasch approach did not show the item misfit that would indicate deviation from clear unidimensionality, the pattern of residuals did seem to indicate the presence of correlated, yet distinct, factors. TETRAD successfully confirmed one dimension in the single-construct data set and was able to confirm two dimensions in the combined data set, yet excluded one item from each cluster, for no obvious reasons. The outcomes of all three approaches substantiate the conviction that the assessment of dimensionality requires a good deal of judgment. Accessed 19,548 times on https://pareonline.net from October 08, 2007 to December 31, 2019. For downloads from January 1, 2020 forward, please click on the PlumX Metrics link to the right

ScholarWorks@UMass Amherst

EDM 2011: 4th international conference on educational data mining : Eindhoven, July 6-8, 2011 : proceedings

Author
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2011
Field of study

Pure OAI Repository

Identifying Causal Structures from Cyberstalking: Behaviors Severity and Association

Author: Ismaili Florije
Luma-Osmani Shkurte
Pathak Pankaj
Zenuni Xhemal
Publication venue: 'Croatian Communications and Information Society'
Publication date: 01/01/2022
Field of study

This paper presents an etiological cyberstalking study, meaning the use of various technologies and internet in general to harass or to stalk someone. The novelty of the paper is the multivariate empirical approach of cyberstalking victimization that has received less attention from the research community. Also, there is a lack of such studies from the causal perspective. It happens, since in most of the studies, a priority is given on a single causation identification, whereas the data examination used for mining causal relationships in this paper presents a novel and great potential to detect combined or multiple cause factors. The paper focuses in the impact that variables such as age, gender and the fact whether the participant has ever harassed someone, is related to the fact of being victim of cyberstalking. The research aims to find the causes of cyberstalking in high school’s teenagers. Furthermore, an exploratory data analysis has been performed. A weak and moderate correlation between the factors on the dataset is emphasized. The odds ratio among the variables has been calculated, which implies that girls are twice as likely as boys to be cyberstalked. Similarly, concerning outcomes related to cyberstalking frequency recidivism are noticed

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Risk Management Based on Expert Rules and Data Mining: A Case Study in Insurance

Author: Daniels Hennie
Dissel Han van
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2002
Field of study

Correctness, transparency and effectiveness are the principal attributes of knowledge derived from databases using data mining. In the current data mining research there is a focus on efficiency improvement of algorithms for knowledge discovery. However, improving the algorithms is often not sufficient. The limitations of data mining can only be dissolved by the integration of knowledge of experts in the field, encoded in some accessible way, with knowledge derived from patterns in the databases. In this paper we discuss an approach for combining expert knowledge and knowledge derived from transactional databases. The approach proposed is applicable to a wide variety of risk management problems. We illustrate the approach with a case study on fraud detection in an insurance company. The case clearly shows that the combination of expert knowledge with monotomic neural networks leads to significant performance improvements

AIS Electronic Library (AISeL)

Knowledge-based Biomedical Data Science 2019

Author: Callahan Tiffany J.
Hunter Lawrence E.
Pielke-Lombardo Harrison
Tripodi Ignacio J.
Publication venue
Publication date: 08/10/2019
Field of study

Knowledge-based biomedical data science (KBDS) involves the design and implementation of computer systems that act as if they knew about biomedicine. Such systems depend on formally represented knowledge in computer systems, often in the form of knowledge graphs. Here we survey the progress in the last year in systems that use formally represented knowledge to address data science problems in both clinical and biological domains, as well as on approaches for creating knowledge graphs. Major themes include the relationships between knowledge graphs and machine learning, the use of natural language processing, and the expansion of knowledge-based approaches to novel domains, such as Chinese Traditional Medicine and biodiversity.Comment: Manuscript 43 pages with 3 tables; Supplemental material 43 pages with 3 table

arXiv.org e-Print Archive