30 research outputs found
TopicViz: Semantic Navigation of Document Collections
When people explore and manage information, they think in terms of topics and
themes. However, the software that supports information exploration sees text
at only the surface level. In this paper we show how topic modeling -- a
technique for identifying latent themes across large collections of documents
-- can support semantic exploration. We present TopicViz, an interactive
environment for information exploration. TopicViz combines traditional search
and citation-graph functionality with a range of novel interactive
visualizations, centered around a force-directed layout that links documents to
the latent themes discovered by the topic model. We describe several use
scenarios in which TopicViz supports rapid sensemaking on large document
collections
Turning the Tide: Curbing Deceptive Yelp Behaviors
The popularity and influence of reviews, make sites like Yelp ideal targets for malicious behaviors. We present Marco, a novel system that exploits the unique combination of social, spatial and temporal signals gleaned from Yelp, to detect venues whose ratings are impacted by fraudulent reviews. Marco increases the cost and complexity of attacks, by imposing a tradeoff on fraudsters, between their ability to impact venue ratings and their ability to remain undetected. We contribute a new dataset to the community, which consists of both ground truth and gold standard data. We show that Marco significantly outperforms state-of-the-art approaches, by achieving 94 % accuracy in classifying reviews as fraudulent or genuine, and 95.8 % accuracy in classifying venues as deceptive or legitimate. Marco successfully flagged 244 deceptive venues from our large dataset with 7,435 venues, 270,121 reviews and 195,417 users. Among the San Francisco car repair and moving companies that we analyzed, almost 10 % exhibit fraudulent behaviors.
Practical Attacks Against Graph-based Clustering
Graph modeling allows numerous security problems to be tackled in a general
way, however, little work has been done to understand their ability to
withstand adversarial attacks. We design and evaluate two novel graph attacks
against a state-of-the-art network-level, graph-based detection system. Our
work highlights areas in adversarial machine learning that have not yet been
addressed, specifically: graph-based clustering techniques, and a global
feature space where realistic attackers without perfect knowledge must be
accounted for (by the defenders) in order to be practical. Even though less
informed attackers can evade graph clustering with low cost, we show that some
practical defenses are possible.Comment: ACM CCS 201
VisIRR: Interactive Visual Information Retrieval and Recommendation for Large-scale Document Data
Research areas: Machine learning, Data mining, Information visualization, Visual analytics, Text visualization.We present a visual analytics system called VisIRR, which is an interactive visual information retrieval and recommendation system for document discovery. VisIRR effectively combines both paradigms of passive pull through a query processes for
retrieval and active push that recommends the items of potential interest based on the user preferences. Equipped with efficient
dynamic query interfaces for a large corpus of document data, VisIRR visualizes the retrieved documents in a scatter plot form with their overall topic clusters. At the same time, based on interactive personalized preference feedback on documents, VisIRR provides recommended documents reaching out to the entire corpus beyond the retrieved sets. Such recommended documents are
represented in the same scatter space of the retrieved documents so that users can perform integrated analyses of both retrieved
and recommended documents seamlessly. We describe the state-of-the-art computational methods that make these integrated and
informative representations as well as real time interaction possible. We illustrate the way the system works by using detailed usage
scenarios. In addition, we present a preliminary user study that evaluates the effectiveness of the system
Towards Secure and Interpretable AI...
Presented on August 29, 2019 at 11:30 a.m.-1:00 p.m. in the Technology Square Research Building (TSRB), 1st Floor Auditorium, Georgia Institute of Technology.Polo Chau is an Associate Professor of Computing at Georgia Tech. He co-directs Georgia Tech's MS Analytics program. His research group bridges machine learning and visualization to synthesize scalable interactive tools for making sense of massive datasets, interpreting complex AI models, and solving real world problems in cybersecurity, human-centered AI, graph visualization and mining, and social good. His Ph.D. in Machine Learning from Carnegie Mellon University won CMU's Computer Science Dissertation Award, Honorable Mention. He received awards and grants from NSF, NIH, NASA, DARPA, Intel (Intel Outstanding Researcher), Symantec, Google, Nvidia, IBM, Yahoo, Amazon, Microsoft, eBay, LexisNexis; Raytheon Faculty Fellowship; Edenfield Faculty Fellowship; Outstanding Junior Faculty Award; The Lester Endowment Award; Symantec fellowship (twice); Best student papers at SDM'14 and KDD'16 (runner-up); Best demo at SIGMOD'17 (runner-up); Chinese CHI'18 Best paper. His research led to open-sourced or deployed technologies by Intel (for ISTC-ARSA: ShapeShifter, SHIELD, ADAGIO, MLsploit), Google, Facebook, Symantec (Polonium, AESOP protect 120M people from malware), and Atlanta Fire Rescue Department. His security and fraud detection research made headlines.Runtime: 59:01 minutesWe have witnessed tremendous growth in Artificial Intelligence (AI) and machine learning (ML) recently. However, research shows that AI and ML models are often vulnerable to adversarial attacks, and their predictions can be difficult to understand, evaluate and ultimately act upon. Discovering real-world vulnerabilities of deep neural networks and countermeasures to mitigate such threats has become essential to successful deployment of AI in security settings. We present our joint works with Intel which include the first targeted physical adversarial attack (ShapeShifter) that fools state-of-the-art object detectors; a fast defense (SHIELD) that removes digital adversarial noise by stochastic data compression; and interactive systems (ADAGIO and MLsploit) that further democratize the study of adversarial machine learning and facilitate real-time experimentation for deep learning practitioners. Finally, we also present how scalable interactive visualization can be used to amplify people’s ability to understand and interact with large-scale data and complex models. We sample from projects where interactive visualization has provided key leaps of insight, from increased model interpretability (Gamut with Microsoft Research), to model explorability with models trained on millions of instances (ActiVis deployed with Facebook), increased usability for non-experts about state-of-the-art AI (GAN Lab open-sourced with Google Brain; went viral!), and our latest work Summit, an interactive system that scalably summarizes and visualizes what features a deep learning model has learned and how those features interact to make predictions. We conclude by highlighting the next visual analytics research frontiers in AI
Visual Data Analytics: A Short Tutorial
Presented on August 8, 2019 at 11:00 a.m. in the ISyE Main Building, Room 228 as part of the Foundation of Data Science (FDS) Summer School 2019.Duen Horng (Polo) Chau is an Associate Professor of Computing at Georgia Tech. He co-directs Tech's Analytics MS program. His research bridges data mining and human-computer interaction (HCI) to synthesize scalable interactive tools for making sense of massive datasets and solving real world problems.Runtime: 63:14 minute
Leveraging Memory Mapping for Fast and Scalable Graph Computation on a PC
Research areas: Graph mining algorithmsLarge graphs with billions of nodes and edges are increasingly common, calling for new kinds of scalable computation frameworks. Although popular, distributed approaches can be expensive to build, or require many resources to manage or tune. State-of-the-art approaches such as GraphChi and TurboGraph recently have demonstrated that a single machine can efficiently perform advanced computation on billion-node graphs. Although fast, they both use sophisticated data structures, memory management, and optimization techniques. We propose a minimalist approach that forgoes such complexities, by leveraging the memory mapping capability found on operating systems. Our experiments on large datasets, such as a 1.5 billion edge Twitter graph, show that our streamlined approach achieves up to 26 times faster than GraphChi, and comparable to TurboGraph. We con- tribute our crucial insight that by leveraging memory mapping, a fundamental operating system capability, we can outperform the latest graph computation techniques
Accelerating the Big Data Challenge With Machine Learning on Emerging Architectures
Presented on September 6, 2016 from 1:00 p.m.-2:30 p.m. at the Klaus Advanced Computing Building, Room 1116W, Georgia Institute of Technology.South Big Data Innovation Hub ; Applications of Analytics and Machine Learning in Energy Industry-Academia WorkshopEnergy and Data Science Academia Talks - Big Data Processing and VisualizationDuen Horng (“Polo”) Chau helps make
interactions with computers easier and
more secure. He is an assistant professor
at the School of Computational Science &
Engineering in the College of Computing at the Georgia
Institute of Technology, and an associate director of the
master’s in analytics degree program.Judy Qiu is an Assistant Professor in the School of Informatics and Computing at Indiana University. Her research interests
are on data-intensive computing at the intersection of Cloud and multicore technologies with an emphasis on life science
applications using MapReduce and traditional parallel and distributed computing approaches. Dr. Qiu leads the SALSA
project in the Pervasive Technology Institute at Indiana University. Data intensive science, Cloud computing and Multicore
computing are converging and will revolutionize next generation of computing in architectural design and programming
challenges. They enable the pipeline: data becomes information becomes knowledge becomes wisdom.Runtime: 23:40 minutes (Chau)Runtime: 23:41 minutes (Qui