Search CORE

30 research outputs found

TopicViz: Semantic Navigation of Document Collections

Author: Chau Duen Horng "Polo"
Eisenstein Jacob
Kittur Aniket
Xing Eric P.
Publication venue
Publication date: 03/11/2011
Field of study

When people explore and manage information, they think in terms of topics and themes. However, the software that supports information exploration sees text at only the surface level. In this paper we show how topic modeling -- a technique for identifying latent themes across large collections of documents -- can support semantic exploration. We present TopicViz, an interactive environment for information exploration. TopicViz combines traditional search and citation-graph functionality with a range of novel interactive visualizations, centered around a force-directed layout that links documents to the latent themes discovered by the topic model. We describe several use scenarios in which TopicViz supports rapid sensemaking on large document collections

arXiv.org e-Print Archive

CiteSeerX

Turning the Tide: Curbing Deceptive Yelp Behaviors

Author: Bogdan Carbunar
Duen Horng (polo Chau
George Burri
Jaime Ballesteros
Mahmudur Rahman
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2014
Field of study

The popularity and influence of reviews, make sites like Yelp ideal targets for malicious behaviors. We present Marco, a novel system that exploits the unique combination of social, spatial and temporal signals gleaned from Yelp, to detect venues whose ratings are impacted by fraudulent reviews. Marco increases the cost and complexity of attacks, by imposing a tradeoff on fraudsters, between their ability to impact venue ratings and their ability to remain undetected. We contribute a new dataset to the community, which consists of both ground truth and gold standard data. We show that Marco significantly outperforms state-of-the-art approaches, by achieving 94 % accuracy in classifying reviews as fraudulent or genuine, and 95.8 % accuracy in classifying venues as deceptive or legitimate. Marco successfully flagged 244 deceptive venues from our large dataset with 7,435 venues, 270,121 reviews and 195,417 users. Among the San Francisco car repair and moving companies that we analyzed, almost 10 % exhibit fraudulent behaviors.

CiteSeerX

Crossref

Practical Attacks Against Graph-based Clustering

Author: Bayer Ulrich
Benczúr Miklós
Carlini Nicholas
Chen Yizheng
Chen Yizheng
Hao Shuang
Invernizzi Luca
Li Zhou
Nadji Yacin
Nelms Terry
Nelms Terry
Papernot Nicolas
Perdisci Roberto
Polo Chau Duen Horng
Rahbarinia Babak
Rndic Nedim
Sivakorn Suphannee
Smutz Charles
Sun Jimeng
Wagner David
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/08/2017
Field of study

Graph modeling allows numerous security problems to be tackled in a general way, however, little work has been done to understand their ability to withstand adversarial attacks. We design and evaluate two novel graph attacks against a state-of-the-art network-level, graph-based detection system. Our work highlights areas in adversarial machine learning that have not yet been addressed, specifically: graph-based clustering techniques, and a global feature space where realistic attackers without perfect knowledge must be accounted for (by the defenders) in order to be practical. Even though less informed attackers can evade graph clustering with low cost, we show that some practical defenses are possible.Comment: ACM CCS 201

arXiv.org e-Print Archive

Crossref

VisIRR: Interactive Visual Information Retrieval and Recommendation for Large-scale Document Data

Author: Chau Duen Horng (Polo)
Choo Jaegul
Clarkson Edward
Gray Alexander
Inouye David
Kannan Ramakrishnan
Lee Changhyun
Lee Hanseung
Li Fuxin
Liu Zhicheng
Mehta Nishant
Ouyang Hua
Park Haesun
Som Subhojit
Stasko John
Stolper Charles D.
Publication venue: Georgia Institute of Technology
Publication date: 01/01/2013
Field of study

Research areas: Machine learning, Data mining, Information visualization, Visual analytics, Text visualization.We present a visual analytics system called VisIRR, which is an interactive visual information retrieval and recommendation system for document discovery. VisIRR effectively combines both paradigms of passive pull through a query processes for retrieval and active push that recommends the items of potential interest based on the user preferences. Equipped with efficient dynamic query interfaces for a large corpus of document data, VisIRR visualizes the retrieved documents in a scatter plot form with their overall topic clusters. At the same time, based on interactive personalized preference feedback on documents, VisIRR provides recommended documents reaching out to the entire corpus beyond the retrieved sets. Such recommended documents are represented in the same scatter space of the retrieved documents so that users can perform integrated analyses of both retrieved and recommended documents seamlessly. We describe the state-of-the-art computational methods that make these integrated and informative representations as well as real time interaction possible. We illustrate the way the system works by using detailed usage scenarios. In addition, we present a preliminary user study that evaluates the effectiveness of the system

Scholarly Materials And Research @ Georgia Tech

Towards Secure and Interpretable AI...

Author: Chau Duen Horng (Polo)
Publication venue: Georgia Institute of Technology
Publication date: 29/08/2019
Field of study

Presented on August 29, 2019 at 11:30 a.m.-1:00 p.m. in the Technology Square Research Building (TSRB), 1st Floor Auditorium, Georgia Institute of Technology.Polo Chau is an Associate Professor of Computing at Georgia Tech. He co-directs Georgia Tech's MS Analytics program. His research group bridges machine learning and visualization to synthesize scalable interactive tools for making sense of massive datasets, interpreting complex AI models, and solving real world problems in cybersecurity, human-centered AI, graph visualization and mining, and social good. His Ph.D. in Machine Learning from Carnegie Mellon University won CMU's Computer Science Dissertation Award, Honorable Mention. He received awards and grants from NSF, NIH, NASA, DARPA, Intel (Intel Outstanding Researcher), Symantec, Google, Nvidia, IBM, Yahoo, Amazon, Microsoft, eBay, LexisNexis; Raytheon Faculty Fellowship; Edenfield Faculty Fellowship; Outstanding Junior Faculty Award; The Lester Endowment Award; Symantec fellowship (twice); Best student papers at SDM'14 and KDD'16 (runner-up); Best demo at SIGMOD'17 (runner-up); Chinese CHI'18 Best paper. His research led to open-sourced or deployed technologies by Intel (for ISTC-ARSA: ShapeShifter, SHIELD, ADAGIO, MLsploit), Google, Facebook, Symantec (Polonium, AESOP protect 120M people from malware), and Atlanta Fire Rescue Department. His security and fraud detection research made headlines.Runtime: 59:01 minutesWe have witnessed tremendous growth in Artificial Intelligence (AI) and machine learning (ML) recently. However, research shows that AI and ML models are often vulnerable to adversarial attacks, and their predictions can be difficult to understand, evaluate and ultimately act upon. Discovering real-world vulnerabilities of deep neural networks and countermeasures to mitigate such threats has become essential to successful deployment of AI in security settings. We present our joint works with Intel which include the first targeted physical adversarial attack (ShapeShifter) that fools state-of-the-art object detectors; a fast defense (SHIELD) that removes digital adversarial noise by stochastic data compression; and interactive systems (ADAGIO and MLsploit) that further democratize the study of adversarial machine learning and facilitate real-time experimentation for deep learning practitioners. Finally, we also present how scalable interactive visualization can be used to amplify people’s ability to understand and interact with large-scale data and complex models. We sample from projects where interactive visualization has provided key leaps of insight, from increased model interpretability (Gamut with Microsoft Research), to model explorability with models trained on millions of instances (ActiVis deployed with Facebook), increased usability for non-experts about state-of-the-art AI (GAN Lab open-sourced with Google Brain; went viral!), and our latest work Summit, an interactive system that scalably summarizes and visualizes what features a deep learning model has learned and how those features interact to make predictions. We conclude by highlighting the next visual analytics research frontiers in AI

Scholarly Materials And Research @ Georgia Tech

Visual Data Analytics: A Short Tutorial

Author: Chau Duen Horng (Polo)
Publication venue
Publication date: 08/08/2019
Field of study

Presented on August 8, 2019 at 11:00 a.m. in the ISyE Main Building, Room 228 as part of the Foundation of Data Science (FDS) Summer School 2019.Duen Horng (Polo) Chau is an Associate Professor of Computing at Georgia Tech. He co-directs Tech's Analytics MS program. His research bridges data mining and human-computer interaction (HCI) to synthesize scalable interactive tools for making sense of massive datasets and solving real world problems.Runtime: 63:14 minute

Scholarly Materials And Research @ Georgia Tech

Leveraging Memory Mapping for Fast and Scalable Graph Computation on a PC

Author: Chau Duen Horng (Polo)
Lin Zhiyuan
Publication venue: Georgia Institute of Technology
Publication date: 01/08/2013
Field of study

Research areas: Graph mining algorithmsLarge graphs with billions of nodes and edges are increasingly common, calling for new kinds of scalable computation frameworks. Although popular, distributed approaches can be expensive to build, or require many resources to manage or tune. State-of-the-art approaches such as GraphChi and TurboGraph recently have demonstrated that a single machine can efﬁciently perform advanced computation on billion-node graphs. Although fast, they both use sophisticated data structures, memory management, and optimization techniques. We propose a minimalist approach that forgoes such complexities, by leveraging the memory mapping capability found on operating systems. Our experiments on large datasets, such as a 1.5 billion edge Twitter graph, show that our streamlined approach achieves up to 26 times faster than GraphChi, and comparable to TurboGraph. We con- tribute our crucial insight that by leveraging memory mapping, a fundamental operating system capability, we can outperform the latest graph computation techniques

Scholarly Materials And Research @ Georgia Tech

Accelerating the Big Data Challenge With Machine Learning on Emerging Architectures

Author: Chau Duen Horng (Polo)
Qiu Judy
Publication venue: Georgia Institute of Technology
Publication date: 06/09/2016
Field of study

Presented on September 6, 2016 from 1:00 p.m.-2:30 p.m. at the Klaus Advanced Computing Building, Room 1116W, Georgia Institute of Technology.South Big Data Innovation Hub ; Applications of Analytics and Machine Learning in Energy Industry-Academia WorkshopEnergy and Data Science Academia Talks - Big Data Processing and VisualizationDuen Horng (“Polo”) Chau helps make interactions with computers easier and more secure. He is an assistant professor at the School of Computational Science & Engineering in the College of Computing at the Georgia Institute of Technology, and an associate director of the master’s in analytics degree program.Judy Qiu is an Assistant Professor in the School of Informatics and Computing at Indiana University. Her research interests are on data-intensive computing at the intersection of Cloud and multicore technologies with an emphasis on life science applications using MapReduce and traditional parallel and distributed computing approaches. Dr. Qiu leads the SALSA project in the Pervasive Technology Institute at Indiana University. Data intensive science, Cloud computing and Multicore computing are converging and will revolutionize next generation of computing in architectural design and programming challenges. They enable the pipeline: data becomes information becomes knowledge becomes wisdom.Runtime: 23:40 minutes (Chau)Runtime: 23:41 minutes (Qui

Scholarly Materials And Research @ Georgia Tech