Search CORE

5,197 research outputs found

Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints

Author: Chang Kai-Wei
Ordonez Vicente
Wang Tianlu
Yatskar Mark
Zhao Jieyu
Publication venue
Publication date: 01/01/2017
Field of study

Language is increasingly being used to define rich visual recognition problems with supporting image collections sourced from the web. Structured prediction models are used in these tasks to take advantage of correlations between co-occurring labels and visual input but risk inadvertently encoding social biases found in web corpora. In this work, we study data and models associated with multilabel object classification and visual semantic role labeling. We find that (a) datasets for these tasks contain significant gender bias and (b) models trained on these datasets further amplify existing bias. For example, the activity cooking is over 33% more likely to involve females than males in a training set, and a trained model further amplifies the disparity to 68% at test time. We propose to inject corpus-level constraints for calibrating existing structured prediction models and design an algorithm based on Lagrangian relaxation for collective inference. Our method results in almost no performance loss for the underlying recognition task but decreases the magnitude of bias amplification by 47.5% and 40.5% for multilabel classification and visual semantic role labeling, respectively.Comment: 11 pages, published in EMNLP 201

arXiv.org e-Print Archive

Crossref

Early Prediction of Movie Box Office Success based on Wikipedia Activity Big Data

Author: A Halavais
A Ishii
A Spoerri
A Spoerri
Attila Szolnoki
B Suh
C Castillo
CA Hidalgo
G Eysenbach
HS Moat
J Bollen
J Ginsberg
J Ratkiewicz
J Török
János Kertész
Márton Mestyán
R Kimmons
R Sharda
RK Pan
S Saavedra
S Sinha
S Sreenivasan
T Brody
T Holloway
T Preis
T Preis
T Yasseri
T Yasseri
T Yasseri
T Yasseri
Taha Yasseri
X Shuai
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Use of socially generated "big data" to access information about collective states of the minds in human societies has become a new paradigm in the emerging field of computational social science. A natural application of this would be the prediction of the society's reaction to a new product in the sense of popularity and adoption rate. However, bridging the gap between "real time monitoring" and "early predicting" remains a big challenge. Here we report on an endeavor to build a minimalistic predictive model for the financial success of movies based on collective activity data of online users. We show that the popularity of a movie can be predicted much before its release by measuring and analyzing the activity level of editors and viewers of the corresponding entry to the movie in Wikipedia, the well-known online encyclopedia.Comment: 13 pages, Including Supporting Information, 7 Figures, Download the dataset from: http://wwm.phy.bme.hu/SupplementaryDataS1.zi

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Aaltodoc Publication Archive

Oxford University Research Archive

FigShare

Knowledge Transfer from Weakly Labeled Audio using Convolutional Neural Network for Sound Events and Scenes

Author: Fugen Christian
Khadkevich Maksim
Kumar Anurag
Publication venue
Publication date: 07/09/2018
Field of study

In this work we propose approaches to effectively transfer knowledge from weakly labeled web audio data. We first describe a convolutional neural network (CNN) based framework for sound event detection and classification using weakly labeled audio data. Our model trains efficiently from audios of variable lengths; hence, it is well suited for transfer learning. We then propose methods to learn representations using this model which can be effectively used for solving the target task. We study both transductive and inductive transfer learning tasks, showing the effectiveness of our methods for both domain and task adaptation. We show that the learned representations using the proposed CNN model generalizes well enough to reach human level accuracy on ESC-50 sound events dataset and set state of art results on this dataset. We further use them for acoustic scene classification task and once again show that our proposed approaches suit well for this task as well. We also show that our methods are helpful in capturing semantic meanings and relations as well. Moreover, in this process we also set state-of-art results on Audioset dataset, relying on balanced training set.Comment: ICASSP 201

arXiv.org e-Print Archive

Crossref

Nanoinformatics 2010 Program

Author: Baker Nathan A
Chaka Anne
Cohen Yoram
Colvin Vicki
Fritts Martin
Geraci Charles L.
Hoover Mark D
Ku Sharon
Kulinowski Kristen M
Lippell Phil
Luo James
McLennan Michael
Morse Jeffrey
Ostraat Michele L
Rajan Krishna
Reznik-Zellen Rebecca
Schad Peter
Tuominen Mark T.
Publication venue
Publication date: 01/11/2010
Field of study

InterNano Nanomanufacturing Repository

Recommended from our members

Proceedings ICPW'07: 2nd International Conference on the Pragmatic Web, 22-23 Oct. 2007, Tilburg: NL

Author
Publication venue: Tilburg University
Publication date: 01/10/2007
Field of study

Proceedings ICPW'07: 2nd International Conference on the Pragmatic Web, 22-23 Oct. 2007, Tilburg: N

Open Research Online (The Open University)

Visualizing Research Digital Libraries with Open Standards

Author: Ginsburg Mark
Publication venue: AIS Electronic Library (AISeL)
Publication date: 25/02/2004
Field of study

Large-scale research Digital Libraries (DLs) contain a large array of potentially useful metadata. Yet, many popular DLs do not provide a convenient way to navigate the metadata or to visualize classification schema in the user session. For example, in the broad world of Management Information Systems (MIS) research, a high-level overview of MIS topics and their inter-relationships would be useful to navigate a MIS DL before zooming in on a specific article. To address this obstacle, this paper describes a prototype, the Technical Report Visualizer System (TRV), which uses a wide variety of open standards to show DL classification metadata in the navigation interface. The system captures MIS article metadata from the Open Archives Initiative (OAI) compliant arXiv e-Print archive at Cornell University. The OAI Protocol for Metadata Harvesting (OAI-PMH) is used to collect the topic metadata; the articles\u27 Association for Computing Machinery\u27s (ACM) Computing Classification System codes. We display the topic metadata in a Java hyperbolic tree and make use of XML conceptual product and implementation product standards and specifications, such as the Dublin Core and BiblioML bibliographic metadata sets, XML Topic Maps, Xalan and Xerces, to link user navigation activity to the abstracts and full text contents of the articles. We discuss the flexibility and convenience of XML standards and link this effort to related digital library visualization approaches. Keywords

AIS Electronic Library (AISeL)

Syntactic and Semantic Analysis and Visualization of Unstructured English Texts

Author: Karmakar Saurav
Publication venue: ScholarWorks @ Georgia State University
Publication date: 14/12/2011
Field of study

People have complex thoughts, and they often express their thoughts with complex sentences using natural languages. This complexity may facilitate efficient communications among the audience with the same knowledge base. But on the other hand, for a different or new audience this composition becomes cumbersome to understand and analyze. Analysis of such compositions using syntactic or semantic measures is a challenging job and defines the base step for natural language processing. In this dissertation I explore and propose a number of new techniques to analyze and visualize the syntactic and semantic patterns of unstructured English texts. The syntactic analysis is done through a proposed visualization technique which categorizes and compares different English compositions based on their different reading complexity metrics. For the semantic analysis I use Latent Semantic Analysis (LSA) to analyze the hidden patterns in complex compositions. I have used this technique to analyze comments from a social visualization web site for detecting the irrelevant ones (e.g., spam). The patterns of collaborations are also studied through statistical analysis. Word sense disambiguation is used to figure out the correct sense of a word in a sentence or composition. Using textual similarity measure, based on the different word similarity measures and word sense disambiguation on collaborative text snippets from social collaborative environment, reveals a direction to untie the knots of complex hidden patterns of collaboration

ScholarWorks @ Georgia State University

A Visual Interactive Analytic Tool for Filtering and Summarizing Large Health Data Sets Coded with Hierarchical Terminologies (VIADS).

Author: Abukamail Nasseef
Brooks Matthew
Buskirk Jacob
Cimino James J.
De Lacalle Sonsoles
Emerson Matthew
Jing Xia
Liu Chang
Masters David
Patel Vimla L.
Shubrook Jay H.
Zhou Yuchun
Publication venue: Touro Scholar
Publication date: 14/02/2019
Field of study

BACKGROUND: Vast volumes of data, coded through hierarchical terminologies (e.g., International Classification of Diseases, Tenth Revision-Clinical Modification [ICD10-CM], Medical Subject Headings [MeSH]), are generated routinely in electronic health record systems and medical literature databases. Although graphic representations can help to augment human understanding of such data sets, a graph with hundreds or thousands of nodes challenges human comprehension. To improve comprehension, new tools are needed to extract the overviews of such data sets. We aim to develop a visual interactive analytic tool for filtering and summarizing large health data sets coded with hierarchical terminologies (VIADS) as an online, and publicly accessible tool. The ultimate goals are to filter, summarize the health data sets, extract insights, compare and highlight the differences between various health data sets by using VIADS. The results generated from VIADS can be utilized as data-driven evidence to facilitate clinicians, clinical researchers, and health care administrators to make more informed clinical, research, and administrative decisions. We utilized the following tools and the development environments to develop VIADS: Django, Python, JavaScript, Vis.js, Graph.js, JQuery, Plotly, Chart.js, Unittest, R, and MySQL. RESULTS: VIADS was developed successfully and the beta version is accessible publicly. In this paper, we introduce the architecture design, development, and functionalities of VIADS. VIADS includes six modules: user account management module, data sets validation module, data analytic module, data visualization module, terminology module, dashboard. Currently, VIADS supports health data sets coded by ICD-9, ICD-10, and MeSH. We also present the visualization improvement provided by VIADS in regard to interactive features (e.g., zoom in and out, customization of graph layout, expanded information of nodes, 3D plots) and efficient screen space usage. CONCLUSIONS: VIADS meets the design objectives and can be used to filter, summarize, compare, highlight and visualize large health data sets that coded by hierarchical terminologies, such as ICD-9, ICD-10 and MeSH. Our further usability and utility studies will provide more details about how the end users are using VIADS to facilitate their clinical, research or health administrative decision making

The Touro College and University System