523 research outputs found
Topic model visualization with IPython
The paper introduces an approach to topic model visualization that is characterized by wide possibilities of choosing a method of visualization, user-friendly model representation, and simplicity of implementation for applications. The existing approaches to topic models visualization have been analyzed, and a system, which allows choosing data source for topic models, changing modeling parameters and visualizing the result of topic modeling with IPython has been developed. The example of topic model visualization has been built using the SCTM-en corpus of original news text
Prepare for Citizen Science Challenges at CERN
Abstract:
To inspire more people to contribute to science, and educate the public about science, two Citizen Science "challenges" were prepared during summer 2013: the CERN Summer Webfest 2013 and the Virtual LHC Challenge. The first part of this report summarizes how to organize a Webfest at CERN and the outcome of the CERN Summer Webfest 2013.The second part gives an introduction to the current state of the Virtual LHC Challenge: a development of the LHC@Home Test4Theory project planned to attract many unskilled volunteers. This work was supported by a grant from the EU Citizen Cyberlab project, with assistance from the Citizen Cyberscience Centre (CCC)
Teaching Data Science
We describe an introductory data science course, entitled Introduction to
Data Science, offered at the University of Illinois at Urbana-Champaign. The
course introduced general programming concepts by using the Python programming
language with an emphasis on data preparation, processing, and presentation.
The course had no prerequisites, and students were not expected to have any
programming experience. This introductory course was designed to cover a wide
range of topics, from the nature of data, to storage, to visualization, to
probability and statistical analysis, to cloud and high performance computing,
without becoming overly focused on any one subject. We conclude this article
with a discussion of lessons learned and our plans to develop new data science
courses.Comment: 10 pages, 4 figures, International Conference on Computational
Science (ICCS 2016
Hardware-accelerated interactive data visualization for neuroscience in Python.
Large datasets are becoming more and more common in science, particularly in neuroscience where experimental techniques are rapidly evolving. Obtaining interpretable results from raw data can sometimes be done automatically; however, there are numerous situations where there is a need, at all processing stages, to visualize the data in an interactive way. This enables the scientist to gain intuition, discover unexpected patterns, and find guidance about subsequent analysis steps. Existing visualization tools mostly focus on static publication-quality figures and do not support interactive visualization of large datasets. While working on Python software for visualization of neurophysiological data, we developed techniques to leverage the computational power of modern graphics cards for high-performance interactive data visualization. We were able to achieve very high performance despite the interpreted and dynamic nature of Python, by using state-of-the-art, fast libraries such as NumPy, PyOpenGL, and PyTables. We present applications of these methods to visualization of neurophysiological data. We believe our tools will be useful in a broad range of domains, in neuroscience and beyond, where there is an increasing need for scalable and fast interactive visualization
Aprendizaje orientado a la programación en economía, negocios y finanzas
[EN] As the relationship between both students (teachers) and information technology evolves, new tools are required to improve learning (teaching) in social sciences. Economics, business and finance are mainly based on data and dealing with data requires specific skills and techniques such as computer programming in order to get full potential of most quantitative models. In this paper, we propose a coding oriented learning method based on Python Notebooks which is specifically designed for students of degrees in economics, business and finance. We follow a learning-by-doing strategy that encourages students to implement economic models as a suitable way to improve the understanding of fundamental concepts. As an illustrative example, we also describe a case study in which Python Notebooks are the key tool to teach cash management in a Master in Business Administration program. Since students of today are the decision-makers of tomorrow, a further advantage of the use of a programming language as a teaching tool is the possibility to connect theory to practice by enabling students to implement their own decision support tools.[ES] La evolución entre la relación entre los estudiantes (profesores) y la tecnología de la información, requiere nuevas herramientas para mejorar el aprendizaje (enseñanza) en las ciencias sociales. La economía, los negocios y las finanzas se basan principalmente en los datos y el tratamiento de los datos requiere habilidades y técnicas específicas, como la programación informática, para aprovechar al máximo el potencial de la mayoría de los modelos cuantitativos. En este documento, proponemos un método de aprendizaje orientado a la programación basado en Python Notebooks, que está diseñado específicamente para estudiantes de títulos en economía, negocios y finanzas. Nuestra estrategia de aprendizaje es eminentemente práctica motivando a los estudiantes a implementar modelos económicos como una forma adecuada de mejorar la comprensión de los conceptos fundamentales. Como ejemplo ilustrativo, también describimos un estudio de caso en el que Python Notebooks es la herramienta clave para enseñar gestión de efectivo en un programa de Máster en Administración de Empresas. Dado que los estudiantes de hoy son los que toman las decisiones del mañana, una ventaja adicional del uso de un lenguaje de programación como herramienta de enseñanza es la posibilidad de conectar la teoría con la práctica al permitir a los estudiantes implementar sus propias herramientas de apoyo a la decisión.Salas-Molina, F.; Pla-Santamaria, D. (2018). Coding oriented learning in economics,business and finance. Modelling in Science Education and Learning. 11(1):55-64. doi:10.4995/msel.2018.9152SWORD5564111da Costa Moraes, M. B., Nagano, M. S., and Sobreiro, V. A. (2015). Stochastic cash ow management models: A literature review since the 1980s. In Decision Models in Engineering and Management, pages 11-28. Springer International Publishing.DiSessa, A. A. (2001). Changing minds: Computers, learning, and literacy. Mit Press.Guzdial, M. (2010). Why is it so hard to learn to program? In Making software: What really works, and why we believe it, pages 111-121. O'Reilly Media, Inc.Ketcheson, D. I. (2014). Teaching numerical methods with iPython notebooks and inquiry-based learning. In Proceedings of the 13th Python in Science Conference. SciPy. org.Myers, G. J., Sandler, C., and Badgett, T. (2011). The art of software testing. John Wiley & Sons.Rossant, C. (2014). IPython interactive computing and visualization cookbook. Packt Publishing Ltd.VanderPlas, J. (2016). Python Data Science Handbook: Essential Tools for Working with Data. O'Reilly
API design for machine learning software: experiences from the scikit-learn project
Scikit-learn is an increasingly popular machine learning li- brary. Written
in Python, it is designed to be simple and efficient, accessible to
non-experts, and reusable in various contexts. In this paper, we present and
discuss our design choices for the application programming interface (API) of
the project. In particular, we describe the simple and elegant interface shared
by all learning and processing units in the library and then discuss its
advantages in terms of composition and reusability. The paper also comments on
implementation details specific to the Python ecosystem and analyzes obstacles
faced by users and developers of the library
Report on the Third Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE3)
This report records and discusses the Third Workshop on Sustainable Software
for Science: Practice and Experiences (WSSSPE3). The report includes a
description of the keynote presentation of the workshop, which served as an
overview of sustainable scientific software. It also summarizes a set of
lightning talks in which speakers highlighted to-the-point lessons and
challenges pertaining to sustaining scientific software. The final and main
contribution of the report is a summary of the discussions, future steps, and
future organization for a set of self-organized working groups on topics
including developing pathways to funding scientific software; constructing
useful common metrics for crediting software stakeholders; identifying
principles for sustainable software engineering design; reaching out to
research software organizations around the world; and building communities for
software sustainability. For each group, we include a point of contact and a
landing page that can be used by those who want to join that group's future
activities. The main challenge left by the workshop is to see if the groups
will execute these activities that they have scheduled, and how the WSSSPE
community can encourage this to happen
The Connectome Viewer Toolkit: An Open Source Framework to Manage, Analyze, and Visualize Connectomes
Advanced neuroinformatics tools are required for methods of connectome mapping, analysis, and visualization. The inherent multi-modality of connectome datasets poses new challenges for data organization, integration, and sharing. We have designed and implemented the Connectome Viewer Toolkit – a set of free and extensible open source neuroimaging tools written in Python. The key components of the toolkit are as follows: (1) The Connectome File Format is an XML-based container format to standardize multi-modal data integration and structured metadata annotation. (2) The Connectome File Format Library enables management and sharing of connectome files. (3) The Connectome Viewer is an integrated research and development environment for visualization and analysis of multi-modal connectome data. The Connectome Viewer's plugin architecture supports extensions with network analysis packages and an interactive scripting shell, to enable easy development and community contributions. Integration with tools from the scientific Python community allows the leveraging of numerous existing libraries for powerful connectome data mining, exploration, and comparison. We demonstrate the applicability of the Connectome Viewer Toolkit using Diffusion MRI datasets processed by the Connectome Mapper. The Connectome Viewer Toolkit is available from http://www.cmtk.org
An Introduction to Programming for Bioscientists: A Python-based Primer
Computing has revolutionized the biological sciences over the past several
decades, such that virtually all contemporary research in the biosciences
utilizes computer programs. The computational advances have come on many
fronts, spurred by fundamental developments in hardware, software, and
algorithms. These advances have influenced, and even engendered, a phenomenal
array of bioscience fields, including molecular evolution and bioinformatics;
genome-, proteome-, transcriptome- and metabolome-wide experimental studies;
structural genomics; and atomistic simulations of cellular-scale molecular
assemblies as large as ribosomes and intact viruses. In short, much of
post-genomic biology is increasingly becoming a form of computational biology.
The ability to design and write computer programs is among the most
indispensable skills that a modern researcher can cultivate. Python has become
a popular programming language in the biosciences, largely because (i) its
straightforward semantics and clean syntax make it a readily accessible first
language; (ii) it is expressive and well-suited to object-oriented programming,
as well as other modern paradigms; and (iii) the many available libraries and
third-party toolkits extend the functionality of the core language into
virtually every biological domain (sequence and structure analyses,
phylogenomics, workflow management systems, etc.). This primer offers a basic
introduction to coding, via Python, and it includes concrete examples and
exercises to illustrate the language's usage and capabilities; the main text
culminates with a final project in structural bioinformatics. A suite of
Supplemental Chapters is also provided. Starting with basic concepts, such as
that of a 'variable', the Chapters methodically advance the reader to the point
of writing a graphical user interface to compute the Hamming distance between
two DNA sequences.Comment: 65 pages total, including 45 pages text, 3 figures, 4 tables,
numerous exercises, and 19 pages of Supporting Information; currently in
press at PLOS Computational Biolog
- …