Python versus R Language

From Open Risk Manual

Python versus R Language: A side by side comparison

Motivation

A large component of risk management relies on data processing and quantitative tools. In turn, such information processing pipelines and numerical algorithms must be implemented in computer systems. Computing systems come in an extraordinary large variety but in recent years open source software finds increased adoption for diverse applications (machine learning, data science, artificial intelligence). In particular cloud computing environments are primarily based on open source projects at the systems level. This facilitates (but does not require) the use of open source computational tools such as python or R.

Objective

The Python versus R Language article is a side by side comparison of a wide range of aspects of the python and R language ecosystems.

The comparison aims:

  • To cover most common use cases that are relevant for the implementation of quantitative risk models (please provide feedback for additions)
  • Be useful for people that are at least somewhat familiar with programming (and optionally one or both of the two languages)
  • Be fact oriented (please provide feedback if you spot errors)


The comparison is not aimed to:

  • Be a detailed / comprehensive catalog of libraries (which count to thousands)
  • Cover use cases that are far removed from quantitative risk models
  • Be exhaustive (e.g identify all the possible computer systems one can run python or R or all the possible ways one can perform linear regression)

Disclaimers

This comparison attempted here is not entirely appropriate. Strictly speaking R is not a general programming language. R is a system for statistical computation and graphics. It consists of a sufficiently general language plus a run-time environment with graphics, a debugger, access to certain system functions, and the ability to run programs stored in script files.

Yet despite the disclaimer a comparison is justified because in very large domain of applications and use cases the two frameworks can be used interchangeably (or nearly so)

The comparison does absolutely not provide an assessment of which language is "better" as this is a meaningless question. The proper way to use the comparison is to start with objectives, knowledge level, use case and combining data points should provide you with sufficient information to decide what would be the best fit.

The comparison between Python and R also is not meant to suggest that an optimal choice of tool is always between these two. It is entirely possible (and not unusual) that for a particular use case the optimal tool is based on an another language.

Structure

The comparison data are provided in tabular format in several distinct tables. Each table documents a relevant language attribute, information for both languages and (where applicable) commentary.

History and Community

The objective of this section is to provide an overall comparison of the history of the two ecosystems, towards answering the question: who is really behind python and R?

Aspect Python R Comment
First Release 1991 1995 Both ecosystems have a long history of development with both receiving a lot of attention in the last few years
Initial Authors Guido van Rossum Ross Ihaka and Robert Gentleman
Current Stable Version 3.7 3.5
Current Governance Python Software Foundation (Non Profit) R Foundation (Non Profit)
Open Source License PSF License GNU General Public License
Size of Core Contributors TBD TBD Core Team Sizes are not public
Developer Communities pyLadies R-Ladies
Size of Developer Communities Third most popular in number of repositories and number of contributors Not in Top 10 (of developers) NB: R programmers might not necessarily self-identify as developers (but as data scientists, statisticians etc.)
Important Organizations Numfocus Bioconductor A large number of both commercial and non-profit organizations support both ecosystems explicitly and implicitly. This is a partial selection with focus on the applications relevant for risk management
Important Conferences pycon useR!
Important Journals The R Journal
Online Forums and Blogs Too many Too many Both ecosystems have very extensive numbers of blogs, forums etc. (with very varying level of quality)

Devices and Operating Systems

This section aims to answer the question: Where (as in what kind of device and operating system) can I use Python or R. It is not a how-to install Python or R in your system!

Aspect Python R Comment
Linux Desktop Comes Pre-installed apt-get install r-base Python is generally pre-installed as it is used by the Linux system itself
MacOS 2.7 version is pre-installed MacOS installer
Raspbian Pre-installed apt-get install r-base Linux is the operating system of choice for IoT devices, which means a basic python installation is generally available
Windows Windows installer Windows installer
Android / iOS Via python-for-android No Neither python or R are readily available on mobile devices
Cloud Servers As per Linux Desktop As per Linux Desktop Cloud servers typically run the Linux operating system and have python installations generally available

Package Management

This section aims to answer the question: How can I extend the Python or R functionality with existing libraries. The ease of installing packages is a very important aspect of the popularity of both and in marked contrast e.g. to languages like C++

Aspect Python R Comment
Discovery of Packages Online Search Required Built-in access to CRAN Most mature python packages are released on PyPi
Online repositories pipy, linux distributions, github packrat
Installation Done at OS level (pypi, setup, conda, pip, easy_install, apt) Bulti-in install.packages Python installation methods are quite varied (and have evolved over time) and can be either system wide (e.g. a linux distro package) or user specific
Dependency Management virtualenv packrat
Loading packages import statement library statement
Documentation of packages Online or local (readthedocs) Built-in

Language Characteristics

This section aims to answer the question: What does code in Python or R look like from a programming perspective? Many standard aspects of programming languages are available in both so are not included.

Aspect Python R Comment
Compiled / Interpreted Interpreted Interpreted Code can be executed interactively
Main Implementation Language C (CPython) C and Fortran This is the language used for the interpretation of a python or R script
Other Implementation Languages Java (Jython), RustPython etc pqR, Renjin, FastR etc Many alternative implementations

of the underlying interpreter exist for both languages

Strongly Typed No No In contrast with languages such as C++ or Java
Native Data Types Numbers, Strings, Lists, Tuples, Dictionaries Numeric, Int, Character, List, Vector, Logical (and the pairlist)
Object Oriented Yes Yes R has a variety of Object Oriented implementations with different design and functionalities, they are denoted S3, S4, R5 and R6 respectively
Code Structure Based on Intendation Free Style
Standard Libraries Extensive Built-in Functions Python has an extensive standard library as it covers a larger domain
Building Extensions Via bindings to other languages Via bindings to other languages See below under HPC for more specifics

Development Environment

This section aims to answer the question: How can I develop and test code / applications written in Python or R

Aspect Python R Comment
Free / Open Source IDE's spyder, netbeans, pycharm community, eclipse, visual studio code R Studio There are many other IDE's or advanced editors that are suitable for python because they support many programming languages via plugins
Commercial IDE's pycharm pro, komodo R Studio
Notebook Environment Jupyter R Markdown
Debugger pdb various builtin functions (browser, traceback, debug)
Testing tox, pytest, unittest runit
Writing Documentation sphinx roxygen2
Python/R Cross-Development and Integration r4intellij, rpy2 reticulate

Files, Databases and Data Manipulation

This section aims to answer the following questions: What direct connectors to disk files and databases are available for Python and R respectively. Once I have connected to a data source, how can I store and do preliminary work with imported data?

Aspect Python R Comment
General Data Wrangling pandas data.table, dplyr, tidyr, stringr The concept of a data frame has been a core aspect of R and pandas has emulated this in the python universe
Local File Loading Builti-in, Pandas Built-in
CSV Loading Pandas Built-in (read.csv), data.table
XLS Loading xlrd, openpyxl XLConnect, xlsx
Relational Database Connectors MySQLdb, psycopg2, sqlite3 RODBCext, RMySQL, RPostgresSQL, RSQLite
Graph Databases Connectors neo4j, pyarango neo4R
Object Relational Mapping SQLAlchemy, Django ORM
Advanced datetimes dateutil lubridate These provide extensions to built-in functionality

General Purpose Mathematical Libraries

This section aims to answer the question: What basic building blocks are available for undertaking quantitative work in Python and R respectively?

Aspect Python R Comment
General Purpose vectors and n-dimensional arrays (as storage) numpy Built-in array The R system comes with many basic functionalities available built-in
Linear Algrebra (matrix operations) numpy.linalg Matrix, RcppArmadillo, RcppEigen For specialized operations (large / sparse matrices see below in HPC)
Mathematical (Special) Functions such as Gamma, Beta, Bessel scipy Built-in functions The R system comes with many basic functionalities available built-in
Random Number Generation Built-in, numpy.random Built-in functions This is about generic random numbers. More specialized

applications mentioned below

Statistics Libraries

This section aims to answer the question: What libraries are available for undertaking standard statistical studies in Python or R? There is a huge number of packages / modules with significant duplication / overlap, especially for the R system, hence only the major / indicative ones are considered.

Aspect Python R Comment
Basic Statistical Analysis (descriptive statistics, moments) scipy.stats, statsmodels car, caret
ANOVA scipy.stats, statsmodels car, caret
Regression Analysis scikit-learn, statsmodels glmnet
Survival Analysis lifelines survival, OIsurv

Econometrics Libraries

This section aims to answer the question: What libraries are available for undertaking econometric (timeseries) studies in Python or R?

Aspect Python R Comment
Basic Econometric Analysis (stationarity, trends,

seasonality) || statsmodels.tsa || Built-in ts ||

ARMA Processes statsmodels.tsa auto, forecast
Vector Auto Regressions (VAR) statsmodels.tsa vars
Heteroskedastic (GARCH) processes statsmodels, arch timeseries, zoo, vars

Machine Learning Libraries

This section aims to answer the question: What libraries are available for machine learning projects in Python or R? The term machine learning is not too specific so we use this category to group various advanced / specialized libraries (of use in quantitative risk management). NB: Machine learning algorithms are typically compute intensive and are thus implemented in system languages with eventual binding and API provided to python or R environments

Aspect Python R Comment
Network Analysis networkx igraph, sna
Random Forests scikit-learn randomForest
Boosting scikit-learn XGBoost
Probabilistic Graphical Models pgmpy bnlearn, gRain
Neural Networks tensorflow, pytorch h2o, MXNet

Visualization

This section aims to answer the question: What functionality is available to produce data driven visualization in Python or R?

Aspect Python R Comment
Low level API matplotlib
Graph packages seaborn, plotly, bokeh ggplot2
Declarative Visualization Altair
XKCD style plots Available! Available!

Web, Desktop and Mobile Deployment

This section aims to answer the question: What tools does each language ecosystem provide for the deployment of applications, whether this is via the web, desktop or mobile apps

Aspect Python R Comment
Native Webservers Tornado OpenCPU
Classic Web Frameworks Flask, Pyramid, Django R Shiny Python web frameworks typically used behind a production web server (Apache, Nginx etc.)
Web Formats xml (builtin), json XML, jsonlite
Web Sockets websockets
Client Side (Browser) Brython, RustPython
Mobile Apps Kivy

High Performance Computing

For our purposes high perfomance computing (HPC) is any use case that requires more than a single CPU and its own memory. This section aims to answer the question: what are my options if I have performance bottlenecks in terms of CPU, memory or disk

Aspect Python R Comment
Bindings to C/C++ Cython, pybind11 Rcpp Both languges are slow compared to lower level / compliled languages. A common approach to make full use of existing CPU is to extend the language via bindings to a faster language
Bindings to other languages (Java, Rust) py4j, pyO3 renjin
Multithreading thread foreach
Multi-core multiprocessing parallel
Spark interface pySpark SparkR, sparklyr
GPU Computing pyCUDA R GPU Offered also built-in in some packages (e.g pytorch, tensorflow)
Distributed Data dask multidplyr

Contributors to this article

» Wiki admin