Overview of the Julia-Python-R Universe

From Open Risk Manual
(Redirected from Python versus R Language)

Overview of the Julia-Python-R Universe

A side-by-side review of the main open source ecosystems supporting the Data Science domain: Julia, Python, R, sometimes abbreviated as Jupyter.

Click on the links to jumpt directly to the corresponding section
General Development Algorithms & Datascience
History and Community Development Environment General Purpose Mathematical Libraries
Devices and Operating Systems Files, Databases and Data Manipulation Core Statistics Libraries
Package Management Web, Desktop and Mobile Deployment Econometrics / Timeseries Libraries
Package Documentation Semantic Web / Semantic Data Machine Learning Libraries
Language Characteristics High Performance Computing GeoSpatial Libraries
Using R, Python and Julia together Visualization

Motivation

A large component of Quantitative Risk Management relies on data processing and quantitative tools (aka Data Science). In recent years open source software targeting Data Science finds increased adoption in diverse applications. The Overview of the Julia-Python-R Universe article is a side by side comparison of a wide range of aspects of Python, Julia and R language ecosystems. The comparison of the three ecosystems aims:

  • To be useful for people that are somewhat familiar with programming and want to inspect options and use the most appropriate tool
  • To promote interoperability, cross-validation and overall best-practices
  • To be factual as much as possible without drifting to judgement / opinions
  • To cover use cases relevant for the implementation of quantitative risk models


The comparison does not aim:

  • To be a detailed / comprehensive catalog of all available libraries (which count to many thousands!)
  • To cover use cases very removed from quantitative risk models
  • To be totally exhaustive (e.g to identify all the possible computer systems one can run a Python interpreter on, or count all the possible ways one can perform linear regression in R)

Disclaimers

The comparison does absolutely not provide an assessment of which system is "better". The proper way to use the comparison is to start with one's objectives, knowledge level, use case.

The comparison attempted here is not entirely appropriate as the three systems have quite different origins and architectural design choices. For example, strictly speaking R is not a general programming language. R is a system for statistical computation and graphics. It consists of a sufficiently general language plus a run-time environment with graphics, a debugger, access to certain system functions, and the ability to run programs stored in script files. Yet despite the disclaimer a comparison is justified because in very large domain of applications and use cases the three frameworks can be used interchangeably (or nearly so)

Structure

The comparison data are provided in tabular format in several distinct tables. Each table documents a relevant language or ecosystem subdomain. The number and focus areas of the different table are somewhat arbitrary and may expand in the future. The order is roughly from more generic aspects towards more specialized / advanced areas, concluding with interoperatibility.

Each table entry (row) highlights key functionality within the subdomain. The language columns point to information or packages and (where applicable) there is commentary. Reference links are included when useful.

At the bottom of some tables there is a row indicated Package Review. This row has a collection of links to the CRAN Task Reviews that aim to summarize the large number of R packages available for some data science tasks. There are also links to a mirror effort to create Python Task Views (this content is still WIP - contributors welcome, see below)

Getting Involved

You can provide simple and anonymous feedback on the wiki version of the overview using the feedback button at the bottom of the page. Alternatively you can become an Open Risk Manual author and actively edit the page. If you are more comfortable using github / markdown, there is a mirror page available here. Please note that the tables are in html format as they are generated automatically.

People interested in developing the Python Task Views can do so via the github repo.

History and Community

The objective of this section is to provide an overall comparison of the history of the two ecosystems, towards answering the question: who is really behind Python, R and Julia?

Aspect Python R Julia Comment
First Release 1991 1995 2009 Both the Python and R ecosystems have a long history of development and both received a lot of attention in the last few years as open source data science became more widerspread. Julia is relatively more recent
Initial Authors Guido van Rossum Ross Ihaka and Robert Gentleman Jeff Bezanson, Stefan Karpinski, Viral B. Shah, and Alan Edelman
Current Stable Version 3.7 3.5 1.2 Check here for Python, Check here for R, Check here for Julia
Current Governance Python Software Foundation (Non Profit) R Foundation (Non Profit) Julia Governance Overview
Open Source License PSF License GNU General Public License MIT License
Size of Core Contributors 2-90 depending on definition 20 Python Core Team Size is difficult to establish (e.g. full-time / part-time, activity level) and there is no single authoritative source, Similarly for Julia
Size of Broader Developer Communities Third most popular in number of repositories and number of contributors Not in Top 10 of community size Not in Top 10 of community size Note: R programmers might not necessarily self-identify as developers (but as data scientists, statisticians etc.)
Developer Associations UK Python Association, pyLadies R-Ladies Formally organized associations promoting Python, R or Julia
Important Non-Profit Sponsors Numfocus Bioconductor Numfocus A number non-profit organizations support these open source ecosystems explicitly or implicitly
Important Corporate Sponsors Diverse Diverse Julia Computing, Inc. Commercial sponsors may be supporting these ecosystems explicitly or implicitly
Important Conferences pycon, europython useR! Juliacon
Important Journals The R Journal Journal of Open Source Software, Papers with Code covering all three systems
IRC Channels #python #julia
Reddit Python subreddit, 428k members R Stats subreddit, 30k members Julia subreddit, 8k members Data Science subreddit (discussing Python, R and Julia topics)
Online Forums and Blogs Too many Too many The Python and R ecosystems have an extensive numbers of blogs, forums etc. (with varying level of quality)

Devices and Operating Systems

This section aims to answer the question: Where (as in what kind of device and operating system) can I use Python, R or Julia? NB: This is not a how-to install Python or R in your system!, just an overview of what is available where.

Aspect Python R Julia Comment
Linux Desktop Comes pre-installed apt-get install r-base apt-get install julia / Linux installer file Python is generally pre-installed as it is used by the Linux system itself. Different distributions may include different (potentially very old) versions of the three languages.
Windows Windows installer Windows installer Windows installer All three languages are available for both Windows 7 and Windows 10 and 32 bit / 64 bit.
MacOS 2.7 version is pre-installed MacOS installer MacOS installer file
Raspbian Pre-installed apt-get install r-base apt-get install julia Linux is the operating system of choice for IoT devices, which means a basic Python installation is generally available
Android / iOS Via python-for-android No No Python, R or Julia are not readily integrated on mobile devices (see also Deployment entry). Check Termux for an alternative option
iOS No No No
Cloud Servers As per Linux Desktop above As per Linux Desktop above As per Linux Desktop above Cloud servers typically run the Linux operating system and have Python installations available

Package Management

This section aims to answer the question: How can I extend the Python, R or Julia functionality with existing libraries. The ease of finding and installing packages is a very important aspect of the popularity of both and in marked contrast e.g. to languages like C++

Aspect Python R Julia Comment
Discovery of Packages Online Search, Built-in PyCharm access to PyPI R-Studio Built-in access to CRAN Julia Docs, Julia Observer Python packages are released on PyPI, R packages are released on CRAN
Number of Packages (Oct 2019) 199,816 15102 ~2496 Check here for the latest count: Python, R, Julia
Online Repositories PyPI, via linux distributions CRAN github, gitlab, bitbucket etc are used for releasing Python, R and Julia for open source packages online, coordination of development and other community support
Package Installation Done at OS level (PyPI, setup, conda, pip, easy_install, apt) Built-in install.packages Built-in Pkg package manager Python installation methods are quite varied (and have evolved over time) and can be either system wide (e.g. a linux distro package) or user specific
Dependency Management pip, virtualenv packrat Federated package management virtualenv enables using isolated Python distributions and package collections within the same system. Julia uses project environments
Loading Packages import statement library statement import / using statements

Package Documentation

This section aims to answer the question: How can I document a Python, R or Julia module? The ease and quality of documentation is an important factor in adoption and efficient use of a language as it both helps beginners learn new functionality and experienced users ensure better quality work

Aspect Python R Julia Comment
Source level documentation Built-in docstrings Docstrings docstrings
Formats markdown, reStructuredText markdown, latex Markdown R packages in CRAN include References Manuals (PDF, typically from latex)
Documentation Generator sphinx roxygen2 Documenter
Online documentation readthedocs CRAN, bookdown Julia Docs

Language Characteristics

This section aims to answer the question: What does code in Python, R or Julia look like from a programming perspective? Many standard aspects of programming languages are available in all three systems so are not included.

Aspect Python R Julia Comment
Compiled / Interpreted Interpreted Interpreted Compiled Just-in-time (JIT) Julia code can be executed interactively
Main Implementation Language C (CPython) C and Fortran Julia This is the language used for the interpretation of a Python or R script. Julia is written in Julia
Other Implementation Languages Java (Jython), RustPython etc pqR, Renjin, FastR etc Many alternative implementations of the underlying interpreter exist for both Python and R. A new approach available for Python and Julia is to compile to Webassembly for native execution in the browser: Python/Pyodide, Julia/Charlotte
Type System Dynamic (Duck) Typing Dynamic Dynamic (Duck) Typing All three systems have essentially dynamic type systems (in contrast with languages such as C++, Java or Rust)
Primitive Data Types Numbers (Integers, Float), Strings, Boolean Numeric, Int, Character, Logical (and the pairlist) Numbers, Char, Bool Double precision is standard in all systems. Higher precision is only via libraries. Julia has a native 128 bit integer type.
Native Data Structures List, Tuple, Dict List, Vector, Data Frame, Factor Tuple, Dict, Set, Array, Vector, Matrix and more
Object Oriented Yes Yes Selective R has a variety of Object Oriented implementations with different design and functionalities, they are denoted S3, S4, R5 and R6 respectively, Julia implements select OO aspects via the Struct composite type
Code Structure Based on Indentation Free Style Free Style
Standard Libraries Extensive Built-in Functions Base Python has an extensive standard library as it covers a larger CS domain, In contrast R and Julia have a more extensive set of data science oriented features included by default
Building Packages / Extensions Modules, Via bindings to C/C++ Creating R packages Julia Packages See below under HPC for more specific options

Development Environment

This section aims to answer the question: How can I develop and test code / applications written in Python, R or Julia?

Aspect Python R Julia Comment
Open Source IDE's spyder, netbeans, eclipse, visual studio code R Studio, RTVS Juno There are many other IDE's or advanced editors (Vim, Emacs etc.) that support programming languages via plugins. The degree of support varies (from syntax highlighting to supporting complete workflows within the IDE/editor)
Commercial IDE's with Community Version pycharm community / pro, komodo R Studio Intellij + Julia Plugin Here we list closed source IDE's with free, or commercial versions
Notebooks / Literate Programming Jupyter, pweave Jupyter, R Markdown, swave, knitr Jupyter, Weave.jl, Literate.jl Jupyter stands for Julia-Python-R Language!
Debugger pdb various built-in functions (browser, traceback, debug) Debugger.jl
Testing tox, pytest, unittest runit, testthat, assertthat Base.test (R testthat is for typical unit tests, R assertthat is to declare the pre and post conditions that code should satisfy)
Package Reviews Reproducibility Task Views Reproducible Research Jupyter is available for all three systems

Files, Databases and Data Manipulation

This section aims to answer the following questions: What direct connectors to files stored on disk or data stored in databases are available for Python, R and Julia? Further, once we have connected to a data source, how can we fetch, store in memory and do preliminary work with the imported data?

Aspect Python R Julia Comment
Loading Local Files Builti-in, Pandas Built-in Built-in General file input from local directories is built-in in all systems
CSV Loading Pandas Built-in (read.csv), data.table, readr CSV.jl
XLS/ODF Loading xlrd, openpyxl XLConnect, xlsx OdsIO.jl
Hiearchical Data Formats (HDF) h5py, pandas.read_hdf rhdf5 HDF5.jl
URL Requests requests, PycURL data.table, rCurl HTTP.jl The Julia package is still new and not tested in production systems
Relational Database Connectors MySQLdb, psycopg2, sqlite3 RODBC / RODBCExt, RMySQL, RPostgresSQL, RSQLite MySQL.jl, PostgreSQL.jl, SQLite.jl
Graph Databases Connectors neo4j, pyArango neo4R Neo4j.jl
Object Relational Mapping SQLAlchemy, Django ORM
General Data Wrangling pandas Built-in data.table, (dplyr, tidyr, stringr, part of the tidyverse) DataFrames.jl The concept of a data frame has been a core aspect of R and pandas has emulated this in Python, DataFrame in Julia
Missing Data Pandas functionality, sklearn.impute Amelia and many others Impute.jl
Advanced datetime handling dateutil lubridate These packages provide datetime specific extensions to built-in functionality
Package Reviews Databases Task Views Databases, Missing Data

General Purpose Mathematical Libraries

This section aims to answer the question: What building blocks are available for undertaking basic quantitative (numerical) work in Python, R and Julia respectively? NB: The division of what is core mathematics and what is a specialized domain is a bit arbitrary.

Aspect Python R Julia Comment
General Purpose vectors and n-dimensional arrays (as storage) numpy Built-in array The R system comes with many basic array functionalities available built-in
Numerical Linear Algebra (matrix operations) numpy.linalg Matrix, RcppArmadillo, RcppEigen Built-in support (LinearAlgebra.Basic), StaticArrays, BandedMatrices, IterativeSolvers For specialized operations (large / sparse matrices see below in HPC), eigenpy and pybind11 provide alternative means to use C++ numerical linear algebra in Python
Mathematical (Special) Functions such as Gamma, Beta, Bessel scipy Built-in functions SpecialFunctions.jl The R system comes with many basic functionalities available built-in
Random Number Generation Built-in, numpy.random Built-in functions Built-in (Random.Random) This entry is about generic random numbers. More specialized applications mentioned below
Mathematical Optimisation JuMP
Symbolic Algebra sympy Symata
Curve Fitting scipy.optimize, numpy.polyfit Built-in ApproxFun
Package Reviews Mathematics Task Views Numerical Mathematics, Optimization

Core Statistics Libraries

This section aims to answer the question: What libraries are available for undertaking standard statistical studies in Python, R or Julia? There is a large number of packages / modules with significant duplication / overlap, especially for the R system, hence only the major / indicative ones are considered.

Aspect Python R Julia Comment
Exploratory Data Analysis (descriptive statistics, moments, etc) pandas.describe, pandas profiling, scipy.stats, statsmodels Base R (stats), car, caret, dplyr describe(DataFrame) EDA is quite broad and loosely defined. Here we take a fairly narrow view that

remains as much as possible non-parametric and model-agnostic

Correlation pandas.corr, numpy.corrcoef Built-in (cor) Built-in (cor)
ANOVA scipy.stats, statsmodels Built-in (aov, anova), car, caret ANOVA.jl
Linear Regression Analysis scikit-learn, statsmodels Built-in Regression.jl
Generalized Linear Regression scikit-learn, statsmodels Built-in glmnet Regression.jl This category includes logistic regression (which is available in many R packages), multinomial regression etc.
Survival Analysis lifelines survival Survival.jl
Gaussian Processes GPy GauPro, GPfit, kergp, mlegp GaussianProcesses.jl
Package Reviews Statistics Task Views, Regression Methods Task Views Probability Distributions, Multivariate Statistics, Extreme Value Analysis, Robust Statistical Methods, Survival Analysis

Econometrics / Timeseries Libraries

This section aims to answer the question: What libraries are available for undertaking econometric / timeseries studies in Python, R or Julia?

Aspect Python R Julia Comment
Basic Econometric Analysis (stationarity, trends, seasonality) statsmodels.tsa Built-in (ts) TimeSeries.jl, Econometrics.jl
ARMA Processes / Univariate Models statsmodels.tsa, pmdarima auto, forecast, tseries ARCHModels.jl
Heteroskedastic (GARCH) processes statsmodels, arch tseries, zoo, vars ARCHModels.jl
Vector Auto Regressions (VAR) statsmodels.tsa mts, vars VectorAutoregressions.jl (WIP)
General Timeseries pflux, prophet prophet (R API) TimeSeries.jl
Frequency Domain Analysis numpy.fft Built-in (spectrum)
Package Reviews Econometrics Task Views Econometrics, Time Series Analysis

Machine Learning Libraries

This section aims to answer the question: What libraries are available for machine learning projects in Python, R or Julia? The term machine learning is not too specific so we use this category to group various advanced / specialized libraries that are relevant for data science (but not e.g. computer vision and other specialized ML applications). NB: Machine learning algorithms are typically compute intensive and are thus implemented in system languages with eventual binding and API provided to Python or R environments

Aspect Python R Julia Comment
Network Analysis networkx igraph, sna LightGraphs.jl
Cluster Analysis (Unsupervised Learning) scikit-learn cluster Clustering.jl K-means and other clustering algorithms
Random Forests scikit-learn randomForest, ranger DecisionTree.jl
Gradient Boosting scikit-learn XGBoost Interface XGBoost.jl Interface
Probabilistic Graphical Models pgmpy bnlearn, gRain PGM.jl
Neural Networks tensorflow, pytorch, keras, Interface to MXNet Interface to h2o, Interface to MXNet, Interface to keras Flux, MLJ, Knet R studio offers an interface to tensorflow
Package Review Machine Learning Task Views Bayesian Inference, Cluster Analysis & Finite Mixture Models, Machine Learning, Graphical Models

GeoSpatial Libraries

This section aims to answer the question: What libraries are available for working with GIS / geospatial data in Python, R or Julia? The geospatial package space is particularly fragmented, the selection focuses on some key anchor concepts.

Aspect Python R Julia Comment
Geo Data Structures GeoPandas.GeoSeries, GeoPandas.GeoDataFrame raster, sp, sf, stars
GDAL gdal rgdal GDAL.jl
GeoJSON geojson geojson, rgdal GeoJSON
PostGIS geojson rpostgis GeoJSON
GeoMaping CartoPy, Descartes gmt GMT
OpenStreetMap openstreetmap OpenStreetMap OpenStreetMap.jl
Spatial Statistics pysal gstat, geoR, geoRglm R has a large number of specialized spatial statistics packages (see Task Views)
Spatial Econometrics pysal.spreg
Package Review Geospatial Task Views Spatial Data, Handling and Analyzing Spatio-Temporal Data

Visualization

This section aims to answer the question: What functionality is available to produce data driven visualization in Python, R or Julia?

Aspect Python R Julia Comment
Low level API's matplotlib grid, gridExtra Plots.jl
Graph packages seaborn, plotly, bokeh ggplot2 Gadfly.jl
Declarative Visualizations Altair Vega.jl
XKCD style plots :-) Available! Available!
Package Review Visualization Task Views Graphic Displays & Visualization

Web, Desktop and Mobile Deployment

This section aims to answer the question: What tools does each language ecosystem provide for the deployment of data based applications, whether this is via the web, desktop or mobile apps.

Aspect Python R Julia Comment
Native Webservers Tornado, Gunicorn, CherryPy, Twisted OpenCPU, plumber HTTP.jl As a general remark these native servers are not exposed directly in production but are fronted by e.g. apache httpd and nginx servers
Classic Web Frameworks Flask, Pyramid, Django R Shiny, rApache Genie.jl Web frameworks typically used behind a production web server (Apache, Nginx etc.)
Web Formats xml, json (built-in) XML, rjson, jsonlite JSON.jl
Web Sockets websockets WebSockets.jl WebSocket connection allows full-duplex communication between a client and server so that either side can push data to the other through an established connection
Client Side (Browser) Brython, RustPython, Pyodide
Mobile Apps Kivy, Beeware Both kivy and beeware allow cross-platform app development.
Package Review Web Task Views Model Deployment, Web Technologies

Semantic Web / Semantic Data

This section aims to answer the question: What tools and libraries are available for working with semantic data (RDF, OWL, JSON-LD etc) and other relevant domain specific metadata schemas?

Aspect Python R Julia Comment
RDF Format rdflib rrdf
JSON-LD Format rdflib.jsonld JSON-LD is an alternative web-friendly serialization format for RDF
OWL Ontologies ontospy, owlready2
Querying RDF (SPARQL) rdflib Rredland
Serving RDF (SPARQL) rdflib
SDMX Format pandasdmx rsdmx SDMX is the statistical data and metadata exchange format
Package Review Semantic Data Task View

High Performance Computing

For our purposes high performance computing (HPC) is any use case that requires more than a single CPU (and its own RAM or disk). This section aims to answer the question: what are my options if I have performance bottlenecks in terms of CPU, memory or disk, hence covering topics such as concurrency or GPU computing. NB: Julia aims to address performance issues through compilation and other design choices

Aspect Python R Julia Comment
Bindings to C/C++ Cython, pybind11 Rcpp Cxx.jl Native Python, R are slow compared to lower level / compiled languages. A common approach to make full use of existing CPU is to extend the language via bindings to a faster language. Bindings might also be useful to re-use existing libraries
Bindings to Java py4j, pyO3 renjin JavaCall.jl
Bindings to other performing languages (Rust etc) pyO3
Coroutines Built-in (async/await, since Python 3.5) Built-in (Tasks/Channels)
Multi-threading Built-in (thread) foreach Built-in (Base.Threads) (Experimental)
Multi-core multiprocessing doParallel, future Built-in (Distributed)
Spark interface pySpark SparkR, sparklyr Spark.jl
GPU Computing pyCUDA gpuR CUDAnative.jl GPU interfaces are offered also via some ML packages (e.g pytorch, tensorflow, MXnet.jl)
Distributed Data dask multidplyr JuliaDB.jl
Package Review HPC Task Views High-Performance and Parallel Computing

Using R, Python and Julia together

The section aims to answer the question: How can I use R from Python, Python from Julia, Julia from R and vice versa :-). The first rows of this table have the From/To Format (From X Call Y) for native integration between the three systems, where "Native" means that the integration is done using language bindings within the respective interpreters / REPL (not explicitly using the operating system or a server API)

Aspect Call Python Call R Call Julia Comment
From Python rpy2 pyjulia
From R PythonInR, rPython XRJulia
From Julia PyCall.jl RCall.jl
Python/R Cross-Development and Integration r4intellij, rpy2 reticulate
Via Server API's Rserve
Via OS / Shell Scripts Built-in (subprocess) Built-in (system2) Built-in (Base.run)

Contributors to this article

» Wiki admin