Overview of the Julia-Python-R Universe
Overview of the Julia-Python-R Universe
A side-by-side review of the main open source ecosystems supporting the Data Science domain: Julia, Python, R, sometimes abbreviated as Jupyter.
If you want to contribute anonymously to the review, simply click on the feedback form. If you are already on github, you can raise and issue there in a dedicated repo.
Categories and Segmentation
General | Development & Frameworks | Algorithms & Data Science |
---|---|---|
History and Community | Development Environment | General Purpose Mathematical Libraries |
Devices and Operating Systems | Files, Databases and Data Manipulation | Core Statistics Libraries |
Package Management | Web, Desktop and Mobile Deployment | Econometrics / Timeseries Libraries |
Package Documentation | Semantic Web / Semantic Data | Machine Learning Libraries |
Language Characteristics | High Performance Computing | GeoSpatial Libraries |
Bindings to Other Languages | Using R, Python and Julia together | Visualization |
Workflow Management | Data Quality and Data Validation | |
Privacy-Preserving Computation | Stochastic Processes |
NB: Links are preferentially to official project pages and (if that is missing) to code repositories. Further discussion at the bottom of the page
History and Community
The objective of this section is to provide an overall comparison of the history of the three data science ecosystems. We are tracking people, organizations, communities, projects etc. with the aim to answer the singular question: who is keeps Python, R or Julia alive?
Aspect | Python | R | Julia | Comment |
---|---|---|---|---|
First Release | 1991 | 1995 | 2009 | Both the Python and R ecosystems have a long history of development and both received a lot of attention in the last few years as open source data science became more widerspread. Julia is relatively more recent |
Initial Authors | Guido van Rossum | Ross Ihaka and Robert Gentleman | Jeff Bezanson, Stefan Karpinski, Viral B. Shah, and Alan Edelman | |
Current Stable Version | 3.11.2 | 4.2.3 | 1.8.5 | Check here for Python, Check here for R, Check here for Julia |
Current Governance | Python Software Foundation (Non Profit) | R Foundation (Non Profit) | Julia Governance Overview | |
Open Source License | PSF License | GNU General Public License | MIT License | |
Size of Core Contributors | 2-90 depending on definition | 20 | 36 | Python Core Team Size is difficult to establish (e.g. full-time / part-time, activity level) and there is no single authoritative source, Similarly for Julia (active / dormant) |
Size of Broader Developer Communities | Second most popular in number of github repositories and number of contributors | Not in the Top 10 of programming languages in terms of community size | Not in the Top 10 of programming languages in terms of community size | Note: R programmers might not necessarily self-identify as developers (but as data scientists, statisticians etc.) |
Developer Associations | UK Python Association, pyLadies | R-Ladies | Formally organized associations promoting Python, R or Julia | |
Important Non-Profit Sponsors | Numfocus | Bioconductor | Numfocus | A number non-profit organizations support these open source ecosystems explicitly or implicitly |
Important Corporate Sponsors | Diverse | Diverse | Julia Computing, Inc. | Commercial sponsors may be supporting these ecosystems explicitly or implicitly |
Important Conferences | pycon, europython, DjangoCon, EuroSciPy | useR!, DSC | Juliacon | |
Important Journals | The R Journal | Journal of Open Source Software, Papers with Code covering all three systems | ||
IRC Channels | #python | #julia | Note: Freenode to Matrix Migration | |
Python subreddit, 1.1m members | R Stats subreddit, 71.2k members | Julia subreddit, 22.4kmembers | Data Science subreddit discussing Python, R and Julia topics | |
Online Forums and Blogs | Too many | Too many | Growing | The Python and R ecosystems have an extensive numbers of blogs, forums etc. (with varying level of quality) |
Podcasts | Real Python, Talk Python |
Devices and Operating Systems
This section aims to answer the question: Where (as in what kind of device and operating system) can I use Python, R or Julia? NB: This is not a manual of how-to install Python or R in your system!. It is just an overview of what ecosystem is available for which platforms.
Aspect | Python | R | Julia | Comment |
---|---|---|---|---|
Linux Desktop | Comes pre-installed | apt-get install r-base | apt-get install julia / Linux installer file | Python is generally pre-installed as it is used by the Linux system itself. Different distributions may include different (potentially very old) versions of the three languages. |
Windows | Windows installer | Windows installer | Windows installer | All three languages are available for both Windows 7 and Windows 10 and 32 bit / 64 bit. |
MacOS | 2.7 version is pre-installed | MacOS installer | MacOS installer file | |
Raspbian | Pre-installed | apt-get install r-base | apt-get install julia | Linux is the operating system of choice for IoT devices, which means a basic Python installation is generally available |
Android / iOS | Via python-for-android | No | No | Python, R or Julia are not readily integrated on mobile devices (see also Deployment entry). Check Termux for an alternative option |
iOS | No | No | No | |
Cloud Servers | As per Linux Desktop above | As per Linux Desktop above | As per Linux Desktop above | Cloud servers typically run the Linux operating system and thus have Python installations available |
Package Management
This section aims to answer the question: How can I extend the Python, R or Julia functionality with existing libraries. The ease of finding and installing packages is a very important aspect of the popularity of these ecosystems and in marked contrast e.g. with language ecosystems like C++ that only recently started developing public repositories.
Aspect | Python | R | Julia | Comment |
---|---|---|---|---|
Discovery of Packages | Online Search, Built-in PyCharm access to PyPI | R-Studio Built-in access to CRAN | Julia Docs, Julia Observer | Python packages are released on PyPI, R packages are released on CRAN |
Number of Packages (Jun 2020) | 443,373 | 19354 | 9191 | Check here for the latest count: Python, R, Julia. Obviously comparing package number count across these different universes comes with many caveats: the conventions about what is a complete "package", quality controls etc are not harmonized. |
Online Repositories | PyPI, via linux distributions | CRAN | juliapackages, juliahub | github, gitlab, bitbucket etc are also used for releasing Python, R and Julia for open source packages online, coordination of development and other community support |
Package Installation | Done at OS level (PyPI, setup, conda, pip, easy_install, apt) | Built-in install.packages | Built-in Pkg package manager | Python installation methods are quite varied (and have evolved over time) and can be either system wide (e.g. a linux distro package) or user specific |
Dependency Management | pip, virtualenv, poetry | packrat | Federated package management | virtualenv enables using isolated Python distributions and package collections within the same system. Julia uses project environments |
Loading Packages | import statement | library statement | import / using statements |
Package Documentation
This section aims to answer the question: How can I document a Python, R or Julia module? The ease and quality of documentation is an important factor in adoption and efficient use of a language as it both helps beginners learn new functionality and experienced users ensure better quality work
Aspect | Python | R | Julia | Comment |
---|---|---|---|---|
Source level documentation | Built-in docstrings | Docstrings | docstrings | |
Formats | markdown, reStructuredText | markdown, latex | Markdown | R packages in CRAN include References Manuals (PDF, typically from latex) |
Documentation Generator | sphinx | roxygen2 | Documenter | |
Online documentation | readthedocs | CRAN, bookdown | Julia Docs |
Language Characteristics
This section aims to answer the question: What does code in Python, R or Julia look like from a programming perspective? Many standard aspects of programming languages are available in all three systems so are not included.
Aspect | Python | R | Julia | Comment |
---|---|---|---|---|
Compiled / Interpreted | Interpreted | Interpreted | Compiled Just-in-time (JIT) | Julia code can be executed interactively |
Main Implementation Language | C (CPython) | C and Fortran | Julia | This is the language used for the interpretation of a Python or R script. Julia is written in Julia |
Other Implementation Languages | Java (Jython), RustPython etc | pqR, Renjin, FastR etc | Many alternative implementations of the underlying interpreter exist for both Python and R. A new approach available for Python and Julia is to compile to Webassembly for native execution in the browser: Python/Pyodide, Julia/Charlotte | |
Type System | Dynamic (Duck) Typing | Dynamic | Dynamic (Duck) Typing | All three systems have essentially dynamic type systems (in contrast with languages such as C++, Java or Rust) |
Primitive Data Types | Numbers (Integers, Float), Strings, Boolean | Numeric, Int, Character, Logical (and the pairlist) | Numbers, Char, Bool | Double precision is standard in all systems. Higher precision is only via libraries. Julia has a native 128 bit integer type. |
Native Data Structures | List, Tuple, Dict | List, Vector, Data Frame, Factor | Tuple, Dict, Set, Array, Vector, Matrix and more | |
Object Oriented | Yes | Yes | Selective | R has a variety of Object Oriented implementations with different design and functionalities, they are denoted S3, S4, R5 and R6 respectively, Julia implements select OO aspects via the Struct composite type |
Code Structure | Based on Indentation | Free Style | Free Style | |
Standard Libraries | Extensive | Built-in Functions | Base | Python has an extensive standard library as it covers a larger CS domain, In contrast R and Julia have a more extensive set of data science oriented features included by default |
Building Packages / Extensions | Modules, Via bindings to C/C++ | Creating R packages | Julia Packages | See below under HPC for more specific options |
Development Environment
This section aims to answer the question: How can I develop and test code / applications written in Python, R or Julia?
Aspect | Python | R | Julia | Comment |
---|---|---|---|---|
Open Source IDE's | spyder, netbeans, eclipse, visual studio code | R Studio, RTVS | Juno | There are many other IDE's or advanced editors (Vim, Emacs etc.) that support programming languages via plugins. The degree of support varies (from syntax highlighting to supporting complete workflows within the IDE/editor) |
Commercial IDE's with Community Version | pycharm community / pro, komodo | R Studio | Intellij + Julia Plugin | Here we list closed source IDE's with free, or commercial versions |
Notebooks / Literate Programming | Jupyter, pweave | Jupyter, R Markdown, swave, knitr | Jupyter, Weave.jl, Literate.jl, Pluto.jl | Jupyter stands for Julia-Python-R Language! |
Debugger | pdb | various built-in functions (browser, traceback, debug) | Debugger.jl | |
Testing | tox, pytest, unittest | runit, testthat, assertthat | Base.test | (R testthat is for typical unit tests, R assertthat is to declare the pre and post conditions that code should satisfy) |
Package Reviews | Reproducibility Task Views | Reproducible Research | Jupyter is available for all three systems |
Files, Databases and Data Manipulation
This section aims to answer the following questions: What direct connectors to files stored on disk or data stored in databases are available for Python, R and Julia? Further, once we have connected to a data source, how can we fetch, store in memory and do preliminary work with the imported data?
Aspect | Python | R | Julia | Comment |
---|---|---|---|---|
Loading Local Files | Builti-in, Pandas | Built-in | Built-in | General file input from local directories is built-in in all systems |
CSV Loading | Pandas | Built-in (read.csv), data.table, readr | CSV.jl | |
XLS/ODF Loading | xlrd, openpyxl | XLConnect, xlsx | OdsIO.jl | |
Hiearchical Data Formats (HDF) | h5py, pandas.read_hdf | rhdf5 | HDF5.jl | |
URL Requests | requests, PycURL | data.table, rCurl | HTTP.jl | The Julia package is still new and not tested in production systems |
Relational Database Connectors | MySQLdb, psycopg2, sqlite3 | RODBC / RODBCExt, RMySQL, RPostgresSQL, RSQLite | MySQL.jl, PostgreSQL.jl, SQLite.jl | |
Graph Databases Connectors | neo4j, pyArango | neo4R | Neo4j.jl | |
Object Relational Mapping | SQLAlchemy, Django ORM | |||
General Data Wrangling | pandas | Built-in data.table, (dplyr, tidyr, stringr, part of the tidyverse) | DataFrames.jl | The concept of a data frame has been a core aspect of R and pandas has emulated this in Python, DataFrame in Julia |
Advanced datetime handling | dateutil | lubridate | These packages provide datetime specific extensions to built-in functionality | |
Unit Conversion | pint, quantities | Unitful | Converting quantities between various systems of unit measurement | |
Package Reviews | Databases Task Views | Databases, Missing Data |
Data Quality and Data Validation
This section aims to answer the following questions: What tools are available for assessing, reporting and improving data quality?
Aspect | Python | R | Julia | Comment |
---|---|---|---|---|
Missing Data | Pandas functionality, sklearn.impute | Amelia and many others | Impute.jl |
Workflow Management
This section aims to answer the question: What tools are available to help manage data science workflows in Python, R and Julia respectively?
Aspect | Python | R | Julia | Comment |
---|---|---|---|---|
ETL | Bonobo, petl, pygrametl | |||
Programmatic Workflow Management | Airflow, Luigi | DrWatson.js |
General Purpose Mathematical Libraries
This section aims to answer the question: What building blocks are available for undertaking basic quantitative (numerical) work in Python, R and Julia respectively? NB: The division of what is core mathematics and what is a specialized domain is a bit arbitrary.
Aspect | Python | R | Julia | Comment |
---|---|---|---|---|
General Purpose vectors and n-dimensional arrays (as storage) | numpy | Built-in array | The R system comes with many basic array functionalities available built-in | |
Numerical Linear Algebra (matrix operations) | numpy.linalg | Matrix, RcppArmadillo, RcppEigen | Built-in support (LinearAlgebra.Basic), StaticArrays, BandedMatrices, IterativeSolvers | For specialized operations (large / sparse matrices see below in HPC), eigenpy and pybind11 provide alternative means to use C++ numerical linear algebra in Python |
Mathematical (Special) Functions such as Gamma, Beta, Bessel or probability distribution functions | scipy | Built-in functions | SpecialFunctions.jl Distributions.jl | The R system comes with many basic functionalities available built-in |
Random Number Generation | Built-in, numpy.random | Built-in functions | Built-in (Random.Random) | This entry is about generic random numbers. More specialized applications mentioned below |
Mathematical Optimisation | JuMP | |||
Symbolic Algebra | sympy | Symata | ||
Curve Fitting | scipy.optimize, numpy.polyfit | Built-in | ApproxFun | |
Package Reviews | Mathematics Task Views | Numerical Mathematics, Optimization |
Core Statistics Libraries
This section aims to answer the question: What libraries are available for undertaking standard statistical studies in Python, R or Julia? There is a large number of packages / modules with significant duplication / overlap, especially for the R system, hence only the major / indicative ones are considered.
Aspect | Python | R | Julia | Comment |
---|---|---|---|---|
Exploratory Data Analysis (descriptive statistics, moments, etc) | pandas.describe, pandas profiling, scipy.stats, statsmodels | Base R (stats), car, caret, dplyr | describe(DataFrame) | EDA is quite broad and loosely defined. Here we take a fairly narrow view that
remains as much as possible non-parametric and model-agnostic |
Correlation | pandas.corr, numpy.corrcoef | Built-in (cor) | Built-in (cor) | |
ANOVA | scipy.stats, statsmodels | Built-in (aov, anova), car, caret | ANOVA.jl | |
Linear Regression Analysis | scikit-learn, statsmodels | Built-in | Regression.jl | |
Generalized Linear Regression | scikit-learn, statsmodels | Built-in glmnet | Regression.jl | This category includes logistic regression (which is available in many R packages), multinomial regression etc. |
Package Reviews | Statistics Task Views, Regression Methods Task Views | Probability Distributions, Multivariate Statistics, Extreme Value Analysis, Robust Statistical Methods |
Stochastic Processes
This section aims to answer the question: What libraries are available for estimating and/or simulating stochastic processes in Python, R or Julia?
Aspect | Python | R | Julia | Comment |
---|---|---|---|---|
Survival Analysis | lifelines | survival | Survival.jl | |
Gaussian Processes | GPy | GauPro, GPfit, kergp, mlegp | GaussianProcesses.jl | |
Poisson Processes | tick, py-hawkes | poisson, NHPoisson, hawkes, emhawkes | ||
Package Reviews | Survival Analysis |
Econometrics / Timeseries Libraries
This section aims to answer the question: What libraries are available for undertaking econometric / Timeseries Data studies in Python, R or Julia?
Aspect | Python | R | Julia | Comment |
---|---|---|---|---|
Basic Econometric Analysis (stationarity, trends, seasonality) | statsmodels.tsa | Built-in (ts) | TimeSeries.jl, Econometrics.jl | |
ARMA Processes / Univariate Models | statsmodels.tsa, pmdarima | auto, forecast, tseries | ARCHModels.jl | |
Heteroskedastic (GARCH) processes | statsmodels, arch | tseries, zoo, vars | ARCHModels.jl | |
Vector Auto Regressions (VAR) | statsmodels.tsa | mts, vars | VectorAutoregressions.jl (WIP) | |
General Timeseries | pflux, prophet | prophet (R API) | TimeSeries.jl | |
Frequency Domain Analysis | numpy.fft | Built-in (spectrum) | ||
Package Reviews | Econometrics Task Views | Econometrics, Time Series Analysis |
Machine Learning Libraries
This section aims to answer the question: What libraries are available for machine learning projects in Python, R or Julia? The term machine learning is not too specific so we use this category to group various advanced / specialized libraries that are relevant for data science (but not e.g. computer vision and other specialized ML applications). NB: Machine learning algorithms are typically compute intensive and are thus implemented in system languages with eventual binding and API provided to Python or R environments
Aspect | Python | R | Julia | Comment |
---|---|---|---|---|
Network Analysis | networkx | igraph, sna | LightGraphs.jl | |
Cluster Analysis (Unsupervised Learning) | scikit-learn | cluster | Clustering.jl | K-means and other clustering algorithms |
Random Forests | scikit-learn | randomForest, ranger | DecisionTree.jl | |
Gradient Boosting | scikit-learn | XGBoost Interface | XGBoost.jl Interface | |
Probabilistic Graphical Models | pgmpy | bnlearn, gRain | PGM.jl | |
Neural Networks | tensorflow, pytorch, keras, Interface to MXNet | Interface to h2o, Interface to MXNet, Interface to keras | Flux, MLJ, Knet | R studio offers an interface to tensorflow |
Package Review | Machine Learning Task Views | Bayesian Inference, Cluster Analysis & Finite Mixture Models, Machine Learning, Graphical Models |
GeoSpatial Libraries
This section aims to answer the question: What libraries are available for working with GIS / geospatial data in Python, R or Julia? The geospatial package space is particularly fragmented, the selection focuses on some key anchor concepts.
Aspect | Python | R | Julia | Comment |
---|---|---|---|---|
Geo Data Structures | GeoPandas.GeoSeries, GeoPandas.GeoDataFrame | raster, sp, sf, stars | ||
GDAL | gdal | rgdal, rgeos | GDAL.jl | |
GeoJSON | geojson | geojson, geojsonR | GeoJSON | |
PostGIS | geojson | rpostgis | GeoJSON | |
GeoMaping | CartoPy, Descartes | gmt | GMT | |
OpenStreetMap | openstreetmap | OpenStreetMap | OpenStreetMap.jl | |
Spatial Statistics | pysal | gstat, geoR, geoRglm | R has a large number of specialized spatial statistics packages (see Task Views) | |
Spatial Econometrics | pysal.spreg | |||
Package Review | Geospatial Task Views | Spatial Data, Handling and Analyzing Spatio-Temporal Data |
Visualization
This section aims to answer the question: What functionality is available to produce data driven visualization in Python, R or Julia?
Aspect | Python | R | Julia | Comment |
---|---|---|---|---|
Low level API's | matplotlib | grid, gridExtra | Plots.jl, Makie.jl | |
Graph packages | seaborn, plotly, bokeh | ggplot2 | Gadfly.jl, AlgebraOfGraphics.jl | |
Declarative Visualizations | Altair | Vega.jl | ||
XKCD style plots :-) | Available! | Available! | ||
Package Review | Visualization Task Views | Graphic Displays & Visualization |
Web, Desktop and Mobile Deployment
This section aims to answer the question: What tools does each language ecosystem provide for the deployment of data based applications, whether this is via the web, desktop or mobile apps.
Aspect | Python | R | Julia | Comment |
---|---|---|---|---|
Native Webservers | Tornado, Gunicorn, CherryPy, Twisted | OpenCPU, plumber | HTTP.jl | As a general remark these native servers are not exposed directly in production but are fronted by e.g. apache httpd and nginx servers |
Classic Web Frameworks | Flask, Pyramid, Django | R Shiny, rApache | Genie.jl | Web frameworks typically used behind a production web server (Apache, Nginx etc.) |
Web Formats | xml, json (built-in) | XML, rjson, jsonlite | JSON.jl | |
Web Sockets | websockets | websocket | WebSockets.jl | WebSocket connection allows full-duplex communication between a client and server so that either side can push data to the other through an established connection |
Client Side (Browser) | Brython, RustPython, Pyodide | |||
Mobile Apps | Kivy, Beeware | Both kivy and beeware allow cross-platform app development. | ||
Webassembly | julia-wasm | Dynamic languages like Python & R are not compilable to wasm | ||
Package Review | Web Task Views | Model Deployment, Web Technologies |
Privacy-Preserving Computation
This section aims to answer the question: What tools and libraries are available for implementing diverse Privacy-Preserving Computation schemes
Aspect | Python | R | Julia | Comment |
---|---|---|---|---|
Secure Deep Learning | PySyft, Tensorflow Federated, FATE, PaddleFL | OnDevAI | ||
Privacy Preserving Statistics / Federated Analysis | RAPPOR | RAPPOR, dataSHIELD | ||
Package Review |
Semantic Web / Semantic Data
This section aims to answer the question: What tools and libraries are available for working with semantic data (RDF, OWL, JSON-LD etc) and other relevant domain specific metadata schemas?
Aspect | Python | R | Julia | Comment |
---|---|---|---|---|
RDF Format | rdflib | rrdf | ||
JSON-LD Format | rdflib.jsonld | JSON-LD is an alternative web-friendly serialization format for RDF | ||
OWL Ontologies | ontospy, owlready2 | |||
Querying RDF (SPARQL) | rdflib | Rredland | ||
Serving RDF (SPARQL) | rdflib | |||
SDMX Format | pandasdmx | rsdmx | SDMX is the statistical data and metadata exchange format | |
Package Review | Semantic Data Task View |
Bindings to Other Languages
Bindings to other languages are use cases that require a multi-lingual approach, e.g. to tap into another ecosystem libraries. This section aims to answer the question: what are my options if I want to invoke C++, Java etc from Python, R or Julia? NB: This table is for binding to other languages, for alternative implementations (e.g. re-implementing python within a JVM, see Language Characteristics)
Aspect | Python | R | Julia | Comment |
---|---|---|---|---|
Bindings to C/C++ | Cython, pybind11 | Rcpp | Cxx.jl | Native Python, R are slow compared to lower level / compiled languages. A common approach to make full use of existing CPU is to extend the language via bindings to a faster language. Bindings might also be useful to re-use existing libraries |
Bindings to Java | py4j, pyO3, JPype, jpy, Javabridge, pyjnius | renjin | JavaCall.jl | |
Bindings to Rust | pyO3 | |||
Bindings to Lua | lunatic-python, lupa | |||
Package Review |
High Performance Computing
For our purposes high performance computing (HPC) is any use case that requires more than a single CPU (and its own RAM or disk). This section aims to answer the question: what are my options if I have performance bottlenecks in terms of CPU, memory or disk, hence covering topics such as concurrency or GPU computing. NB: Julia aims to address performance issues through compilation and other design choices
Aspect | Python | R | Julia | Comment |
---|---|---|---|---|
Coroutines | Built-in (async/await, since Python 3.5) | Built-in (Tasks/Channels) | ||
Multi-threading | Built-in (thread) | foreach | Built-in (Base.Threads) (Experimental) | |
Multi-core | multiprocessing | doParallel, future | Built-in (Distributed) | |
Spark interface | pySpark | SparkR, sparklyr | Spark.jl | |
GPU Computing | pyCUDA | gpuR | CUDAnative.jl | GPU interfaces are offered also via some ML packages (e.g pytorch, tensorflow, MXnet.jl) |
Parallel Computing (Clusters, MPI etc) | ClusterManager.jl | |||
Distributed Data | dask | multidplyr | JuliaDB.jl | |
Package Review | HPC Task Views | High-Performance and Parallel Computing |
Using R, Python and Julia together
The section aims to answer the question: How can I use R from Python, Python from Julia, Julia from R and vice versa :-). The first rows of this table have the From/To Format (From X Call Y) for native integration between the three systems, where "Native" means that the integration is done using language bindings within the respective interpreters / REPL (not explicitly using the operating system or a server API)
Aspect | Call Python | Call R | Call Julia | Comment |
---|---|---|---|---|
From Python | rpy2, RSPython | pyjulia | ||
From R | PythonInR, rPython, RSPython | XRJulia | ||
From Julia | PyCall.jl, PythonCall.jl | RCall.jl | ||
Python/R Cross-Development and Integration | r4intellij, rpy2 | reticulate | ||
Via Server API's | Rserve | |||
Via OS / Shell Scripts | Built-in (subprocess) | Built-in (system2) | Built-in (Base.run) |
Motivation
A large component of Quantitative Risk Management relies on data processing and quantitative tools (aka Data Science). In recent years open source software targeting Data Science finds increased adoption in diverse applications. The Overview of the Julia-Python-R Universe article is a side by side comparison of a wide range of aspects of Python, Julia and R language ecosystems.
The comparison of the three ecosystems aims:
- To be useful for people that are somewhat familiar with programming and want to inspect options and use the most appropriate tool
- To promote interoperability, cross-validation and overall best-practices
- To be factual as much as possible without drifting to judgement / opinions
- To cover use cases relevant for the implementation of data science and (in particular) quantitative risk models
The comparison does not aim:
- To be a detailed / comprehensive catalog of all available libraries (which count to many thousands!)
- To cover use cases very removed from quantitative risk models
- To be totally exhaustive (e.g to identify all the possible computer systems one can run a Python interpreter on, or count all the possible ways one can perform linear regression in R)
Disclaimers
The comparison does absolutely not provide an assessment of which system is "better". The proper way to use the comparison is to start with one's objectives, knowledge level, use case and identify how those might be served by the respective pillars.
NB: The comparison attempted here is not *entirely* appropriate as the three systems have different origins and architectural design choices. Strictly speaking R is not a general programming language. R is a system for statistical computation and graphics. It consists of a sufficiently general language plus a run-time environment with graphics, a debugger, access to certain system functions, and the ability to run programs stored in script files. Yet, despite the disclaimer, a comparison is justified because in a large domain of applications and use cases the three frameworks can be used interchangeably (or nearly so)
Structure
The comparison data are provided in tabular format in several distinct tables. Each table documents a relevant language or ecosystem subdomain. The number and focus areas of the different table are somewhat arbitrary and may expand in the future. The order is roughly from more generic aspects towards more specialized / advanced areas, concluding with interoperatibility.
Each table entry (row) highlights key functionality within the subdomain. The language columns point to information or packages and (where applicable) there is commentary. Reference links are included when useful.
At the bottom of some tables there is a row indicated Package Review. This row has a collection of links to the CRAN Task Reviews that aim to summarize the large number of R packages available for some data science tasks. There are also links to a mirror effort to create Python Task Views (this content is still WIP - contributors welcome, see below)
Getting Involved
You can provide simple and anonymous feedback on the wiki version of the overview using the feedback button at the bottom of the page. If you are comfortable using github / markdown you can raise an [issue here](https://github.com/open-risk/Overview-of-the-Julia-Python-R-Universe).
People interested in developing the Python Task Views can do so via the github repo.