Fast prototyping (Numpy!)
Popular:
Well-known
Several great libraries
Share ideas between developers / scientists
Popularity counts
Readability counts
Expressivity counts
Anyway, one needs a good and well-known scripting language so yes!
(even considering Julia)
Designed for fast prototyping & "glue" codes together
Generalist + easy to learn ⇒ huge and diverse community 👨🏿🎓🕵🏼 👩🏼🎓 👩🏽🏫👨🏽💻👩🏾🔬 🎅🏼 🌎 🌍 🌏
Expressivity and readability
Not oriented towards high performance
(fast and easy dev, easy debug, correctness)
Highly dynamic 🐒 + introspection (inspect.stack()
)
Automatic memory management 💾
All objects encapsulated 🥥 (PyObject, C struct)
Objects accessible through "references" ➡️
Usually interpreted
Interpreted (nearly) instruction per instruction, (nearly) no code optimization
The numerical stack (Numpy, Scipy, Scikits, ...) based on the CPython C API (CPython implementation details)!
Optimized implementation with tracing Just-In-Time compilation
The CPython C API is an issue! PyPy can't accelerate Numpy code!
For microcontrollers
mylist = [1, 3, 5]
list
: array of references towards PyObjects
arr = 2 * np.arange(10)
print(arr[2])
4
Pure Python terrible 🐢 (except with PyPy)...
from math import sqrt
my_const = 10.
result = [elem * sqrt(my_const * 2 * elem**2) for elem in range(1000)]
but even this is not very efficient (temporary objects)...
import numpy as np
a = np.arange(1000)
result = a * np.sqrt(my_const * 2 * a**2)
Even slightly worth with PyPy 🙁
Cprofile (pstats, SnakeViz), line-profiler, perf
, perf_events
"Premature optimization is the root of all evil" (Donald Knuth)
80 / 20 rule, efficiency important for expensive things and NOT for small things
For example, using Numpy arrays instead of Python lists...
unittest
, pytest
pipelining, hyper-threading, vectorization, advanced instructions (simd), ...
important to get data aligned in memory (arrays)
What does CPython (compile
, "byte code", nearly no optimization, see dis
module)
Just-in-time
Has to be fast (warm up), can be hardware specific
Ahead-of-time
Can be slow, hardware specific or more general to distribute binaries
Compilers are usually good for optimizations! Better than most humans...
From one language to another language (for example Python to C++)
⚠️ in Python, one interpreter per process (~) and the Global Interpreter Lock (GIL)...
In a Python program, different threads can run at the same time (and take advantage of multicore)
But... the Python interpreter runs the Python bytecodes sequentially !
Terrible 🐌 for CPU bounded if the Python interpreter is used a lot !
No problem for IO bounded !
Many tools to interact with static languages:
ctypes, cffi, cython, cppyy, pybind11, f2py, pyo3, ...
Glue together pieces of native code (C, Fortran, C++, Rust, ...) with a nice syntax
⇒ Numpy, Scipy, ...
Remarks:
Numpy: great syntax for expressing algorithms, (nearly) as much information as in Fortran
Performance of a @ b
(Numpy) versus a * b
(Julia)?
Same! The same library is called! (often OpenBlas or MKL)
Don't use too often the Python interpreter (and small Python objects) for computationally demanding tasks.
Pure Python
→ Numpy
→ Numpy without too many loops (vectorized)
→ C extensions
But ⚠️ ⚠️ ⚠️ writting a C extension by hand is not a good idea ! ⚠️ ⚠️ ⚠️
Langage: superset of Python
A great mix of Python / C / CPython C API!
Very powerfull but a tool for experts!
Easy to study where the interpreter is used (cython --annotate
).
Very mature
Now able to use Pythran internally...
My experience: large Cython extensions difficult to maintain
from numba import jit
@jit
def myfunc(x):
return x**2
"nopython" mode (fast and no GIL) 🙂
Also a "python" mode 🙂
GPU and Cupy 😀
Methods (of classes) 🙂
def mydecorator(func):
# do something with the function
print(func)
# return a(nother) function
return func
@mydecorator
def myfunc(x):
return x**2
<function myfunc at 0x7fc5bd76f378>
This mysterious syntax with @
is just syntaxic sugar for:
def myfunc(x):
return x**2
myfunc = mydecorator(myfunc)
<function myfunc at 0x7fc5bd76f598>
Sometimes not as much efficient as it could be 🙁
(usually slower than Pythran / Julia / C++)
Transpiles Python to efficient C++
Good to optimize high-level NumPy code 😎
Extensions never use the Python interpreter (pure C++ ⇒ no GIL) 🙂
Can produce C++ that can be used without Python
Usually very efficient (sometimes faster than Julia)
High and low level optimizations
(Python optimizations and C++ compilation)
SIMD 🤩 (with xsimd)
Understand OpenMP instructions 🤗 !
Can use and make PyCapsules (functions operating in the native word) 🙂
# calcul of range
print_optimized("""
def f(x):
y = 1 if x else 2
return y == 3
""")
def f(x): return 0
# inlining
print_optimized("""
def foo(a):
return a + 1
def bar(b, c):
return foo(b), foo(2 * c)
""")
def foo(a): return a + 1 def bar(b, c): return ((b + 1), ((2 * c) + 1))
# unroll loops
print_optimized("""
def foo():
ret = 0
for i in range(1, 3):
for j in range(1, 4):
ret += i * j
return ret
""")
def foo(): ret = 0 ret += 1 ret += 2 ret += 3 ret += 2 ret += 4 ret += 6 return ret
# constant propagation
print_optimized("""
def fib(n):
return n if n< 2 else fib(n-1) + fib(n-2)
def bar():
return [fib(i) for i in [1, 2, 8, 20]]
""")
import functools as __pythran_import_functools def fib(n): return n if (n < 2) else (fib((n - 1)) + fib((n - 2))) def bar(): return [1, 1, 21, 6765] def bar_lambda0(i): return fib(i)
# advanced transformations
print_optimized("""
import numpy as np
def wsum(v, w, x, y, z):
return sum(np.array([v, w, x, y, z]) * (.1, .2, .3, .2, .1))
""")
import numpy as __pythran_import_numpy def wsum(v, w, x, y, z): return __builtin__.sum( ((v * 0.1), (w * 0.2), (x * 0.3), (y * 0.2), (z * 0.1)) )
Only "nopython" mode
limited to a subset of Python
limited to few extension packages (Numpy + bits of Scipy)
pythranized functions can't call Python functions
No JIT: need types (written manually in comments)
Lengthy ⌛️ and memory intensive compilations
Debugging 🐜 Pythran requires C++ skills!
No GPU (maybe with OpenMP 4?)
compilers unable to compile Pythran C++11 👎
Performance issues, especially for crunching numbers 🔢
⇒ need to accelerate the "numerical kernels"
Many good accelerators and compilers for Python-Numpy code
⇒ We shouldn't have to write specialized code for one accelerator!
Other languages don't replace Python for sciences
Modern C++ is great and very complementary 💑 with Python
Julia is interesting but not the heaven on earth
Keep your Python-Numpy code clean and "natural" 🧘
Clean type annotations (🐍 3)
Easily mix Python code and compiled functions
JIT based on AOT compilers
Methods (of classes) and blocks of code
JIT (@jit
)
AOT compilation for functions and methods (@boost
)
Blocks of code (with if ts.is_transpiled:
)
Parallelism with a class (adapted from Olivier Borderies)
omp/tsp.py (OpenMP) and tsp_concurrent.py (concurrent - threads)
Also compatible with MPI!
Works also well in simple scripts and IPython / Jupyter.
# abstract syntax tree
import ast
tree = ast.parse("great_tool = 'Beniget'")
assign = tree.body[0]
print(f"{assign.value.s} is a {assign.targets[0].id}")
Beniget is a great_tool
Write the (Pythran) files when needed
Compile the (Pythran) files when needed
Use the fast solutions when available
PyPy (Python abstraction for free) + Numpy accelerators used through Transonic
Modern C++ for more fundamental tools (with multi-language API)