Some Scikit-Learn estimators can run jobs on multiple CPUs in parallel thanks to joblib and the n_jobs
parameter, or via openMP
.
Some internal NumPy-based methods can be parallelized if NumPy is installed with numerical analysis libraries such as MKL, OpenBLAS or BLAS.
When the underlying code uses joblib
, the number of workers (threads or processes) running in parallel is controlled via `n_jobs'.
Joblib supports multiprocessing & multithreading - the choice depends on the backend choice.
Scikit-Learn usually relies on loky
(Joblib's default), which is for multiprocessing. Joblib creates a memory map that is shared by all processes - when data size is >1MB.
In some case, Scikit-Learn will tell Joblib that multithreading is preferable.
from joblib import parallel_backend
with parallel_backend('threading', n_jobs=2):
print('done')
done
OpenMP parallelizes code written in Cython or C. It relies exclusively on multithreading and will try to use (by default) as many threads as possible.
You can control thread count via an environmental variable:
$OMP_NUM_THREADS=4 python my_script.py
NumPy & SciPy rely on multithreaded linear algebra libraries such as MKL, OpenBLAS or BLIS. The number of threads used by the libraries can be set via MKL_NUM_THREADS
, OPENBLAS_NUM_THREADS
or BLIS_NUM_THREADS
environmental variables.
NumPy & SciPy distributed on pypi.org and conda-forge are linked to OpenBLAS.
conda packages on Anaconda's "defaults" channel are linked by default to MKL.
Consider a case on an 8-CPU machine with GridSearchCV
(on Joblib) running with n_jobs=8
, plus a HistGradientBoostingClassifier
(on OpenMP). Each instance of the Classifier will spawn 8 threads (one for each CPU). That's 8*8=64 threads, which will cause too much scheduling overhead.
Starting with joblib>=0.14
with the loky
backend, joblib limits child processes' thread counts.
assume_finite
: skip-validation flag (for faster computations)working_memory
: size of temp arrays
(environmental variables, before importing sklearn):
SKLEARN_SITE_JOBLIB
- if nonzero, sklearn uses site joblib instead of a vendored verson.SKLEARN_ASSUME_FINITE
- default for assume_finite
SKLEARN_WORKING_MEMORY
- default for working_memory
SKLEARN_SEED
- sets global random generator seedSKLEARN_SKIP_NETWORK_TESTS
import sklearn
print(sklearn.get_config())
{'assume_finite': False, 'working_memory': 1024, 'print_changed_only': True, 'display': 'text'}