R 中 BLAS 模块是用来进行矩阵计算等的,而 R 本身带的 BLAS 和 LAPACK 模块运算效率较为低下,我们可以通过替换这两个模块实现运算速度的大幅增加。
RRO(Revolution R Open)是 Revolution 出品的第三方 R 版本,在其中使用了 Intel MKL 库作为 BLAS 和 LAPACK 的运算库,大大提高了 R 的矩阵运算效率。由于 Intel MKL 为收费组件,我们无法直接得到,因此可以通过 RRO 提供的 RevoMath 附加包令官方 R 版本具有 Intel MKL。
下面是 Windows 和 Linux 下的替换方法:
1、Windows
最简单的方法是安装 RRO 和 RevoMath。当然,如果我们不想用 RRO,也可以通过安装 RRO 和 RevoMath,然后提取
$R_HOME$\bin\x64 中的
libiomp5md.dll、
Rblas.dll、
Rlapack.dll 放入官方 R 版本中的相同目录,并覆盖原文件。(如果考虑随后的切换,也可以备份原来的
Rblas.dll 和
Rlapack.dll。)为了可以调节运算所需 CPU 单元,也可以将 RRO 的
library 中的
RevoUtilsMath 目录复制到官方 R 版本中的
library 目录。
2、Linux
我这里以 openSUSE 13.2 64-bit 为例说明。
$R_HOME$ 为
/usr/lib64/R,下载 RRO 官网上的 openSUSE 条目下 MKL 文件(RevoMath 3.2.4),解压缩后复制到
$R_HOME$ 中
lib 文件夹,在该文件夹下执行:
|
sudo mv Rblas.so Rblas.so.keep sudo mv Rlapack.so Rlapack.so.keep sudo ln -s libmkl_rt.so Rblas.so sudo ln -s libmkl_rt.so Rlapack.so |
同时修改
$R_HOME/etc/Rprofile.site 文件(没有则创建该文件),增加如下两行:
|
Sys.setenv("MKL_INTERFACE_LAYER"="GNU,LP64") Sys.setenv("MKL_THREADING_LAYER"="GNU") |
这时可以进行测试,运算速度将得到极大提升。
采用 ATT 上 R Benchmarks 中的 R Benchmarks 2.5,在替换 Intel MKL 后的系统中,结果如下:
R Benchmark 2.5
===============
Number of times each test is run__________________________: 3
I. Matrix calculation
Creation, transp., deformation of a 2500x2500 matrix (sec): 0.996666666666594
2400x2400 normal distributed random matrix ^1000____ (sec): 0.363333333332776
Sorting of 7,000,000 random values__________________ (sec): 0.64333333333343
2800x2800 cross-product matrix (b = a' * a)_________ (sec): 0.486666666665769
Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec): 0.310000000000097
--------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 0.484534047305902
II. Matrix functions
FFT over 2,400,000 random values____________________ (sec): 0.39666666666623
Eigenvalues of a 640x640 random matrix______________ (sec): 0.336666666666739
Determinant of a 2500x2500 random matrix____________ (sec): 0.266666666666424
Cholesky decomposition of a 3000x3000 matrix________ (sec): 0.25
Inverse of a 1600x1600 random matrix________________ (sec): 0.263333333333018
--------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 0.287006387703729
III. Programmation
3,500,000 Fibonacci numbers calculation (vector calc)(sec): 0.603333333333164
Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec): 0.283333333332242
Grand common divisors of 400,000 pairs (recursion)__ (sec): 0.48000000000017
Creation of a 500x500 Toeplitz matrix (loops)_______ (sec): 0.376666666667006
Escoufier's method on a 45x45 matrix (mixed)________ (sec): 0.319999999999709
--------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 0.386767050168976
Total time for all 15 tests_________________________ (sec): 6.37666666666337
Overall mean (sum of I, II and III trimmed means/3)_ (sec): 0.377475216986724
--- End of test ---
而替换前的结果为:
R Benchmark 2.5
===============
Number of times each test is run__________________________: 3
I. Matrix calculation
Creation, transp., deformation of a 2500x2500 matrix (sec): 0.979999999999998
2400x2400 normal distributed random matrix ^1000____ (sec): 0.376666666666666
Sorting of 7,000,000 random values__________________ (sec): 0.643333333333333
2800x2800 cross-product matrix (b = a' * a)_________ (sec): 10.71
Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec): 5.05
--------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 1.47113039784771
II. Matrix functions
FFT over 2,400,000 random values____________________ (sec): 0.396666666666671
Eigenvalues of a 640x640 random matrix______________ (sec): 0.650000000000001
Determinant of a 2500x2500 random matrix____________ (sec): 2.52
Cholesky decomposition of a 3000x3000 matrix________ (sec): 4.02000000000001
Inverse of a 1600x1600 random matrix________________ (sec): 2.63
--------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 1.62713360762634
III. Programmation
3,500,000 Fibonacci numbers calculation (vector calc)(sec): 0.636666666666665
Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec): 0.269999999999991
Grand common divisors of 400,000 pairs (recursion)__ (sec): 0.473333333333329
Creation of a 500x500 Toeplitz matrix (loops)_______ (sec): 0.383333333333345
Escoufier's method on a 45x45 matrix (mixed)________ (sec): 0.349999999999994
--------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 0.398967243235916
Total time for all 15 tests_________________________ (sec): 30.09
Overall mean (sum of I, II and III trimmed means/3)_ (sec): 0.984775438409437
--- End of test ---
可以看出,效率的提高还是很明显的。另外,经实验,Intel MKL 比 OpenBLAS 效率略高。