Some examples for multithreading
Neccessity of multithreading
In 2007, BGMN was used for random stacked minerals for the first time
[3]. From the beginning, these calculations
were very time consuming. At the same time, single core CPUs reached their
asymptotic limit and multicore processors became in common use. So the only
way out was speeding up these calulations by a multithreaded BGMN.
As of Mar 2010, all of the core programs BGMN, GEOMET, VERZERR and MAKEGEQ
are parallelized, but with different impact. The central switch for all these
programs is
NTHREADS=...
It should be set to the number of CPUs in your PC or less. You may put
that line in the task describing *.sav
file or, for
convenience, into the *.cfg
file residing in the BGMNwin
installation directory:
bgmn.cfg
for the BGMN program
geomet.cfg
for the GEOMET program
makegeq.cfg
for the MAKEGEQ program
verzerr.cfg
for the VERZERR program
Those files are accesible via the Configuration menu in BGMNwin.
Features of a multithreaded BGMN
During common use on multicore PCs, a significant speed up of BGMN was found.
Another feature became clear soon: There is a fluctuating number of refinement
steps for repeated BGMN runs on the same problem. In the startup step, only the
least significant digits of the refining parameters become different.
But in following, these differences become more and more significant, digit by
digit up to the leading digit. This reverses during convergency, and the final
results are mostly identic. By that, comparing the speed up from total run time
becomes impossible. For that reason, the result file was enriched by the total
number of refinement steps. And, we compare the computing time per step.
A common measure in parallel computing is parallel efficiency. It
simple means the "goodness of parallelization". If an application
using N cores/threads demands exactly the Nth part of computing
time compared to the same application running on only 1 core/thread,
its parallel efficiency is 100%. If N threads demands more time,
the parallel efficiency shrinks down in reverse proportional.
PCs as used for the examples:
- DELL Precision T5400, 4GB (8·512MB) RAM, dual XEON core 2 quad
(model 5440, 2.83GHz, 6MB L2-Cache), Windows XP Professional
- Phenom 9750 (2.4GhZ), 4GB (4·1GB) RAM, Fedora Linux 9
1. Example: Reynold's Cup 2004 [1], sample 1
Artifical mixture, representing a "mudstone" composition:
- quartz
- pottassium feldspar (monoclinic)
- albite
- calcite
- pyrite
- dolomite
- kaolinite
- illite 1Mt
- illite/smectite mixed layer mineral, R3 ordered, approx. 86% illite layers
- montmorillonite
- chlorite (Mg-rich)
Speed-up results:
PC | 1 |
2 |
number of threads | 1 |
2 | 4 | 8 |
1 | 2 | 4 |
time per refinement step | 5.8s |
3.4s | 2.7s | 1.6s |
6.2s | 3.1s | 1.9s |
parallel efficiency | 100% |
85% | 54% | 45% |
100% | 99% | 85% |
2. Example: Reynold's Cup 2008 [2], sample 1
- quartz
- opal-CT
- sanidine
- albite
- goethite
- gibbsite
- hornblende
- magnetite
- volcanic glass (obsidian)
- halloysite (with kaolinite)
- montmorillonite, Ca-form
- montmorillonite, Na-form
plus 10% ZnO as internal standard.
For opal-CT and halloysite, a recursive structure model [3]
was used.
Speed-up results:
PC | 1 |
2 |
number of threads | 1 |
2 | 4 | 8 |
1 | 2 | 4 |
time per refinement step | 50s |
35s | 17s | 9s |
46s | 23s | 12s |
parallel efficiency | 100% |
70% | 71% | 66% |
100% | 97% | 94% |
Features of a multithreaded GEOMET
Due of its nature of counting independent random events, the GEOMET
algorithm parallelization was simple and of high impact. The parallel
efficiency is near to 100%. The computation time of GEOMET has never
been a great challenge, so I decided to put the effect of parallelization to
precision instead of speed up. On multicore machines, GEOMET puts
additional steps inbetween the angular positions as set in the task describing
*.sav
file. Example: Having a *.sav
file
containing
NTHREADS=4
zweiTheta[1]=8
zweiTheta[2]=12
GEOMET will do raytracing at 8, 9, 10, 11 and 12 degree.
Features of a multithreaded MAKEGEQ
Parallelization of MAKEGEQ was of medium impact. I decided
to put the effect of parallelization to precision too, thus the stepwidth
will be reduced. Example:
NTHREADS=4
pi=2*acos(0)
WSTEP=3*sin(pi*zweiTheta/180)
Hence MAKEGEQ will use an effective stepsize of
0.75*sin(pi*zweiTheta/180)
with the exception of capillary geometry: In that case, the computation
time of MAKEGEQ is much enlarged by a double nested integral
calculation and parallelization will be used for speed-up.
Features of a multithreaded VERZERR
Parallelization of VERZERR is of medium impact. There is no
way for precision enhancements by parallelization, so I put the impact
of parallelization to speed up. Prior to parallelization there were
cases of notable long VERZERR run times. There was some
additional progress in the VERZERR algorithms and in following
the parallelized VERZERR will run some minutes only.
References:
[1] R. Kleeberg,
Results of the second Reynolds Cup
contest in quantitative mineral analysis.
Commission of Powder Diffraction, International Union of Crystallography,
CPD Newslett. 30 (2004) 22–24.
[2] General information:
www.clays.org/reynoldscup.html
[3] K. Ufer, R. Kleeberg, J. Bergmann,
H. Curtius, R. Dohrmann,
Refining realstructure parameters of disordered layer structures within the
Rietveld method,
Proceedings of the 5th Size-Strain conference "Diffraction Analysis of the
Microstructure of Materials" (SS-V),
Z. Kristallogr. Suppl. 27 (2008) pp. 151–158