Smectite

Site map

Up Kaolin Spergau Metashale Böhlscheiben Smectite Copper Shale
Variable Divergence Trimer Multithreading combined refinement

Some examples for multithreading

Neccessity of multithreading

In 2007, BGMN was used for random stacked minerals for the first time [3]. From the beginning, these calculations were very time consuming. At the same time, single core CPUs reached their asymptotic limit and multicore processors became in common use. So the only way out was speeding up these calulations by a multithreaded BGMN.

As of Mar 2010, all of the core programs BGMN, GEOMET, VERZERR and MAKEGEQ are parallelized, but with different impact. The central switch for all these programs is

NTHREADS=...
It should be set to the number of CPUs in your PC or less. You may put that line in the task describing *.sav file or, for convenience, into the *.cfg file residing in the BGMNwin installation directory: Those files are accesible via the Configuration menu in BGMNwin.

Features of a multithreaded BGMN

During common use on multicore PCs, a significant speed up of BGMN was found. Another feature became clear soon: There is a fluctuating number of refinement steps for repeated BGMN runs on the same problem. In the startup step, only the least significant digits of the refining parameters become different. But in following, these differences become more and more significant, digit by digit up to the leading digit. This reverses during convergency, and the final results are mostly identic. By that, comparing the speed up from total run time becomes impossible. For that reason, the result file was enriched by the total number of refinement steps. And, we compare the computing time per step.

A common measure in parallel computing is parallel efficiency. It simple means the "goodness of parallelization". If an application using N cores/threads demands exactly the Nth part of computing time compared to the same application running on only 1 core/thread, its parallel efficiency is 100%. If N threads demands more time, the parallel efficiency shrinks down in reverse proportional.

PCs as used for the examples:

  1. DELL Precision T5400, 4GB (8·512MB) RAM, dual XEON core 2 quad (model 5440, 2.83GHz, 6MB L2-Cache), Windows XP Professional
  2. Phenom 9750 (2.4GhZ), 4GB (4·1GB) RAM, Fedora Linux 9

1. Example: Reynold's Cup 2004 [1], sample 1

Artifical mixture, representing a "mudstone" composition: Speed-up results:
PC1 2
number of threads1 248 124
time per refinement step5.8s 3.4s2.7s1.6s 6.2s3.1s1.9s
parallel efficiency100% 85%54%45% 100%99%85%

2. Example: Reynold's Cup 2008 [2], sample 1

plus 10% ZnO as internal standard.
For opal-CT and halloysite, a recursive structure model [3] was used. Speed-up results:
PC1 2
number of threads1 248 124
time per refinement step50s 35s17s9s 46s23s12s
parallel efficiency100% 70%71%66% 100%97%94%

Features of a multithreaded GEOMET

Due of its nature of counting independent random events, the GEOMET algorithm parallelization was simple and of high impact. The parallel efficiency is near to 100%. The computation time of GEOMET has never been a great challenge, so I decided to put the effect of parallelization to precision instead of speed up. On multicore machines, GEOMET puts additional steps inbetween the angular positions as set in the task describing *.sav file. Example: Having a *.sav file containing
NTHREADS=4
zweiTheta[1]=8
zweiTheta[2]=12
GEOMET will do raytracing at 8, 9, 10, 11 and 12 degree.

Features of a multithreaded MAKEGEQ

Parallelization of MAKEGEQ was of medium impact. I decided to put the effect of parallelization to precision too, thus the stepwidth will be reduced. Example:
NTHREADS=4
pi=2*acos(0)
WSTEP=3*sin(pi*zweiTheta/180)
Hence MAKEGEQ will use an effective stepsize of
0.75*sin(pi*zweiTheta/180)
with the exception of capillary geometry: In that case, the computation time of MAKEGEQ is much enlarged by a double nested integral calculation and parallelization will be used for speed-up.

Features of a multithreaded VERZERR

Parallelization of VERZERR is of medium impact. There is no way for precision enhancements by parallelization, so I put the impact of parallelization to speed up. Prior to parallelization there were cases of notable long VERZERR run times. There was some additional progress in the VERZERR algorithms and in following the parallelized VERZERR will run some minutes only.

References:

[1] R. Kleeberg,
Results of the second Reynolds Cup contest in quantitative mineral analysis.
Commission of Powder Diffraction, International Union of Crystallography,
CPD Newslett. 30 (2004) 22–24.

[2] General information: www.clays.org/reynoldscup.html

[3] K. Ufer, R. Kleeberg, J. Bergmann, H. Curtius, R. Dohrmann,
Refining realstructure parameters of disordered layer structures within the Rietveld method,
Proceedings of the 5th Size-Strain conference "Diffraction Analysis of the Microstructure of Materials" (SS-V),
Z. Kristallogr. Suppl. 27 (2008) pp. 151–158