Multithreaded applications on C++

As a part of my diploma work, I had to find out how to develop multithreaded applications. The obvious option is developing apps based on Win32 threads (for Windows only, of course). The library is free, but not cross platform. There are also several cross platform libraries letting you develop multithreaded applications. The most wide spreaded ones are Intel TBB, OpenMP, Boost MPI.

OpenMP

Website - http://openmp.org/wp/. Lessons - https://computing.llnl.gov/tutorials/openMP/.

The advantage of the library is that you shouldn’t sighificantly change the code to use it. The only thing you need is to put some compiler directives in your code.

Parallel loops allow you to split loop iterations over multiple threads. So using two threads the first thread would perform the first half of the iteration, the second thread would perform the section half.

#pragma omp parallel for
for (int i=0; i<N; i++) {...}

Sections allow you to statically partition the work over multiple threads. This is useful when there is obvious work that can be performed in parallel. However, it’s not a very flexible approach.

#pragma omp parallel sections
{
  #pragma omp section
  {...}
  #pragma omp section
  {...}
}

Tasks are the more flexible approach – these are created dynamically and their execution is performed asynchronously, either by the thread that created them, or by another thread.

#pragma omp task
{...}

OpenMP has several things going for it.

  • Directive based, which means that the compiler does the work of creating and synchronising the threads.
  • Incremental parallelism, meaning that you can focus on just the region of code that you need to parallelise.
  • One source base for serial and parallel code. The OpenMP directives are only recognised by the compiler under compiler flag. So you can use the same source base to generate serial and parallel code. This means that if the parallel code generates a wrong answer you can use the same code base to generate a serial version which you can then use to verify the computation. This means that you can isolate parallelisation errors from errors in the algorithm.

Stack Overflow

The disadvantage of this library is that it’s not supported by all compilers. Say, you can’t use OpenMP in MS Visual Studio 2008 and 2010 Express Edition (http://msdn.microsoft.com/en-us/library/tt15eb9t.aspx).

Intel TBB и Boost MPI

Intel TBB (http://threadingbuildingblocks.org/) and Boost MPI (http://www.boost.org/) libraries are just libraries. They use classes to deal with threads.

The important difference between these two libraries is the licenses they are distributed under. The free Boost MPI version may be used in commercial proprietary projects, while Intel TBB is distributed under GPL (for open source projects only).

You can find more discussions and use examples of Boost MPI rather than Intel TBB.

A good discussion of OpenMP and Boost MPI may be found on the site http://www.linux.org.ru/forum/development/4252077 (Russian). The translation is here:

The program has a big external loop (about 10000 iterations) which has several smaller loops with math calculations (about 2500 iterations). I’ve faced a problem in threads synchronization when trying to spread the main loop among several threads. The iterations are tightly linked. So I decided to divide smaller loops into several threads. At this point I faced a problem: whether to use boost::thread or OpenMP.

I’ve found a lot of information about OpenMP, from which I concluded that it takes OpenMV much time to fork and defork a program. Boost::threads, people say, doesn’t have this disadvantage. More than that, it’s said boost::thread outperforms OpenMV in processing small loops. Of course I decided to find it out myself…

I wrote 2 programs with equal number of iterations:

OpenMP:
_______________________________________

#include <iostream>
#include <omp.h>
#include <time.h>
#define Npar 4
#define Nmas 1000
int main(int argc, char* argv[])
{
float* a = new float[Nmas];
float* b = new float[Nmas];
float* c = new float[Nmas];
for (int k = 0; k < 100000; k++)
{
#pragma omp parallel shared(a, b, c) num_threads(Npar)
{
int myid = omp_get_thread_num();
for (int i = myid; i < Nmas; i += Npar)
{
a[i] = (float) i;
b[i] = a[i] * a[i];
c[i] = 0.3 * a[i] - b[i] / a[i];
}
}
}
delete[] a;
delete[] b;
delete[] c;
std::cout << clock() << std::endl;
return 0;
}

_________________________________________

Boost::thread :
_________________________________________

#include <boost/thread/thread.hpp>
#include <iostream>
#include <time.h>
#define Nmas   1000
#define Npar   4
using namespace std;

void proga(int ind, float* a, float* b, float* c)
{
for (int i = ind; i < Nmas; i += Npar)
{
a[i] = (float) i;
b[i] = a[i] * a[i];
c[i] = 0.3 * a[i] - b[i] / a[i];
}
}

int main(int argc, char* argv[])
{
float* a = new float[Nmas];
float* b = new float[Nmas];
float* c = new float[Nmas];
for (int j = 0; j < 100000; j++)
{
boost::thread my_thread1(&proga, 0, a, b, c);
boost::thread my_thread2(&proga, 1, a, b, c);
boost::thread my_thread3(&proga, 2, a, b, c);
boost::thread my_thread4(&proga, 3, a, b, c);
my_thread1.join();
my_thread2.join();
my_thread3.join();
my_thread4.join();
}
delete[] a;
delete[] b;
delete[] c;
std::cout << clock() << endl;
return 0;
}

________________________________________

Compiler: g++-4.3, ОС Kubunta 9.04,
Processor: CPU Intel Core 2 Quad Q8300 2.5GHz.

I compiled and ran both programs to find out which one works faster. The collected data is below (I ran each variant 10 times, the average data is given).

clock function data:
boost 8700000 (8.7s)
openMP 8800000 (8.8s)

time function data:
boost
real 0m8.588s
user 0m4.616s
sys 0m4.176s
__________________
openMP
real 0m7.438s
user 0m6.828s
sys 0m3.132s
__________________

Поделиться в соц. сетях

Share to Facebook
Share to Google Plus
Share to LiveJournal

Похожие посты

Leave a Reply

  

  

  

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>