The ROme OpTimistic Simulator  2.0.0
A General-Purpose Multithreaded Parallel/Distributed Simulation Platform
mpi.c File Reference

MPI Support Module. More...

#include <stdbool.h>
#include <communication/mpi.h>
#include <communication/wnd.h>
#include <communication/gvt.h>
#include <communication/communication.h>
#include <queues/queues.h>
#include <core/core.h>
#include <arch/atomic.h>
#include <statistics/statistics.h>
+ Include dependency graph for mpi.c:

Go to the source code of this file.

Macros

#define MPI_TYPE_STAT_LEN   (sizeof(struct stat_t)/sizeof(double))
 The size in bytes of the statistics custom MPI Datatype. It assumes that stat_t contains only double floating point members.
 

Functions

bool pending_msgs (int tag)
 Check if there are pending messages. More...
 
bool is_request_completed (MPI_Request *req)
 check if an MPI request has been completed More...
 
void send_remote_msg (msg_t *msg)
 Send a message to a remote LP. More...
 
void receive_remote_msgs (void)
 Receive remote messages. More...
 
bool all_kernels_terminated (void)
 Check if all kernels have reached the termination condition. More...
 
void collect_termination (void)
 Check if other kernels have reached the termination condition. More...
 
void broadcast_termination (void)
 Notify all the kernels about local termination. More...
 
static void reduce_stat_vector (struct stat_t *in, struct stat_t *inout, int *len, MPI_Datatype *dptr)
 Reduce operation for statistics. More...
 
static void stats_reduction_init (void)
 Initialize MPI Datatype and Operation for statistics reduction. More...
 
void mpi_reduce_statistics (struct stat_t *global, struct stat_t *local)
 Invoke statistics reduction. More...
 
void dist_termination_init (void)
 Setup the distributed termination subsystem. More...
 
void dist_termination_finalize (void)
 Cleanup routine of the distributed termination subsystem. More...
 
void syncronize_all (void)
 Syncronize all the kernels. More...
 
void mpi_init (int *argc, char ***argv)
 Initialize MPI subsystem. More...
 
void inter_kernel_comm_init (void)
 Initialize inter-kernel communication. More...
 
void inter_kernel_comm_finalize (void)
 Finalize inter-kernel communication. More...
 
void mpi_finalize (void)
 Finalize MPI. More...
 

Variables

bool mpi_support_multithread
 Flag telling whether the MPI runtime supports multithreading.
 
spinlock_t mpi_lock
 
static spinlock_t msgs_lock
 A guard to ensure isolation in the the message receiving routine.
 
static unsigned int terminated = 0
 
static MPI_Request * termination_reqs
 MPI Requests to handle termination detection collection asynchronously.
 
static spinlock_t msgs_fini
 A guard to ensure isolation in collect_termination()
 
static MPI_Op reduce_stats_op
 MPI Operation to reduce statics.
 
static MPI_Datatype stats_mpi_t
 MPI Datatype to describe the content of a struct stat_t.
 
static MPI_Comm msg_comm
 MPI Communicator for event/control messages. More...
 

Detailed Description

MPI Support Module.

This module implements all basic MPI facilities to let the distributed execution of a simulation model take place consistently.

Several facilities are thread-safe, others are not. Check carefully which of these can be used by worker threads without coordination when relying on this module.

This file is part of ROOT-Sim (ROme OpTimistic Simulator).

ROOT-Sim is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; only version 3 of the License applies.

ROOT-Sim is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with ROOT-Sim; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA

Author
Tommaso Tocci

Definition in file mpi.c.

Function Documentation

bool all_kernels_terminated ( void  )

Check if all kernels have reached the termination condition.

This function checks whether all threads have been informed of the fact that the simulation should be halted, and they have taken proper actions to terminate. Once this function confirms this condition, the process can safely exit.

Warning
This function can be called only after a call to broadcast_termination()
Returns
true if all the kernel have reached the termination condition

Definition at line 270 of file mpi.c.

void broadcast_termination ( void  )

Notify all the kernels about local termination.

This function is used to inform all other simulation kernel instances that this kernel is ready to terminate the simulation.

Warning
This function is not thread safe and should be used only by one thread at a time
Note
This function can be used concurrently with other MPI functions (hence its thread unsafety)
This function can be called multiple times, but the actual broadcast operation will be executed only on the first call.

Definition at line 327 of file mpi.c.

void collect_termination ( void  )

Check if other kernels have reached the termination condition.

This function accumulates termination acknoledgements from remote kernels, and updates the terminated counter.

Note
This function can be called at any point of the simulation, but it will be effective only after that broadcast_termination() has been called locally.
This function is thread-safe

Definition at line 289 of file mpi.c.

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

void dist_termination_finalize ( void  )

Cleanup routine of the distributed termination subsystem.

Once this function returns, it is sure that we can terminate safely the simulation.

Definition at line 471 of file mpi.c.

+ Here is the caller graph for this function:

void dist_termination_init ( void  )

Setup the distributed termination subsystem.

To correctly terminate a distributed simulation, some care must be taken. In particular:

  • we must be use that no deadlock is generated, e.g. because some simulation kernel is already waiting for some synchronization action by other kernels
  • we must be sure that no MPI action is in place/still pending, when MPI_Finalize() is called.

To this end, a specific distributed termination protocol is put in place, which requires some data structures to be available.

This function initializes the subsystem and the datastructures which ensure a clean a nice shutdown of distributed simulations.

Definition at line 452 of file mpi.c.

+ Here is the caller graph for this function:

void inter_kernel_comm_finalize ( void  )

Finalize inter-kernel communication.

This function shutdown the subsystems associated with inter-kernel communication.

Definition at line 562 of file mpi.c.

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

void inter_kernel_comm_init ( void  )

Initialize inter-kernel communication.

This function initializes inter-kernel communication, by initializing all the other communication subsystems.

Definition at line 545 of file mpi.c.

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

bool is_request_completed ( MPI_Request *  req)

check if an MPI request has been completed

This function checks whether the operation associated with the specified MPI Request has been completed or not.

Note
This function is thread-safe.
Parameters
reqA pointer to the MPI_Request to check for completion
Returns
true if the operation associated with req is complete, false otherwise.

Definition at line 144 of file mpi.c.

+ Here is the caller graph for this function:

void mpi_finalize ( void  )

Finalize MPI.

This function shutdown the MPI subsystem

Note
Only the master thread on each simulation kernel is expected to call this function

Definition at line 578 of file mpi.c.

+ Here is the caller graph for this function:

void mpi_init ( int *  argc,
char ***  argv 
)

Initialize MPI subsystem.

This is mainly a wrapper of MPI_Init, which contains some boilerplate code to initialize datastructures.

Most notably, here we determine if the library which we are using has suitable multithreading support, and we setup the MPI Communicator which will be used later on to exhange model-specific messages.

Definition at line 514 of file mpi.c.

+ Here is the caller graph for this function:

void mpi_reduce_statistics ( struct stat_t global,
struct stat_t local 
)

Invoke statistics reduction.

This function is a simple wrapper of an MPI_Reduce operation, which uses the custom reduce operation implemented in reduce_stat_vector() to gather reduced statistics in the master kernel (rank 0).

Parameters
globalA pointer to a struct stat_t where reduced statistics will be stored. The reduction only takes place at rank 0, therefore other simulation kernel instances will never read actual meaningful information in that structure.
localA pointer to a local struct stat_t which is used as the source of information for the distributed reduction operation.

Definition at line 428 of file mpi.c.

+ Here is the caller graph for this function:

bool pending_msgs ( int  tag)

Check if there are pending messages.

This function tells whether there is a pending message in the underlying MPI library coming from any remote simulation kernel instance. If passing a tag different from MPI_ANY_TAG to this function, a specific tag can be extracted.

Messages are only extracted from MPI_COMM_WORLD communicator. This is therefore only useful in startup/shutdown operations (this is used indeed to initiate GVT and conclude the distributed simulation shutdown).

Note
This function is thread-safe.
Parameters
tagThe tag of the messages to check for availability.
Returns
true if a pending message tagged with tag is found, false otherwise.

Definition at line 122 of file mpi.c.

+ Here is the caller graph for this function:

void receive_remote_msgs ( void  )

Receive remote messages.

This function extracts from MPI events destined to locally-hosted LPs. Only messages to LP can be extracted here, because the probing is done towards the msg_comm communicator.

A message which is extracted here is placed (out of order) in the bottom half of the destination LP, for later insertion (in order) in the input queue.

This function will try to extract as many messages as possible from the underlying MPI library. In particular, once this function is called, it will return only after that no message can be found in the MPI library, destined to this simulation kernel instance.

Currently, this function is called once per main loop iteration. Doing more calls might significantly imbalance the workload of some worker thread.

Note
This function is thread-safe.

Definition at line 208 of file mpi.c.

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

static void reduce_stat_vector ( struct stat_t in,
struct stat_t inout,
int *  len,
MPI_Datatype *  dptr 
)
static

Reduce operation for statistics.

This function implements a custom MPI Operation used to reduce globally local statistics upon simulation shutdown. This function is bound to reduce_stats_op in stats_reduction_init().

Definition at line 348 of file mpi.c.

+ Here is the caller graph for this function:

void send_remote_msg ( msg_t msg)

Send a message to a remote LP.

This function takes in charge an event to be delivered to a remote LP. The sending operation is non-blocking: to this end, the message is registered into the outgoing queue of the destination kernel, in order to allow MPI to keep track of the sending operation.

Also, the message being sent is registered at the sender thread, to keep track of the white/red message information which is necessary to correctly reduce the GVT value.

Note
This function is thread-safe.
Parameters
msgA pointer to the msg_t keeping the message to be sent remotely

Definition at line 169 of file mpi.c.

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

static void stats_reduction_init ( void  )
static

Initialize MPI Datatype and Operation for statistics reduction.

To reduce statistics, we rely on a custom MPI Operation. This operation requires a pre-built MPI Datatype to properly handle the structures which we use to represent the local information.

This function is called when initializing inter-kernel communication, and its purpose is exactly that of setting up a custom MPI datatype in stats_mpi_t.

Additionally, this function defines the custom operation implemented in reduce_stat_vector() which is bound to the MPI Operation reduce_stats_op.

Definition at line 381 of file mpi.c.

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

void syncronize_all ( void  )

Syncronize all the kernels.

This function can be used as syncronization barrier between all the threads of all the kernels.

The function will return only after all the threads on all the kernels have already entered this function.

We create a new communicator here, to be sure that we synchronize exactly in this function and not somewhere else.

Warning
This function is extremely resource intensive, wastes a lot of cpu cycles, and drops performance significantly. Avoid using it as much as possible!

Definition at line 492 of file mpi.c.

+ Here is the call graph for this function:

+ Here is the caller graph for this function:

Variable Documentation

spinlock_t mpi_lock

This global lock is used by the lock/unlock_mpi macro to control access to MPI interface. If proper MPI threading support is available from the runtime, then it is not used.

Definition at line 55 of file mpi.c.

MPI_Comm msg_comm
static

MPI Communicator for event/control messages.

To enable zero-copy message passing, we must know what LP is the destination of an event, before extracting that event from the MPI layer. This is necessary to determine from what slab/buddy the memory to keep the event must be taken. Yet, this is impossible because the MPI layer does not allow to do so.

To actually be able to do that, the trick is to create a separate MPI Communicator which is used only to exchance events across LPs (control messages also fall in this category). Then, since we can extract events from this communicator, we can match against both MPI_ANY_SOURCE (to receive events from any simulation kernel instance) and MPI_ANY_TAG (to match independently of the tag).

We therefore use the tag to identify the GID of the LP.

We can retrieve the information about the message sender and the size of the message which will be extracted by inspecting the MPI_Status variable after an MPI_Iprobe is completed.

Definition at line 100 of file mpi.c.

unsigned int terminated = 0
static

This counter tells how many simulation kernel instances have already reached the termination condition. This is updated via collect_termination().

Definition at line 64 of file mpi.c.