Parallelization of Medical Imaging on Clusters
In collaboration with Lucid Concepts AG, Technopark Zurich (http://www.lucid.ch), a Swiss software startup in medical imaging, we developed a distributed parallelization framework in .NET for executing workflows of image processing steps on MS HPC clusters.
Project duration: 2013. Partly funded by CTI/KTI.
Lucid Concepts develops medical imaging software for analysis of high-resolution computed tomography (CT) images. The objective of this applied research project was to accelerate this image processing as much as possible on a high-performance (HPC) cluster. In this setting, image processing is described as a workflow (directed acyclic graph) consisting of various processing algorithms with partial dependencies. A particular challenge is to cope with the large image data (typically a few GBs per workflow).
Lucid medical image software frontend (copyright and courtesy by Lucid Concepts AG, Zurich).
A distributed .NET image processing architecture on the basis of the existing Lucid system has been designed and implemented, executing the workflows on a MS HPC cluster in the backend. The architecture consists two main components: (1) A distribution controller dispatches workflow jobs on a cluster, by also transferring the necessary data to and from cluster. (2) A runner component on the cluster acts as a thin wrapper running specific processing tasks on the compute nodes. Thereby, the system exploits the degrees of independencies across workflow processing steps to facilitate parallel processing on different cores and compute nodes in the cluster. Various optimizations have been applied to reduce the traffic between client and cluster as well as between cluster compute nodes: Selective dispatching of compute-intense tasks on cluster, combining subsequent processing steps for economizing inter-node data exchange, compressing data for transfers, and permitting configurable multi-core usage per processing step.
Architecture of the distributed image processing system.
Experimental evaluations showed that the system achieves the maximum theoretical speedup that is per se constrained by the necessary data transmission time of the compressed input/output data between client and server. This is particularly suited for computing-intense image processing algorithms where the parallelization gain clearly outweighs the network transmission costs.
For all questions and feedback, please do not hesitate to contact us: Contact