This would be the most frequent asked question.
With STM32F103 running at 72MHz, version 0.12 takes 1.78 [sec], for gpg --clearsign (total real time, including communication, computation on host PC).
It will be improved a bit in next version.
See PolarSSL fix for ARM for improvement.
read more