Developer rarely concern for efficiency of their programming logic. It's not a hard-and-fast observation but that's how most of amateur developer write the code. But performance is becoming critical day by day and commercial applications vie for as much performance gain as possible.
Performance can be hit for motley of reasons but application logic is what we are going to stress here. You may write a code that performs badly with cache system of your hardware or you schedule your threads inappropriately. There may be many reasons that one may not become aware unless someone find it out.
HP Caliper is one of my favorite tool that can help you with finding many causes of application slowdown. It is an Intel Itanium based tool and runs on HPUX & Linux.
Talk about its feature and I may run out of space. It can make basic profiles like sampled call-graph, flat function profile, CPU events profile. Besides it can provide call-stack profile (critical for I/O bound applications), data cache profile (to help you re-layout the data structures).
Best part is that it does not need a recompile of application or any library. Just give it a binary or attach it to a process. Run it and there you are with third party insight in to your logic. It comes with a command line interface and GUI.
Only downside is that it runs on Itanium binaries only so other users have to wait till it become available for them too.