Notorious Devil

Friday, November 6, 2009

Bouquet of questions

o What is a cache line?

- Smallest unit of data transfer between cache and main memory.
- Finest level of granularity

o How can I get IPC information over processes?

- Use $ipcs

o Tell the value of enum's elements.

enum e_tag{ a, b, c, d=20, e, f, g=20, h
}var

- a=0, b=1, c=3, d=20, e=21, f=22, g=20, h=21

o Why use volatile?

- To avoid compiler optimization on variables involved in two cases:
+ Shared library
+ Value updated implicitly by hardware

Avoid optimization of Load/Store instructions by the compiler.
e.g., If you have a variable(X) that reads from a shared lib variable(Y), while Y is being updated by other process. Compiler would have no idea of this dependency of external value updates and it may remove these LOAD/STORE instructions during compilation.

Thursday, October 22, 2009

Digging Intel Itanium: RSE, Register Stack Engine

Itanium processors heralded a new era or processing. Though not very successful commercially, IPF has added many wonderful techniques to computer science. I'll discuss the one I liked the most.

It is called RSE (Register Stack Engine), a technique to avoid using main memory during function calls and do the stuff in processor register itself. When a function call is made, calling function passes the arguments that are saved in main memory. After this, the return address is saved. Main memory access is slower when compared to processor speed. Itanium has 128 GPR and out of those 96 are available for RSE. These 96 registers take care of function call mechanism, appearing as a register stack frame to the application. This bypasses memory access till all 96 registers are occupied. Interestingly, processor itself is responsible for running this show and also it's transparent to the application.

More here: http://software.intel.com/en-us/articles/itaniumr-processor-family-performance-advantages-register-stack-architecture

Tuesday, October 13, 2009

Fast hard drives: How?

A simple hard drive today is capable of things that sound like some outlandish technology. Just try to do some file I/O in your application and do it with many threads.
Say you have 4 threads, A,B,C, and D. And request to do I/O comes in A then B and so on. If you check the return status of these threads, the ordering might be surprising. Thread D may return before A. How?

Disk have a technology called Native Command Ordering. So they take your request in and process them on a single, simple logic:
-> Serve the one which you can do fastest.
This depends on the head position of the disk. The request that can be served with minimal movement of head, is served first.

Sunday, October 4, 2009

A few 'why' answered

There are certain 'why' that we may have missed, so I am trying to attend them...one by one.

o Why do we need hash table?
Of course, for better, faster search. It could even get us an element in O(1).

But we need to use hash because if given data is in a form which can't be ordered; we need hashing. Example can be images. How will you order a set of images and search any.
Hashing comes to help here. It generates a unique(ideally) hash key for given such data. We save these keys in a hash table. So searching an image is now searching a key in hash table. Keys are generated with a hash function. More about that later.

To be continued with more whys.

Tuesday, September 22, 2009

HP Caliper : A Profiler

Developer rarely concern for efficiency of their programming logic. It's not a hard-and-fast observation but that's how most of amateur developer write the code. But performance is becoming critical day by day and commercial applications vie for as much performance gain as possible.

Performance can be hit for motley of reasons but application logic is what we are going to stress here. You may write a code that performs badly with cache system of your hardware or you schedule your threads inappropriately. There may be many reasons that one may not become aware unless someone find it out.

HP Caliper is one of my favorite tool that can help you with finding many causes of application slowdown. It is an Intel Itanium based tool and runs on HPUX & Linux.
Talk about its feature and I may run out of space. It can make basic profiles like sampled call-graph, flat function profile, CPU events profile. Besides it can provide call-stack profile (critical for I/O bound applications), data cache profile (to help you re-layout the data structures).

Best part is that it does not need a recompile of application or any library. Just give it a binary or attach it to a process. Run it and there you are with third party insight in to your logic. It comes with a command line interface and GUI.

Only downside is that it runs on Itanium binaries only so other users have to wait till it become available for them too.

Happy profiling!!