Tuesday, September 30, 2008

What an idea!!!

Now that's called an idea. Call them crazy, but hey this guy wants a shot at it. And who knows, he might succeed.

Monday, September 29, 2008


It's not working. After working fine, giving right results, even allowing itself to be benchmarked, and being reported as correct, it has kicked up a huge fit. There seems to be a memory bug somewhere. Chasing it led to STL of c++ and glibc. That bug hunt story is as surprising as it is rewarding (from a learning pov, and not from a productivity one). Then I came across valgrind.

Thank god it exists. It's literally manna from heaven.

There seems to be a bug in the lookup table for the map between position and bit offset. I have tried removing the alignment requirements, trying both the memset and the manual set versions, but it stubbornly says that there is a small error somewhere here. All errors from my shared binary point to its involvement.

Need to sleep on it somewhat. Doesn't appear to be a big bug at this time, but will be hard to find. Hopefully it's the only one.

Friday, September 26, 2008


It's working. That's a huge huge satisfaction. Considering that it was only yesterday that I was down in the dumps, the recovery seems miraculous some 20 hours later.

However, as always, the devil lies in the details. Blind run for production sized samples led to a 3x slowdown.

Yup. That's correct. All this effort. All this pain. All the sweating, thinking, toiling, fretting, praying for a 3x slowdown. I didn't sign up for this.

On deeper inspection, the searches are at least 2x faster. PRNG is 5x faster. Then what the hell is up with it?

Turns out, the overhead in Python/c++ transition is too much. Not much if you are going to amortize it over large runs. But if the underlying code is not going to do much, you are dead. So, some automation is in order. Further, I need to add prefetching hints to it. I think at least some latency of memory access can be hidden safely behind the vector operations. And I just realized after running it, there are fundamental limits to speed up that may be obtained with my cache aware optimizations.

I am gonna go out on a limb by saying this. I think we are approaching the fundamental limits of this method. But then, I have said before that I am out of optimizations for this. And of course, as always,


In short, the real bottleneck is between the keyboard and the chair, not between the motherboard and the cooler.

Then why say so? It's my gut feeling. Would love to be proven wrong. The more the margin, the better.

Wednesday, September 24, 2008

When is enough, enough?

Swig is working. My new cache aware data structure is working. Segmentation faults have been removed. It seems faster too. Good news, you may ask?

Unfortunately no. When my code was working, I created a new branch with git and left the old one in master. Something seems to have gone wrong meanwhile. Today I just checked out the master branch and it was the same as my new branch.

This is bad.

Very bad.

Anyway, what I was trying to do was that I was trying to add vectorization. Not working. No idea why. The portions I actually vectorized are reporting correct results. Naturally, other parts also got touched as I was trying to convert my data to 16 byte aligned AoS form. Now, I have no idea why it's not working. Meanwhile I wanted to go back to the older, working, scalar version.

And now, it's gone.

God knows how much I struggled to get vector multiply working using only SSE2 intrinsics. There is a direct instruction in penryn class CPUs. It turs out that SSE3 wasn't so useful after all since it had mainly floating point intrinsics.


Can I have some divine intervention please?

Wednesday, September 17, 2008

Segmentation Fault

I have some good news and some bad news. Good news is that SWIG driven C++/Python combination is working fine.

The bad news is that this attempt of mine also happens to be the time I have jumped headlong into using pointers. I have used them before, but only in small amounts, where the code was well understood and in working state and even then, they were only introduced as part of optimizations. So now all their nastiness is being exposed to me. I am getting segmentation faults at seemingly random places. An example.

I am using a file and when I am done using it, I set it to NULL. Further, I was closing the file in the destructor as an added precaution as well. So it led to an attempt to

fclose(filePtr);//filePtr is NULL when this is called

This was causing a segmentation fault. I had no idea that you can't close a NULL file pointer. Now I have dropped this call altogether. I am still getting segmentation faults in seemingly random places. I can't use gdb to debug it either. (I don't know how). Segmentation faults are supposed to be the easiest ones to find. You just run them in a debugger and it points to the offending location for you. But, no such luck for me. So bottom line, code on.

Saturday, September 13, 2008


Exams finished yesterday. I have a some good news and some disappointing news. The good news is that I got swig to almost work. Just two small routines left to write and then I should be on my way to C/C++ and Python nirvana. Big deal, you may very well ask.

It's a very big deal for me because I have an established track record of getting stuck in so called one-time-tasks. You just do them once in your life, sort of foundation stone laying ceremony for something. They bite me especially. Rest of the folks would just do it and forget all about it. Not me though. Anyway the good news is that right now the stuff seems to be working fine.

The disappointing news is that 4870x2 is not supported by AMD sdk. WHY??? Of all stupidities, why this one. It's the fastest card in your line up for christ's sake. Poor guy.

Checked out the v1.2 of AMD's Stream SDK. It was horrible. Again. Brook+ now supports ints, but their FP matrix multiply routine still uses floats as counters. While their int based matrix multiply routine uses ints as counters. Not much improvements in docs. Why haven't they made even this simple a change? New features such as compute shaders, local data share are exposedd only in CAL but not in Brook+. It seems that AMD is focussing their efforts on building CAL as a reliable foundation for future releases (aka OpenCL and DX11 compute shaders) and has totally ditched adding new features in Brook+.

Saturday, September 6, 2008

India's Breakout Moment

Ladies and gentlemen, India's breakout moment has arrived. It's fitting that it arrives less than a month after Beijing Olympics. It's a very happy feeling that we are sitting at the world's highest tables today. In 1998, we were abused, screamed at, called names, sought to be punished (with economic sanctions). in 2008, we are a fully paid up lifetime member of the very same club. It is fitting that the country that built an technology jail for us has torn it down after 34 years.

In this hypocrite world, the only language that is understood is the language of power. As has been said very eloquently, samrath ko nahin dos gusain. In english, the powerful are not capable of doing any wrongs. In 1998, we seen as a danger to world peace and stability. In 2008, nobody dared cross our path as we rewrote a 40 year old treaty signed by the ~190 countries on our terms. The icing on the cake is that we are still out of that treaty!. What's the difference between then and now? India's march ahead is seen as inevitable. And when a 1.2 billion heavy elephant puts on weight and starts building up momentum, people think twice before getting in the way.

As for the non-proliferation pricks/ayatollahs/hypocrites, NPT has been blown apart from inside, not outside. The biggest dangers to it lie inside. China, North Korea, Iran have systematically shredded it.

After following insane policies for 40 odd years, we are now well on our way to achieve our rightful place in the world. Sure, we have almost infinite capacity to screw it all up, but still, this gives me renowned hope that in my lifetime atleast, we will be able to stand up to anybody else in the world with pride.

I hope this turns out to be our inflection point. Watch out world, India is about to gatecrash your party (on the time scale of couple of decades that is). In it's own style.

Wednesday, September 3, 2008


Exams. They are here. Again. Gotta study. I wish there were less courses and more time to pursue your own interests here in IIT. But anyway, I really need to focus on my studies now. I can't keep cribbing and neglect my studies meanwhile. Though I haven't really thrown myself into studies headlong yet. :)