Saturday, August 30, 2008

Wish 3

It's official. AMD's new SDK for gpu's is coming in two weeks. I just hope they do a better job this time. I downloaded their docs for their current release and went through them just to get a feel for what's their hardware/software platform like.

It was horrible.

Docs were alpha grade. They actually have a full blown version of their sdk meant for you to code in assembly. That's not a typo. Gawwd......, assembly, in 2008. I have had my share of writing assembly for a lifetime. We had a course in microprocessors where we had to write in assembly, hand assemble it and then punch it in hex. I hated it and I am done doing it. I think is going to be a while before I even contemplate writing assembly even for the innermost loops.

Brook+ is a disaster. Ok, may be not a disaster but still I don't feel that it is the right way to go about it. It's foundations were laid in 2004, when men were men and wrote DirectX/OpenGL shaders to multiply matrices. It was meant to allow folks to write portable shaders without asking them to learn the grpahics API first. Brook+ looks like that, and acts like that too. Even today, Brook+ compiles to C++ before it finally compiles to machine code. I dont think it is the right tool on 2008.

My hunch is that nVidia beat them to the punch with CUDA and they were forced to respond. In a hurry they dusted off whatever they could find and pushed it out after doing some renovation. It doesn't support integers and bitwise ops and they are forced to use floats as counters. What does that tell you about the maturity of their tool chain? However, their announcement of IL (aka ptx for AMD) indicates that now they have a solid base to build on. I must admit I really liked the architecture of AMD gpu's over nVidia GPU's and I hope this poor soul is able to achieve his dreams. Not to mention that one can get 2.4Tflops per card from AMD ;)

Bottomline, a few things are needed before it can be considered a serious competitor to CUDA.

1) Better docs. Absolutely the first thing they need to do. More detailed docs, explaining the hardwre naturally with lots of in-docs code samples. Having small bits of code explain the stuff to you right next to theory really helps.

2) A real C compiler. No Brook style fluff. No assembly in 2008. Expose the hardware better in the docs so that we know what kind of choices are we making in our code. What lies on chip, what is off chip? What is cached and what is not?

3) More stable drivers. It was said that drivers will not support 8 GPU's even if you could pack them into on PC by using 4 X2 cards. Why? This level of support does not cost them much. The FASTRA guys achieved enormous amount of PR goodwill for nvidia. This kind of good news really gets attention from those who are serious about writing high speed stuff for your platform. AMD stands to gain a lot of (much needed) dev attention if it can demo a 9.6T system and gone 1 up on CUDA which has been getting a lot of developer attention. [FASTRA only does 4T at the max :( ]

4) And yes, let users figure out if they want a particular GPU to be used for graphics or not. Sometimes integrated graphics are enough as in this case. The consumer bought it, he should have the right to decide whether he wants to contribute to global warming by playing games/folding@home/his own compute stuff on it.

Wednesday, August 20, 2008

Some Good News

Just installed CPU-Z. The results are very good news for what I am trying to do. My CPU supports SSE3 and has a cacheline size of 64 bytes to boot! Though I am disappointed that /proc/cpuinfo didn't show me that. May be I need to check a few things here and there to be sure of what's up with /proc/cpuinfo. I just wish I had more time to do it. I am really itching to have a go at it. C++ skullduggery has been done. Now just need to run swig and (pray to God it works out). If I can implement all the ideas I have in mind, this would be really something for me to be proud of.

Greatly looking forward to it.

Friday, August 15, 2008

Happy Birthday

Happy birthday, India

Wishing you many happy returns of the same

Wednesday, August 13, 2008

A more efficient data structure

I need to implement a faster version of BFS. I am looking for a new representation for the graph. I am going to assume that the cacheline size is 64 bytes on my Turion 64 X2 processor. It should lead to higher reference of locality and much better packing than before. And if my assumption of cache line size turns out right, it's going to be a big plus.

I have a new idea in mind, but I need to figure out the nuts and bolts for it. Hopefully should be done soon.

Monday, August 11, 2008



Keep the party going guys, Best of Luck