It's not working. After working fine, giving right results, even allowing itself to be benchmarked, and being reported as correct, it has kicked up a huge fit. There seems to be a memory bug somewhere. Chasing it led to STL of c++ and glibc. That bug hunt story is as surprising as it is rewarding (from a learning pov, and not from a productivity one). Then I came across valgrind.
Thank god it exists. It's literally manna from heaven.
There seems to be a bug in the lookup table for the map between position and bit offset. I have tried removing the alignment requirements, trying both the memset and the manual set versions, but it stubbornly says that there is a small error somewhere here. All errors from my shared binary point to its involvement.
Need to sleep on it somewhat. Doesn't appear to be a big bug at this time, but will be hard to find. Hopefully it's the only one.