std::bad_alloc

July 7th, 2007

Exceptions are one of the most flamed topics in C++. But before I talk about the operator new I want to say a few things in general about them. There are many ways to use exceptions, different approaches. Although, it costs more than error checking in sake of speed, it is pretty convenient and usable. Well, some people use it to replace these return code tests. Others use it in rare cases where something less expected happens. I even saw some justified reasons to why not use it. For instance, you can’t see if a function throws an exception from merely looking at its interface. Every library has its own type of exception base class, CAtlException, std::exception, etc… And you should not let anything slip your catch clause. So sometimes it might be pretty annoying to catch them all, and then you have to multiple your inside-catch-clause-code for handling each different exception. And if you have C++, you won’t dare using goto, even not for a cleanup…Of course, there are good sides also and that’s why most people use them after all.

If we take a look at the operator new, that’s the C++ raw memory allocator, we will see that it might throw an std::bad_alloc in case of failure, which is 100% legitimate, because most of the times you just expect that it will return your a pointer to a valid contigious block of memory. But now, assuming I use my own internal exceptions type for my application; Why do I need everytime to catch this exception and rethrow the converted exception? This is waste of time both in development and in run time. And there are some approaches who believe you don’t have to catch it at all, because if it is thrown, you are dead anyways. Now tell me seriously, if you write a real software for a company, you are not going to catch it? now gimme a break! So catching it you must, and therefore I don’t think every usage of the new operator should be wrapped by its own try and catch if you wrap it anyways with catching your own application’s exception. So eventually, after lots of discussions with my colleagues, we decided it suits us mostly to overload the global operator new function with our own custom function with a single change, that it will throw our exception rather than std::bad_alloc in case of failure. Overloading that operator, requires you to overload the array interface also. Don’t forget, when overloading new you will want to overload delete… :)

Basic Ops #1 – Multiplication

June 30th, 2007

A few years a go a friend challenged me to write the basic arithmetic operations, add/sub, div/mul. But the catch was to write them without using that same instruction for doing the calculation and to make it as a long multiplication rather than log(A*B). I think the Multiply implementation is the easiest one amongst these operations, so I decided to start with it:

unsigned int mul(unsigned int a, unsigned int b)
{
  unsigned int sum = 0, d = 0;
  while (a) {
     if (a & 1) sum += (b << d);
     a >>= 1;
     d++;
  }
  return sum;
}

The idea of the algorithm, which I wrote from scratch, was to treat the input A as a mask. And anytime the next bit is set in the mask, I accumulate the shifted B. So what you get actually is long multiplication, which will work for negative numbers as well as positive ones, since the nature of 2’s complement numbers.

You should take a paper and a pencil and start playing with the algorithm by applying it on small numbers and learn how it works… :) Though this one is easy, really.

Multi-byte NOPs

June 26th, 2007

Horay, finally official multi-byte NOPs are supported by Intel 80×86 processors. So OK, not all processors support it yet, and you have to check the result of a cpuid instruction to know whether you can use it or not, but it’s a good start nevertheless. It is unnecessary to mention that this kind of NOP is a real NOP, and yet I just did. :)

So why do we care you ask?

Well, probably you certainly don’t. But some code generator tools will use it in the future, or hand written code in assembly. It will make the code more understandable and it will fit all sizes you need from 2 bytes up to 9 bytes. However, to be honest, I can’t really grasp the reason why they actually support it. I mean, one could write X single byte NOPs. But then I thought, the processor will spend time for decoding the instructions in the pipeline until it will realize it’s a NOP. So I guess the multi-byte NOP will be faster than a varying number of NOPs in a row. I tried to think what’s the benefit of such thing. Most of the NOPs or pseudo NOPs are used to align code to 8 or 16 bytes boundaries. This is probably for caching reasons and faster memory read operations. But if the NOPs are used for alignment usually they will follow a branch and (sometimes) even conditional branch, so they might not get to be executed most of the times. So why the effort for making such a new instruction? I can’t come up with a good answer. Got any idea? Ah, and if you were to align your code and it won’t run then why not dumping zeroes as a padding? And it’s not like 80×86 architecture has delayed branch… So they had a good reasoning for coming up with this instruction. The only real answer I came up with is for time critical code where you have to measure your code in MS…

Maybe you already saw some pseudo NOPs in the past. These are real instructions that don’t affect registers or flags of the current execution context. You can come up with many variations of such NOPs using the LEA instruction:

LEA REG, [REG+0]

And you can come up with less popular NOPs, MOV REG, REG…

The advantage of the LEA instruction is that it accepts a complex addressing mode operand so you can make the whole instruction bigger easily by using different sizes of the immediate ‘0’. If you don’t understand how the ModR/M is formatted, you will have to assemble the code and disassemble it until it fits the size you require. Not mentioning that you can’t control 100% of the code generated by the assembler that you would prefer to use the DB thingy directly to emit the pseudo NOP. The multi-byte NOP accepts the same source operand as LEA accepts, therefore its size varies.

Well, you might find yourself find of ADD EAX, 0 as a NOP. Probably because in math it doesn’t mean anything special. But while executed it affects the flags. Though, you might say it’s not a big deal and that depends where you dump the pseudo NOPs in your code. However, it seems that even a popular assembler had some bad issues with generating pseudo NOPs. And believe me, when you trust a tool and it misbehaves, you might get crazy until you realize who’s to blame!

Now I have to make diStorm support it ;)

NOP NOP NOP!!!1

FAT Bastard

June 25th, 2007

Hey, sorry for being so quiet for the last week… been busy.

 Anyways, this is an interesting thing that happened to me a couple of weeks a go. My father bought some IPod nano thingy, I wouldn’t call it fake, but it has its advantages, radio, video and it even lets you delete files directly. Don’t get me wrong, I myself have the IPod nano which is really cool and prolly enough for most of my needs. But it is pretty expensive for a 2gb disk… oh well. So I was uploading songs to my father’s player. I was kinda lazy and decided to dump all songs in the root directory. After a while I got some message while trying to copy more songs, something like “Cannot create a file or directory”. It really annoyed me, I thought it might be something with the unicode file names, so I even tried to rename them to English alphabetic, just in case, but for no avail. At this moment, I wanted to give up. I was lacking any idea of what the problem might be. I almost thought the player is broken in some way and should be replaced…who knows.

The day after I sat with my friends from work, telling them this issue, no one had any idea what it might be. (And they are people like me who should have the faintest idea.. but nothing). So I told them that the only crazy idea I had in mind is that I can’t copy the songs because the player is formatted with FAT16. Which if you know or not, has a static size (of sectors) for the root directory. It means that if you allocate all these sectors, the root directory is full and you can’t continue creating a directory or copying a new file onto it. I even wrote a sample that shows how to parses FAT12 (FAT16 is easier to parse, because the size of a chain pointer is a whole 2 bytes, not 1.5…but both are piece of cakes compared to todays FS’s). Anyways, I got back home, formatted the player’s disk, for some reason it got crazy and did directory loops (recursively linking to the same directory) ??? I didn’t know what was going on. And to be honest, really didn’t care. Later on, I decided that now that I think I know what was the problem, I must create directories. Now, in FAT the sub-directories are not limited in any way (as opposed to the root directory). Creating directories and uploading songs, finally, I could easily upload songs and use the whole disk.

I bet that even the support team of the company who invented this player has no idea about this limitation. Funny enough I later noticed that is also supports FAT32 (which doesn’t suffer from the root directory limitation), but it seemed like the player got problems browsing the directories… aargh.

 I think this is a really cool example that shows that your low-level knowledge might help you for real high-level stuff… Knowledge is power. :)

 P.S: When saying “disk” as the player’s storage, I meant a flash storage…but it doesn’t really have anything with my post.

Here is more info about FAT16 vs FAT32.

HICON from Memory – Fixed

June 15th, 2007

It gets worse from where I left it. My solution was good for retrieving the one and only icon image in the .ico file. It fulfilled my needs, so I didn’t have to check it further. But Jaine left a comment where she says my code doesn’t work for her icons. So I created a dummy project and tried my code with an icon file which contains several images and tried to get one of them, and to my surprise it didn’t work. The reason I decided to write about this issue in the first place, was because I thought I had a good solution that might interest other people. Specifically the implementation that I am dependent only on standard API’s, nothing undocumented, etc…

First thing I should do is explain why my code doesn’t work for several images in the same file. So, if you took a look at the article I directed you in the other post, “Icons in Win32”, you’ll notice, and I said that earlier too, that there are two structures of the images of the icon: in-memory and in-file. My assumption was that they are of the same size, but I didn’t notice that the in-memory structure is two bytes lesser than the in-file structure. To be accurate, the in-file structure named ICONDIRECTORY, according to that article, is 0x10 bytes in size. And the in-memory structure named GRPICONDIRENTRY, is 0xe bytes in size. The different is in the last field, which I said was the same in both structures, which is the Id and Offset. The Id is a word sized and the Offset is dword sized (so you can point to the whole file, > 64kb). So what happened was that LookupIconIdFromDirectoryEx scanned the bits we fed it with the sizeof(GRPICONDIRENTRY) and the file we read is using the ICONDIRECTORY structure which is longer. Thus, every loop (if you have more than one image in the .ico file), the offset to read the next structure of the image was -2 behind… and things messed up, of course. Note that icons in resources are stored as in-memory, recall you pass ID using MAKEINTRESOURCE to LoadImage…

I really hoped to find a solution which uses the standard API, but without success. Maybe GDI+ will do the job, but I didn’t try and I don’t wanna use GDI+ in my case…but you should give it a shot, with IStream and other not-pretty stuff. :) Anyways, I couldn’t avoid a hacky solution, this way or another, it seems we will have to rely on that article and hope MS won’t change their icon format (LOL, never know). So this is my really hacky solution:

void FixIconToResource(PBYTE pbuf)
{
unsigned short i, count = *(unsigned short*)&pbuf[4];

for (i = 1; i < count; i++) {

// 6 to skip GRPICONDIRENTRY header. 0x10=in-file size, 0xe=in-mem size.

memcpy(pbuf + 6 + (i * 0xe), pbuf + 6 + (i * 0x10), 0xe);
}
}

It has to be called with the buffer of the icon and before it is passed to LookupIconIdFromDirectoryEx. It assumes the icon is valid and that I can access the structures inside it. What this code actually does is repositioning the ICONDIRECTORY structures to be in their appropriate place as GRPICONDIRENTRY (just copying them 2 bytes backward, overriding the 2 bytes of the offset). This solution seems to work well, but it has two drawbacks, the easy-to-notice one is that I alter the buffer in-place. But I guess that’s not a biggy. The other problem is that while copying the structure, I actually convert it, thus chopping the last 2 bytes of the Offset (of the image inside the file…) and making it a word sized value. So it means that if you have icon files that are bigger than 64kb, this solution won’t work…

The funny thing (or really sad) is the reason my code still worked for the first icon without this fix. The thing is that the Id  (word) and Offset(dword) start on the same offset in both structures. So if you have a dword and you read only a word from it, you will get the low-word. Luckily, we’re on x86 architecture and the little-endian behavior made it working for me, otherwise I would probably end up with Offset of zero…

I thought it’s important to, at least, say a few words about the ‘best’ solution. That will be like what LoadImage actually does deep inside its code… constructing a fake GRPICONDIR (that’s like the icon resource inside a PE) and filling in all fields from the in-file structures to the GRPICONDIRENTRY. But this is not enough, because we still want to support any file-size. So you will pass in the index of the image inside the icon group. Then when you are done converting all images, you will call LookupIconIdFromDirectoryEx which will return the correct Id (which is actually the index to the icon) and what you are left to do is seeking to the in-file structures as an array with that index, and get the offset to the image, passing that to the CreateIcon…

Thanks Jaine, without you it wouldn’t have happened… :P

Oh Boy, There’s a Bug

June 12th, 2007

Can you spot it?

void p(const CString & str)
{
char* endPtr = NULL; 
unsigned long x = strtoul(str.Mid(1, 2), 10, &endPtr);
if (*endPtr != ‘\0’) {
// invaild input…
}

}

 Using ATL’s CString, you can run this code and nothing will happen – the bug won’t be ‘triggered’. And more than that, everything will work as expected and correctly. But as most developers find bugs, running the software and fixing the crashes, this one didn’t come up to the surface… so you have a bug and you don’t know about it, how not fun. If you are really into C++ business it might be easy for you to spot this bug. But I’m not sure if it’s C++ to blame here, guess it’s not. If you take a look at the Mid method its declaration is as follows:

CStringT Mid(int iFirst, int nCount) const;

 This means it returns a object with the relevant slice of the original string. The destructor of this new object is immediately called, because its scope is a ‘parameter’. So now the endPtr points to freed memory. Yey. And then you access endPtr as you should…even if the memory is freed it is still in memory (because CString dynamically allocates space for its buffers), only AppVerifier forced it to be removed, thus generating an access-violation and then it surfaced up and we fixed it. But, honestly, it was still tricky to spot this one.

 So now that you know the bug and why it happened, I can only say that you should beware of methods/functions that return objects and has a very short scope, if at all. Not mentioning the copy constructor issues… Of course, merely the scope isn’t to be blamed, it’s the mixture of short scope and a stale pointer (and dynamically allocated memory),…To be honest, if the memory was local to the stack there was no way to detect this bug, unless the compiler would use the same stack space for different variables and they were written to between the call to strtoul and the access to the pointer. And even that isn’t reliable, but you could notice some weird mismatches between the input and that input validation…

 Any suggestions for tracking such bugs?

Plugins’ Headache

June 10th, 2007

As I was writing a plugin, or to be more accurate, an extension module for Python in C/Win32; I had mysterious crashings here and there. Using a JIT I saw that the crash is in malloc/free so I looked at the stack and saw where’s the problem at (function scope). So far so good, piece of cake. Isolating the problematic code, yet I still didn’t fully understand why it happens, I ran it on a new clean project in Debug mode and got the cool message that there might be a heap corruption. Not surprising, of course. Now, the thing that made me really angry was that my extension used to work with an older version of MSVS. So thinking of this fact, I immediately took a look at python25.dll and saw that it uses MSVCRT version 7. Also, looking at my recently compiled extension dll, I noticed that my CRT is version 8. Knowing all this: heap does problems and different CRTs, I understood that my code uses one heap and Python’s uses another heap. Which is a big no-no, of course, mixing between different heaps…ouch.

Well, it was time to find a solution to a simple problem – I have to use the same heap. But how? Well, to make a short story shorter, I took a look at the Python’s header files and noticed they have two malloc’s. One was a macro which substitutes with malloc(size), which means use the current compiler’s CRT (that what I had used actually). And the other malloc was their own implementation of memory-allocation (above their own CRT’s malloc). Reading the comments surrounding that function declaration it says something like “platform independent allocations”. So voila, I knew that all I had to do is using these memory functions. At the moment I saw these functions, I blessed Python’s developers for thinking, whether directly or not, about this problem :)

The moral story is that if you integrate your code with another application in the code level make sure you talk the same protocol. It might be a memory heap, in my case, and it might be different implementation to FILE*… Just remember to supply the necessary functions for different compilers. Or if you’re on the SDK users side, use the SDK wisely, before you start hacking your way to a working solution.

Creating HICON Directly from Memory

June 8th, 2007

[updated] 

 How many times you had to do something trivial and you knew there ought to be a decent solution out there?

That’s another common problem that I smashed my head against the wall and came up with a surprising simple solution. For example you might archive icons in a single file, or you download them from the Internet to a buffer in memory. And then you want to display them. That sounds an easy task ah? But most people will just write the memory block back to a temporary file and only then will use LoadImage to load the icon from the disk. That’s lame and hits performance big time. To spare you with the research I had done to solve it, here’s the code:

HICON createIcon(PBYTE iconData, int iconSize)     

{     

   HICON hIcon = NULL;     

   // Ahhh, this is the magic API.     

   int offset = LookupIconIdFromDirectoryEx(iconData, TRUE, iconSize, iconSize, LR_DEFAULTCOLOR);     

   if (offset != 0) {     

      hIcon = CreateIconFromResourceEx(iconData + offset, 0, TRUE, 0x30000, iconSize, iconSize, LR_DEFAULTCOLOR);     

   }     

   return hIcon;     

}

You should read the documenation about LookupIconIdFromDirectoryEx, it says the call should be wrapped with SEH…for malformed files. Probably you ask yourself why I treat the ID as an offset. But that’s because if you look at the structures of ICON in-file and in-memory you’ll notice that’s the same field. So it fits for our purpose here and gives us the offset to the single icon’s data rather than the whole directory as it is saved in file or resource data.

Of course, you have to call DestroyIcon when you’re done messing with the icon…

 For more information about icons format and Win32 implementation and relevant API,  check this out. It really helped me in understanding how it all works. Funny that it even hints about that LookupIconIdFromDirectoryEx. ;) Too late.

Focusing on the Focus

June 7th, 2007

Sometimes you want to create a pop up window that won’t steal the focus. It seems that even a simple ShowWindow(SW_SHOW); instructs the windows manager to set the focus and activate that window. But what if you want a tooltip ? After all, even a tooltip is a window. So no need to invent the wheel again, you can just use SW_SHOWNA which means – hey, show my window but don’t activate it.

If the window is already activated, thus having the focus, specifying SHOWNA won’t help it, because it’s already got the focus. So you are doomed, and have to wait until some other window takes the focus or when your window is destroyed the window manager will automatically giev the focus to the former window. But that won’t be good enough for a tooltip-like window. So you have to be careful with it. Even if you want to reposition the window using SetWindowPos you will have to set the SWP_NOACTIVATE flag, so it won’t get the focus. That’s true also in a case where the window is hidden, which sounds a bit stupid to me (why activate a hidden window?). But who said it’s perfect? and not to mention the dialog manager…pffft.

After seeing myself that many people have this problem, I decided it’s worth a post. :)

Pop Count

June 5th, 2007

Population Count is counting the number of bits set to one in a word (or any sized integer). I found pop counting as one of the most fascinating small algorithms in the bit twiddling category. There are so many uses for pop counting. Though, rarely, I find myself use them, to make stuff compact, like bit-arrays, special bit-flags which represent collections, etc… There are also popular algorithms which use pop counting in error-correcting codes and sound/image processing.

Here’s my favorite implementation:

int popcount(unsigned int x)
{
int c = 0;
for (; x > 0; x &= x -1) c++;
return c;
}

The whole concept is behind the mere formula: x &= x -1. But I will leave it for you to figure it out.

Speaking of pop count, Imri and I even used this same implementation above in the NULL Terminated Strip. I even hinted for this algorithm. And here’s a real 32 bits bugfree implementation (that strip has the bugs on purpose):

XCHG EBX, EAX
XOR EAX, EAX
OR EBX, EBX
JZ END
A:
LEA ECX, [EBX-1]
INC EAX
AND EBX, ECX
JNZ A
END:

Of course, most implementation are branch-free code, for the sake of speed. However – Lately, I added the SSE4 support to diStorm, and among the new instructions I found out there is a single instruction for it all: popcnt. :)