Insanely Low-Level

Python: Converting Signedness

November 30th, 2007

Many times I encounter Python modules that were written in C/C++ which when you use them you get to a really annoying problem. Integers that that module returns are all unsigned. That’s really a problem, because Python longs are (potentially) inifinite (they are digit-strings) AFAIK, unlike C/C++ integers that have a limit of, usually, 32 or 64 bits.

If you pass back -1 to Python and format it as unsigned long, you will get 2**32-1. Now suppose you really want to treat the number inside your Python code as signed integer, you will have to convert 2**32-1 to -1 somehow. Most solutions I saw to this problem was to use struct in the following manner:

import struct
struct.unpack(“l”, struct.pack(“>L”, 2**32-1))[0]

Packs an unsigned long value of 0xffffffff to (signed) long -1 (using little endian, but I don’t care now about it).

You might want to use unsigned long long – that’s 64 bits integers in order to convert a bigger number. So you can do one of two things, convert your 32 bits integer to 64 bits integer by sign extending (and that’s another whole challenge) it and stuff it into unpack/pack of 64 bits, or test the size of the integer (by how many bits it takes) and call the correct unpack/pack pair.

It was then that I realized why I shouldn’t use this ugly trick. It simply doesn’t support Python’s longs. As I said earlier they are infinite and using this trick you are capped to 64 bits. So I thought of a better way, and not using any stdlib module at all, leaving it pure Python code…

The way we know to change the signedness of an integer is by negating it, which is NOTing all bits of that number and incrementing it by 1, right? (2’s complement) Well true and that should work:

(Say we work with 8 bits integers)

0xfb = -5

>>> ~0xfb
-252
>>> ~0xfb+1
-251
>>> 256-251
5
>>> 0-5
-5

The way it works is: -(256 + (~x+1)) where x is a byte. For every integer we need to scan for its most significant bit… We can do it the other way around with the following formula (x – NEXT_MSB(x)):

>>> 0xfb – 0x100
-5

This way it’s like we changed the sign bit of the integer and fixed the number as well. Both formulas can work for all integers’ sizes. But the key here is to find the MSB. I prefered to stick to the latter formula rather than the former, since it seems to be shorter. But it doesn’t really matter, both work.

So now we have to code it, and as you should know me already – in one-liner device! The first challenge is to find the MSB, something like this should suffice in C(!):

for (int i = 7; i >= 0; i–)
if (x & (1 << i)) break;

This will work for a byte integer, and note that we must start from the end towards the start. Otherwise we won’t find the MSB but the LSB. The problem in Python is that we don’t know the size of the longs we mess with and we need to come up with a trick to find its MSB.

The lame trick I used for converting a number into decimal ASCII string, will be used here too and it goes like this:

for i in xrange(2**32):
if (i / (1 << i)):
break

We try to divide the input number by 2, 4, 8, 16, 32, … and when the result is 0, we know that we are out of bits. I said it’s lame because we use division, which is slow. If you got any other idea write to me please.

Another drawback is the limit of the numbers we scan, we are limited to 2**32, this is huge enough and I guess you will never reach that, or I will be dead first prolly :o. Using Erez’s trick (see here), we can make it a bit more elegant and stop as soon as the MSB was found.

I am not sure whether you noticed, but supplying an input of a negative number isn’t a smart move, we will have to check for it specifically. Eventually this is the code I came up with:

(Note that the input “bits-stream” can be any size)

def signed(n):
return n if n < 0 else n – [i for i in (2**j if n/(2**(j-1)) else iter(()).next() for j in xrange(2**31-1))][-1]

>>> signed(0xb)
-5
>>> signed(0xfb)
-5
>>> signed(0xffffb)
-5
>>> signed(0xffffffffffffffffffffffffb)
-5L

GZ-LN

Posted in Algorithms, Python | 3 Comments »

Delegators #3 – The ATL Way

November 24th, 2007

The Active-Template Library (or ATL) is very useful. I think that if you code in C++ under Windows it’s even a must. It will solve your great many hours of work. Although I have personally found a few bugs in this library, the code is very tight and does the work well. In this post I’m going to focus on the CWindow class, although there are many other classes which do the delegations seamlessly for the user, such as: CAxDialogImpl, CAxWindow, etc. CWindow is the main one and so we will examine it.

I said in an earlier post that ATL uses thunks to call the instance’s window-procedure. A thunk is a mechanism to convert a function invocation between callee and caller. Look it up in Wiki for more info… To be honest, I was somewhat surprised to see that the mighty ATL uses Assembly to implement the thunks. As I was suggesting a few ways myself to avoid Assembly, I don’t see a really good reason to use Assembly here. You can say that the the ways I suggested are less safe, but if a window is choosing to be malicious you can make everything screwed anyway, so I don’t think it’s for this reason they used Assembly. Another reason I can think of is because their way, they don’t have to look-up for the instance’s ‘this’ pointer, they just have it, wheareas you will have to call GetProp or GetWindowLong. But come on… so if you got any idea let me know. I seriously have no problem with Assembly, but as most people thought that delegations must be implemented in Assembly, I showed you that’s not true. The reason it’s really surprised me is that the Assembly code is not portable among processors as you know; and ATL is very popular and used library. So if you take a look at ATL’s thunks code, you will see that they support x86 (obviously), AMD64, MIPS, ARM and more. And I ask, why the heck?when you can avoid it all? Again, for speed? Not sure it’s really worth it. The ATL guys know what they do, I doubt they didn’t know they could have done it without Assembly.

Anyhow, let’s get dirty, it’s all about their _stdcallthunk struct in the file atlstdthunk.h. The struct has a few members, that their layout in memory will be the same as it was in the file, that’s the key here. There is an Init function which constructs the members. These members are the byte code of the thunk itself, that’s why their layout in memory is important, because they are get to ran later by the processor. The Init function gets the ‘this’ pointer and the window procedure pointer. And then it will initialize the members to form the following code:

mov dword ptr [esp+4], this
jmp WndProc

Note that ‘this’ and ‘WndProc’ are member values that their values will be determined in construction-time. They must be known in advance, prior to creation of the thunk. Seeing [esp+4] we know they override the first argument of the instance’s window-procedure which is hWnd. They could have pushed another argument for the ‘this’ pointer, but why should they do it if they can recover the hWnd from the ‘this’ pointer anyway…? And save a stack-access? :)

Since the jmp instruction is relative in its behaviour, that is, the offset to the target address is not absolute but rather relative to the address of the jmp instruction itself, upon initialization the offset is calculated as well, like this:

DWORD((INT_PTR)proc – ((INT_PTR)this+sizeof(_stdcallthunk)));

Note that the ‘this’ here is the address of the thunk itself in memory (already allocated).
Now that we know how the thunk really looks and what it does, let’s see how it’s all get connected, from the global window procedure to the instance’s one. This is some pseudo code I came up with (and does not really reflect the ATL code, I only wanna give you the idea of it):

CWindow::CreateWindow:

WndProcThunk = AllocThunk((DWORD)this.WndProc, (DWORD)this);
m_hWnd = CreateWindow (…);
SetWindowLong(m_hWnd, GWL_WNDPROC, WndProcThunk);

This is not all yet, although in reality it’s a bit more complicated in the way they bind the HWND with its thunk…Now when the window will be sent a message, the stack will contain the HWND, MESSAGE, WPARAM and LPARAM arguments for the original window procedure, but then the thunk will change the HWND to THIS and immediately transfer control to the global window procedure, but this time they got the context of the instance!

CWindow::WindowProc(HWND hWnd, UINT message, WPARAM wParam, LPARAM lParam)
{
CWindowImplBaseT< TBase, TWinTraits >* pThis = (CWindowImplBaseT< TBase, TWinTraits >*)hWnd;
pThis->ProcessWindowMessage(pThis->m_hWnd, uMsg, wParam, lParam, lRes, 0);
}

And there we go. Notice the cast from hWnd to the ‘this’ pointer and the call with the context of the instance, at last, mission accomplished ;)

More information can be found here.

A new problem now arises, I will mention it in the next post in this chain, but in a couple of words, here’s a hint: NX bit.

Posted in Assembly, C++, Win32 | 1 Comment »

Delegators #2 – C++ & Win32

November 17th, 2007

Last post I was talking about using SetProp/GetProp to link between an HWND and an instance to a class that encapsulates the window widget. There’s another way to do it.

The first time you are taught how to create a window in Win32, they tell you that you have to fill in all fields in the WNDCLASS structure, but they always keep on saying to ignore some of the advanced fields. One of the fields inside that structure is cbWndExtra, this one says to the windows manager how many bytes to allocate after the original windows manager’s structure. So suppose all we need is to save the ‘this’ pointer, it means we will need 4 bytes, in 32bits environment system, of course. So we can set cbWndExtra to 4, and then we have a DWORD we can access to as much as we like with the SetWindowLong/GetWindowLong API.

Something like this would suffice:

WNDCLASS wc;
wc. fields = …
wc.lpszClassName = “myclass”;
wc.cbWndExtra = 4;
RegisterClass(&wc);
HWND hWnd = CreateWindow(“myclass”, …)
SetWindowLong(hWnd, 0, this);
ShowWindow(…);
…
..

And later on in the window-procedure, you can do this:

LRESULT WINAPI MyWindow::WndProc(HWND hWnd, UINT message, WPARAM wparam, LPARAM lparam)
{
MyWindow* pThis = (MyWindow*)GetWindowLong(hWnd, 0);
if (pThis != 0) return pThis->WndProc(message, wparam, lparam);
return DefWindowProc(hWnd, message, wparam, lparam);
}

And there you go. There are a few problems with this technique though. You can’t handle the WM_CREATE message inside the instance’s WndProc, that’s because before CreateWindow returns, the global WndProc will get called with this message, and you didn’t have the chance to set the ‘this’ pointer…

The second problem is that this technique is good only for windows you create yourself, because you control the WNDCLASS… You can use pre-created window-classes of the system by changing their fields in your application only, I think this will create a copy of the pre-created classes only for your process.

Anyhow, to solve the first problem, I saw implementation that create the instance of the object when they receive the WM_CREATE inside the global WndProc. This is really a matter of design, but this way you can call an initializer of the instance once WM_CREATE was received. I can’t come up with another way at the moment if the instance is already created. Maybe just call the initializer on the first message the window gets? Well it’s a bit cracky.

Posted in C++, Win32 | 3 Comments »

Delegators #1 – C++ & Win32

November 14th, 2007

In the technical point of view a Delagator is a call-forwarder, as simple as that. But in the designing aspect it is a technique where an object outwardly expresses certain behaviour but in reality delegates responsibility for implementing that behavior… And thanks for Wiki for sparing me the ugly explanation. :)

The reason I came up to want to implement delegators as seamlessly as possible in C++ was because I got to a situation where I wanted to write a pure OOP wrapper for windows objects (CreateWindow). Now let me get into the juicy details before you raise your eyebrows. And even before that – yes, something like ATL but only for windows. So say you have a class that is responsible for creating a window and controling it by handling its messages and stuff. Sounds legitimate, right? It would look something like:

class Window {
public:
Window(const cstring& name) { m_hWnd = CreateWindow(…) }
void Show () { ShowWindow(m_hWnd, SW_SHOW); }
…
..
private:
HWND m_hWnd;
};

And you get the notion. It looks ok and it is right, but let’s get further and hit the problem. When we create a window we need to pass the name of the window-class, that is some structure that contains more general info about the class itself, like the background color, mouse cursor and other stuff, but the most important one of them is wndproc pointer. The wndproc pointer is a pointer to a callback function that handles the messages of the windows that belong to this window-class. I assume you know how Win32 Windows system works. Now, can you spot the problem already?

Well, since it asks for a pointer to a function, and we want to have an instance of our Window object per window, there is no way to bind between the two. (OK, I lie, but continue reading please). In our case we would like to have the method of our message-handler being called and not a global function. If you’re not sure why, then think that each window has some private members in its instance that tells special things about that window. So we gotta find a way to link between our instance and our “window-procedure” method.

This is a start:

class Window
private:
LRESULT WINAPI WndProc(HWND hWnd, UINT message, WPARAM wparam, LPARAM lparam)
{
switch (message)
{
…
..
}
return DefWindowProc(hWnd, message, wparam, lparam);
}
public:
static LRESULT WINAPI WndProc(HWND hWnd, UINT message, WPARAM wparam, LPARAM lparam)
{
// Delegate to the instance’s window procedure.
return g_window.WndProc(hWnd, message, wparam, lparam);
}

So now the window-class structure will point to the static function WndProc which when called will delegate its parameters to the internal (private) WndProc that can access the members. It’s almost a good solution. But now we are allowed to have only one window and a global instance that contains it. The good thing is that a static public function can call a private method of the same class, so we can hide the core of it. The bad thing is we still expose an internal function we don’t want to in the first place.

The problem is now to find a way to link between a real window object and our instance. The ID of a window is its HWND (or Handle to Window). So we could hold a list with all HWND’s we created and look it up before delegation and then make the right call to the correct instance. This is too much hassle for nothing. There ought to be a way to store some private data on the window object itself, right? At least I would suspect so. Eventually, after reading some MSDN and searching the net, I found a savior function which is called SetProp (and GetProp ofc). Exmaining their prototypes:
BOOL SetProp(HWND hWnd, LPCTSTR lpString, HANDLE hData);
HANDLE GetProp(HWND hWnd, LPCTSTR lpString);
We actually have a kind of dictionary, we give a string and store a pointer (to anything we’re upto). Afterwards, we can retrieve that pointer by using GetProp. Let’s work it out again:

ctor:
m_hWnd = CreateWindow(…);
SetProp(m_hWnd, “THIS_POINTER”, (HANDLE)this); // Ah ha!

What we did was to link the HWND with the this pointer. The window procedure will look like this now:

static LRESULT WINAPI WndProc(HWND hWnd, UINT message, WPARAM wparam, LPARAM lparam)
{
Window* pThis = (Window*)GetProp(m_hWnd, “THIS_POINTER”); // Some magic?
if (pThis != NULL) return pThis->WndProc(…); // Forward the call to the correct instance.
…
…
return DefWindowProc(…);
}

Well, as for the code here, I don’t handle errors, and I use C casts, so sue me :). I merely wanna show you the mechanism. And if you really wanna get dirty, you will have to RemoveProp when you get WM_NCDESTROY, etc…

After I got this code really working, I still was wondering how ATL does this binding. So I took a look at their code… It seems to have a global linked list with all instances of the windows that ATL created. And then when the global window procedure is get called, it looks it up on that list. In reality it is much more complex then my explanation, since they need to synchronize accesses among threads, make sure the window belongs to the same thread, etc… All that for only the first time call of the window-procedure. Then it sets the REAL ‘window-procedure’ method of the instance itself, and there it uses Assembly, muwahaha. That will be covered next time.

BTW – SetWindowLong cannot work since all you can do is changing the window-class fields. Although, maybe there is some undocumented field you can play with to store anything you like. Never know? :)

Posted in Assembly, C++, Win32 | 3 Comments »

About DIV, IDIV and Overflows

November 8th, 2007

The IDIV instruction is a divide operation. It is less popular than its counterpart DIV. The different between the two is that IDIV is for signed numbers wheareas DIV is for unsigned numbers. I guess the “i” in IDIV means Integer, thus implying a signed integer. Sometimes I still wonder why they didn’t name it SDIV, which is much readable and self explantory. But the name is not the real issue here. However, I would like to say that there is a difference between signed and unsigned division. Otherwise they wouldn’t have been two different instructions in the first place, right? :) What a smart ass… The reason it is necessary to have them both is because signed division is behaving differently than unsigned division. Looking at a finite string of bits (i.e, unsigned char) which has a value of -2 and trying to unsigned divide that by -1, will result in 0, since if we take a look at the numbers as unsigneds – 0xfe and 0xff. And naively asking how many times 0xff is contained inside 0xfe, will result in 0. Now that’s a shame because we would like to treat the division as signed. For that, the algorithm is a bit more complex. I am really not a Math guy. So I don’t wanna get into dirty details of how the signed division works. I will leave that algorithm for the BasicOps column of posts… Anyway, I can just say that if you have an unsigned division you can use it to do a signed division of the same operands size.

Some processors only have signed division instructions. So for doing an unsigned division, one might convert the operands to the next bigger size and then do the signed division. Which means the high half of the operand is zero, which makes the division work as expected.

With x86, luckily we don’t have to do some nifty tricks, we have them straight away, DIV and IDIV, for our use. Unlike multiplication, when there is an overflow in division, a division overflow will be raised, wheareas in multiplication only the CF and OF flags will be set. If we like it or not this is the situation. Therefore it’s necessary to convert the numbers before doing the operation. Sign extension or zero extension (depending on the signedness of operands) and only then do the division operation.

What I really wanted to talk about is the way the overflow is detected by the processor. I am interested in that behavior since I write a simple x86 simulator as part of the diStorm3 project. So truly, my code is the “processor” or should I say the virtual machine…Anyhow, the Intel documentation for the IDIV instruction shows some psuedo algorithm:

temp = AX / src; // Signed division by 8bit
if (temp > 0x7F) or (temp < 0x80)
// If a positive result is greater than 7FH or a negative result is less than 80H
then #DE; // Divide error

src is a register/immediate or a memory indirection, which results in a 8bits value that will be signed extended to 16bits and only then will be signed divided by AX. So far so good, nothing special.

Then comes some stupid looking if statement. Which on the first look says, that if temp is 0x7f or 0x80 then bam, raise the exception. So you ask yourself how these special values have anything to do with overflowing.

Reading on the next comment makes things clearer, since for 8bits input, the division is done on 16bits, and the result is stored inside 8bits that are signed values, the result can vary from -128 to 127. Thus, if the result is positive, and the value is above 127, there is an overflow, because then the value will be treated as a negative number, which is a no no. And same for negative results: if the result is negative and the value is below 128 there is an overflow. Since the negative number cannot be represented in 8bits and as a signed number.

It is vital to understand that overflow means that a resulting value cannot be stored inside its destination because it’s too low or too big to be represented in that container. Don’t confuse it with carry [flag].

So how do we know if the result is positive or negative? If we take a look at temp as a byte sized, we can’t really know. But that’s why we got temp as 16bits. That extra half of temp (high byte) is really the hint for the sign of the whole value. If the high byte is 0xff, we know the result is negative, otherwise the result is positive. Well I’m not 100% accurate it, but let’s keep things simple for matter of conversation. Anyway, it is enough to examine the most significant bit of temp to know its sign. So let’s take a look at the if statement again now that we have more knowledge about the case.

if temp[15] == 0 and temp > 127 : raise overflow

Suddenly it makes sense, huh? Because we assure the number is positive (doesn’t have the sign bit set) and the result is yet higher than 127, and thus cannot be represented as a sign value in a 8bits container.

Now, let’s examine its counterpart guard for negative numbers:

if temp[15] == 1 and temp < 128: raise overflow

Ok, I tried to fool here. We have a problem. Remember that temp is 16bits long? It means that if, for example, the result of temp after the division is -1 (0xffff), our condition is still true and will raise an overflow exception, where the result is really valid (0xff represents -1 in 8bits as well). The problem origin is in the signed comparison. By now, you should understood that the first if statement for a positive number uses an unsigned comparison as well, although temp is a signed value.

We are left with one option since we are forced to use unsigned comparisons, (my virtual processor supports only unsigned comparisons), then we have to convert the signed 128 value into a 16bits unsigned value, which is 0xff80. As easy as that, just signed extend it…

So taking that value and putting it in its place we get the following if statement:

if temp[15] == 1 and temp < 0xff80: raise exception

We know by now that temp is being compared to as an unsigned number. Therefore, if the result was a negative number (must be above 0x8000) and yet it was below 0xff80, then we cannot represent that value in a 8bits signed container, and we have to raise the division error exception.

Eventually we want to merge both if statements to be one, sparing some basic boolean algebra, we end up with:

if (temp > 0x7f) && ((temp < 0x8000) || (temp > 0xff80)):

then raise exception…

Posted in Algorithms, Assembly, diStorm | 3 Comments »

C++ Singletons

October 29th, 2007

One of the first rules you learn when programming is not to use global variables. Sometimes it’s possible and sometimes it’s not. I believe that every rule has an exception, <– this too. But the thing is that if you code in C++, you say to yourself, let’s encapsulate it all in a class. How nice, huh? Thing is that from the beginning you knew that those variables should be global, to something. Everything is in relation. Certainly not all functions will use the globals. The real example I encountered was to use the AllocConsole API for creating a console screen for my process. The system allows for every process to have at most one console, so obviously wrapping it in a singleton class is a good move. That class will contain also some variables which hold the state of the console. Let’s spare the gross details for now…

So having the variables inside the class as a namespace led me to touch the variables as:

void MainClass() 
{ 
 Console::m_staticVariable = ... 
}

Then I found myself accessing that variable from the MainClass, and I understood that I should move some code from MainClass to the Console class. Hey, you can say that I designed the whole thing wrong from the first moment. But believe me, sometimes you just have that piece of code which might lie in both classes. Eventually you have to decide where it fits better. Even though after some coding I was satisfied with my code, all the methods were static as well, otherwise I can’t access the so called globals. Although, this time the globals are in a namespace, that’s a start. Noticing that all Console’s methods were static, I got disgusted at my own code. Seriously, that happens to me too ;) As long as you don’t commit the code, you are allowed to go all the way until you are satisfied and the code is to your liking. At least this is what I think.

So… singleton was my answer, of course.

The “standard” basic pattern for a singleton would be to make the constructor private and have a getInstance method which will return the one and only instance of the singleton class in the system. It looks something similar to:

class MyClass {
private:
MyClass() { … }

public:
static MyClass& getInstance()
{
static MyClass instance; // This is all the trick.
return instance;
}

};

Speaking technically the way the static is implemented in Assembly is just setting a boolean with true when you create the object the first time and even register its corresponding destructor to the _atexit CRT function. I found it interesting.

Here it is in a nutshell:

static MyClass& getInstance() 
{ 
 static bool isInitialized = false; 
 if (!isInitialized) { 
  isInitialized = true; 
  g_MyClassInstance::MyClass(); 
  _atexit(g_MyClassInstance::~MyClass); 
 } 
 return g_MyClassInstance; 
}

Static variables are not magic, you have to check whether they are already initialized or not. The generated code, is no brainer, and uses a boolean to check it out as well.

So off to the way I went, changing a few bits of my code to become a singleton. While testing my code again, I got a crash when the program was finished. Now quickly thinking, there’s something wrong with a “destroy” code, either destructor or something similar doesn’t function well. Well, looking at my MainClass dtor I noticed that I have to control the death of the Console class. Note that when calling the getInstance method the first time of a singleton it will only then get initialized. So you are guaranteed when the singleton-instance is constructed but you don’t know when it will be destructed. Doh.

Well, let’s go to business, implementing a dynamic singleton. To be honest, I have no idea how that’s officially called, I’m pretty sure someone already named it with something. Using the dynamic singleton I control both construction and destruction timings.

The first getInstance implementation that comes to mind is something like this:

MyClass& MyClass:getInstance() 
{ 
 static MyClass* instancePtr = NULL; 
 if (instancePtr == NULL) { 
  instancePtr = new MyClass(); 
  // error code for handling bad_alloc... 
 } 
 return *&instancePtr; 
}

Now you need a destroy method:

void MyClass:Destroy() 
{ 
 ASSERT(instancePtr != NULL); // No no. 
 delete instancePtr; 
 instancePtr = NULL; 
}

Aha! Now notice how Destroy access instancePtr which was defined inside getInstance. Therefore, you have to move the pointer to be a static class member, how rude. But no biggy. Of course, you must not call Destroy from the destructor, to avoid recursion and bad stuff happening, in short. It is vital to remember to call Destroy, otherwise you are in big troubles. On the other hand, having C++ on our side, we can extend the code so it will use a sort of smart pointer that will know when to destroy the instance automatically. A static std::auto_ptr<MyClass> m_instancePtr, will certainly do the job.

Now you ask yourself, why the heck should I use the smart pointer mechanism if I want to control the construction and destruction of the singleton. Well that’s a good question, you have to consider multi-threading.

Some problems arise with singletons and multi-threaded applications. Even though my specific application was MT, I didn’t have to worry about the construction of the singleton instance because it was surely done before CreateThread was called. When you cannot be sure about that, you will have to protect the static instance on your own. But you cannot wrap the static declaration with acquiring any sort of a lock. Maybe with a nested anonymous scope? While I’m not really sure, it’s pretty ugly. So that’s why you need the smart pointer mechanism which will destroy the class on its own accord (assuming you don’t really care when) and you are the one who fully control the construction time…

Benny and the jets say hi.

Posted in C++, Reversing | No Comments »

Challenge: One-Liner For Converting a Decimal

October 25th, 2007

Or – a one-liner device to convert a decimal number to a string, using any base, which is lower than 10. If you want to use bases which are above 10, you will have to construct a table somehow that goes from ‘0’ to ‘9’ and then continues from ‘a’ to the required base, (or you can use a static table). So suppose we are dealing with a base <= 10, we only need to convert it to ascii, so it’s pretty simple.

If you didn’t figure it out until now (and how could you?) I’m talking about Python here. There is this int() function (actually it’s a class type to be more accurate, its constructor), which converts any string to a decimal number. Say, int(‘101’, 2) will result in 5. But the opposite operation is no where to be seen.

The straight forward way is easy:

while(n > 0):
l.append(n%BASE)
n /= BASE
“”.join(map(str, l[::-1]))

Though, it’s an ugly way, just to show the principle. We can do it with recursion, and then we don’t need to reverse the result, by a side effect of recursion.

When I decided to write the conversion function just for the fun of it, I wanted it to not use recursion…because with recursion it’s really easy. :) So why to make our life simple when we do things for learning and sport? Besides, for some people recursion is less intuitive, althought we might argue abou it.

So here’s my first version:

“”.join([str((n/i)%b) for i in [b**j for j in xrange(31, -1, -1)]]).lstrip(‘0’)

At the beginning I use chr((n/i)%b + 0x30), because I’m used to deal with char arrays and thinking old school C code. So Kasperle came up with the str thingy, which is much better for code readability.

Anyway, I really got pissed with the idea that I have to drop all leading zero, otherwise for n=5, I will get an input of ‘00000000000000000000000000001110’, which is quite cumbersome.

One drawback is the size of integer we want to convert, as you probably guessed, this code supports 32 bit numbers, it might support any number in a jiffy… But then you will probably have to strip more zeros most of the times. ;( Enough fooling around.

What I’m really trying to achieve is to use the code to convert any sized number, without the need of any constant magic value in my one-liner.

So trying to come up with the accurate number of digits to convert in the first place is the really the bugging trick. What we really need is something like math.log. Using the log we can know the number of digits at once. But then we need to import math. Do we count ‘import’s when we say one-liner or not? Well, I will take it as No. Hardening my life without math.

“”.join([str((n/i)%b) for i in [b**j for j in xrange(math.log(n, b), -1, -1)]])

I could have used the input number for the xrange, but it won’t return ‘0’ for an input zero number. And even so, it’s kidna cheating and lame.

Technically, the solution is to generate a list with [1, 10, 100, 1000, ….]. The condition to stop is when n/entry == 0. The problem to make this list is how to generate it on the fly? :) or how to stop generating it.

Well, AFAIK in Python it’s not possible. So I’m trying to simulate log. Imri just suggested to use a rate number for a log approximation which will be base dependent. But I didn’t like that idea – magic numbers, remember? And maybe even losing precision.

By now, Kasperle, who was the recursion guy, lost his patience with my stupid challenge. Imri is trying to calculate crazy numbers for log approximations, which I stopped following long ago. :)

FYI: Kasperle’s code, which is pretty cool, goes like this:

foo = lambda n,base: (n or “”) and (str(foo( n / base, base)) + str( n % base))

Notice the way the recursion stops…However, in one-liner code, I prefer assigning the result to a value, rather than assign the lambda and call it. But it’s also possible to do, for instance: x = (lambda y: y+1)(0). But if you ask me, I don’t really like this notation.

Then Imri suggested another idea using sqrt, but I objected since we need math. The truth is that you can do x**0.5 in Python. But eventually his solution wasn’t good enough.

ARRRG, As for now I am giving up :(. If you have another idea, let me know.

Posted in Algorithms, Python | 3 Comments »

Lambdas Forever

October 20th, 2007

Ahhh Python, what a splendid scripting language. One of the most likeable features is the anonymous functions, aka Lambda. The lambda is actually a way to write/implement a simple one liner function in-place. The official docs says:

“Lambda forms (lambda expressions) have the same syntactic position as expressions. They are a shorthand to create anonymous functions; the expression lambda arguments: expression yields a function object.”

Instead of implemented the damned comparison function for sorting, you probably all know what I’m talking about:
def ComparisonProc(x, y):
return y – x
list.sort(ComparisonProc)

We can simply do:
list.sort(lambda x, y: y – x) # Descending
and voila.

This is a very simple example. Lambdas are really handy when you want to do one liner devices. Some of them which you manage to stuff in one line and some which you just can’t. However, without lambda it wouldn’t have been possible in the first place.

There are many samples in the Internet. I came up with something, hopefully even useful. Let’s say you want to print all .JPG files on your c:\, including subdirectories. So we have to scan the h.d for all files, then filter those with .JPG extension and afterwards print the result. :) Yes this is all possible in one-liner, let’s see the original code first:

for root, dirs, files in os.walk(‘c:’):
    for i in files:
            if i[-4:] == “.jpg”:
                    print i

The one-liner version:

print filter(lambda name: name[-4:] == ".jpg", reduce(lambda x,y:x+y, [i[2] for i in os.walk('c:')]))

See? Easy :)

Actually now, I have to explain a few more things.
1) We are only interested in the Files list from os.walk, therefore we take the third entry in the result, that’s i[2].

2) The i[2] itself, is a list, and we cannot filter a list of lists with the file names, therefore we have to flatten the lists to a single list containing the file names. This is where the reduce comes in, it will return the accumulated result of all lambdas – each time calling the lambda with the accumulated x and supplying the next item, y. Thus, adding the lists extends the resulting list and flatten them…

3) Now that we the a single list with all file names in the system, we need to filter out the files which are not .JPG. So yet again we use a lambda that checks the last 4 characters in the file name and assures whether it is a .JPG, all other files will be removed from the resulting list.

4) Lastly, print the result. Actually you can use pretty print (module pprint) to print it prettier :)

Yes, Python’s crazy!

So what’s the annoying things with lambdas? They are slow relatively to list comprehensions (which we used to get all lists of file names above). But again, if we are using scripting – are we after speed? I am not sure. Another irritating thing about lambdas is that you cannot assign inside the expression, but then you have reduce.. :)

The worst thing about lambdas is when you use global variables, and let me explain. Since lambdas are evaluated at runtime (I hope I am right here) if you access some variables outside of the lambda, they will get re-evaluated everytime with the lambda itself. Now think that you wanted the lambda to have a specific value when you created the lambda, and then when you really call the lambda, that value was already changed and your result is screwed.

Enough words, let’s see the problem with some code:

>>> x, y = [lambda z: 5+z+i for i in xrange(2)]
>>> x(0)
6
>>> y(0)
6

Oh uh, we’re in trouble!

Do you notice we get the same result for both functions? This is incorrect because they are not supposed to return the same value. Note this:

>>> i
1

So now when both lambdas, x and y, are evaluated they use i as 1. Sucks huh?

>>> i = 3
>>> x(0)
8
>>> y(0)
8
“Use Nested Lambdas, Luke”

x, y = [(lambda v: lambda z: 5 + v)(i) for i in xrange(2)]
>>> x, y = [(lambda v: lambda z: 5 + v)(i) for i in xrange(2)]
>>> x(0)
5
>>> y(0)
6
>>>

The outter lambda is get evaluated immediately and thus leaves the value, and not the pointer to the value, in the code. Next time when the inner lambda is evaluated it uses the value-of(i) and not the value-from-pointer-of(i).

This surely will help someone out there :) And then they say Lambdas are going to be deprecated in Python 3000…

[Updated]

Thanks to Kasperle, here’s another solution to the nested lambdas:

x, y = [lambda z, dummy = i: 5 + dummy for i in xrange(2)]

The drawback is that x and y can now get a second parameter which you potentially let the caller override…Or if we’re talking about Python 2.5 you can use functools.partial.

Posted in Algorithms, Python | 7 Comments »

Basic Ops #3 – Averaging Without Overflow

October 15th, 2007

A few months ago I read this post here. Now I’m not talking about bugs in general but I want to focus on that specific bug. Summing two integers without causing an overflow. It might sound an easy thing to do, but taking another look reveals some implementation details that you should be careful about. As I see it, there are two problems. On some processors an addition operation that overflow might generate an interrupt. Although, I haven’t seen it on the mainstream processors, it still a challenge, because every time you add two numbers you should make sure you won’t trigger that overflow int. Or handling that interrupt – what can you really do about it? Not much, I guess… The other serious problem is that you lose precision of your result, or getting a wrong value if at all.

I decided to give an example from examining the average operation. Adding two numbers and dividing the result by two, of course. The problem arises when you add the numbers and cause an overflow. Even that same post shows some ways to overcome the overflow. Another wrong way that seems to be correct at the first time is a/2 + b/2. But messing with integers we know that we lose precision twice (for each division) for odd numbers, but we overcame the Addition potential overflow. Getting back to the (a+b)/2 equation, summing the numbers first and doing one division yeilds in only one precision lose for odd numbers, and yet the result is considered correct. That post writer suggests a correction int mid = (low + high) >>> 1; Which I still don’t see directly why it’s better, notice the ugly unsigned shift operator. I guess the standard says to use the next bigger variable size. Looking at unsigned char x = 200, y = 100, avg = (x + y) / 2; this one will be calculated well, since the addition is done in integer size and then the result is written back as a byte. But I’m not sure whether two integers are promoted to ‘long’ automatically. Let’s say for the matter of conversation that you have to calculate the average of the largest sized registers of that processor (and for those smart asses who think of add/adc combination, leave me alone now :) ), you can’t get away without an overflow. Which leads to the next equation a + (b – a)/2. This time we assume that b – a results in a positive number, otherwise you have to switch it to a – b, which means that you will have to check (/cmp) the numbers first. And checking is out of question. It’s all pure math, there must be some way even in the computers practical world. And I forgot to mention that the latter equation doesn’t lose any precision (it’s the same result as (a+b)/2). Finally I get to this equation: ((a & b) << 1) + (a ^ b). And more over, finally I understand it! hehe :)

If you take a look at the Addition algorithm I posted here. You will find that a&b calculates the first carry mask and shifted left by one, so it’s ready to be added again to the result of the half added value, you got it right, a ^ b. Although, in my algo I didn’t use the + operator for adding the two numbers, because that’s stupid and non-challenging. But the + here is used for propagating the carry to the higher bits at once without needing a loop. Why do I mention this equation then? Because if we take this equation and shift it to the right, this time, by one, we will get (a & b) + ((a ^ b) >> 1). Which adds the two numbers, and divides them by two. And voila, you got an average of two integer numbers without a chance of overflowing. Now this is cool! The other advantages are that the result is correct (no precision lose) and you don’t have to do any prerequisites.

BTW – To be honest I have no idea how the steps from a + b to ((a & b) << 1) + (a ^ b) are achieved, I will try to find it out sometime. But after implementing the Addition myself, I surely understand this equation and I hope you too now.

Posted in Algorithms | 6 Comments »

X86 Assemblyyy

October 8th, 2007

Complex instructions are really useful, especially if you try to optimize the size of your code. Of course, modern processors nowadays are becoming RISC’ish more and more. But as for X86 its backward compatibility makes those instruction to stay there (forever?) ready for you to use. The funny thing is that in the modern X86 processors the RISC instructions are probably faster, so compiler don’t generate code with the CISC instructions. Thing is, that when you size-optimizing your code, or writing a shell code, you don’t care much about speed at all. So why not take advantage of those instructions?

The most popular X86 CISC instruction is LOOP. It’s a simple one as well, decrements the genereal purpose register CX(/ECX/RCX) by one and jumps to some address if it’s not zero. So you have something like 3 sub-instructions in one. Or call it micro-opcodes. Such as: a decrement, an if statement (cmp) and a branch.

So speaking of LOOP, there are also LOOPZ and LOOPNZ, those instruction in addition to branching upon rCX not being zero, will also branch if the Zero flag is set or not. Which means that you “earn” another condition testing for free. For instance, if you wanted to do some test on each entry in an array and then continue to next entry only if the previous was successful and there are still cells to scan, those instruction might be helpful.

I have never seen anyone uses those instructions, even not in code crunching. I think it’s because most people just don’t read the specs, and even so, they don’t know how to use those instructions. Not that they are hard to use, but maybe a bit confusing or not popular.

I found somewhat a useless combination of the repeat prefix with the LODS instruction. A REP LODSB, means: read into AL the byte at address DS:rSI and advance rSI (by examining DF…). So you end up with some code that gets into AL the last byte of the buffer that rSI was pointing to…(Of course it depends on the initial value of rCX). I think that in the 8086 this repeat and lods combination was prohibited. So while I was working on diStorm, I made it so if a LODS instruction is prefixed with a REP, that REP prefix is being ignored. Then I got some angry email that today it’s not the case and this combo is supported… I even checked the current specs and it seems that that guy was right. So honestly, I’m not sure it’s useful for anything… but it’s cool to note it.

Another instruction I wanted to talk about is SCAS. I guess you know this instruction in the strlen implementation as follows:

sub ecx, ecx
sub al, al
not ecx
cld
repne scasb
not ecx
dec ecx

Now, I’m not sure whether this is the fastest way to implement an strlen, some compilers use this implementation and other have find-a-zero-byte-inside-a-dword trick. Though maybe I should talk about those tricks in another post someday…

Anyway, back to SCASB, so now that we saw how strlen is implemented, we know that with the REPNE prefix, which means continue as long as rCX is not zero and as long as the Zero flag is zero as well; we test for two conditions in one instruction. In the code above the REPNE prefix tests ZF, but the truth is that the SCAS instruction updates all other flags. So think of the SCAS instruction as a compare instruction between the Accumulator register (AL/AX/EAX/RAX) and the source memory…For example you can do SCAS and then JS (jump on sign)…

There are many other forsaken instructions, that are not fully used, so next time when you fire your assembler, take a look at the specs again, maybe you will find something better. Well, if you have more ideas of the like, you are welcome to send a comment.

Posted in Assembly | 11 Comments »

Insanely Low-Level

Python: Converting Signedness

Delegators #3 – The ATL Way

Delegators #2 – C++ & Win32

Delegators #1 – C++ & Win32

About DIV, IDIV and Overflows

C++ Singletons

Challenge: One-Liner For Converting a Decimal

Lambdas Forever

Basic Ops #3 – Averaging Without Overflow

X86 Assemblyyy

Insanely Low-Level

Categories

Recent Posts

Recent Comments

Blogroll

Sites

Meta