Weird Stuff

January 28th, 2008

As I am still working on TinyPE NG very hard, I got it to 220 bytes at the moment. I am still not frustrated and I think I will be able to get it a few bytes less. Since my last post, I was talking with Peter Ferrie on the Code Crunchers mailing list, which you’re invited to sign up right here. So Peter suggested that I won’t use WinExec, that instead of executing the downloaded file, which was an .exe by then. I should download a .dll file and LoadLibrary it. Thing was, that I didn’t use LoadLibrary, that was one of the tricks in the new version. Eventually, I removed lots of code (18 bytes so far!) and managed to download the .dll and load it using an export forwarding, but this time on the downloaded file! And then it even spared the ExitProcess trick (one byte…) that I came up with Matthew Murphey in the last challenge. I don’t need to ExitProcess since now the dll is loaded into the same process, and ExitProcess in the dll itself will do the job… My only problem was that my server didn’t let me download any file with an extension of ‘.dll’. I got freaked out and didn’t understand why the damned thing won’t let me download it. So I tried to remove the access list in .htaccess and play with it, but nothing helped. So I almost wanted to give up with the whole idea. Until at the last moment, I thought that since my server is Linux based (so why does it care about dll files in the first place?) I can call the file “.DLL”, notice the capital letters. Now the loader doesn’t really care about big or small letters so everything went ok then…

To a different matter now, a friend (who contributed to diStorm in the past), keeps on using it heavily himself and found something interesting. He was trying to exchange two registers, eax and r8d (xchg eax, r8d). That would be something with the REX prefix (specifially 41) and 90 following. The thing was that no matter what you’re doing (that is prefixing 90 with any byte) it won’t change it’s behavior. It’s like 90 is really hardwire for doing nothing (no-operation). Ahh sorry, 90 is xchg eax, eax which is used to denote a NOP instruction for those who were following me. So image you want to exchange two registers and the assembler generated 41 90 – nothing happens when you run it. Quite absord. So it has to be changed into the 2 bytes of the exchange instruction… The cool thing about this whole story that diStorm showed the output well: DB 0x41; NOP. Now to be honest, I never gave it a thought when I ported diStorm to support the 64 bit instructions. But it so happens that the 0x90 is really being changed to NOP rather than xchg eax, eax. So the prefix is useless and thus dropped… Anyays a nice finding Stefan!

Oh yeah, well I was not saying the whole truth, there is a prefix for the NOP instruction, 0xf3. Together with 0x90, it becomes a PAUSE instruction…

TinyPE NG is Out

January 17th, 2008

Here you go guys:
http://ragestorm.net/tiny/tinypeng.exe

Source will be released withing a couple of weeks.
Have fun :)
Meanwhile I will be in Turkey for the weekend to relax and leave the bits behind.

Kix$

TinyPE NG

January 15th, 2008

I rewrote TinyPE and just got to 240 bytes!!!!!1!!11!!11!!1! * 9**9**9

Downloading a file from the .net (specifically my own site) and running it, while the strings in the .exe must be encrypted in someway. You can find more information by googling the TinyPE Challenge.

Holy shit, I’m kinda excited myself, it was my goal and I have just reached it after a few days of hard work. Now I gotta shave one more byte and then I’m done :)

 I will release the source once I am finished. I do new stuff and again no tools can read my .exe file, not even my own diSlib64… bammer.

Stay Tuned :) blee

Anti Debugging

January 14th, 2008

I found this nice page about Anti-Debugging tricks. It covers so many of them and if you know the techniques it’s really fun to read it quickly one by one. You can take a look yourself here: Window Anti-Debug Reference. One of the tricks really attracted my focus and it was soemthing like this:

push ss
pop ss
pushf

What really happens is that you write to SS and the processor has a protection mechanism, so you can safely update rSP immediately as well. Because it could have led to catastrophic results if an interrupt would occur precisely after only SS is updated but rSP wasn’t yet. Therefore the processor locks all interrupts until the end of the next instruction, whatever it is. However, it locks interrupts only once during the next instruction no matter what, and it won’t work if you pop ss and then do it again… This issue means that if you are under a debugger or a tracer, the above code will push onto the stack the real flags of the processor’s current execution context.

Thus doing this:
pop eax
and eax, 0x100
jnz under_debugging

Anding the flags we just popped with 0x100 actually examines the trap flag which if you simply try to pushf and then pop eax, will show that the trap flag is clear and you’re not being debugged, which is a potential lie. So even the trap flag is getting pended or just stalled ’till next instruction and then the debugger engine can’t get to recognize a pushf instruction and fix it. How lovely.

I really agree with some other posts I saw that claim that an anti-debugging trick is just like a zero-day, if you’re the first to use it – you will win and use it well, until it is taken care of and gets known. Although, to be honest, a zero-day is way cooler and another different story, but oh well… Besides anti-debugging can’t really harm, just waste some time for the reverser.

Since I wrote diStorm and read the specs of both Intel and AMD regarding most instructions upside down, I immediately knew about “mov ss” too. Even the docs state about this special behavior. But it never occurred to me to use this trick. Anyway, another way to do the same is:

mov eax, ss
mov ss, eax
pushf

A weird issue was that the mov ss, eax, must really be mov ss, ax. Although all disassemblers will show them all as mov ss, ax (as if it were in 16 bits). In truth you will need a db 0x66 to make this mov to work… You can do also lots of fooling around with this instruction, like mov ss, ax; jmp $-2; and if you single step that, without seeing the next instruction you might get crazy before you realize what’s going on. :)

I even went further and tried to use a priviliged instruction like CLI after the writing to SS in the hope that the processor is executing in a special mode and there might be a weird bug. And guess what? It didn’t work and an exception was raised, of course. Probably otherwise I won’t have written about it here :). It seems the processors’ logic have a kind of an internal flag to pend interrupts till end of next instruction and that’s all. To find bugs you need to be creative…never harm to try even if it sounds stupid. Maybe with another privileged instruction in different rings and modes (like pmode/realmode/etc) it can lead to something weird, but I doubt it, and I’m too lazy to check it out myself. But imagine you can run a privileged instruction from ring3…now stop.

SOAP Is So Tricksy

January 11th, 2008

I started to code in .Net at work a while ago, yes yes have a laugh at me, I still kick your ass in Assembly ;) Anyway, we use SOAP heavily and it is really wrapped well in C#, you invoke the remote methods seamlessly as if they were a local interface/methods to use. We wanted to see the performance of our server, so I wrote some simple stress tests in C#, of course. I configured my IIS to work with my SOAP libraries and started running the tests. Then after a long long while, an exception is thrown from the server notifying me that the session is null. Needless to say, our service needs authentication and stores some info about it (it requires a login before usage). I wasn’t sure why the session is dropped, because all other connections kept on working and invoking methods on the server. So something here was fishy. I googled for ‘dropped sessions’ and all the stuff I could find about .Net, IIS and Sessions. They gave tips like check the Event Viewer for logs that the server is recycling etc. Looking at the Event Viewer of the server I didn’t find anything special nor any indication that something happened to the server. And no, the server wasn’t recycling cause otherwise all the other connections would have dropped at the same time as well as that one I saw, therefore I eliminated options like the files in the virtual directory of the services were changed or scanned… (Some weird feature of the ASP.Net Web Service, go figure). Eventually I sniffed my own network with WireShark and since I was stress-testing the server I gathered loads of packets in a jiffy, realizing it’s not the way to go. However I managed to catch the exception while sniffing the connections (while restarting the sniffing every few seconds… :) ) and analyzing the data yielded that the client (that is, my test app) used out of the blue some session id that I didn’t see earlier in the same log I captured (mind you I didn’t have the full log). Spooky something. I did the test again, and got same results. Then I thought why not to change the session time of my webservice in IIS to 1 min. You guessed it right, it so happened that the problem occurred more frequently, so now I was sure the problem is in my code and not something wrong configured with the server or the client uses random sessions ids for some reason… never know with bugs, you know, especially when you stress-test. hihi

The next thing I was doing, since I didn’t want to sniff the whole session of the connection(s) was to add some code for logging the time of last method-invocation and when that null session exception is thrown to see how many seconds have elapsed since that last invocation. Immediately it showed that some invocation happened after 1 minute, meaning the session at the server is already expired and yet the client would still use that same session id it got from the beginning. Every connection was made from a different thread, and each thread could talk to a few webservices at the same time. When the exception is thrown I know which webservice/method raised it and to that function I added the last time thingy.

In the end, I had some ‘if random(100) < 1’ that resulted in true only after some time in a while, the bug didn’t surface up from the beginning cause everytime that it did the invocation of the remote method, the session time out will be reset, but some rare times, the invocation hasn’t occurred for more than the default session time-out (20 mins)  and thus the exception and dropped connection.

The solution was simple, though we had two options: To add cookies to the client side so we don’t need a session at the server, and even if the session is expired the server will still recognize our cookie and serve us well. The other solution, which was simpler now, was to call the login method again to re-initialize the authentication, which really creates the Session at the server side and everything works better now.

I had this stupid bug because SOAP is wrapped so nicely in .Net that you don’t ‘feel’ it, nor even remember that you use a server here, I really mean it guys. So why in the world should I call login after I did it once in the initialization?! Now I got a reason, apparently :)

SQL: Passing Array

January 4th, 2008

Many times you find yourself with this damned task, to pass an array to your SQL code (whether you supply the code, or whether it’s a stored procedure doesn’t really matter). You need to pass an array of elements, usually Id’s, but it’s absolutely up to you. In my app I had to pass to my stored procedure an array of Id’s. Each ID was a binary(20). Since I use MSSQL 2005, I could use XML to accomplish this task. So my stored procedure receieves a parameter of Xml:

CREATE PROCEDURE usp_MyProc
   @ids XML
AS
BEGIN
    SELECT f1, f2 FROM @ids.nodes(‘/id’) T(c) JOIN MyTable ON ID = T.c.value(‘.’, ‘MyIdType’);
END

Invoking it will be:

DECLARE @x AS XML;
SET @x = “<id>aa</id><id>bb</id>”;
EXEC usp_MyProc @x

Now the Ids in @x must be encoded in Base64, because MyIdType is Binary(20). So everything is cool now, except one thing, performance sucks big time. Now I was wondering what is the difference between attributed values and the node itself:
<id value=”id goes here”/>
( SELECT f1, f2 FROM @ids.nodes(‘/id’) T(c) JOIN MyTable ON ID = T.c.value(‘value’, ‘MyIdType’); )
and
<id>id goes here</id>

Apparently when only doing a test of parsing the Xmls input, they are almost running at the same speed, although the attributed way is a bit faster. But when you take the result (that’s the converted id as binary) and insert it into a temp table, such as: DECLARE @tmp AS TABLE (ID MyIdType), the attributed Xml seems to perform 2X faster than the nodes format. Interesting ah?

Even so, the performance was still bad, and I had to do something better, I decided to get rid off the Xml and Base64 stuff, they take too much time for SQL to evaluate. My first attempt was to convert from an input of VARCHAR directly to Binary, but without success, the conversion won’t be done well, I got my binary stream as 0x4142 if I were to supply an input of “ab” instead of a right value of 0xab…Then I looked into this substring builtin function and found out (thanks to a friend) that it can operate on varbinary types as well! Imagine that. This meant that from now on I can supply the ids array as one long binary input as varbinary…

The code is now:

DECLARE @P AS INT;
SET @P = 1;
DECLARE @L AS INT;
SET @L = LEN(@ids)+1;
WHILE @P < @L
  SELECT f1, f2 FROM MyTable WHERE ID = SUBSTRING(@ids, @P, 20);
  SET @P = @P + 20;
END

Voila. This seemed to be twice faster than attributed Xml, thus we saved string parsing and Base64 decoding and we even eliminated the use of JOIN. Yet, we do the parsing of the binary stream ourself, and it a bit blows the code with more lines, but it’s still simple code and the performance is much better.

In the beginning I didn’t realize why the above code was faster than a single select query with Xml, but I then understood that SQL scans the tables using indexing, of course. So even if it parses the Xml each time for the next node, the select query happens in reality more than a single time (as much as there is Id nodes to read in), so breaking the single select and putting it in the while statement didn’t degrade performance, but only boosted it with binary parsing. However, binary parsing is not the right phrase to use, because we only “slice” (as in Python :) ) the binary input stream to smaller chunks which doesn’t do any parsing anyways. The only disadvantage I found to this way of passing arrays, was that I cannot pass records/structures as in Xml… It might require more work then and ugly parsing for integers etc…

Let Me Import

December 19th, 2007

The problem is that you cannot import modules from different (relative) paths in Python. Now sometimes it’s really a must. Like when you have a generic CPU module and then a subdirectory called ‘x86’ that wants to derive that CPU class and create a ‘x86CPU’.

So I tried to mess up with __import__ but without any luck. Then I said I just want to solve this problem, don’t care how far (ugly) I go. I started with execfile, which AFAIK isn’t exactly like import, but it’s good enough for me. execfile(“../cpu.py”) didn’t work, unforetunately. I then realized that the cpu.py file imports another file from its own directory and since the execfile doesn’t do some magic with the directories the import simply failed. Bammer. Next thing I did was to add the path of the CPU module to the sys.path and retried my attempt with a success. Although it works, I don’t like the directory separator which isn’t cross-platform. And yes I said it’s a temporary solution, but I’m still sensitive to it. Usually temporary hacks that work tend to stay there, as much as I don’t like this reproach…this is how things work usually, no?

So that one was solved, quite easily, but another one arised. In the original code (when all files used to be in the same directory) when import worked I always import the module name, and not the lot of it (import * from cpu – bad habit). So all my code is accessing the CPU module with ‘CPU.’ prefix, which is good, you know the source of everything you touch. The new problem is that since I moved to use execfile this prefix is ill now and I must get rid off it.

I thought about changing the globals dictionary to the name of the module I want to execfile() and then switch it back. But it becomes too nasty, and I’m not sure whether it would work anyway. My first attempt might be ok for some of you, alas, it’s still not good enough for me.

And I think my design with the files is ok, after all it makes sense. And yes, I know I (maybe?) should have put the files in site-packages directory and the import then would work well. However, I want the files to be in a specific directory out of Python reach (in a way).

Oh by the way, the code of my attempt is:

import os, sys
sys.path.append(os.getcwd()[:os.getcwd().rfind(os.sep)])
execfile(“..” + os.sep + “cpu.py”)

Ugly something, heh?

Ok I was just going to publish this post but I had to try another thing which really solved the issue. I managed to use __import__ well.

So now it looks like:

import os, sys
sys.path.append(os.getcwd()[:os.getcwd().rfind(os.sep)])
cpu = __import__(“cpu”)

This time it solves the second problem as well – the namespace (of the module)  issue. And yet we need the new path. Can we get rid off it?

A C Tidbit

December 12th, 2007

Yeah well, the title is a bit weird… So I’ve been reading the C standard for fun (get a girl?) and I was interested in two things actually. First of all, while I was reading it a friend asked me a question if the layout of local (auto) variables in the scope of a function have any special order. The short answer is no. And I won’t extend this one much. Although nothing is mentioned about the “stack” and how variables are supposed to be in memory in this case. Now this is a bit confusing because in structures (and unions) what you define is how they lie in memory. But this only sounds normal because if you need to read something from hardware in a way, you really care about the order of the members of the structure… Though as it happened to me while I was reversing I saw lots of variables-recycling. Means the compiler uses a variable (which is in the stack) and afterwards went it’s out of scope it re-uses the same memory place for another variable in the same function. Or it might be the coder sucked so much that he used the same variable a few times…:P which I doubt since for once the dword was used as a byte. So the only thing you know about the stack of the function is its layout according to that same function’s code of that specific version and only after compilation (assembly listing is good too).

The other thing I was curious about is the pointer arithmetic that is mixed with integer expressions:

char *p = 0;
p + 5

We all know that it will evaluate to 5. But what about:

long *p = 0;
p + 5 ?

This one will evalute to 20, 5 * sizeof(type)…

Actually the compiler translates the operator [], array subscript, such that p[i] = *(p + i). Nothing is special here as well, unless you didn’t mess with C for a long while… Now it becomes really powerful when you can cast from one pointer type to another and use the indirection operator, so for example, say you want to scan for a dword in a memory block:

long *p = <some input>;
for (int i = 0; i < 1000; i++)
   if (p[i] == value) found…

 But in this case, you can miss that dword since we didn’t scan in byte alignment, so we have two options, either changing the type of pointed p to be a char, or to make a cast… In both ways we need to fix the limit of scan, of course.

So now it becomes:

if (*(dword*) ( ((char*)p) +i) == value) found…

This way p is scanned in byte units, because i is multiplied by sizeof(char). And we take that pointer to char, which is incremented by 1 every iteration and cast it to a dword and then derference that… I think you cannot avoid the use of a cast (or maybe automatic conversion) in the matter of getting this task completed.

Now it might be obvious to most of you, but I doubt it, since I fell in this trap as well:

long a[4] = {0};
printf(“%d”, &a[1] – &a[0]);

What will this code print when executed? I thought 4, as the sizeof of ‘a’ element is a 4 (denoted from the ‘long’ type), but to my surprise ‘1’ was printed; as it will do it as if the subscripts were the operands, however the result will be signed integer. Thus 1 – 0.

The bottom line is that p[i] is *((char*)p + i*sizeof(p[0])) and p[i] – p[j] is ((char*)&p[i] – (char*)&p[j])/sizeof(p[0]). Well the latter is less useful, but if you thought the printf above will return the sizeof the element then you are wrong :)

Never Trust Your Input – A WordPress Case

December 7th, 2007

 Although this post isn’t about Security, the title still holds true about everything you do with input, but whatever you do, never trust it. :)

While I was manually editting the last post about SQL I encountered a bug in WordPress. This is not a big deal, probably because it’s not a security bug, but it’s really annoying when the browser freezes while you’re trying to edit your post. So I switched to FireFox and the same bug happened again. In the beginning I thought something was wrong in the internals of the browsers, but both browsers have the same internal bug? That doesn’t make much sense. Looking at the Task Manager I saw that both IE and FF chew up memory and CPU at 50% (this is because I have two cores, otherwise it would be 100% on a single processor…). So immediately I understood that there’s an infinite loop running, hence the 50% CPU usage and it allocates memory in a way (eventually by strings). Next thing I did was to isolate the code of WordPress somehow, I wasn’t even sure what caused this bug to surface in the beginning. All I knew was that it has something to do with my post. Therefore I needed to take a look at my raw post’s text, since it contained some HTML tags. So eventually I saw that I had an unbalanced PRE tag. Unbalanced means that I add an opener tag but didn’t close it. So left only a “<pre>” in my post and saw that the browsers freeze. Now what? Digging into WordPress code I understood there is a special class for a textarea input. I realized I have to search for a “pre” string in the JS files (which originated from .php files). And eventually after a few trials and errors of uncommenting code I found this chunk:

var startPos = -1;
while ((startPos = content.indexOf(‘<pre’, startPos+1)) != -1) {
    var endPos = content.indexOf(‘</pre>’, startPos+1);
    var innerPos = content.indexOf(‘>’, startPos+1);
    var chunkBefore = content.substring(0, innerPos);
    var chunkAfter = content.substring(endPos);
    var innards = content.substring(innerPos, endPos);

   innards = innards.replace(/\n/g, ‘<br />’);
   content = chunkBefore + innards + chunkAfter;
  }

This is ripped from the tiny_mce_gzip.php file. The ‘content’ variable holds the text of my post, which is “<pre>”. Now notice the first line in the block:

var endPos = content.indexOf(‘</pre>’, startPos+1);

Which renders endPos to be -1, since there’s no close tag. And more over, there is no check for the return value, the programmer assumed it will always find a match. :(
Now let’s analyze the block to see why it becomes an infinite loop that chews up memory:

endPos = -1, innerPos = 4, chunkBefore = “<pre”, chunkAfter = “<pre>”, innards = “<pre>”;

Notice that ‘innards’ contain the whole content, since evaluating a substring with an input of -1 returns the input untouched… We can ignore the replace, which doesn’t really affect the loop in any way here. And be hold, ‘content’ is being reassigned to hold the whole new string but now it looks like: “<pre<pre><pre>”.

And the loop stop condition of ‘startPos’ returns true and again endPos gets -1… And bam, you’re browser is frozen.

The fix is pretty straight forward. It’s a shame they have a stupid bug like this. What I had to do at my end was to use phpMyAdmin to edit the SQL tables and change the post so it won’t lock my browser yet again. Although I could have fixed this bug on the server, for some reason I didn’t do it…

Anyway I sent an email to WordPress and hopefully they will fix it immediately, though it’s nothing urgent.

SQL: Setting NOCOUNT Programmatically

December 7th, 2007

As I mess with SQL at work, I learnt about some feature which is called, NOCOUNT. This feature tells the server to return the number of affected rows after each query a client does, thus most of the times wasting bandwidth for unused/unwanted data. I was interested in setting this option server-wide. Although, some people will say it’s a bad move, because if you later have more DB’s on the server which need this feature, bla bla… But I know what I want and that’s what I will do.

The problem was how to set it programatically, I searched the inet for a bit, but didn’t come up with anything useful. So I decided to write that snippet on my own. It’s a bit tricky, but once you see it, it looks ok.

USE master;
DECLARE @CurrentValue AS INT;
CREATE TABLE #tmp (v VARCHAR(50), w INT, x INT, y INT, z INT);
INSERT INTO #tmp
      EXEC sp_configure 'User Options';
SELECT @CurrentValue = y FROM #tmp;
DROP TABLE #tmp;
SET @CurrentValue = @CurrentValue | 512; -- 512=NOCOUNT
EXEC sp_configure 'User Options', @CurrentValue;
RECONFIGURE WITH OVERRIDE;

The trick was how to get the results from the EXEC sp_configure so you can read them yourself, the solution was to create a temporary table which you instruct SQL to insert the results of sp_configure into that table. Later on, you can simply read from this table and do whatever you’re up to. This way we can preserve the original value and in addition set the NOCOUNT flag (512, as a bit field…).