applied security conferences and training: CanSecWest | PacSec | EUSecWest |

EUSecWest 2008

The third annual EUSecWest conference will be held on May 21/22 at the Sound club in Leicester Square in central London, U.K.

May 16, 2008

Script runtimes are vulnerable just like everything else

Google's AppEngine is vulnerable to attack according to security expert Justin Ferguson. For those wondering what AppEngine is, it's a platform for web-based application development, and it's not just Google, it's potentially all interpreted code. Developers who believe they are safe from memory corruption problems that plague C and C++ programs should think twice about that assumption. Justin has been taking a long, hard look at scripting languages and what he's uncovered may surprise you. The interpreter itself is a big, fat target.

Sean Comeau: For years we've been told that if we write our applications in scripting languages such as Perl and Python that we are safe from buffer overflows. How much truth is there to this belief?

Justin Ferguson: Well, it's true and it's not. It's not quite as likely as if you had written the application yourself (of course depending on your personal skill-set and such), and for a buffer overflow to occur you're really dependant on the authors of the interpreter or virtual machine. That said, the interpreters et al are just C programs themselves, and often they're big and complex, and thus have plenty of bugs lurking around. I've found one every time I've sat down and looked at Python and Ruby, PERL is a bit different, but I don't think it's so much of a coding quality issue but rather grokking the PERL source code requires a bit of patience, everything is a macro to a function pointer to a macro to a function pointer, et cetera.

Where things become tricky is that I not only have to find a buffer overflow or similar in the interpreter, but I have to find one reachable via your scripted application. Times are changing though, software as a service is rolling into new domains. For instance, if you look at Google's AppEngine, they just hand you a restricted Python interpreter.

That's actually pretty dangerous as it allows a potential attacker a lot more control and means I just need a bug in the interpreter, I can craft the code around it and get the interpreter exactly into the state necessary-- even more, I can directly call APIs that will leak information. For instance, is a 64-bit Linux box with ASLR, I can tell this because I can call functions from Python like sys._getframe() which will leak a heap address. Google is trying hard to restrict the interpreter, but at the present time their implementation is half-baked and their security team is going to lose sleep because of it.

The basis of my point is that interpreters are somewhat of a forgotten attack surface, and are viable targets, but its highly contextual and you may audit a web-app all the way down through the interpreter and find nothing-- but the point is we should be doing it this way and not just looking for SQL injection and XSS bugs. In short, There's plenty of bugs in the interpreters, so in a lot of ways the conjecture that you won't have a buffer overflow in your code if you use a scripting language is pretty absurd.

Sean Comeau: You're saying Google has exposed a huge attack surface by executing un-vetted Python scripts on its infrastructure. Is this really any different from the threat faced by traditional shared web hosting providers who offer CGI, Rails, Zope, Catalyst, etc?

Justin Ferguson: Indeed that is what I am saying, and no, it's really not different than the shared hosting providers other than they're typically for-pay, so you need to put down a credit card number or similar, or they're a lot more restrictive.

Sean Comeau: Do defenses designed for unmanaged C/C++ code such as ASLR, more restrictive permissions on process memory (NX bit), stack protection, heap cookies, etc also make the VM/scripting vulnerabilities you have found more difficult to exploit? Why or why not?

Justin Ferguson: Well, I think ASLR is something that we will find is highly effective and will withstand the test of time. Simply put, it really doesn't matter what I do, in 90% of instances I still need some idea of the address space. For instance, depending on the language I may be able to return into a function by name, but I still need the address of the function that does the resolution. Where ASLR really shines in the scripting languages is in handling dynamic typing, in Python for instance this is handled by a pointer in the structure and then the structure member is compared against the address of that types object. For instance, when checking a PyCodeObject (the object that represents the bytecode), the comparison is done via the following macro:

#define PyCode_Check(op) ((op)->ob_type == &PyCode_Type)

So in order for me to go down a code path that requires me to survive this check, I need to know the address of PyCode_Type. I have a couple methods for getting around this, but it's still very contextual to the exploit and what environment you're working with exactly. So ASLR is still highly effective. The upside of this though, is that no one reports read access violation bugs, so you still find them a lot in code bases, not because no one else found them, but because no one has reported them. This combined with some of the features of the languages make information leaks a bit easier, but it's still contextual.

Stack/Heap cookies are designed to protect either C/C++'s metadata, or the processors, not higher layers of abstraction. That said, most of your overflows occur in the heap, and every time you corrupt heap metadata you run the risk of triggering some check or some fault. When developing a proof-of-concept for one of the bugs I ran across an instance where I needed to pass a value of -24 to the allocator, however when it went to NULL terminate the string the index of -24 corresponded exactly to the block of memories size member, which kept killing me when I hit a realloc() a few dozen lines down. This isn't really specific to the various protections, you could have them all turned off and still run into this problem, it's just the nature of the beast.

Having proper memory permissions (i.e. NX) doesn't really disable exploitation in the interpreter at all, it just often makes life a lot more difficult. You have to realize that *a lot* is accomplished via function pointers, so in addition to losing opcodes because you either couldn't get or have not gotten the address of that objects type, you lose some opcodes that use function pointers from your object as well.

So to summarize, ASLR is still very effective (and sadly one of the lesser deployed protections), The various cookies don't present much of an issue, but its contextual. NX is pretty moot, bytecode is read/interpreted, not executed, once you have control of that stream, the game is pretty much over. That said, I wouldn't run around and turn these things off, they're still effective and useful, just not as much in this context.

Sean Comeau: Do you think memory management bugs in scripting runtimes are as common as they are in other applications written in unmanaged code?

Justin Ferguson: That's a somewhat hard question to answer, it really depends on what applications we're talking about. Some applications are just bug-ridden, others were written well, whereas others have been around long enough to have most of their bugs killed. I'd say in general, they're on par with other programs of the same complexity I've looked at.

Sean Comeau: Seeing Perl coredump isn't shocking to me. It happens, especially with modules containing xs calls to native libraries. Have you found exploitable bugs in the Perl itself or its core modules? What about 3rd party modules?

Justin Ferguson: For PERL, PERL really is an oddity, I have a friend who exclaimed that he still maintains that PERL wasn't written, that it was found on a crashed UFO. If you ever look at the source code some, you'll understand why. That said, I put on rubber gloves and goggles every time I dig into that code, and focused mostly on third party modules. I'm not entirely positive how much of the PERL I will get to in the presentation, I really tried to fit too much into the time slot, and as a result I think I only talk about bugs in the ImageMagick interface. Bugs in ImageMagick doesn't surprise anyone. I think in general what we're going to find is that third-party modules are surprisingly buggy, the core is more stable, but the bugs they have are harder to fix than in some of the other languages.

Sean Comeau: What about Python? Are the problems you found in it different from those in Perl or is it the exact same type of problem?

Justin Ferguson: Python is the language I was specifically referring to when I said PERL's bugs are harder to fix than other languages. Python has a significant number of bugs, but generally they're pretty easy to fix, and spot. They make a lot of use of signed integers, even their reference counters use signed integers. This causes a lot of problems for them, I think nearly every bug I've found thus far in Python ties back to integer related problems. As for where these bugs are, they're all over. The first time I looked at Python, I started in the most obvious place, the modules that ship with it. In doing so I found a bunch of bugs in one of their modules that parses images, but as it turned out it had been reported. Then I stumbled across some problems in their zlib code. This was caused by a lack of input validation in an allocation API that used signed integers without checking for negative values and added the size of an object to this value. I then took this function and looked around the source code and found other bugs in their SSL functions, Core API, et cetera-- it was everywhere, but in one fell swoop all of those bugs were patched simply by checking that the value was positive.

Sean Comeau: Is there any way for Perl/Python/Ruby developers to code defensively to protect themselves from these bugs or is the only solution finding and fixing all the interpreter bugs?

Justin Ferguson: Well, I think the biggest thing that needs to happen for developers is that they're going to have to start coding a little more like traditional C programmers. This means that although the language may support a 2 gig string, it's probably not in their best interest to allow strings that long, as it only takes a multiplication of two to cause that length to roll over on 32-bit platforms; if you think ASCII to UTF-16 and such, you'll find that's not incredibly uncommon. That's probably not the most realistic example, at least not until we can send gigs of data over the wire-- another example is integers. Most of these languages don't have the concept of integer overflow in the source language, so no one checks the integer values unless it messes with the logic of their program. However internally, it's still just a C program, so large or negative values can have an impact there. The point being, there needs to be sanity checking of user-supplied data even if it doesn't directly affect the logic of the application you're writing yourself, you need to be thinking about how the interpreter might handle what you're doing.

Sean Comeau: Can you talk about some of the methods you're using to uncover problems in Perl/Python?

Justin Ferguson: This is a question that always boggles me, I get asked this quite a bit during the course of doing my job-- typically by clients who want to know what tools I use. Generally speaking, I don't make much use of tools, my eyes and a text editor. What I've been doing is reading the source, finding a pattern that causes a mistake and then seeing where I can apply it. For instance, one of the bugs that the Python developers patched was in a function called PyString_FromStringAndSize(char *, int), it basically serves as a realloc() for string objects, the second argument it took was signed and internally no sanity checking was done before they summed the parameter with the size of a structure, so you could pass in a negative value that's less than or equal to the size of that structure and get it to misallocate memory. So when I stumbled across this API call, I took it and went through the source to see where I could apply it. As it turned out, it affected a lot of things-- modules, core objects/datatypes, et cetera. There's more examples, but not everything is patched so I don't really want to talk about it yet-- but that's the general idea, dig through the source until you find where they messed up, and then expand off of that concept.

Sean Comeau: Have you looked at other runtimes/languages such as the .NET CLR, Java, or any others?

Justin Ferguson: Not yet, but that's exactly where I'm ultimately heading with this. A few years back I worked as a reverse engineer and I noted that more and more of what I was reversing was in things like Delphi, VB, .Net, Java, et cetera, I saw the writing on the wall so to speak. Whether I like it or not, the managed languages and other higher-layers of abstraction are here to say, and their use is only going to increase. I don't think C/C++ programs are going anywhere, but I think we'll see a gradual decline as time progresses and probably get well under the 50% mark. So I realized this and I started to wonder, what is the future of insecurity? I not only do this for a hobby, I do it for a living, so I really can't afford to wake up one day and find out that the entire world changed around me and now I'm out of work because I failed to adapt. That's what this really is, the writing says 'evolve or die', and I don't know if this is it or not, but it's my first stab at it.

Back to the specific question however, I opted to work with the interpreters first just because they're a bit more open and I had no experience doing this sort of thing really, so I wasn't entirely sure exactly what I would find and I didn't want to get lost in details of reversing some Microsoft binary. Once I feel I have an adequate grasp of what types of things work and what types of things don't, I'm coming for both of you, Sun and Microsoft.

Sean Comeau: Thanks Justin!

Justin will reveal more details at EUSecWest in London, May 21/22, 2008. Register now.