The 3DS Cryptosystem

April 6, 2016, 5:12 pm

≪ Previous: Cosmo3DS: The CFW nobody wanted

I think one of the biggest challenge for system engineers is designing security. Recently, at the 32c3 conference, plutoo, derrek, and smea presented a series of hacks that completely defeated the security of the 3DS. As a result, people have implemented boot-time unsigned code execution (called “arm9loaderhax” in the 3DS community; other communities might relate this with “untethered jailbreak” or “bootloader unlock”). What I want to do today is not to reveal anything novel, but look at the security of the 3DS as a whole and see what went wrong. In this deep dive, I will hypothesize the design decisions that led to the cryptosystem found on the 3DS. Then I will present the flaws that led to “arm9loaderhax.” Finally, I will summarize the findings and provide a few tips to fellow engineers in hopes that these kinds of mistakes will not be made again. (Extra details are provided in parenthesis, they are for people with deeper knowledge of the 3DS and are not required to understand the rest of the article.)

Preliminary

Let’s begin with the first question that ought to be asked when designing a secure system: what is the threat model? I believe that for Nintendo, it boils down to two things

System integrity: all code running on the system should be checked and signed by Nintendo. This ensures that users are safe from malicious code and that Nintendo gets licensing fees from games and applications.
Content protection (DRM): code and resources should not be extractable by the user. This protects license holders (creating trust in the system) and ensures intellectual property cannot be stolen by competitors.

Note that the threat model takes account of factors that directly impact business. I’m sure there’s also other points considered (prevent cheating, protect user privacy, etc), but once you know what the most important assets are, you have a better understanding of how to protect them. These take priority. Of the two points listed, which is the most important? You might argue DRM because that’s where most of the money lies (and I think Nintendo did too). However, we will soon see that without system integrity, you cannot even toy with the idea of DRM.

System Integrity

In order to ensure all code running on a system is authorized, there needs to be a trust hierarchy. An example of this is: userland code always trusts kernel code, but kernel code has to cryptographically verify user code before running it. Now who verifies kernel code? The kernel loader. And so on… This process ensures that every component of the system has been audited by some other component.

SecureBoot On the 3DS (as with most systems), the boot ROM is the root of the trust hierarchy. The code there cannot be changed, and contains root cryptographic certificates (public keys).

There are two processors on the 3DS. The ARM9 “security” processor facilitates crypto (access to key generator, AES/RSA engines, etc), file system access to the NAND, and other low-level stuff. The ARM9 processor talks to the ARM11 processor through Process9 (user mode on ARM9) which ideally does security checks and then talks to Kernel9. (However, if you watched the 32c3 talk linked to above, this isn’t necessarily the case as Kernel9 has a system call that is literally “run whatever code you pass in.” Therefore, the diagram on the left is actually wrong as Process9 is not separated from Kernel9 at all.) We will look at the ARM9 Loader later.

The ARM11 “application” processor is the CPU that does everything you see. I go into more details about how modules are loaded in my last article, but the gist of it is that there are kernel calls that only certain system module (running in user mode) can access. Applications must communicate with these modules to get system resources. The ARM11 kernel sometimes has to communicate with Process9 (on ARM9) to get access to more sensitive system resources.

So aside from a couple of concerns, the design of the secure boot chain is mostly solid. (Of course implementation is another story; there has been many flaws in the system software and almost no exploit mitigation.) One good design choice here is that Kernel11 has low exposure. Because the system modules does most of the work, the kernel does not have to expose all syscalls to every application. For example, a game does not have access to the syscalls to map executable memory. So, in order for an exploit found in a game to run arbitrary code, it would have to first compromise the “ro” module (which has the right syscalls). This also means that code running in supervisor mode can be more limited and smaller in size, which means that bugs would be easier to spot. Again, this is all good in theory; but in practice, some very stupid implementation flaws (allowing the GPU to write to executable memory for example) makes this all moot–a story I don’t have time to get into. Another good design choice is that complexity only grows as you move down the trust hierarchy. When there’s more code, there’s more bugs, so keeping the bulk of the complexity in less trusted code is good for security. However, this is only a rule of thumb and should not serve as a type of defense!

Content Protection

Let’s assume for now that we have system integrity (we don’t). How do we implement DRM? The truth is that DRM is impossible in theory. However, as engineers, we do not always have to follow theory. The secret of DRM is that, unlike other cryptosystems, you are not designing it to be secure forever. (Note for the pedantic: I know that no cryptosystem currently known would last forever, but if you can point out this fact, you also know what I mean.) Specifically, if your DRM can last 100 years, most people (on the engineering side) would be very happy. In fact, if you can provably do that, you would “solve” the problem of DRM. Most DRM schemes are designed with decades in mind (something that you might not admit to business people). That means we can commit some security faux pas that the textbooks would forbid. For example, security by obscurity is a tool here. If it takes the hacker 5 years to figure out your scheme, then by all means do it, because you just bought another 5 years. (But be warned that if you think it takes 5 years to crack the scheme, it likely will take 5 months.)

Original 3DS Implementation

Pre70Encryption First, let’s go over the crypto primitives we have on the 3DS. There is an on-chip hardware AES engine with 64 keyslots. When a key is written into a key-slot, it stays there until it’s either cleared or rewritten with another key. You can write a “normal key” into the key-slot and the AES operation works as usual (with the normal key being the key). The more interesting case is if you instead write two keys–KeyX and KeyY–into the slot. In that case, an on-chip key generator derives the “normal key.” However, the normal key is never revealed outside the AES engine! That means, even if we extract KeyX and KeyY (by dumping the code that sets them), we cannot find the normal key (… in theory, I’ll come back to this point later). There is also a hardware RSA engine that has its own key-slots and operates similarly (except there is no key generator).

Games are stored in a container format named NCCH (unnecessary details: game carts are in NCCH with an additional block level encryption on the cart. eShop games are in NCCH with an additional layer of encryption by a “title key” that is decrypted by the “common key” (key-slot 0x3D, KeyX set by boot ROM, KeyY set by Kernel9). However, these details are unnecessary because it does not change anything in the trust.) It is decrypted with a key generated from a KeyX set by the boot ROM and a KeyY set by Process9. Process9 is found in NATIVE_FIRM which is stored encrypted on the NAND with a console-unique IV. The key to decrypt NATIVE_FIRM is set by the boot ROM. The console-unique IV is derived from a (maybe not so) unique card ID found on the eMMC (NAND). This is to prevent a downgrade attack on a new console using the firmware extracted from an older, vulnerable, console.

7.x Implementation

Speaking of vulnerabilities–in 2013, the first 3DS hack came out (a simple buffer overflow in Process9). This set Nintendo into something of a panic. Hackers broke the trust hierarchy at the Kernel9 level, which means that they control everything underneath that. Even though the NCCH KeyX (0x2C) was in the boot ROM and therefore still safe, hackers can use the key generator as a black box to decrypt whatever games they like. In order to protect future games from being decrypted in this way, they came up with a rather ingenious plan

70Encryption

In the boot ROM, the RSA engine (slot 0x0) is initialized with a key so it can be used later in the boot process. In Kernel9, slot 0x0 is used with an RSA operation and the key-slot is overwritten with another key. Just like the AES engine, once a key-slot is overwritten, it is impossible to extract the previous key. Note that it is unlikely Nintendo purposely erased this key-slot for security measures. The real reason is likely that there’s only 4 RSA key-slots (versus 64 AES key-slots), so they needed to re-use slot 0x0 for other RSA operations. The ingenious method Nintendo devised was to use some unrelated data outputted from the RSA engine at slot 0x0 to derive the 7.x NCCH KeyY. Because all versions of the firmware (including the vulnerable ones) have wiped this slot by the time the exploit could be triggered and since the boot ROM is still secured, even if you run the algorithm to derive the 7.x NCCH KeyY, it won’t work since RSA slot 0x0 has already been wiped. (In the diagram, the old NCCH key is used in conjunction. This is for compatibility–if you put a 7.x game on an older console, the game icon and banner can still show up, but you must update the system before the game can run).

Ultimately, to get these keys you need to get code execution to work before Kernel9 is first initialized (and wipes the key-slot). arm9loaderhax (described later) found in 2015 does this, but from what I understand, a undisclosed method was used to retrieve these keys originally (in 2014). However, since a 9.2 exploit was found in early 2015 (I wrote a series on it back then), the black-box method to decrypt games worked again.

New 3DS Implementation

Near the end of 2014, Nintendo released the New 3DS–the first major hardware revision. This was also their chance to try to salvage what’s left of their cryptosystem. Sorry mobile viewers, the diagram only gets more complicated…

First, I want to introduce an element I’ve left off before because it did not affect the security then. The OTP (one time programmable) section is unique data found physically in the CPU (okay pedants, the “SoC”). Its primary use is to store a ECDSA private key for identifying the console to Nintendo’s servers (for eShop and services). It exists on the old 3DS as well but on the New 3DS, it has been integrated into the chain of trust.

The New 3DS also comes with a new key-store found in a sector of the eMMC (NAND). The data in the key-store is the same on each console, but the key-store itself is AES encrypted with a SHA256 hash of the OTP section as the key. That means that the encrypted key-store is console unique! The OTP region is disabled early in the boot process: Kernel9 on the old 3DS (but not really, more on this later…), and the new ARM9 Loader on the New 3DS. That means, if implemented correctly, it should not be possible to decrypt the key-store after boot–even if you exploit the system at a later point. (Key word: correctly. Even though the keys were wiped from the AES engine, they forgot to wipe it from the SHA engine used in deriving the keys!)

Using the new NAND key-store, the New 3DS derives keys for NCCH decryption of New 3DS-only titles. Remember, the old 3DS does not have this NAND key-store, so it would not be able to derive the new keys. It is also unlikely that there is any hope in the form of another lucky slot that’s cleared early in the boot process. That means, black-box decryption of any old 3DS supported games will always be possible. However, maybe they can still protect New 3DS only games…

The other addition is the ARM9 Loader. Previously, once NATIVE_FIRM is decrypted, the code for ARM9 and ARM11 runs on their respective processors. Nintendo now added an additional layer of encryption for the Kernel9/Process9 code and they added ARM9 Loader to derive the new keys and decrypt the new layer. I think the idea here is this: there are potentially 32 keys we can use in the NAND key-store. If someone hacks the latest ARM9 firmware and obtain the secrets there, we can always release an update that encrypts it with a new key. Since the ARM9 Loader is added to the hierarchy of trust, as long as A9L is safe (and it is a much smaller code-base than Kernel9/Process9), even if they hack Kernel9, they cannot find the new keys released in an update and must hack the firmware again to get the new secrets. This was perhaps a solution to the panic of 7.x when it was hard to re-secure secrets after the system was hacked once.

We first saw this new system put to test with firmware 9.6. Before then, Nintendo made the mistake of forgetting to clear a key-slot initialized by the OTP hash. Although the OTP hash was still safe, it was possible to derive many other New 3DS keys from access to that key-slot. This includes the KeyX/KeyY for the new encryption layer on the ARM9 code. But no fear! 9.6 encrypted the ARM9 code derived from a new key in the key-store and for almost a year people outside of a few inner circles cannot decrypt anything for the New 3DS.

Things Fall Apart

What I like about the arm9loaderhax (here-forth A9LH) is that it’s only possible because of all the added complexity in the cryptosystem. The original system I showed would not have been vulnerable. However, A9LH is inevitable because Nintendo must respond to each attack. It is just unfortunate that each hole they patch only reveals more leaks. I’m not going to go into details of how A9LH works, delebile did a wonderful writeup on it. Instead, I’ll try to answer why it works.

To do that, let’s work backwards. The finale is that because the decryption of the ARM9 code is unauthenticated, if you corrupt the key-store, a junk key will be derived and the encrypted ARM9 code will decrypt into junk data. Then the console jumps into junk data which, if interpreted as code, can branch into a controlled region of memory with our payload. Thanks to ARM’s RISC-y instruction set, a random branch instruction is easy to find.

But if we were to have performed this attack on firmware 9.5, it would have failed. In 9.5, the first 16 bytes of the decrypted key-store is used to derive key-slot 0x15. That key is first used to decrypt a control block at offset 0x40 of the encrypted ARM9 code. If the control block is all zeros, then slot 0x15 is used again to decrypt the actual binary at offset 0x800. Now on 9.6, the first 16 bytes of the decrypted key-store is still used to derive key-slot 0x15 and the next 16 bytes are used to derive key-slot 0x16. 0x15 is still used to check the control block but 0x16 is used to decrypt the binary! They forgot to check the validity of the second 16 bytes of the key-store! They either forgot to change to slot 0x16 for checking the control block, or for technical reasons, they had to use slot 0x15 there and just neglected to use a test vector with slot 0x16. Either way, the only reason this is possible at all is because Nintendo had to revoke 9.5 and use a new key.

But we would still be stuck if we didn’t have access to the decrypted keystore. We had to modify the second 16 bytes but keep the first 16 bytes intact. There were two things people exploited here to get the KeyX/KeyY for decrypting the key-store. First, as I touched upon already, is that Nintendo forgot to clear the SHA256 registers which was used to derive the keys. However, to exploit this, for various reasons beyond the scope of this article, you needed to perform a hardware attack. The other method was more insidious. Before firmware 3.0, Nintendo forgot to lock the OTP after boot! That means if you flash any firmware < 3.0 on your 3DS (which involved some trickery for the New 3DS since it was not designed to run < 3.0), it would boot normally (as the code, of course, is signed). Then you can use any of the exploits found since then, take over the system, and dump the OTP. This means the New 3DS key-store can be decrypted and all new keys (even ones not currently used) can be derived. It’s not surprising that such a hole was overlooked because back then (three years ago), Nintendo did not expect the OTP to be used in the chain of trust. The irony is that the feature designed to bring more security was the one that completely broke it.

Key Generator

Since A9L is broken, any 3DS past, present, or future can be hacked on boot. That’s the result of A9LH. (Although for the future, a new exploit is needed to trigger a downgrade to obtain the OTP. ~~However, I also think something that modifies the eMMC CID via hardware might also be possible.~~ EDIT: It is not possible. NAND encryption uses a unique console key along with an unique IV derived from the CID.) So system integrity is gone. What about content protection? It’s hanging by its last thread. We can always use the 3DS as a blackbox for decrypting content. However, the goal for attackers is to get the “normal” keys and be able to decrypt content offline. The key generator is the defense for this. Remember that security by obscurity can only buy you so much time? It took about four years since the original release of the 3DS for hackers to break it.

If you haven’t watched the 32c3 presentation linked at the top of this post, I highly recommend you do so, as I’m not going to give the full details here. The gist of plutoo and yellows8’s ingenious crack was that they discovered through cryptoanalysis the algorithm of the key-generator was just some XORs and rotates. They did this because there were a couple of normal keys that were “leaked.” A couple of keys that we only know KeyX and KeyY for are found as normal keys on the WiiU (needed for communication with 3DS). Another key was accidentally included as a normal key in one firmware release, and then changed to KeyX KeyY in the next release!

That means that if we get KeyX and KeyY, we now have the normal keys. Unfortunately, there are still a bunch of shiny keys hidden in the boot ROM (which is also disabled like the OTP after use; no mistakes found yet).

Postmortem

So what went wrong here? I want to summarize by listing some of the big mistakes Nintendo made that hopefully won’t be made by anyone again

Focusing on content protection instead of system integrity: you cannot have one without the other. Always implement exploit mitigation when you can!
Not having a contingency plan for when the system is hacked: no matter how secure you think your system is, you need to have a plan for when it’s broken. That way you don’t end up scrambling around and introducing more bugs.
Too much complexity: having lots of blocks that say “AES” and “RSA” in your plan might impress the boss, but it just adds to the attack surface. Always go with the simplest plan that secures against your threat model.
Do not change the trust hierarchy after production! Everything is built on that hierarchy. Adding/removing from it will break assumptions that you might not even be aware of.

Sources

Some of the information here is from my own reverse engineering, but the bulk of it is from information found on 3dsbrew.org. Please let me know if there’s any mistakes or anything that doesn’t make sense.

↧

CGEN for IDA Pro

December 29, 2015, 12:00 am

≫ Next: 3DS Code Injection through “Loader”

≪ Previous: The 3DS Cryptosystem

It all started when I wanted to analyze some MeP code. Usually, I do all my disassembly in IDA Pro, but this is one of the few processors that isn’t supported by IDA. Luckily, there is objdump for this obscure architecture. After fumbling around for a bit, I was convinced that porting the disassembler to IDA would be a much better use of my time than manually drawing arrows and annotating the objdump output.

Process

Turns out, there isn’t too many resources online on writing IDA processor modules. The readme of the SDK was minimal (it tells you to read the sample code and the header files) and refers to two documents: an online guide that is now gone and The IDA Pro Book by Chris Eagle. Opening the book to the chapter on writing processor modules, you’re greeted with a dire warning to turn back (along with a note about the lack of documentation) as many have tried and failed.

One of the reasons why writing a processor module is so challenging is that the processor_t struct contains 56 fields that must be initialized, and 26 of those fields are function pointers, while 1 of the fields is a pointer to an array of one or more struct pointers that each point to a different type of struct (asm_t) that contains 59 fields requiring initialization. Easy enough, right?
Chris Eagle, The IDA Book 2nd Edition

Well, I’m not easily discouraged, so I read on and familiarized myself with the process of creating a processor module. I’m not going to describe the process here in detail because Chris does a great job in the book, but I’ll give a brief outline

IDA Processor Module

There are four components of a processor module. The “analyzer” parses the raw bits of the machine code and generates the information about an instruction. The “emulator” uses that information to help IDA with further analysis. For example, if an instruction references data, your module can tell IDA to look for data at that address. If the instruction performs a function call, your module can tell IDA to create a function there. Contrary to its name, it does not actually “emulate” the instruction set. The “outputter” does just that: given the data generated from the analyzer, it prints out the disassembly to the user. Finally, there’s the architectural information, which is not a component mentioned elsewhere but I consider it one. This is not code, but static structures that tell IDA information such as the names of the registers, the mnemonic of the instructions, alignments and so on.

CGEN

The binutils (objdump) for MeP are machine generated by CGEN. CGEN attempts to abstract away the task of writing CPU tools (assemblers, disassemblers, simulators, etc) into writing CPU definitions. The definitions describe the CPU (including the hardware elements, the instruction sets, operands, etc) with the Scheme language. CGEN takes the definition and outputs C/C++ code for all the needed CPU tools. Originally I wanted to avoid CGEN and just wrap the (generated) binutils code to an IDA module (à la Hexagon). In theory, your module does not have to follow the convention laid out above. You can make the analyzer record the raw bits, the emulator do nothing, and the outputter use binutils to generate a complete line and print it out. However, in doing so, you essentially lose most of the power of IDA (finding xrefs, stack variables, etc). It would also be a shame to not use all the information given to us by the CGEN CPU definitions. These definitions (in theory) are strong enough to generate RTL code to implement the processor, so we would like to give as much of this information to IDA as possible.

CGEN Generators

The generators themselves (CGEN docs refer to them as “applications”) are also written in Scheme, in the Guile dialect. I have never written a line of functional code before, so it took me a day to understand the relatively small codebase. CGEN has its own object system that they call COS. Everything defined in the CPU descriptions becomes objects, and each generator gives these objects methods to print themselves out. For example: the simulator might give the operand object a “generate code to get value” method. Then a call to generate the semantics of an instruction into C code would use these objects’ methods. Like a true software engineer, I cut out functions from the generators for the simulator, disassembler, and architecture description and stitched them together with my own code to generate various components of an IDA module.

The analyzer used, as its base, the simulator instruction decode generator. I had to modify CGEN to record the order of operands as specified by the instruction syntax (the only modification to CGEN itself, everything else are additions). Then, I overwrote the simulator’s method definitions for extracting the operands from the instruction with code to populate the “cmd” structure in IDA (which requires the operands to be ordered).

The emulator used the simulator model generator as its base and was the most difficult to write (in terms of code complexity). The one major issue here is that while the generated simulator expects code to run in order and have state information stored, the IDA emulator does not store state information and IDA does not guarantee that your emulator will be called in the order that the instructions appear. That means we cannot make any assumptions about the state and our emulator can only make decisions based on the instruction alone. Because we only care about finding data and code references with the emulator, we can make the following simplifications:

Any conditionals will have the condition stripped out and all paths will be taken
Using any values from registers will stop the emulator and return immediately
Setting any values to registers will evaluate the value but discard the result

The first point allows for xrefs to be found regardless of the condition. A conditional branch, for example, will allow for a code xref to be generated. The second point is there because we do not know the state, so any dependencies on a register value that’s not already stripped out will make finding a xref difficult. In theory, we can still find offset xrefs this way, but we would have to know that only additions/subtractions are used and only a single register is used and that adds a lot of complexity. The third point allows memory reads to be captured. With those simplifications in place, we know that any memory reads/writes and any PC reads/writes that are found without knowing the state can be turned into xrefs.

The outputter used the syntax parser (binutils’ opcode builder) as its base. It reads the instruction definition in order to output the right orderings of parenthesis, commas, and so on. I just replaced the generate print methods of the hardware objects to generate calls to IDA output functions.

Results

MeP executable loaded and recognized by IDA Pro. All the blue is a result of the auto-analysis.

At the most basic level, the generated modules outputs what you would expect from objdump. The analyzer will find the right type for the operands (if possible). The emulator tries to find all constant addresses and adds xrefs to them (code and data). The outputter will print all instructions correctly, and the operands with the right type/size/name if needed.

The main thing it doesn’t do right now is keeping track of the stack pointer. It also does not identify if branches are jumps or calls (requiring CF_CALL flag). It does not identify if an instruction does not continue flow (requiring the CF_STOP flag) (it’s actually trivial to add this, but harder to add it without introducing emulator code to other generators. Since it’s easy to identify the instructions by hand, I decided to leave it out).

Usage

Once you generate the IDA module components, you still need to manually write the processor_t structure, the notify() function (optional), and implement and special print functions (as defined by the CPU definition). Then you can copy the CGEN headers from binutils and compile it with the IDA SDK. Take a look at the MeP module as an example. You can reuse most of the non-generated code as-is (just change some strings and constants). If you run into any issues, feel free to contact me. I haven’t tested this on anything other than MeP because of laziness but I hope the code works more generally.

Downloads

The CGEN code is here and the Toshiba MeP module is here. The MeP module has basic stack tracking and call recognition added manually. When I have the time, I’ll port the rest of the CGEN supported modules that IDA does not support over.

↧

3DS Code Injection through “Loader”

March 28, 2016, 12:00 am

≫ Next: Cosmo3DS: The CFW nobody wanted

≪ Previous: CGEN for IDA Pro

I’ve seen many CFWs (custom firmware; actually they’re just modified firmwares) for the 3DS but there seems to be a lack of organization and design in most of them. I believe that without a proper framework for patching the system, writing mods for the 3DS is extremely difficult and usually requires an in depth knowledge of the system even to make simple modifications. So here I present a plan that I hope developers will pick up on and contribute to.

3DS System Firmware: An Introduction

The 3DS has a microkernel architecture. This means most functionalities are implemented as modules (running as separate processes) and IPC is used by modules to talk to one another. The kernel has limited functionality (basic memory management, threading and synchronization, and access to IO memory) and these basic functions are exposed to modules (where the main work is done). All modules run in userland, but the security is not equal. There are three categories of modules: system, applets, and applications. (This is not an official categorization, as different modules have permission settings based on service-call access, access to other modules, memory space, and so on. However, you can basically group modules by these security properties into these three groups.) System modules do the backend work: installing content, TLS stack, communication with Nintendo’s servers, etc. Applets are system software the user “sees”: home menu, notifications, Nintendo ID login, keyboard, etc. Applications are the rest: settings app, camera app, games, etc.

The reason why it’s hard to write patches or mods for the Nintendo 3DS is because of this modularization of the system. When we take control of the kernel, patching system features is not as simple as finding the right address in memory and overwriting it. Because everything is in different processes (and each have their own virtual memory space), you either have to recognize the kernel layout and parse kernel objects and map the right memory addresses to patch (NTR does this) or you have to stupidly scan all of physical memory (slow and high risk of false positives) for the right thing to patch. There’s additional complications if you’re using a hack that reboots the firmware or requires patching a process that’s currently not running: you have to know when the process launches and modify the code when it does. It is possible to insert hooks when you first gain control in order to run code when processes are launched. However, most of the CFWs out there does something insanely non-practical: it busy-loops in a separate thread and scans all of physical memory for the right thing to patch… ad infinitum.

So we want to do the following

Patch code for any module (system, applet, or application) we want easily
Ideally we can insert the hook early in the boot process so we can patch most modules BEFORE they start (an example would be to change the region of the system: once the CFG module is loaded, every other module gets the region information from CFG; so we can either patch CFG or patch every other module that uses it)
Everything should be fast and use minimal resources (linear scan of physical memory is NOT okay)
Ideally the code should not be overly dependent on offsets–you should not have to compile the code separately for each firmware version

3ds-boot Luckily there is a place in the boot-chain we can patch that does just that. The “loader” module is in charge of parsing the CXI executable format and loading the code into memory. That means that if we change “loader” to patch the code after it is read from the CXI file (and decompressed) but before it signals to the kernel to run (schedule) it, we can patch most modules in the system. The kernel bootstraps the five main modules: fs (file IO), loader (CXI loader), pm (process manager), sm (IPC), and pxi (talks to security processor) and then pm uses loader to start the rest of the modules. Other modules can start processes using pm, which in turn uses loader.

So loader is a good place to start if we want flexibility in patching the system and it fits the bill for what we require in loading patches, but there are some complications. First, the loader module does not have permission to access the SD card (where we want to use to store patches). Second, and more importantly, the code to read and parse patches from the SD card and run them when a process is being loaded seems like a good amount of code. If we wish to insert it into the loader module, we might have to move stuff around–a big pain with object code. So we need a better plan than just hijacking “loader.”

[2016/01/19 04:15:01] shinyquag: You could probably code your own “loader”, I’d wager it’s probably doable

Shinyquag would have won the wager because, luckily, loader is also the smallest module on the system. It has a couple of commands and no complicated logic. It’s small enough (20KB) to completely reverse in a day. Once we rewrite the module, adding functionality is a trivial matter. In fact, since Nintendo uses a lot of boilerplate code so our loader implementation would actually be even more lean. Finally, since this module has not changed much since 2.x, we can likely use it on future firmwares without much changes. However, just because it is doable, does not mean it will be easy…

Debugging without a Debugger

First I reversed the module and rewrote it in C. Then it got difficult. Since this was the first time anyone tried to build a sysmodule with the homebrew development environment, I had to make changes to many parts of the toolchain. Then there’s the problem of testing: we’re replacing a key component of the system very early in the boot process. There’s no iterative development since we cannot isolate the module without ending up rewriting the entire operating system. So, I just wrote the entire module in one sitting and hoped for the best. Unfortunately the best is a black screen and we don’t have access to anything like gdb or even “printf”. The only output we have is that the console boots or it does not boot. After re-reading my code over and over again and fixing bugs each time until I can’t spot anything else obvious, I’ve discovered, thanks to #3dsdev, that prayer was not the only means of debugging this.

Enter XDS, the 3DS emulator we all forgot about. ichfly managed to get it to boot the 3DS home menu a while ago, but development has since ceased (likely because of other more promising emulators in the works). There’s one thing XDS does that the other 3DS emulators does not do is that XDS fully emulates the 5 sysmodules bootstrapped by the kernel (the other guys use HLE). This is perfect because we only need to run “loader” in the boot process and let it interact with the other bootstrapped modules until enough modules are loaded by “loader” that we can consider it working. After porting XDS to OSX, I was able to make use of the generous debug logs (comparing the IPC requests/responses from my loader with stock) to find the final bugs in my loader and get it to boot on the 3DS.

Our loader can now correctly duplicate the functionality of the original but ultimately, we wish to add more features. The most important one is SD card reading. The way that access permissions work is that each module specifies in its CXI header what filesystem devices it has access to. However, in the case of bootstrapped modules, the access fields are ignored. So in order to get “loader” the right permissions to read the SD card, it is not enough to modify the CXI headers. We need to talk with the “fs” module and request the permissions directly (this is what “pm” does with other modules after loader puts the code into memory). This involves reversing enough of the “fs” module to figure out how the permissions are stored and then making the right requests to “fs” to give “loader” those permissions. Since XDS does not emulate the SD card, I had to find a different avenue to test my code. What I decided to do was modify the URL to the Themes Store in the Home Menu to point to my server + the return value of the file IO calls. Then I would load up the Themes Store to find the return value and look through my reversed object code to see what caused the error. This worked well because I did not want to add more code (e.g: framebuffer or networking) just for debugging. (Who debugs the debugger?) With SD card access, future debugging is much easier because we can just write logs to the SD card.

Loader Based CFWs

I believe that using a custom “loader” will make it much easier to write mods for the 3DS. We could see hacks such as a “Homebrew” button in the Home Menu or custom keyboards or custom themes outside of what Nintendo officially supports. We might also see hacks for games similar to HANS but without requiring access to a dump of the game. I hope 3DS developers will pick up on this and make cool mods and hacks for the system.

The code can be found here. Developers should modify patcher.c and replace it with their own implementation.

↧

Cosmo3DS: The CFW nobody wanted

March 28, 2016, 12:00 am

≫ Next: The 3DS Cryptosystem

≪ Previous: 3DS Code Injection through “Loader”

In the last article, I talked about my plan for creating 3DS mods. Now, I will put that plan to the test with a CFW (modified firmware) that nobody wants except me.

The idea for this CFW is that I want

Keep my 3DS on the hackable 9.2 firmware but still use the latest system software (emuNAND)
Play games region free right from the home menu
Change the system region without possibly bricking the device
Use the eShop with region changed systems

and only those things. Specifically, I don’t want to patch signatures that allow for installing pirated content (personal choice, I don’t care what you do). I don’t want threads running in the background to support fancy features such as replacing version strings or taking screenshots or in-game menus. So I created my own CFW in two components: a stripped down version of ReiNAND where everything except emuNAND is removed and an implementation of my custom loader which does all the patches. No threads. No glut.

I understand that I’m the only one who needs such a CFW and most people can make do with one of the ones out there. This is mostly written just for my own use and for me to test out my idea of patching the system. However, I’m happy that my N3DS is now a cosmopolitan.

If you want to use it, you need the CFW as well as the custom loader. Then, the important part is injecting the custom loader into the right FIRM file. (If you search for “reinand 3.1 firmware.bin pastebin” you should be able to find it)

↧

The 3DS Cryptosystem

April 6, 2016, 12:00 am

≫ Next: HENkaku: Vita homebrew for everyone

≪ Previous: Cosmo3DS: The CFW nobody wanted

I think one of the biggest challenge for system engineers is designing security. Recently, at the 32c3 conference, plutoo, derrek, and smea presented a series of hacks that completely defeated the security of the 3DS. As a result, people have implemented boot-time unsigned code execution (called “arm9loaderhax” in the 3DS community; other communities might relate this with ”untethered jailbreak” or “bootloader unlock”). What I want to do today is not to reveal anything novel, but look at the security of the 3DS as a whole and see what went wrong. In this deep dive, I will hypothesize the design decisions that led to the cryptosystem found on the 3DS. Then I will present the flaws that led to “arm9loaderhax.” Finally, I will summarize the findings and provide a few tips to fellow engineers in hopes that these kinds of mistakes will not be made again. (Extra details are provided in parenthesis, they are for people with deeper knowledge of the 3DS and are not required to understand the rest of the article.)

Preliminary

Let’s begin with the first question that ought to be asked when designing a secure system: what is the threat model? I believe that for Nintendo, it boils down to two things

System integrity: all code running on the system should be checked and signed by Nintendo. This ensures that users are safe from malicious code and that Nintendo gets licensing fees from games and applications.
Content protection (DRM): code and resources should not be extractable by the user. This protects license holders (creating trust in the system) and ensures intellectual property cannot be stolen by competitors.

System Integrity

SecureBoot On the 3DS (as with most systems), the boot ROM is the root of the trust hierarchy. The code there cannot be changed, and contains root cryptographic certificates (public keys).

Content Protection

Original 3DS Implementation

Pre70Encryption

First, let’s go over the crypto primitives we have on the 3DS. There is an on-chip hardware AES engine with 64 keyslots. When a key is written into a key-slot, it stays there until it’s either cleared or rewritten with another key. You can write a “normal key” into the key-slot and the AES operation works as usual (with the normal key being the key). The more interesting case is if you instead write two keys–KeyX and KeyY–into the slot. In that case, an on-chip key generator derives the “normal key.” However, the normal key is never revealed outside the AES engine! That means, even if we extract KeyX and KeyY (by dumping the code that sets them), we cannot find the normal key (… in theory, I’ll come back to this point later). There is also a hardware RSA engine that has its own key-slots and operates similarly (except there is no key generator).

7.x Implementation

70Encryption

New 3DS Implementation

Things Fall Apart

Key Generator

Postmortem

So what went wrong here? I want to summarize by listing some of the big mistakes Nintendo made that hopefully won’t be made by anyone again

Focusing on content protection instead of system integrity: you cannot have one without the other. Always implement exploit mitigation when you can!
Not having a contingency plan for when the system is hacked: no matter how secure you think your system is, you need to have a plan for when it’s broken. That way you don’t end up scrambling around and introducing more bugs.
Too much complexity: having lots of blocks that say “AES” and “RSA” in your plan might impress the boss, but it just adds to the attack surface. Always go with the simplest plan that secures against your threat model.
Do not change the trust hierarchy after production! Everything is built on that hierarchy. Adding/removing from it will break assumptions that you might not even be aware of.

Sources

↧

HENkaku: Vita homebrew for everyone

July 28, 2016, 12:00 am

≫ Next: HENkaku KOTH Challenge

≪ Previous: The 3DS Cryptosystem

Photo credits to Davee For the last couple of months, molecule (composed of I along with Davee, Proxima, and xyz) have been working hard to bring you an easy-to-use homebrew solution. The result is HENkaku (変革), the first HEN for the Vita. Since the release of Rejuvenate a year ago, developers have created tons of wonderful emulators, games, and apps for the Vita. Unfortunately, Rejuvenate is hard to set up, has many annoying limitations, and supports only an older firmware version. As a result, we recommended Rejuvenate only to developers who wish for an unofficial way to write apps for the Vita. When I first announced Rejuvenate and the call for an open toolchain, I emphasized that the SDK must be binary compatible with the Vita’s native loader. I published the specifications document and some gracious developers took up the task and wrote vita-toolchain. At the time, there were some pushback on why I was adamant on binary compatibility when the loader was also written by us. Well, the reason was this: developers (mostly) do not have to make any changes to their code. If your homebrew ran on Rejuvenate, it will run with HENkaku with minimal work. We ask developers to build their code with the latest toolchain now for HENkaku compatibility.

What is HENkaku?

HENkaku simply lets you install homebrew as bubbles in LiveArea. It is a native hack that disables the filesystem sandbox. It installs molecularShell, a fork of VitaShell that lets you access the memory card over FTP and install homebrew packages (which we create as VPK files). With vita-toolchain, developers have access to the same system features licensed developers have access to as well as undocumented features that licensed developers cannot use (including overclocking the processors).

What is it NOT?

It does not let you install or run Vita “backups”, warez, or any pirated content. It does not disable any DRM features. It does not let you decrypt encrypted games. Here’s my stance on this: I do not care one way or the other about piracy. I do not judge people who do pirate. I will not act as the police for pirates. However, I will personally not write any tools that aid in piracy. It is my choice just as it is the pirate’s choice to steal content.

FAQ

HENkaku will only work on 3.60 so we recommend that you update to it. We know that updating to 3.60 breaks many current Vita tweaks and hacks so here’s a short guide on what HENkaku replaces.

Should I update if I already have Rejuvenate?

Rejuvenate was a limited hack mainly designed for developers who wish to dip their toes in the water. Rejuvenate will not be supported anymore. HENkaku is superior in every way.

Should I update if I am using VHBL or PSP homebrew bubbles?

Yes. Since HENkaku gives homebrew full filesystem access, it is possible for a developer to create a “bubble creator” Vita homebrew that generates signed PSP homebrew bubbles. As a bonus, you don’t have to purchase any games to do this! We (molecule) will not provide support for PSP/PS1 related stuff though, so all this depends on someone else picking up the baton. If you are highly dependent on PSP homebrew, we suggest that you wait and not update past 3.60.

Should I update if I am using eCFW/ARK/TN-V/TN-X?

We do not support running PSP ISOs/backups. Your best option would be to wait and see what other developers do.

Should I update if I am using FailMail tricks to modify the system (whitelist, themes, etc)?

Yes. Vita homebrew will have full filesystem access so you can do app.db modifications as well as change whitelist files and so on. It is even possible for homebrew developers to write apps that do system mods for you so you don’t have to mess with sqlite at all.

Release

HENkaku will be released publicly on 07/29/2016 9:00AM UTC at https://henkaku.xyz/. It will only support firmware 3.60, so feel free to update to it now in preparation. Do not update past 3.60, the current firmware at the time of this writing.

↧

HENkaku KOTH Challenge

August 5, 2016, 12:00 am

≫ Next: Welcome Jekyll!

≪ Previous: HENkaku: Vita homebrew for everyone

We released HENkaku a week ago and were blown away by the reception. There has been over 25k unique installs and every day new homebrew are being announced. This is all thanks to those who contributed to the SDK project back when Rejuvenate was announced. Without a working toolchain for developers and a couple of working homebrews at the time of HENkaku’s launch, I doubt the reception would have been as popular.

Since the release, there have been a couple of questions we’ve been getting over and over again: When will this work on older firmware versions? How does HENkaku work? Where is the source code? I am going to address these questions in a bit. First, I want to thank Sony. It is common for hackers to laugh and poke fun at companies on the receiving end of hacks. But I think that’s unfair–security issues are a learning experience for all sides and we should all be thankful for it. For myself, I started my work on the Vita since its North America release in 2012. Although Davee beat me in hacking the PSP compatibility mode and getting ROP on WebKit, I was the first to run native code and dump the memory through PSM. Since then, Davee, Proxima, I, and later xyz (collectively “molecule”) have been working on the Vita on and off through the years. It is a tremendous learning experience both working with these smart individuals and getting my hands dirty with real world hacks. I think I owe a large portion of what I know about security due to my work on the Vita. It has, hands down, the most well designed security infrastructure of any consumer electronics device. In 2012, the iPhone, Android, and 3DS were no match. Even today, I think the Vita rivals the security of devices in the market.

There’s no single reason that led me to this conclusion, but there’s a couple of factors. First, the Vita has really good security-in-depth: multiple layers of abstraction, exploit mitigation, proper input sanitation, etc. Second, the software and firmware are mostly proprietary. Now that’s interesting because usually this is usually a point against security: trust the audited code of the open source community because rolling your own features will expose you to more bugs. However, in their case, this worked in Sony’s favor. They managed to not make any major security mistakes (I hypothesize that they hired an external security firm to audit their code) and this made it harder for us to put a foot in the door (because we have nothing to go on). While known WebKit exploits provide a common way into a new device, the Vita is unlike the PS4 where we can exploit known FreeBSD9 bugs on an older firmware to get higher privileges. The calculated risk they took in using proprietary code paid off since nobody has been able to decrypt their firmware files yet–and until someone does, it is unlikely that anyone would write any advanced exploit code. However, the risk is that if their code is indeed buggy, then once the floodgates open (someone finds a single exploit), there is no closing it (all the bugs will be found). Finally, the Vita is not exposed to hardware attacks simply because it would be too expensive to perform. Unlike the 3DS, the Vita’s RAM is on the same chip as the CPU so we cannot dump the contents through external hardware without access to a sophisticated lab and experienced technicians. That means as long as someone doesn’t dump the memory, because of the exploit mitigation features, it would be extremely difficult to find vulnerabilities and exploit them. However, it is very difficult to dump the memory because we do not have the funds to do it with hardware and must resort to exploiting the software. But then we have a chicken-and-egg problem.

All this is to say that we of team molecule wish to share our learning experience with the rest of you. We feel that the Vita has been neglected by hackers because of it’s unpopularity. However, they are missing out on a great challenge. The barrier of entry has been lowered since you can buy a PS TV for less than $50 USD. Don’t take my word for it, take a stab at it yourself and see if the device is really secure or if I’m just too inexperienced.

KOTH Challenge

CTF challenges are common in the hacking community. The goal is to hack a system in a controlled environment to get a “flag” and is a fun and educational experience. I highly recommend it to anyone interested in security. We are hosting a variation of this challenge. The first king-of-the-hill challenge will take place on Vita Island.

The idea is as follows: we (molecule) are currently the kings of the hill. You (challenger) can claim the throne by reversing our hack (HENkaku) and explaining it. Once we have been knocked off, we will post all our source code, build scripts, and a special bonus… We won’t say what it is yet, but it can be claimed by anyone who beats the challenge (not just the first) and is only valuable to people who have an interest in the Vita and Vita hacking. Since all the “prizes” are available to everyone and not just the first, we strongly encourage collaboration.

To make the challenge as interesting as possible, we used minimal obfuscation in our code. The goal isn’t to see who can write the best deobfuscation tool but to invite all the skilled security researchers of the world to look at what we believe is one of the most secure device on the market today. Therefore most of the difficulties in the challenge will be posed by the system and not us.

Releases

The source for HENkaku will be released in parts. Today, we released the files for offline hosting. This allows the challengers to start in reversing our code and also allows for anyone to mirror HENkaku. It also allows those with slow or intermittent internet access to use HENkaku.

Next, when someone completely reverses the second stage ROP and explains properly how it works, we will release the source code up to that point as it might aid in the next part. I don’t think it would take more than a couple of weeks for someone to get to this point. Some questions to be thinking about are: how do we manage to run unsigned code? do we get kernel access? if so, how? if not, what other ways are there?

Finally, when someone figures out the entire HENkaku installation process, we will release all our source and tools. I hope this would be done in no longer than a couple of months (if interest takes off) however it may take a year (if there is minimal interest). I’m not going to hold the HENkaku sources for hostage, so if there is no interest for a long time, I’ll reevaluate the options.

Until then, molecule will be taking a break from hacking for an indeterminate amount of time. We will still maintain HENkaku and post fixes from time to time. However, we will not be actively working so we won’t be able to port HENkaku to lower firmware versions. For me this is because the amount of free time I have is slowly diminishing and I have other things to do. I hope I have inspired others to take up on hacking the Vita so molecule won’t be the only people to hack it. My hope is that in a year, HENkaku would no longer be needed and molecule can quietly retire.

↧

Welcome Jekyll!

August 21, 2016, 12:00 am

≫ Next: HENkaku Update

≪ Previous: HENkaku KOTH Challenge

Welcome to the new yifan.lu! I just completed the biggest upgrade of this blog since its inception. What? You don’t notice any changes? That’s good. Although the changes are drastic (moving from WordPress in dynamic PHP to Jekyll in static HTML), the goal was to make the changes as transparent as possible. Please let me know if you notice anything broken so I can fix it.

The move wasn’t easy. I had eight years of posts (although, to be fair, the majority of it is worthless) and lots of customizations to WordPress that I had to migrate. I started with exitwp, which converts WordPress exports to Jekyll in the Markdown format. I had to modify it in order to migrate my custom post type (projects) to a Jekyll collection.

Next, I took minimal-mistakes as a template and converted my WordPress theme to Jekyll. Finally, I set up Staticman to support commenting without requiring a third party service and modified exitwp once more to convert the comments.

In the end, I wanted to switch to Jekyll for three main reasons

Markdown is sexy. Plus, built in syntax highlighting and the ability to track the history with git. Free hosting with Github Pages means I no longer have to deal with my free hosting provider which required ads on pages (no more ads now!) and always had random downtime. I tried paid hosting, but Wordpress’s RPC seemed to eat up my resources and I didn’t have the time to figure out how to stop it.
No PHP! No downtime from “hackers.” No need to update WordPress all the time. No need to have to set up a complex system of caches just to have reasonable speed.
HTTPS support. The blog always supported optional SSL from CloudFlare, but lots of links had HTTP hard coded in. There was no easy way to go in and change all the URLs all over the place. Now, if you access yifan.lu from HTTPS (which I recommend), you will not be prompted about insecure resources or stumble a link that downgrades your connection. Because I believe in choice, HTTPS is not enforced–you still have the option to go visit this blog through HTTP.

For archival purposes, here is my modded exitwp.py with some pretty bad hacks to convert my custom post type to a collection and comments to Staticman:

↧

HENkaku Update

August 27, 2016, 12:00 am

≫ Next: Yes, it’s a kernel exploit!

≪ Previous: Welcome Jekyll!

Version Screen It’s been almost a month since the release of HENkaku. We now have over 100,000 unique installs! (That number excludes re-installs required after rebooting.) To celebrate, we are pushing the third major update and it includes features that many users have been asking for. For the impatient, you can get it right now by rebooting your Vita and installing HENkaku from https://henkaku.xyz/.

Release 2

First, let’s recap the features we added since the initial release.

Dynarec support: Developers can generate ARM code and execute it directly. This aids in JIT engines for emulators.
Offline installer: HENkaku can now run without a network connection thanks to work by xyz. He also made a nice writeup that you should check out if you’re interested in the technical details.
VitaShell 0.7: When we originally released HENkaku, we forked VitaShell to molecularShell because we didn’t want to spend too much time writing our own file manager. Thanks to The_FloW, our changes have been merged to the official VitaShell codebase and we no longer need molecularShell. This release had added many new features and bug fixes to the shell.

Release 3

Now, the fun stuff. Today, we are pushing the next major update to HENkaku. The following features will be available the next time you run the online HENkaku installer. Self-hosters should get the changes from Github.

PSN spoofing: You can access PSN without updating to 3.61! Please continue reading for some important notes.
Safe homebrew support: Developers can optionally mark their homebrews as “safe” and it will not gain restricted API access. We highly recommend developers who are not using such features to update their packages as safe.
VitaShell 0.8: Read the release notes from The_FloW for the list of changes to VitaShell.
Version string: A callback to the PSP days where every hack would change the system version string. We do that too now (see the screenshot) so we can provide better support to our users.
Update blocking: In HENkaku mode, firmware updates using the official servers are blocked. That way you won’t accidentally install 3.61 and it won’t download in the background regardless of your settings.

Again, you will see these updates immediately the next time you install HENkaku if you use the online installer. If you use the offline installer, you need to update the payload. To do this, you need to temporarily enable network functionality on your Vita and open the offline installer bubble (NOT the Mail application). Install offline HENkaku again and re-disable network functions. The next time you run offline HENkaku with the Mail application, you should see the new payload. Because of how the offline installer works, this will not update VitaShell. You must run the online installer again to get the latest VitaShell. Optionally, you can download the VitaShell 0.8 VPK and install it.

PSN Spoofing

You can access PSN after enabling HENkaku on 3.60 but please heed this warning. Using hacks on your PlayStation console is (and always has been) against the PSN terms of service and is a ban-able offense. We have had hacks of various forms on the Vita for years and nobody has ever been banned and hopefully this will stay true in the future. However, because HENkaku has opened up the console more than any previous hacks, we might be at a point when Sony decides to enforce the PSN ToS and start banning people. That is why my personal recommendation is that you do not use PSN on your HENkaku enabled console even though we give you the option to (at your own risk). If you are paranoid, you may want to use only the offline installer so your Vita does not communicate with Sony’s servers. Or you may want to format your console in order for the console to not be associated with your main PSN account. Again, there has not been any confirmed bans nor have I heard of an incoming ban-wave, but my gut feeling is that you should be prepared.

The PSN spoofing is only temporary! The next time Sony releases an update, I predict that spoofing will become a lot more difficult to do. So make sure you download the games you want and update your PS+ licenses while you still can.

Safe Homebrew

HENkaku gives developers access to public APIs (the same APIs licensed developers use to make games), private APIs (hidden APIs that may be exposed to licensed developers in the future), and restricted APIs (APIs used internally by the operating system and is not meant for external developers to use). We have seen many cool homebrews that make use of restricted APIs. For example, RegistryEditor by some1 allows you to access hidden system settings not exposed in the Settings application. There are also experimental homebrew that allow you to modify system files (at your own risk) in order to change layouts and to find exploits. Unfortunately, it also allows for malicious developers to write homebrew that wipes your memory card or (although we have not seen such an application yet) even brick your console. We have always warned the community to be vigilant, but from a design perspective, it does not make sense to give every homebrew full access.

Therefore we added the option for developers to specify their homebrew as “safe” and not get access to restricted APIs and not disable the filesystem sandbox. All you have to do is download the latest toolchain and change the call to vita-make-fself in your Makefile to vita-make-fself -s. Safe homebrews can still access all public APIs and private APIs (so you still have dynarec, changing clock speed, etc) as well as specific directories on the memory card, but there is no access to restricted APIs (registry, system partitions, etc).

Most homebrews would already be considered “safe” (you would know if you used a restricted API). However, the big catch is that ux0: (memory card) access is now restricted to ux0:data (for arbitrary data), app0: (a mount of your application directory at ux0:app/TITLEID), and savedata0: (a mount of your application save). There is no direct access to ux0:app/TITLEID since safe homebrews are sandboxed. If you wish to use and store custom data on the memory card, please use ux0:data as it can be accessed by all applications and is not deleted when your bubble is deleted (useful for emulators).

So what about unsafe homebrew? HENkaku still supports running them, but VitaShell will now throw a nice and scary warning message whenever the user attempts to install an unsafe homebrew. The hope is that if someone decides to package up a bricking malware as a “game”, the user can be alerted because games wouldn’t need extended permissions. However, in order for this warning system to work, developers of safe homebrew must update their current packages to be safe. We do not want to numb users to the warning as all “legacy” applications are currently considered “unsafe.”

To recap, if you do nothing, your .vpk is by default considered to be unsafe and can still have access to restricted APIs and all filesystems. If you do not wish to have the unsafe message pop up every time a user installs your vpk, then you should download the latest toolchain and change the call to vita-make-fself in your Makefile to vita-make-fself -s. All current homebrew are still supported and still work and there are no changes to the behavior of anything already installed.

On Piracy

Now for the elephant in the room. For those who aren’t familiar, I recommend reading my reply on how I approach piracy. The short of it is that, as I’ve stated countless times, I do not care if you pirate games or not. I personally will not write piracy-enabling or piracy-aiding tools, but if you do it, then that’s your business and not mine. We did not add DRM/anti-piracy code nor did we add anti-DRM/piracy code. The whole point of HENkaku is owning your own device. Sony does not get to tell you what you can or cannot do with the device you bought. Same with molecule. That is what I believe. I’m writing this because I have been receiving a lot of harassment lately for things I have not said and for ideals I do not have. Please do not waste both of our time trying to convince me that piracy is/is not bad.

On KOTH Challenge

We have seen many great progress on the KOTH challenge to reverse HENkaku. The first stage has been reversed, and as promised, xyz did an amazing writeup that filled in the rest of the details. We have seen participants piecing together stage two and I think we can expect some of them to talk publicly about it soon. Once that happens, more information will be revealed by us. I am happy to hear that the participants are really enjoying the challenge and am even more delighted to hear that non-participants are really not enjoying the challenge 😉.

↧

Yes, it’s a kernel exploit!

August 28, 2016, 12:00 am

≫ Next: Please don’t let Trump win

≪ Previous: HENkaku Update

When HENkaku came out exactly a month ago from today, we posed a challenge to the scene to reverse our hack. The reason for this decision rather than to just post our writeups immediately and take all the limelight is because we believe that the Vita is a device that is so unique in its security features that we won’t be doing it proper justice by just revealing the flaws. We want people to know about how good the security is rather than just point out the mistakes made. In doing so, we hoped that hackers new and old will take the challenge and have fun with it. Today, one such challenger by the name of st4rk completed the second third of the challenge. He has written a detailed post on how he reversed the payload and I recommend you read it right now.

My Comments

Stage 1 of HENkaku was a previously patched but undocumented WebKit exploit. Many people including st4rk and H figured out most of the details within days. xyz of molecule then posted a complete writeup and focus shifted to stage 2 and all was quiet for weeks. Now that st4rk has published his writeup, I want to add some comments from our side.

The best way to understand what each part does is debugging and know well about the Vita’s security measures.

This was a smart and unique way of approaching the problem. Instead of starting from the bottom up (look at the dumps and try to figure out what each set of bytes means) he looked from the top down. Sometimes such change in perspective really help in clearing a path to the solution. By asking questions like “how did they get past the KASLR?” or “what did they do to put code into the kernel?” st4rk was able to rebuild the exploit piece by piece. This is what “reverse engineering” truly means.

The stage2 is a huge rop-chain and to solve this problem I written a python script using capstone to help me to deal with it, you can find it here

The rop-chain is actually generated from roptool by Davee of molecule. It’s an amazing piece of work that lets you turn Turing-complete code into ROP chains. It’s no surprise that decomposing the chain would require an automated tool.

The first time that I read it, it didn’t make any sense, first because we don’t have a “molecule0” device on PS Vita and second that I didn’t know anything about the SceIoDevCtl. I read the vitasdk and psp2sdk to give me a good base and decided to write a ROP code for my 1.50 Vita and test the Syscalls.

Smart thinking in using a low firmware version Vita. There is no kernel ASLR and stack canaries before firmware 1.80. That’s why I recommend it for hackers and aspiring hackers.

This is our first kernel exploit, it’s used to defeat Kernel ASLR and to write our Kernel ROP chain.

Yup. The first kernel exploit we use is an information leak in sceIoDevCtl. The function copies the 0x400 bytes from the kernel stack into the user output buffer without checking the size field. That means if we call some random function that leaves pointers in the stack (in this case, a call to sceIoOpen), the next call to sceIoDevCtl does not clear it and copies it back to user. This is enough to defeat kernel ASLR.

This vulnerability was found back in late 2014 by our very own Davee. We finally made use of it years later.

The Kernel exploit is in the the module that handle the SceNet functions (it’s the SceNetPs).

Ah, the exploit that made all of this possible. A use-after-free in the socket handling function triggered by a race condition. st4rk managed to get as much information as he could without seeing the code, but the actual exploit is a complex and truly marvelous piece of work that this margin is too narrow to contain. This vulnerability was found and exploited by xyz earlier this year and is what sparked us to create HENkaku. He will be posting a detailed explanation of this exploit later this week.

Stage 3

Now things get truly interesting. Stage 3 is the final part of the exploit and is, what I believe, the hardest part to reverse. Stage 3 is a kernel ROP chain that executes the code to make all the HENkaku patches. Typically, to reverse a ROP chain, you would dump the code memory and reconstruct the gadgets in order to analyze the chain. Indeed, this is what st4rk did for our userland ROP code. However, we did not release any exploit that leaks arbitrary kernel memory. The sceIoDevCtl vulnerability can only leak kernel stack memory–which in this case is just the ROP chain that we inject. So how would you crack this code? You can either

Find a Vita vulnerability that lets you dump kernel memory
Find a novel way of cracking ROP chains blind

In either cases, everybody wins. If you find another kernel exploit, it would be groundwork for the next Vita hack. Since our sceIoDevCtl is patched now, we have no way of defeating kernel ASLR on newer firmwares–which is a prerequisite for any hack. If you manage to crack the ROP chain blind, well, for one you are definitely smarter than me. Of all the members of molecule, I am the only one who does not think the task is impossible. We honestly cannot think of a way of cracking the ROP chain blind. Davee claims it is impossible and xyz thinks we should provide more help. However, I think it is arrogant to assume that nobody can do it just because we can’t do it. The king-of-the-hill challenge really is about finding people better than ourselves to both collaborate with and to continue the work.

Back when I was working on reversing Gateway, I saw some of the most ingenuous hackers coming up with novel ways of reversing Gateway’s (in comparison: simple) ROP chain. I learned a lot from these people and I am hoping that there are more of them out there to impress me. That is why I pose this impossible challenge.

What’s Next

Today, as promised, we are releasing the full source of stage 1 and 2 written in roptool. This will allow you to make easy changes to the exploit code as well as test changes to the binary payload for your reversing endeavor. I can’t wait to see what you guys come up with next!

↧

Please don’t let Trump win

September 12, 2016, 12:00 am

≫ Next: Array Shuffling with Additive Generators

≪ Previous: Yes, it’s a kernel exploit!

To the Trump voter,

I can’t express how much I don’t want to be here right now making a post on American politics. A strong belief I live my life by is to respect other’s views on things and not try to change them as long as it does not harm me. That last point is why I am posting this. If Trump is elected in November, it will hurt me directly along with millions of other Americans. I am not using “hurt” in a hyperbolic way–I’m not talking about taxes, or feelings, or opinions. I am saying that by placing a xenophobe into the highest elected office in the most powerful nation, Trump voters will be responsible for making me, an immigrant, scared of my future. But more than me, I feel scared for my Muslim friends, my Black friends, and my Hispanic friends. If you think I am being dramatic, just listen to the man yourself. Listen to what he has to say about people like us–people he think are outsiders. If you do not agree with his foreign or domestic policies but are voting because of other reasons, that is your choice. But do not do it with a clean conscious. A lot of innocent people will suffer when this man becomes president. Again, I do not mean this as a hyperbole. When Bush was president, the other side complained about the war and how many people suffered. When Obama was president, the other side complained about the economy and how many people suffered. All this partisanship makes us numb to emotional appeals. However, as bad as the Democrats think of Bush or the Republicans think of Obama, neither of these men have made it a campaign promise to directly disenfranchise a good portion of the population. No, we have wasted our words complaining about the Bogeyman we wanted to create and now that the real Bogeyman has come, we are speechless.

Trump is a bully. I know because I have been bullied so often in my life. The bullies never gets to me though. What always gets to me–what always tears me apart inside–are the bully’s followers. It’s not the mean things they say, it’s the laughter that echoes around you. Much of my life is dedicated to reading, video games, and computers. I am proud of that choice to this day. The bully is into flashy objects, pretty girls, and his own superiority. We come from different background and he loath people who are different. I love science, math, the bizarre, and the unknown. The bully is proud of his ignorance. He makes it a point that he hates science. That math is useless. That normal is good and you’re not normal. You’re an outsider and we don’t like you. I aspire to learn and create. The bully aspires to take and destroy. To this day, I wonder why so many people follow the bully. Why so many good and decent people believe what he peddles. Why it is “cool” to be anti-intellectual. Why he always gets what he wants. If you are a follower of my blog, regardless of race, gender, or ethnicity, I think you understand these feelings of frustration at someone who you know is evil but everyone else loves them. Someone who opposes everything you believe in and the world rewards him for it. Please do not let the bully win again.

But what about Clinton? Isn’t she evil too? Aren’t we having to choose between “giant douche” and “turd sandwich”? Let me first say that the only insult you can throw at me worse than calling me a Republican is calling me a Democrat. I completely agree that the American political system is a mess and the differences between the two parties is an illusion. I believe that campaigns are just the puppet-show designed to make the “Fourth Estate” money and the wizard behind the curtain is the American Corporation. That is my political view and is the reason why I usually completely ignore politics. However, it is again a mistake to take this election to be just like any other game of charade. Instead of “giant douche” and “turd sandwich” we have “turd sandwich” and “the possibility of thermo-nuclear war.” As bad as it is to eat a turd sandwich, I would rather do that then die in a megaton of radiation because a guy insulted the president on Twitter. We are taught by the media that false equivalence is the same as neutrality. It’s not. The criticisms against Trump is magnitudes worse than the criticisms against Clinton. However, we are led to believe that both candidates are “controversial” and therefore have a hard decision to make. Let’s actually consider all the main criticisms of Clinton in their worse incarnation and assume these criticisms are completely and absolutely true, then we have someone who:

Messed up the security at Benghazi resulting in the loss of several American lives and then lied to cover it up.
Took money from big bankers and corporations and the money influenced key policy decisions.
Storied confidential emails on a non-secure personal server resulting in national secrets being leaked. Then lied to cover up the mistakes.
Always lies and says exactly what people want to hear.

Okay, that’s pretty bad. But now let’s consider just four of over a dozen main criticisms of Trump

Has zero experience or knowledge of international policies.
Has zero experience of domestic policies and is proud of it. Does not understand how the economy works, making basic mistakes like assuming the country can be run like a company. Also cannot run a company successfully either.
Openly incited a foreign power to hack the US and influence the election. Is a pathological liar since he would lie about small, insignificant things like being his own agent.
Assuming he tells the truth, he
- Criticized a judge for his ability to rule fairly based on his race
- Made fun of a handicapped reporter
- Disrespects and criticizes war heros, war veterans, and their families
- Says he could get away with murder
- etc…

That is just horrendous. So let’s take a score. On one hand, Clinton is accused of screwing up foreign policy. However, it cannot be argued that she has years of experience. If I tell you that you need surgery and can choose between a surgeon who’s graduated from medical school and operated for years but messed up once. Or have the surgery performed a businessman who has never done an operation before but claims that he’s applied band-aids to himself before so he basically has the experience. Who would you choose?

Now let’s say all the critics are right and Clinton makes decisions that benefit the rich more than the working class. At least she knows what she is doing even if you believe it is screwing us over. At least she knows basic things like “I shouldn’t destroy the economy because the rich will suffer as well as the poor.” If, on the other hand, we say “fuck the rich, let’s put a shit throwing monkey in charge of the economy,” then maybe, just maybe the poor will get a better deal. Or, more likely, the economy will be covered in shit. And I’m completely glossing over the fact that Trump doesn’t even know the Constitution well enough to differentiate between articles and amendments. The constitution that he would have to be executing.

What about Clinton’s emails? What’s “bad” about the whole email scandal is that it accuses Clinton of being either irresponsible or malicious. And some argue that the way it’s handled shows that the system is tipped in favor of insiders. Okay, valid point. But on the other hand would you say Trump is responsible and virtuous? That it is responsible to incite a foreign nation to hack our election? That the system is tipped against him, a rich real estate mogul whose parents are also rich real estate moguls? I have minimum respect for a man who has lost more money than I will ever make. It’s not just the pot calling the kettle black. It’s as if the kettle has one dark spot and the pot is covered in soot.

Finally we have the accusation that Clinton is a liar. Regardless of the validity of that claim, I would rather be deceived by a smooth talking con-artist than let an ugly monster do exactly what he says. Okay, you might argue, Trump doesn’t mean exactly what he says. The media spins his words to antagonize him. First of all, Trump does not have a sophisticated repertoire of complex double-meanings and tongue-in-cheek humor. He’s not fucking Nietzsche, who requires hours of close reading to parse a single sentence. Even if he is joking about matters like asking for the assassination his political opponent or about the menstrual cycles of a reporter, that sort of brash, fratty, humor should not represent the nation. Some people criticize Obama for being a “comedian in chief” and that Obama misspeaks and sometimes forgets to mention “God” in a speech and they take issue with how he presents our nation to the world. Well I, for one, would rather have a “comedian in chief” than a “clown in chief.” How can we be taken seriously at a nuclear talk when our leader makes a racist joke? How can we participate in trade deals when our leader would not know what a word means on the treaty? It is one thing for an “outsider” to take office. It is a completely different thing to pick someone off the streets and ask them to run the wealthiest country in the world. Being a “politician” is a dirty word these days, but to be a politician is to have skills of persuasion and diplomacy to convince other parties to act in America’s best interest. In meetings with other world leaders, if you make a comment without thinking or you say something out of anger, it may irreparably damage international relationships. Relationships that, like it or not, is the foundation of our economy as well as our security. Even if you were truly misunderstood, you do not get to hold a press conference clarifying what you meant and all is forgiven. You do not get to cry on Twitter about how the whole world is out to get you. There are certain procedures and protocols that require experience to master, and not following them might damage our reputation to the world–as it already has with this election cycle. It is the opposite of making America great. I would rather elect a skilled liar who can bluff America’s best interest in the international stage than someone who “tells it like it is” and throws a tantrum if the other party fails to be convinced.

If Trump is unelectable and Clinton is the “lesser of two evils,” why not vote for a third party? I have not researched enough into Jill Stein or Gary Johnson to form a proper opinion but I know that they are more likely to take votes away from Clinton than from Trump. In that respect, to stop Trump, it would be more advantageous to vote for Clinton than a third party. Famous computer scientist Scott Aaronson made a proposal for vote swapping third party votes in non-swing states for Clinton votes in swing states. That would help the third party get federal funds while still stopping a maniac from winning office. Lastly, if you decide to protest by not voting, then if Trump wins know that you will be part of the good [wo]men who do nothing.

This last part is for those of you who want to vote Trump just to mix things up. Maybe you believe Trump isn’t the best candidate but fuck it, you’re tired of the system or you’re tired of life and maybe if the world burns, there’ll at least be fireworks. Maybe you think America deserves to reap what it sows? As Stephen King puts it: “Conservatives who for 8 years sowed the dragon’s teeth of partisan politics are horrified to discover they have grown an actual dragon.” I understand this sentiment, I used to be a Nihilist too. If you truly believe this, why aren’t you out in the world setting everything on fire? Because as much as you believe in the Chaotic, part of you is anchored in the Lawful. And while the game may seem dull or even painful now, it is rather immature to flip the board and force everyone to start over. Especially if in doing so, you have to look in your friend’s eyes and tell her “your life is going to be more miserable because I want to watch the world burn.”

All this is to say that you should vote for Clinton. Aaronson claimed that he

unhesitatingly endorses Hillary Clinton for president—and indeed, would continue to endorse Hillary if her next policy position was “eliminate all quantum computing research, except for that aiming to prove NP⊆BQP using D-Wave machines.”

So in a similar fashion, I endorse Hillary Clinton for president. I personally don’t believe she is a liar (more than the healthy dose of lying we expect from politicians) or that she is incompetent or that she is in the pocket if big donors. However, even if she was, I still endorse her because fuck Donald Trump. Fuck him and all that he stands for. I will vote for Hilary Clinton even if her next policy was to “ban all console hacking except to provide support to backup loaders” and claim “the Vita has a 2GHz CPU.”

I hope you hated reading this as much as I hated writing this.

Yifan Lu

P.S: This is the first and only comment I will make on this subject. I will not reply to comments/criticisms here or any other public space except to correct factual mistakes. I will not argue with you about big stupid things like American politics. I will only argue about small stupid things like video game hacking.

↧

Array Shuffling with Additive Generators

October 2, 2016, 12:00 am

≫ Next: HENkaku KOTH Solved

≪ Previous: Please don’t let Trump win

I was working on unit tests for a project and I wanted a fast and easy way to create random permutations of a range of numbers. That reminded me of some things I’ve learned in elementary number theory that I thought I might share with you. There is nothing new or non-trivial in this post, but I am always excited about sharing a concrete application for abstract mathematics.

Let’s start with the code first.

/**
 * @brief      Creates a random permutation of integers 0..limit-2
 *
 *             `limit` MUST BE PRIME! `ordering` is an array of size limit-1.
 *
 * @param[in]  limit     The limit (MUST BE PRIME). Technically another 
 *                       constraint is limit > 0 but 0 is not prime ;)
 * @param[out] ordering  An array of permutated indexes uniformly distributed
 */staticinlinevoidpermute_index(intlimit,intordering[limit-1]){ordering[0]=rand()%(limit-1));for(inti=1;i<limit-1;i++){ordering[i]=(ordering[i-1]+ordering[0]+1)%limit;}}

This function picks, uniformly at random, an array that is a permutation of . Additionally, it has the following nice properties:

The algorithm is optimal in both time and space complexity. Any algorithm that permutes elements must write out all the element. That means we take time and scratch space. We can also modify this into a streaming algorithm. For example, if we are using Python, we can use yield to permute an extremely large data set without having to store “seen” elements.
The marginal distribution is uniform (assuming rand() is uniform, which isn’t technically true). This will be proven in the end. An important note: the permutation distribution is not uniform. This will not generate all possible permutations.
The code is small and simple enough to copy-paste. You can easily modify it to permute any array of items (not just sequential numbers).

The runtime proof is trivial and will be omitted (it’s a single for loop). The rest of this post will be a proof of correctness and assumes no previous knowledge of group theory.

First, some preliminaries. A group is simply a set of elements and an operation that works on those elements that satisfies some basic properties. For example, classic addition over real numbers would be considered a group. That is because when you add any two real numbers, you get another real number (this is called the closure property). Multiplication over real numbers is also a group. Addition over only integers is also a group. There are three other properties that are necessary to make an operation and a set a group. They are associativity (ex: ), existence of an identity (ex: makes an identity), and invertibility (ex: and are inverses in the additive group of integers because where is the identity). As another example, let’s look at the group of real numbers under multiplication. We have closure because any two numbers will multiply to another number. Associativity of multiplication is something you learned in grade school. The identity of real number under multiplication is because . Finally, for any real number, the inverse is also a real number. For example , which is the identity. A classic non-group is the integers over multiplication because most integers do not have an integral inverse.

All of this may seem basic and you might be asking “what is the point?” Well, the idea is that we intuitively “know” how addition and multiplication works. We “know” how to add numbers and we “know” that the number we add up to is still a number. However, by formalizing our intuitions into something more solid, we are able to build up ideas that may not be as obvious. There are many neat and bizarre groups that mathematicians study, but this post will focus on the next most basic group: modular arithmetic. If you are a programmer, chances are that you have worked with modular arithmetic. You might have written code like int i = (x + y) % 5; // choose one of 5 slots. That is modular arithmetic. In math, we typically do not use the percent symbol but instead write to denote addition modulo 5. The easy way to think about modular arithmetic is in terms of remainders. because and is 1 with a remainder of 2.

Here’s the key observation: for any integer, , we can create a group for the modular arithmetic over $q$ (you can easily confirm the four properties to yourself). In fact this group is called a cyclic group because you only need one element (in this case, the number ) to get every other element by repeatedly applying the group operation (addition). As an example, consider the group . To get every element, we have ,, and so on. We say that generates.

This applies to any integer, but now let’s just consider prime numbers, . From above, we know that is cyclic. That means it can be generated by . However, it can be generated by as well. Here’s an example for : , , , , . But, it’s not just . and also generate ! You should try it out yourself. This leads us to our first theorem.

Theorem 1 For any prime number, , any integer where will generate .

Before proving that, we will prove the following lemma that will be helpful in the theorem.

Lemma 1 For any prime number, and any integer , there exists a where and can be produced only by adding to itself.

Proof. We showed above that this works so we will focus on the case of . Take and add it times until the first time we go over . That means and . Since is prime, it must be that also and therefore . Let . (I’m being sloppy with the math here by assuming as the shorthand for adding to itself for times). This should be the first time we see . Why is this? Well, let’s look at the division/remainder definition of modular arithmetic again. We see that is . Since the smallest number of times we add before reaching a number larger than , we know that (otherwise which contradicts our construction of ). So we have

This is just a linear equation with known variables , , and . Which means there exists at most one solution for and since is prime.

What’s the upshot? We now have two unique elements! Only more to go. We can build the elements inductively.

Lemma 2 Given unique elements with $n < p$, we can find another unique element using only the group operation on elements in .

Proof. Note that Lemma 1 is the base case with . Now we prove the inductive case. Consider some fixed . If, by contradiction, for all we have then it must be the case that

which comes from summing together all the relations. However, this means

contradicting our assumption that is prime. So there must be some .

Theorem 1 directly follows from this. An exercise to the reader is to show how this reduces to the algorithm presented at the beginning. What’s left is to show that by picking the generator uniformly, we can get a random permutation of the elements.

Theorem 2 Let be the elements for group . If we draw uniformly and let , then for some random

Proof. Since generates , returns a unique value for each unique .

I hope that you caught a glimpse of the wonderful world of number theory and how it might help you with coding. As an exercise to the reader, extend the algorithm and proof to work with any number (not just prime). You might even be able to apply the famous Chinese Remainder Theorem! Don’t feel shy to point out the inevitable mistakes in this post. I am also curious if anyone has a simpler (and still elementary) proof. As always, you can put MathJax in comments with something like $$a^2+b^2=c^2$$.

↧

HENkaku KOTH Solved

October 20, 2016, 12:00 am

≫ Next: taiHEN: CFW Framework for PS Vita

≪ Previous: Array Shuffling with Additive Generators

When HENkaku was first released, we posed to the community the KOTH challenge to get more hackers interested in the Vita. This week, two individuals have separately completed the challenge and are the new kings of Vita hacking! Mike H. and st4rk both proved that they have the final encryption key, showing that they solved the kernel ROP chain. I highly recommend reading their respective posts as they give some great insight into how hacking works. I also know of a third group who might have also completed the challenge but wishes to keep quiet for now. Congratuations to them too!

The Prize

All participants have been given the prize for solving the challenge and in a short time, everyone will get a peek too. Molecule has gotten quite lazy since the release of HENkaku and since we underestimated the amount of time it would take for the challenge to be completed, we are only midway through polishing up the source code for release. The participants and I have agreed to not release anything until the end of the month. As a bonus for waiting, the source will not be for HENkaku as you know it today–it will be for the major update we have been working on. Stay tuned for more details! In the meantime, it would be fun to see if anyone can run their own kernel payload with all the information out today–it should be possible!

HENkaku Kernel ROP

The rest of this post is dedicated to my own explanation in creating the ROP chain for the challenge. I believe it is the most complex ROP chain ever written (although I haven’t seen too many ROP chains that does work beyond copying code and running it). Enjoy!

Introduction

First we’ll define a security model for our system. We assume that code in kernel memory is “secure” and our main asset (what we are trying to protect) is kernel code. It is important to note that we are NOT trying to protect the kernel exploit. In fact, we assume the kernel vulnerability, userland exploit, and all userland ROP code is fully understood by the adversary. This is because, even with obfuscation, it is only a matter of time before one can figure out the vulnerability. However, we observe that knowing the vulnerability is useless without a method of exploiting it. Since our vulnerability allows us to control code flow but does not defeat data execution protection, it is useless to an adversary who do not possess kernel code.

This also means that Sony is not an adversary that our model defends against. Since Sony has all the code and likely debug units, it would not be feasible to write a ROP chain that can be obfuscated against Sony. The key idea is this: we only need to protect against reading out of kernel code. That means without either a clever way of figuring out our ROP chain or a separate kernel read exploit, the adversary cannot make use of our vulnerability.

Security Model

We will actually consider two security models: the weak model assumes that the adversary does not know any of the gadgets used in our ROP chain and the strong model assumes that the adversary knows all of the gadgets used in our ROP chain (and nothing else, or they can trivially just write their own ROP chain). The strong model is the more interesting case. We are trying to secure kernel code from prying eyes but our ROP chain actually leaks a lot of information about kernel code. We know, for example, a lower bound on the size of the code. We know there are some regions that are Thumb code (LSB of the addresses). From the (in-)frequency and distribution of gadgets, we can guess what are function calls and what are helper gadgets. If we identify gadgets that are function calls, we can also guess at the number of arguments they take and any constants that are passed as arguments. The list goes on and on. So our strong security assumption takes the worst case: the adversary knows exactly what each of the gadgets does.

Weak Assumption

To protect against the weak security assumption, we did two things. First we obfuscated any useful constants. A keen observer may see 256 and guess that AES-256 encryption is used. Or, if they are knowledgeable at Vita development, they may see 0x1020D006 and wonder if it is a memory type passed to sceKernelAllocMemBlock. That’s why we hid most constants inside gadgets. The 256, for example, becomes

  .word BASE+0x000232eb @ movs r0, #8 @ bx lr
  ...
  .word BASE+0x0001b571 @ lsls r2, r0, #5 @ bx lr

and the “size” parameter for the memory allocation becomes

  .word BASE+0x00001e43 @ and r2, r2, #0xf0000 ...

where r2 was used for something else earlier in execution. It was harder to craft 0x1020D006 so we had to settle with

  .word BASE+0x00000031 @ pop {r0, pc}
  .word      0x08106803 @ r0 = 0x8106803
  .word BASE+0x0001eff1 @ lsls r0, r0, #1 ...

because at the end of the day, this will not protect against smarter adversaries and is only meant to slow down analysis. There are some cases where we get obfuscation for free just because of the trickiness of writing ROP chains:

  .word BASE+0x0001f2b1 @ r5 = eor sb, r0, #0x40 ...

This was the only way to move a value from R0 to SB (which we want for storage because it is callee saved). We store the counter for the decrypt loop in SB so our constant for the payload size (used in a compare) is XORed with 0x40.

  .word (ENC_PAYLOAD_SIZE ^ 0x40) @ r4 = (payload size) ^ 0x40
  .word BASE+0x00022a49 @ subs r0, r0, r4 @ pop {r4, pc}
  .word      0xDEADBEEF @ r4 = dummy
  .word BASE+0x00003d73 @ ite ne @ movne r0, r3 @ moveq r0, #0 @ bx lr

The second thing we did was to use the dummy data for obfuscation. At times we find the need for gadgets such as

  .word BASE+0x00000ce3 @ pop {r4, r5, r6, r7, pc}
  .word      0xDEADBEEF @ r4 = dummy
  .word      0xDEADBEEF @ r5 = dummy
  .word      0xDEADBEEF @ r6 = dummy
  .word BASE+0x0000587f @ r7 = movs r2, r0 @ pop {r4, pc}

in order to set register R7. In fact most gadgets ends up popping data into registers we don’t care about. This is one source of difficulty in writing ROP chains: we need gadgets that don’t mangle registers we DO care about. About half the data in our ROP chain is junk and we can take advantage of that. If we write the address of gadgets into these junk fields, then the adversary must differentiate between gadgets and junk. We make this especially hard by training a Markov chain to generate junk data. This means the distribution of gadgets is roughly the same before and after obfuscation and since we consider bigrams, the probability that one gadget is used after another is about the same for junk fields. We do this because the adversary may use statistical heuristics to deobfuscate the ROP chain.

In the end, our obfuscation can be defeated by a brute force attack to find the junk data. You can take one field at a time and try to change it to a random value and see if the chain still executes successfully (this may be repeated for more confidence). Since there are only about 200 WORDs of data, this should be feasible (although a bit painful).

Strong Assumption

The strong security assumption poses a much more difficult problem. We need to satisfy both the following

The ROP chain must be useful and therefore must eventually execute ARM code.
It should be non-malleable so our chain cannot be taken apart and pieced together by the adversary to break our security model. This is especially difficult because by construction, ROP is malleable. Our goal is to only use a subset of gadgets that don’t have universal fit. We will call a gadget non-degenerate if its usefulness is dependent on its placement in the chain.

Here is an outline of what the payload has to do in order to be useful: first allocate a block of kernel RW memory. Then we have to get the address to that block (the Vita always requires the user to do this manually). Next, we have to set up the AES engine. Then we need to decrypt our payload using the AES engine in blocks. Finally, we have to remap the memory as RX and jump to it.

We have to protect against the obvious attacks such as removing the decryption step or changing the encryption key. There are also less obvious attacks such as changing the block for the base address or to be remapped. In the next few sections, we will describe each step of the payload and the tricks used in detail.

Design

The main design decision is to not use any LDR/STR gadgets (other than with SP as the source). This is because of the spirit of goal #2, we do not want to make it easy to reuse the gadgets. LDR/STR is likely to be degenerate. If the adversary is able to get an arbitrary read from kernel memory, it is game over. This filters out a large chunk of potential gadgets we can use. We also do not want to limit the size of the second stage payload (therefore introducing a third stage) because each loader adds to the attack surface. This means we need to have a loop in ROP. This is not easy because we need to conditionally manipulate the stack pointer and also keep variables in harder to use registers such as R8 or LR. From experience, the higher the register, the rarer the gadget (except for R12). You will find that most gadgets operate with R0-R3 and R12 because those are used as scratch registers and for parameter passing in the ABI. R4-R6 are also more common because register allocation in GCC starts at lower registers. This is a double edged sword: if we use higher registers for saving loop variables, then we are less likely to limit the gadgets we can use inside the loop. However, it also means that sometimes we have to get creative to move data around these registers (for example, abusing a EOR instruction by calling it twice on the SB register).

Allocate Memory

First, we need to call sceKernelAllocMemBlockForKernel(name = "Magic", type = 0x1020D006, size = 0xA0000, opt = NULL). This is fairly straightforward. The name is arbitrary so we just chose some string in memory. The type has to be 0x1020D006 (DRAM, cachable, kernel RW, user NA) and we described the obfuscation trick above. The size just has to be large enough to hold our payload and that particular value is due to the other obfuscation trick described above.

Get Base Address

sceKernelGetMemBlockBaseForDriver(id, base) places the base address in *base. The id is the return value from above. The easy way of doing this is to set base to a temporary buffer and then LDR it later. However, this would expose a LDR gadget. Instead we use

  .word BASE+0x00019713 @ add r3, sp, #0x28 ...
  .word BASE+0x00001e1d @ mov r0, r3 ...
  .word BASE+0x0001efe1 @ movs r1, r0 ...

to put the return value right into the stack. Then immediately after the function call we can retrieve it

  .word BASE+0x00001f17 @ sceKernelGetMemBlockBaseForDriver(r0 = id, r1 = base) ...
  ...
  .word BASE+0x00000031 @ pop {r0, pc}
  .word      0xDEADBEEF @ r0 = base address (written to from above)

Finally, we save the base address to R7 and note that this prevents us from using any gadgets that touches R7 in the future (if we really have to though, we can always move the data around but that would take work).

Initialize AES Engine

To setup the AES engine, we need to call aes_init(ctx = buf, blksize = 128, keysize = 256, key = secret_buf). The trick to obfuscate the constant 256 was described above. We will highlight a couple of other tricks. First, we need to make sure the ctx buffer (which contains the expanded key) is not revealed to the user. This is pretty simple: we just place the buffer into the memory block we just allocated. That memory block is not accessible in user mode and our security assumption is that kernel memory is protected. We also store the ctx buffer into R6 for future use. This prevents putting the pointer to sensitive information into the user-modifiable (from the exploit) stack. However, this also makes things harder as from this point onwards, we can no longer use gadgets that corrupt R6. Finally, we use the following non-degenerate gadget to set up the key pointer

  .word BASE+0x0001fdc5 @ mov r3, lr ...

The return pointer was set from a previous gadget

  .word BASE+0x000050e9 @ mov r0, r7 @ blx r3

and this intricately links the previous section for getting the memory base to setting the key here. Also note that since the key is in kernel code, as long as the kernel code is safe, our payload code will also be protected in our security model. In practice though, since we are using AES-ECB, we are vulnerable to replay attacks but more on that later…

Decrypt Loop

The loop was tricky to implement. We have a counter in SB that is incremented in each iteration. To keep things simple we also increment it in the first iteration which puts our payload at offset 0x10. Thankfully this still works as 0x00000000 is a NOP in 32-bit ARM so we can slide into our payload. The loop logic involves conditionally subtracting from the stack pointer. To do this, we first save the “right” stack pointer into R4

  .word BASE+0x0001d9eb @ add r2, sp, #0xbc ..,
.Lsp_offset_start:
  ...
  .word BASE+0x00000853 @ pop {r0, r1, pc}
  .word      0xDEADBEEF @ r0 = dummy
  .word (0xbc - (.Lloop_end - .Lsp_offset_start)) @ r1 = 0xbc-sizeof(loop)
  .word BASE+0x000000ab @ subs r2, r2, r1 ...
  ...
  .word BASE+0x0002328b @ movs r1, r2 ...
  ...
  .word BASE+0x000000d1 @ movs r4, r1 ...

This puts R4 at “.Lloop_end” which is exactly the value of SP if the loop condition is not met.

  .word BASE+0x0001bf1f @ movs r2, r4 ...
  ...
  .word (-(.Lloop_end-decrypt_loop_start)) @ r3 = offset to start of loop
  .word BASE+0x0000039b @ pop {r4, pc}
  .word (ENC_PAYLOAD_SIZE ^ 0x40) @ r4 = (payload size) ^ 0x40
  .word BASE+0x00022a49 @ subs r0, r0, r4 ...
  ...
  .word BASE+0x00003d73 @ ite ne @ movne r0, r3 @ moveq r0, #0 ...
  ...
  @ add either 0 or offset to loop start to r2 (sp at loop end)
  .word BASE+0x000021fd @ add r0, r2 ...
  ...
  .word BASE+0x00000ae1 @ movs r1, r0 ...
  ...
  .word BASE+0x0002a117 @ pop {r2, r5, pc}
  .word BASE+0x00000347 @ r2 = pop {pc}
  .word BASE+0x0001f2b1 @ r5 = ...
  .word BASE+0x00000067 @ mov sp, r1 @ blx r2
.Lloop_end:

If the condition is met, then SP is set back to the start of the loop. Cool, right!

Decrypt Payload

The actual decryption code was perhaps the hardest to write. In theory, it is simple enough: aes_decrypt(ctx = r6, src = user_buf+counter, r2 = r7+counter). However, the difficulty comes from the fact that we cannot use R6, R7, SB and therefore we must let aes_decrypt save the callee saved arguments into the stack. However, the PUSH instruction will corrupt our ROP chain. The first attempt was to just leave a chunk of space in the chain (using ADD SP) before calling aes_decrypt. That doesn’t work however, as LR is pushed to where the gadget to call aes_decrypt used to be (breaking the next iteration). Since the gadget to call aes_decrypt will be corrupted no matter what, the only way around it is to rewrite that gadget into the chain in each iteration.

We did not have any usable gadgets of the form STR SP, [Rs, #-IMM] so the store must happen before the call. The only gadgets of the form STR SP, [Rs, #IMM] had IMM at most 0x1C so the write gadget must be very close to the aes_decrypt call. However, this brings us back to the original problem that aes_decrypt corrupts the 0x18 bytes of the stack above it. To make matters even worse, we also need to set R1 and R2 before making the function call (the arguments). It is impossible to find a gadget that both writes to the stack and also sets R1 and R2 to the right value, so we must setup R1 and R2 beforehand. This means our gadget restoring gadget must also not touch R1 and R2. After hours of searching, the perfect gadget was found

  str r5, [sp, #0xc]
  ldr r5, [sp, #0x38]
  str r5, [sp, #0x10]
  blx r4
  add sp, #0x1c
  pop {r4, r5, pc}

This gadget is highly non-degenerate. It is almost tailor made for our specific purpose. To prevent confusion (there will be confusion), we will now refer to this as the “magical gadget.” Here’s how we used it:

  .word BASE+0x00001411 @ pop {r4, r5, pc}
  .word BASE+0x00000347 @ r4 = pop {pc}
  .word BASE+0x000209d7 @ r5 = str r5, [sp, #0x10] @ blx r4 @ add sp, #0x1c @ pop {r4, r5, pc}
  .word BASE+0x000209d3 @ str r5, [sp, #0xc] @ ldr r5, [sp, #0x38] @ str r5, [sp, #0x10] @ blx r4
  .word BASE+0x00001411 @ pop {r4, r5, pc}
  .word BASE+0x00000347 @ r4 = pop {pc}
  .word BASE+0x0001baf5 @ r5 = 0xD8678061_aes_decrypt

  @ BEGIN region overwritten by decrypt
  .word      0xDEADBEEF @ becomes str r5, [sp, #0x10] @ blx r4 @ add sp, #0x1c @ pop {r4, r5, pc}
  @ lr = add sp, #0x1c @ pop {r4, r5, pc}
  .word      0xDEADBEEF @ becomes add sp, #0xc @ pop {pc}
  .word      0xDEADBEEF @ dummy
  .word      0xDEADBEEF @ dummy
  .word      0xDEADBEEF @ dummy
  .word      0xDEADBEEF @ becomes 0xD8678061_aes_decrypt(r0 = ctx, r1 = src, r2 = dst) @ bx lr
  @ END region overwritten by decrypt

  .word      0xDEADBEEF @ dummy
  .word      0xDEADBEEF @ dummy
  .word      0xDEADBEEF @ dummy
  .word      0xDEADBEEF @ dummy
  .word      0xDEADBEEF @ dummy
  .word BASE+0x0000652b @ loaded by above: add sp, #0xc @ pop {pc}
  .word      0xDEADBEEF @ dummy
  .word      0xDEADBEEF @ r4 = dummy
  .word      0xDEADBEEF @ r5 = dummy

The short of it is that we first write two helper gadgets to the region destroyed by aes_decrypt. Those two gadgets will restore the gadget for making the aes_decrypt call and then actually call it.

Lets step through this line by line. First we setup R4 and R5. R5 is the first restoring gadget (the magical gadget) and SP+0x38 (below the volatile region) is the second restoring gadget. We then invoke the magical gadget for the first time and write the two restoring gadgets. It then jumps to R4 which we have defined as a simple no-op (pop the next gadget). The next gadget sets up R4 and R5 again for the next phase. The first restoring gadget (the magical gadget) is called to write R5 (now the aes_decrypt gadget) to the right location. Then the next one skips the dummy data and executes aes_decrypt.

aes_decrypt saves the LR value and returns to it at the end. Where is LR? Inside the magical gadget of course (thanks to the BLX R4). That means we run

  add sp, #0x1c
  pop {r4, r5, pc}

which hands control back to the ROP chain. Note that we used this one gadget three different ways here! Crazy!

Remapping Executable

The final step is to remap the memory region as executable. It is very straightforward. We first use sceKernelFindMemBlockByAddrForDriver(base = r7, 0) to retrieve the block id. Then we call remap_memory(blkid = r0, type = 0x1020D005). Note the type is now kernel RX user NA. For our final trick, we obfuscate the type parameter by making it appear the same as the first type

  .word      0x08106803 @ r1 = 0x8106803
  .word BASE+0x000233d3 @ lsls r2, r1, #1 ...
  ...
  .word BASE+0x00000433 @ subs r1, r2, #1 ...

Then we jump into the executable

  .word BASE+0x00011c5f @ blx r7

and we’re done! Note that we do not perform any authentication on the binary payload. This design decision was made for two reasons: 1) if we introduce more gadgets, we increase the amount of data leaked and 2) the authentication may be vulnerable to an oracle attack since it will likely be very simple. This means, however that we are vulnerable to replay attacks (moving and changing the encrypted blocks around) and we allow the adversary to jump into random code. We believe that either attacks will be very difficult to exploit. Since the block size is large and the payload is small, the attacker does not have many blocks to work with. Jumping into random code will, with high probability, just trigger undefined instruction exception before doing anything useful.

Breaking the Chain

We will now give one possible solution for solving the challenge without a memory leak (but with lots of luck and intuition). First break the junk data obfuscation with the method described in that section. Working backwards, we wish to redirect the aes_decrypt gadget to “decrypt” kernel memory with a known key into the stack buffer (that we can leak with the sceIoDevctl exploit). To do that, we have to find the aes_decrypt gadget and then figure out the arguments. We know the source argument so we need to find the destination argument (which must be close-by in the chain). There are a lot of different ways of going about this. For example: attempt to modify gadgets addresses with +/-1 or +/-4 in hopes of hitting the counter increment gadget. Unfortunately this will not work here because the +0x10 is done directly by a gadget without any arguments. Eventually we might attempt a timing attack using another processor reading kernel stack while the ROP chain runs. We will then find that at some point, the base address of the decrypt buffer will be placed in the stack. We can now race to change the pointer to point into the stack instead and then have the payload decrypted to kernel stack. However, timing is critical because the remap gadget will fail and the kernel will panic when attempting to execute the code (this may be mitigated by removing the BLX R7 gadget and replacing it with multiple copies of the chain). We even discover the aes context buffer is now in kernel stack and find the scheduled keys. If we knew that R3 to aes_init is a pointer to the key, we could also try to replace the key pointer to kernel stack and change the key to a known one. Then, we can replace the source pointer to be in SceSysmem and “decrypt” the data with a known key into kernel stack. Then, we can “encrypt” that data to get the crown jewels at last.

↧

taiHEN: CFW Framework for PS Vita

November 1, 2016, 12:00 am

≫ Next: Designing taiHEN: A CFW Framework

≪ Previous: HENkaku KOTH Solved

Ever since I first bought the Vita, I have dreamed of running a custom firmware on it. I don’t mean just getting kernel code running. I want an infrastructure for adding hooks and patches to the system. I want a system for patching that was properly designed (or actually has a design), clean, efficient, and easy to use. That way, firmware patches aren’t a list of hard coded offset and patches. I’ve seen hacks that busy loops the entire RAM looking for a version string pattern so it can replace it with a custom text. I’ve seen hacks that redirect the “open” syscall so every file open path is string compared with a list of files to redirect. The examples go on and on. Needless to say, good software design is not a strong point for console hacking. For HENkaku, we did not commit any major software development sins, but the code was not perfect. It had hard coded offsets everywhere, abuse of C types, and lots of one-off solutions to problems but it got the job done. Part of the reason we didn’t want to release the source right away was that we didn’t want people to build on that messy code-base (the other reason was the KOTH challenge). I remember the dark days of 3DS hacking where every homebrew that needed kernel access would just bundle in the exploit code. This is why I decided to create taiHEN.

taiHEN

taiHEN is a framework for writing application and system level patches. Simply put, it lets you run game and kernel plugins anywhere. taiHEN is not a new exploit. The HENkaku update (which we lovingly call taiHENkaku) uses the same chain of exploits (and therefore still requires firmware 3.60) but the actual firmware patches have been moved to the taiHEN system. taiHEN is designed to be firmware and exploit agnostic–that means it should run on any firmware if you bring your own exploit. Right now the only exploit is HENkaku and it requires WebKit to work. However, if someone finds a boot exploit or an exploit for 3.61/3.63, all they have to do is load taihen.skprx and (ideally) every plugin should just work. This also means that when someone ports over the HENkaku exploit to lower system versions, they do not have to re-build every patch from scratch.

In addition to adding hooks to the kernel, taiHEN also allows hooking system applications and games. Add elements to LiveArea? Enable more options in Settings? Cheats in games? The possibilities are endless. More information is at the official site: tai.henkaku.xyz and from Davee’s blog.

taiHENkaku

As promised, this is the big HENkaku update. In addition to the major plumbing overhaul, we added some new features to HENkaku too:

Loading compressed FSELFs are supported now
VitaShell is updated to 1.42 with a brand new HENkaku configuration menu that allows user configuration of PSN version spoofing. (Note at the time of writing, VitaShell has not been updated yet. I will push an update as soon as it is out.)
Unsafe homebrew is disabled by default This change means that some of your homebrew will not launch immediately. Before you panic, go into molecularShell, press Start, enter the HENkaku configuration menu and choose to enable unsafe homebrew. You also need to do this to use system and kernel plugins. More information on this change can be found here. (Note, this feature is disabled in the beta currently because the VitaShell configuration options is not out yet. It will be enabled as soon as that’s done.)

Because this is a major update with a significant increase in complexity, we are releasing it as an open beta. The changes mostly benefit developers wanting to write plugins with taiHEN so that is the target audience. To install it, reboot your Vita and visit http://beta.henkaku.xyz/. You can always go back to the last stable release from the regular site https://henkaku.xyz/.

Note on the beta: It is currently in an unstable state. Some features such as PSN spoofing do not currently work. I hope to resolve the issues in the upcoming days. Meanwhile, I hope that developers can start writing plugins immediately while I iron out the issues. Again, the beta is only recommended for developers making plugins and is of no benefit currently for regular users.

Plugin SDK

Davee did a wonderful job implementing SDK support for user and kernel plugins. The changes are not in the mainline yet, so please help us test it. You need the new toolchain updates to build taiHEN and your own plugins.

Development Wiki

This brings me to the last point. For the kernel, there needs to be a lot of reverse engineering to figure out all the functionalities exported by the kernel. We at molecule have done a lot of work in the past few years but we have not even covered 10% of what the kernel exports. This was the prize given to those who completed the KOTH challenge and now it is released for the public. It contains just about everything that molecule has discovered and reversed about the Vita since 2012 and includes a lot of low level information about the system. It is a good place to start for anyone who wishes to get into Vita hacking: wiki.henkaku.xyz.

What’s next?

To summarize, today we are releasing four things

taiHEN, a CFW framework for the Vita enabling kernel and user plugins for everyone
taiHENkaku, the latest update to HENkaku that uses taiHEN
Plugin supporting SDK, for creating kernel and user plugins
Vita Development Wiki, the largest resource for Vita hacking and revere engineering

All this is due to the gracious work done by my friends in molecule: Davee, Proxima, and xyz. I am extremely lucky to have worked with such talented individuals and they have my sincere thanks. All our releases have been made with a level of polish and professionalism unparalleled by anyone else in the console hacking scene because of them. This also marks the end of our active development on the Vita. We’ll release bug fixes from time to time and we’ll continue to look into hacking the F00D processor (lv0), but we will not have the time to create user facing content anymore. I want to thank the community for the encouragement and support and I want to thank Sony for building the Vita and making it secure. Finally, I want to thank everyone who participated in the KOTH challenge and proved to me that there is indeed still interest in hacking the Vita. I know that we leave the scene in good hands!

↧

Designing taiHEN: A CFW Framework

November 17, 2016, 12:00 am

≫ Next: State of the Vita 2016

≪ Previous: taiHEN: CFW Framework for PS Vita

I take software design very seriously. I believe that the architecture side of software is a far more difficult problem than the implementation side. As I’ve touch upon in my last post, console hackers are usually very bad at writing good code. The code that runs with hacks are usually ill performing and unstable leading to diminished battery life and worse performance. In creating taiHEN, I wanted to do most of the hard work in writing custom firmwares: patching code, loading plugins, managing multiple hooks from different sources so hackers can focus on reverse engineering and adding functionality.

A nice companion piece to this would be my previous article on designing a CFW for the 3DS. However, even my 3DS CFW was lacking as it required hard coding offsets to patches (for each firmware version). A workaround is to do pattern matching, which is what many 3DS CFW do, but that is only a half-measure. A key observation: most desired patches will have a user-observable effect. A second key observation: most observable effects span multiple modules. What does this mean? On the Vita, the kernel is modular and applications are also modules. Every functionality is contained in its own module, with its own set of permissions, and its own interface. For example, SceKernelThreadmgr handles threading related stuff and SceIoFilemgr handles file IO. If we treat each module as a black box and only worry about the interface between modules, we can still do some powerful modifications. For example, if we wish to change the data from a file accessed from a game, we only need to hook the interface between the game and SceIoFilemgr and write logic to redirect the file if some conditions match (typically a string compare with the path). This way, we do not have to dig inside the game and find the specific offset of the function that accesses the file of interest. This is also a good approach because the kernel has built in support for finding exported and imported functions (it is used by SceModulemgr to do dynamic linking). Additionally, since modules share the same identifiers for exports/imports across firmware versions (for compatibility), this removes the need for the hacker to have to try to manually find offsets or do messy pattern matching. In practice, the majority of firmware patches can be done this way. To modify functionality affecting just that one module, hook an import function. To modify functionality affecting all other modules, hook an export function. Hooking the interfaces between modules is much cleaner than injecting code into the modules themselves.

This was the main goal in taiHEN: give developers a way to easily add hooks to module interfaces. But that is not enough for a good custom firmware framework. I also wanted to satisfy the following goals

The framework should be simple. Nobody likes reading documentation and figuring out which init functions to call and what flags to pass in. The API should be simple to read and intuit.
It should be thread-safe. This is a requirement for any kernel code running on a modern system.
It should be robust. I do not want to introduce new bugs into the kernel.
It should be fast. Again, a basic requirement for kernel code. Specifically, inserting hooks should be fast and more importantly, executing the hooks should not require a ton of overhead.

To explain how these goals are met, I will dive into the low-level details of certain design decisions and my justification for doing it that way.

Data Structure

The most important aspect of the design is the underlying data structure. In order to choose the right data structure for the job at hand, you have to consider what operations you wish to optimize for. In this case

Given a process id, check if we have patches for the process.
Given a process id and an address (and size), query the structure to see if a patch already exists.
Add and remove patches
For function hooks, store the original function so it can be called

Hashmap layout To start out, we use a standard hashmap to store tai_proc_t structures. This maps process ids to tai_proc_t, so we can quickly get information for a given process id. Since process ids are efficiently random, they serve as a good hash function as well.

Now it gets tricky. We support two kinds of patches: injection and hooks. An injection is simple–it’s a direct write to the target memory. A typical use case might be to overwrite a string in read-only memory. An injection also claims exclusive access: another attempt at injecting the same address will fail. A hook, on the other hand, is designed to redirect a function. For any given imported/exported function, a hook will redirect control flow to the patch function. This is done thanks to libsubstitute. There might be the case where multiple plugins wish to hook a single function (for example, two different plugins wish to redirect two different files). That’s where the hook chain comes in. The patched function can invoke TAI_CONTINUE to call the next patch function in the chain (the last one in the chain would be the original function). This allows the patch function to manipulate the inputs before continuing the chain and manipulate the output after continuing the chain.

To support this “hook chain” idea, the easiest implementation would be to generate a “dispatcher” function. We can keep track of every hook in the chain, and then the dispatcher calls them all in order. Removing a hook from the chain would be simple enough, just remove the reference from the dispatcher. This would introduce a lot of overhead though. If there is just one function in the chain, we would have to still generate and run the dispatcher. Here is a better solution: when the first hook is inserted, we use libsubstitute to redirect the function call directly to that patch function. We then store a pointer to the original function along with some other information in tai_hook_t in the process’s address space. TAI_CONTINUE will use that pointer to find the next patch function to call–no dispatcher needed. tai_hook_t is therefore a node in a linked list. When we add another hook to the chain, we simply allocate another tai_hook_t and add it to the linked list. The only catch is that if we remove the head hook, we must re-patch the original function with libsubstitute to jump to the new head hook.

So the layout is as follows: Each tai_proc_t points to a sorted linked list of tai_patch_t. Each tai_patch_t is either an injection or a linked list of hooks. Because the patches are sorted by their address, it makes insertion and deletion simple.

Here’s a visual representation of what the data structure looks like.

Data structure layout

The last thing to note is the slab allocator. There are two times where we need to allocate memory accessible directly from the process’s address space. First, libsubstitute needs to save the first couple of bytes of the function it is hooking. This is so you can call back into the original function from your patch. Second, as mentioned above, we need the actual hook data to be in user address space in order for one hook to find the next one in the chain without potentially having to call down into the kernel to find it. The easy way of doing this is to allocate a new page for each request, but that wastes a lot of memory because we need only about 20 bytes for both the hook data and the saved original function. A slab allocator is perfect for this situation because each request is small and about the same size. This makes allocations both fast and have little overhead. The allocator I chose to use was this one because it was simple and has no external dependencies.

Designing for Testability

The hard part of any project–especially writing kernel code in a system with no debugging facilities–is testing it. My only “debugging” tool is printf so a project that is multithreaded, operates with complex custom data structures, and interacts intimately with the kernel makes testing a daunting task. One of my goals is for taiHEN to be robust. That is why from the start, I designed it to be built from blocks that are self contained.

Building blocks

In the bottom layer, we have tai_proc_map which is a hash map with some special add/query functions (as described above). Because this data structure doesn’t depend on any Vita-specific functionality, I was able to write a unit test on my Intel x86 machine. There, I spawned hundreds of threads each performing random operations with the hash map. Once that worked without and crashes or leaks, it gave me enough confidence to use it as a building block. Since the slab allocator and libsubstitute are both external projects with their own unit tests, I can just rely on those. In the next level, I have tests for the NID resolver and the hook chain system (described above). Each gets their own test. Once those were passing with enough seeds and iterations, I hooked everything together into the APIs exposed to the developer. The final tests I wrote only interfaces with the public APIs (in multiple threads) to ensure that the overall system was working.

This is not the perfect testbench but it is the bare minimum that I believe all complex software projects should do. With more time, I would have written more unit tests, directed tests, and randomized tests. My minimum testbench though saved me from a lot of headaches. When something breaks, I didn’t have to wonder what component has the bug or if multiple components were all breaking at once. I can quickly root cause the crash and fix it. I think, for developers, the most frustrating situation is not being able to proceed: not knowing how to debug is more despairing than having to fix a complicated bug.

Conclusion

You can learn more about taiHEN at the dedicated site and you can read about the APIs here. I hope that people will build some really neat stuff with this framework. I also hope that people will build upon it as well. I know that the PS4’s kernel is structured very similarly and it just may be possible to port taiHEN there. If anyone would like to pick it up, check out the GitHub page.

↧

State of the Vita 2016

December 31, 2016, 12:00 am

≫ Next: psvimgtools: Decrypt Vita Backups

≪ Previous: Designing taiHEN: A CFW Framework

Although it hasn’t been a good year for all of us, 2016 was a great year for the Vita. In August, molecule released the first user-friendly Vita hack which builds on four years of research and a year of building a SDK platform from scratch. Since then, we saw dozens of homebrews, new hackers showing up in the scene, and the creation of a community that I am proud to be a part of. In November, I released taiHEN, a CFW framework that makes it easy to extend the system and to port future hacks. As such, it was a busy year for molecule. We are a team of five individuals and we served as pen testers, exploit writers, web developers, UI designers, web masters, IT, moderators, PR, recruiters, software architects, firmware developers, support, and lawyers for the Vita hacking community. These are roles we took out of necessity because Vita hacking is such a niche interest. However, these are not roles we can hold forever. Back in November, I said that I (and I am assuming the rest of molecule but I do not speak for them) would retire from the scene after taiHENkaku was stable enough and that time has finally come. Aside from a parting gift from Davee that should be released in a couple of days we will be retiring from all non-research tasks. Since we entered the scene with no drama, no bullshit, and no corruption, we will leave in the same manner. Firstly, all our work are either already open sourced or are in the process of being tidied up and released. Second, we have extensively documented all our findings on the Vita with the exception of our TrustZone (lv1) hacks which we left out at the request of other hackers who wish to try the challenge without aid. Lastly, we revamped the process for setting up development and making homebrew is easier than ever. Fixing the toolchain required a lot of boring and tedious work and I want to thank everyone who helped with the process. I am proud that our toolchain is the only unofficial toolchain that was designed rather than hacked together.

The community

We leave the rest to you, the hacking community. We hope HENkaku will be ported to other firmwares. We hope that taiHEN will be used to make spectacular extensions. We hope that someone will make a debugger for the SDK. We hope someone will find a way to dump the latest firmware and enable PSN spoofing. We opened a new forums for Vita developers and hackers alike to share ideas and creations. I know realistically, because of how small the user base is, we will not have the level of activity that exists on the 3DS or iOS jailbreaking community. But nevertheless, I am thankful for everybody who has participated in Vita hacking.

What is left

There are four distinct security levels on the Vita. Userland, kernel (lv2), TrustZone (lv1), and F00D (lv0). We have hacked the first three levels, but owning F00D is particularly challenging. It uses a proprietary instruction set and an architecture that is severely underdocumented. It has minimal attack surface, and we can’t see any code that runs on it because of multiple levels of encryption. Even if we get a crash through fuzzing, it is unclear how we can exploit any vulnerability. Hardware attacks are not useful here because we don’t have any control of the code running on it (typically hardware attacks involve escalating the privilege of running code). Attacking F00D will be my only focus in Vita hacking at this point and I welcome anyone who wants to help me in this journey.

Hardware mods

If you are a skilled hardware hacker and can wire together an external eMMC flasher for the Vita or PS TV, please contact me. I am willing to pay for such a device and it would help speed up fuzzing efforts. The pinouts can be found here. The problem is there there are no test points for the eMMC (unlike most other devices) so the only way to get access is by cutting the trace or by soldering to the tiny (~0.5mm) noise reducing resistors next to the CPU. I believe that replacing these resistors with solder bridges would be safe. Then external wires can be soldered onto the bridge and connected to some port that we can drill into the case. However, the scale of all this is beyond my skills and equipment. If you know anyone who can help with this, please forward this request to them. It would be an immense service to molecule and the Vita hacking scene.

Final Words

In this day and age when hacking has been politicized, fetishized, and commoditized, we should remember where hacking came from. Hacking is about freedom of knowledge not an ego contest about who knows what. Hacking is about control over the devices we own by us not control by other hackers. Hacking is about fun and exploration and challenges not about showing off and making profits. As our skills becomes ever more relevant for the connected world and generates power and revenue for many organizations, it is easy to forget that. But luckily for us, we are Vita hackers. Nobody has ever profited off the Vita.

↧

psvimgtools: Decrypt Vita Backups

February 19, 2017, 12:00 am

≫ Next: Modem Cloning for Fun (but NOT for profit!)

≪ Previous: State of the Vita 2016

The Vita’s Content Manager allows you to backup and restore games, saves, and system settings. These backups are encrypted (but not signed!) using a key derived in the F00D processor. While researching into F00D, xyz and Proxima stumbled upon a neat trick (proposed originally by plutoo) that lets you obtain this secret key and that has inspired me to write a set of tools to manipulate CMA backups. The upshot is that with these tools, you can modify backups for any Vita system including 3.63 and likely all future firmware. This does not mean you can run homebrew, but does enable certain tricks like disabling the PSTV whitelist or swapping X/O buttons.

Backup Keys

Because my friends who discovered this are pretty busy with other stuff at the time, I will attempt to document their findings here. The backup encryption process is documented in detail on the wiki, but the short version is that your AID (unique to a PSN account) is used to generate a key seed. This key seed is used by the F00D processor (the security coprocessor) to generate a AES256 key, which is passed directly to the hardware crypto device. The ARM (application) processor can access this crypto hardware but cannot read any keys out of it. This means that ARM can use the hardware as a black-box to encrypt backups without knowing the key. Of course you can try to brute force the key since you know both the plaintext and ciphertext thanks to the HENkaku kernel hack, but that would take time, which is physically impossible. However, since we can hack any Vita on 3.60, it is possible to use the Vita itself as a black box for extracting and modifying backups for other devices on unhackable firmwares, but since the process requires access to a hacked Vita, it is not very useful.

One Weird Trick

But not all hope is lost! As I’ve said, the crypto hardware can be accessed by the ARM processor as well as the F00D processor. For certain other non-critical tasks, the ARM processor sets the key directly for the crypto hardware, so we know how the keys are set. There are a few dozen key slots that both processors can write to. The catch is that once the key is written, it cannot be read back.

Let’s dive deeper into how keys are passed to the crypto hardware. Note that an AES256 key is 256-bits or 32 bytes wide. Since an ARMv7 processor can only write 4 bytes at a time (okay it can do 8 bytes and also the bus width is usually optimized to be the size of a cache line, but for simplicity, we assume it can only write 4 bytes), a 32 byte key is sent with 8 write requests of 4 bytes. Now, the correct way for a crypto device to handle this is to provide a signaling mechanism to the host so it can indicate when a key slot write is about to occur. Then the host sends all parts of the key. Finally, the host indicates that the key transfer is complete and the crypto device locks the key in place and wipes it when another key transfer is requested for that slot. And for completeness, there should be measures in place to only allow one device to do a key transfer at a time in order to prevent races.

The incorrect way to do this is to naively allow anyone to set any part of the key at any time. Why? Because if we can set part of an unknown key to a known value, we can reduce the time to brute force the complete key dramatically. Let’s say we have an unknown 256-bit key that is 22 22 22 22 44 44 44 44 66 66 66 66 88 88 88 88 AA AA AA AA CC CC CC CC EE EE EE EE 11 11 11 11. Now say we can zero out the first 28 bytes of this key so the crypto engine uses 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 11 11 11 11 where we still don’t know the last 4 bytes.

But now, we pass in a chosen plaintext to the crypto device to do an AES256 operation and we get back the ciphertext. We can then brute force every possible key with the first 28 bytes to be zero. That’s keys, which takes about a minute to compute with a single modern Intel core. We now know the last four bytes of the key and can repeat this procedure for the second to last four bytes and so on. This reduces the search time to , which is not only possible but practical as well. Running this brute force optimized on a four core Intel CPU with hardware AES instructions takes about 300 seconds to find the full 256-bit key. In fact, xyz pointed out that you can even precompute all possible “mostly-zero” keys and the storage would “only” be half a TB.

As you might have guessed, the Vita does it the incorrect way, so anyone can retrieve their backup keys.

psvimg-keyfind

I wrote a tool to do this brute force for you. It is not hyper-optimized but is portable and can find any key on a modern computer in about ten minutes. I have provided a Vita homebrew that generates the chosen ciphertexts on any HENkaku enabled Vita. These “partials”, as I call it, can be passed to psvimg-keyfind to retrieve a backup key for any PSN AID. The AID is not console unique but is tied to your PSN account. This is the hex sequence you see in your CMA backup path. The idea is that if you have a non-hackable Vita, you can easily send your AID to a friend (or stranger) who can generate the partials for you. You can then use psvimg-keyfind to find your backup key and use it to modify settings on your non-hackable Vita. Huge thanks to Proxima for the reference implementation that this is based off of.

UPDATE: You no longer need to use this tool. This site will take care of everything if you pass in your AID.

Hacking Backups

What I did is completely reverse how CMA generates and parses the backup format. I have documented extensively how these formats work. I also wrote tools to dump and repack CMA backups and all this works with backups generated from the latest firmware.

Hacking backups isn’t as fun as having a hacked system. So, don’t update from 3.60 if you have it! You cannot run unsigned code with this, so you are only limited to tricks that can be done on the registry, app.db, and other places. This includes:

Enabling almost any games to run on the PSTV
Swap X/O buttons for out-of-region consoles
Run PSP homebrew with custom bubbles
and maybe more as people make new discoveries

My hope is that other people will take my tools as building blocks for a user-friendly way of enabling some of the tricks above as currently the processes are pretty involved. This also increases the attack surface for people looking to find Vita exploits as parsing of files that users normally aren’t allowed to modify are common weak points.

Additionally, because of how Sony implemented CMA backups and that the key-erase procedure is a hardware vulnerability, this is pretty much impossible to patch in future firmware updates. Unless Sony decides to break all compatibility with backups generated on all firmware up until the current firmware. And that would mean that any backup people made up until this theoretical update comes out would be unusable. Sony is known for pulling stunts like removing Linux from PS3, but I think this is beyond even what they would do.

Release

I’ve built versions of this tool for Windows x64, Linux x64, and OSX here. Please read the usage notes.

↧

Modem Cloning for Fun (but NOT for profit!)

April 2, 2017, 12:00 am

≫ Next: psvsd: Custom Vita microSD card adapter

≪ Previous: psvimgtools: Decrypt Vita Backups

Recently, I stumbled upon an old cable modem sitting next to the dumpster. An neighbor just moved out and they threw away boxes of old junk. I was excited because the modem is much better than the one I currently use and has fancy features like built in 5GHz WiFi and DOCSIS 3.0 support. When I called my Internet service provider to activate it though, they told me that the modem was tied to another account likely because the neighbors did not deactivate the device before throwing it away. The technician doesn’t have access to their account so I would have to either wait for it to be inactive or somehow find them and somehow convince them to help me set up the modem they threw away.

But hackers always find a third option. I thought I could just reprogram the MAC address and activate it without issue. Modems/routers are infamously easy to hack because they always have outdated software and unprotected hardware. Almost every reverse engineering blog has a post on hacking some router at some point and every hardware hacking “training camp” works on a NETGEAR or Linksys unit. So this post will be my rite of passage into writing a “real” hardware hacking blog.

BPI+

Getting access to a shell was laughably easy so I won’t even go into details. In short, I Googled the FCC ID found on the sticker and found the full schematics for the board along with part numbers of all the chips (such information is required in the FCC approval process but most companies request that it be kept confidential). Through the schematics, I found the UART console, which was nicely exposed through some unfilled port. In fact, I did too much work here because after opening the device up, I found the word “CONSOLE” printed on the solder mask right next to those ports. After soldering some headers to it, I was able to connect it to my Raspberry Pi and enter the root shell without needing any password. The whole process took about an hour–the most time being trying to physically open the plastic shell because (and this may be surprising) hackers are not the epitome of physical strength.

Once I got a shell, I dumped the flash memory and I grepped for the MAC address printed on the label (trying hex, ASCII, and different separators). I found a file in a partition labeled NVRAM containing the MAC address. The file does not appear to have any checksums, so I just replaced it with a new MAC, rebooted and… nothing. The modem refused to establish a connection. That’s when the real work started…

The first clue was looking around in the NVRAM partition and finding a set of certificates signed for the modem’s MAC address. Googling “DOCSIS certificate” led me down the rabbit hole of modem cloning, service stealing, bandwidth unlocking, and so on. I learned about how not too long ago, people would modify their modem configuration files in order to unlock higher speeds than what they paid for (if anything at all). As ISPs clamped down and secured their infrastructure, the hackers moved on to “cloning” modems by finding the MAC address of an existing subscriber and reprogramming their modem to use the same MAC address in order to steal service. As a result of all this, the DOCSIS 1.1 specification established a PKI system of validation for MAC addresses.

First, I generated a set of self-signed certificates for my new MAC address. Surprisingly, I was able to provision the modem and my ISP accepted the certificate and gave me an IP address. Unfortunately, I was not able to access the Internet and even using my old router’s MAC address did not work. My guess is that self-signed certificated are used by engineers to test the network and therefore do not allow access to the Internet. It likely also has to do with protections against “simple” cloning. Now my plan is to get a new set of certificates from an unactivated device. I went on eBay and bought a broken SurfBoard SBG6580. The reason for this model is purely because it was the cheapest one I could find. Since it was broken, it is more likely that it’s deactivated.

Dumping SBG6580

Wires hooked up

Unfortunately, the FCC does not have the schematics for this device public but a quick inspection showed that the chip labeled Spansion FL128SAIF00 is a 16MiB SPI based flash memory with the datasheet being easily available online. Being a TSOP chip, it is easy enough to solder wires to and luckily I remembered NORway from back when I downgraded my PS3 and that it has SPI dumping support. I connected the Teensy2++ and patched in support for detecting this chip.

binwalk was able to find some embedded certificates

DECIMAL       HEXADECIMAL     DESCRIPTION
--------------------------------------------------------------------------------
67322         0x106FA         Certificate in DER format (x509 v3), header length: 4, sequence length: 803
68131         0x10A23         Certificate in DER format (x509 v3), header length: 4, sequence length: 1024
70249         0x11269         Certificate in DER format (x509 v3), header length: 4, sequence length: 808
71063         0x11597         Certificate in DER format (x509 v3), header length: 4, sequence length: 988
83445         0x145F5         Certificate in DER format (x509 v3), header length: 4, sequence length: 866
84317         0x1495D         Certificate in DER format (x509 v3), header length: 4, sequence length: 983
85306         0x14D3A         Certificate in DER format (x509 v3), header length: 4, sequence length: 864
100090        0x186FA         Certificate in DER format (x509 v3), header length: 4, sequence length: 803
100899        0x18A23         Certificate in DER format (x509 v3), header length: 4, sequence length: 1024
103017        0x19269         Certificate in DER format (x509 v3), header length: 4, sequence length: 808
103831        0x19597         Certificate in DER format (x509 v3), header length: 4, sequence length: 988
116213        0x1C5F5         Certificate in DER format (x509 v3), header length: 4, sequence length: 866
117085        0x1C95D         Certificate in DER format (x509 v3), header length: 4, sequence length: 983
118074        0x1CD3A         Certificate in DER format (x509 v3), header length: 4, sequence length: 864
131164        0x2005C         LZMA compressed data, properties: 0x5D, dictionary size: 16777216 bytes, uncompressed size: 2898643604054482944 bytes
8388700       0x80005C        LZMA compressed data, properties: 0x5D, dictionary size: 16777216 bytes, uncompressed size: 2898643604054482944 bytes

This includes DOCSIS BPI+ certificates for both US and European regions as well as code signing certificates and root certificates. But unfortunately, no private keys. From experience, it seems likely that the private keys would be stored close to the public keys, so I looked in the hex dump for possible candidates. There were blobs of random looking data in between some of the certificates. It also appears that before each certificate is a two-byte length of the DER file. So I was able to parse the NV storage to dump the certificates, some plaintext setting and device information, as well as 0x2A0 sized blobs of data I previously saw. This data can’t be the private exponent of the RSA key because it is too large. It also does not appear to contain any structure, so it can’t have any CRT component of a key. My hypothesis was that it’s an encrypted PKCS#8 RSA private key in DER format. The evidence was that the file size was aligned to an encryption block, that my other modem used PKCS#8 in DER, and that PKCS#8 DER of an RSA1024 key is about 0x279 bytes, which is suspiciously close to 0x2A0 (for comparison, PEM encoded keys are at least 0x394 bytes and PKCS#1 in DER is almost 1KB because of the extra factors).

Reversing the Firmware

Key derivation With that in mind, there is no way around having to reverse the firmware. The last two entries in binwalk showed two large compressed chunks, which is a good start. I found another hacker has dealt with this kind of compression before and the trick was that the header was non-standard (it lacked a valid uncompressed size field). Googling the CPU gave this wiki which asserts that the architecture is big endian MIPS. There were many references to 0x8000.... in the firmware and nothing to 0x7fff...., so I assumed the load address was 0x80000000. Of course, the load address was incorrect but rather than spending time reversing the bootloader, I instead assumed that the load address was page-aligned (because what sane programmer who isn’t thinking about security wouldn’t) and found a random pointer from the code into the large section of strings and incremented the pointer by 0x1000 until I found a string that started at that address. The load address was 0x80004000.

Thankfully there are enough debug strings to narrow down the search for the decryption routine in the 16MiB firmware. By looking for terms like “decrypt” and “bpi” and “private”, I was able to find a function that prints out ******** Private Key Source is ENCRYPTED. (%d BytesUsed)\n as well as @@@@@@@@@ des3ABC_CBC_decrypt() failed @@@@@@@. Seems pretty promising.

From the debug printf, it’s obvious that some blob of data is passed to a function called des3ABC_CBC_decrypt. I assume this means 3DES EDE with a 3-key config. The input key is 21 bytes which is non-standard. Turns out there’s a simple key derivation process (yay security by obscurity) that involves shuffling the key bytes, and subtracting the index from each byte. Then the 21 byte key, which is 8 groups of 7 bits is transformed into the standard representation of 8 groups of 8 bits where each group has a parity check. I’ve included the reversed code below.

With the correct key, I was able to decrypt the 0x2A0 blob which turned out to be (as I suspected) a DER encoded PKCS#8 RSA private key along with a SHA1 hash to authenticate the encryption.

At last

It was a fun journey, but out of caution, I will not be actually using this modem. Cloning MACs is too much intertwined with stealing internet service and although it is not something I ever intend to do, I do not want there to be any confusions between me, the government, and Big ISP. As a result, it was just a fun exercise. As a word of advice to the reader, many people have been arrested for hacking modems. This site does not condone or promote any illegal activities and this post is presented only for education purposes and is more about reversing hardware then it is about bypassing restrictions.

Other Notes

Here’s some “fun facts” I’ve gathered on my journey.

Some modems have backdoors for your ISP (usually technician and customer service agents) to log in to. The modem I looked at had a SSH server that is not visible on the LAN (your devices <-> your modem) or WAN (your modem <-> Internet) but is visible to the CMTS (your modem <-> your ISP). This is enforced by iptables. They also have a separate username/password to the router gateway page with a weak password that you cannot change. This login works from LAN and WAN as well if you enable remote management.
EAE (early authentication and encryption) is a feature in DOCSIS 3.0 that allows an encrypted connection to be established early on. My ISP (one of the top ISPs in America) has this disabled even for DOCSIS 3.0 routers that support it. The CM config file contains information on your service plan, your upstream/downstream limits, your bandwidth usage, and more. This is sent unencrypted along with the DHCP request to establish an IP address.
Because of the above, it might be possible to perform a MITM attack on neighbors through DHCP.
DOCSIS 3.0 provides the ability to use AES-128 per-session traffic encryption but my ISP (again one of the top ISPs in America so I doubt other ISPs differ in this) chooses instead to use DES (not 3DES by the way) with 56-bit keys (since they still support DOCSIS 2.0). Note that an attack was presented in 2015 by using rainbow tables to make cracking DOCSIS traffic trivial. Reading the service agreement with my ISP, it seems that they concede to this and declared that there is no expectation of privacy with their service. I guess a legal fix is much easier than a technical one.

↧

psvsd: Custom Vita microSD card adapter

August 22, 2017, 12:00 am

≫ Next: Foobar, Blossoms, and Isomorphism

≪ Previous: Modem Cloning for Fun (but NOT for profit!)

One thing I love about Vita hacking is the depth of it. After investing so much time reverse engineering the software and hardware, you think you would run out of things to hack. Each loose end leads to another month long project. This all started in the development of HENkaku Ensō. We wanted an easy way to print debug statements early in boot. UART was a good candidate because the device initialization is very simple and the protocol is standard. The Vita SoC (likely called Kermit internally as we’ll see later on) has seven UART ports. However, it is unlikely they are all hooked up on a retail console. After digging through the kernel code, I found that bbmc.skprx, the 3G modem driver contain references to UART. After a trusty FCC search, it turns out that the Vita’s 3G modem uses a mini-PCIe connector but with a custom pin layout and a custom form factor. The datasheet gives some useful description for each pin, and UART_KERMIT seemed like the most likely candidate (there’s also UART_SYSCON which is connected to the SCEI chip on the bottom of the board, which serves as a system controller and a UART_EXT which is not hooked up on the Vita side). So finding a debug output port was a success, but with the datasheet in front of me, the USB port caught my attention. Wouldn’t it be neat to put in a custom USB device?

On USB in the Vita

A quick aside on the various USB ports found in the different models of the Vita.

The top port on OLED models (commonly referred to as the “mystery port” and incorrectly referred to as a “hidden video out port”) is a USB host. It is unknown if the port is enabled by default or how to enable it.
The bottom port on OLED models (sometimes called the “multiconnector”) supports UDC (USB client) but can also enable USB host support. It is unknown how this switch is controlled, but I’m guessing the syscon is involved and it’s likely USB OTG.
The microUSB port on LCD models have the ID pin connected which implies support for USB OTG or something like it. However, it is unknown how to activate this feature.
There is a USB type A port on PS TV.
There is also a USB to Ethernet chip in the PS TV for the Ethernet port that is connected to Kermit via USB.
The audio codec chip is connected to Kermit via USB for all models.
and of course, the 3G modem on OLED models is connected by USB. On Wifi only models, VDD to the unfilled mini-PCIe pad is missing a bridge. The USB D+/D- signals are also missing a ferrite bead under the adjacent shield. It is unknown if bridging these three locations will enable the USB port on wifi models or if extra work is needed.

Designing a microSD adapter

In order to become more familiar with hardware design as well as understand how USB works on the Vita, I thought it would be fun to create a custom Vita USB device that fits on the modem port. The main reason I chose this port aside from the other USB ports is that it is the easiest to build. It is just a matter of designing and fabricating a PCB, which is simple to do. In comparison, connecting to any of the other USB ports would require creating custom adapters, molding plastic, and dealing with mechanical issues. Creating an adapter for the external ports is also not exactly a usable solution as the Vita is supposed to be portable, and having to dangle a USB port is not something most people are willing to do. In addition, my custom Vita modem card can expose the UART port to work as a console output device (which started this whole project). For this first project, I wanted to build a microSD adapter. Vita memory cards are notoriously expensive, with 32GB cards retailing for $79.99 USD. In comparison, a microSD card with similar performance and capacity goes for $12 USD. Therefore, it would be immensely useful to use microSD cards as a USB storage replacement for the proprietary Vita memory cards.

Choosing parts

SD to USB ICs are pretty cheap and common–you find them in any USB SD adapter. A quick research shows that most cheap adapters use an Alcor or Genesys chip. There is also the MAX14500 series chip from Maxim that is no longer in production and the Microchip USB2244 chip. The documentation for the cheap Asia manufactured chips were lacking so I went with the USB2244 even though it is more expensive (I don’t plan to mass produce it anyways). Microchip provides good documentation in comparison, complete with layout guidelines and a reference design. Unfortunately, I can’t find an Eagle library for the USB2244 so I had to design it myself (using Sparkfun’s tutorial).

Next, I needed an Eagle part for the Vita modem form factor. Luckily, I found a good part for mini-PCIe and was able to modify it to the custom size that Vita uses thanks to the drawing in the datasheet.

Vita 3G modem drawing

Schematic

Next is connecting the parts together. Having no experience whatsoever, I turned again to Sparkfun’s tutorials. Copying the reference design, I came up with a board with the microSD adapter and pin headers for the UART.

Schematics

Layout

I learned board layout again from Sparkfun making sure to follow the design guidelines from Microchip. I also cheated by looking at the layout for the reference board and ensuring that relative distance between objects match from my design. The main challenge is in routing because of the constrained size, but through some creativity, I managed to hook everything up.

Layout front Layout back

Manufacturing

Next step is to produce some prototypes. Thankfully this is extremely easy in this day and age. Pcbshopper allows you to choose your design requirements and it will search across many PCB manufacturers for the best price. The price (plus shipping) is similar across many Chinese manufacturers–about $15 for 10 boards with standard options. The catch is slow lead time and even slower shipping. Throughout the project, I’ve tried EasyEda, SeeedStudio, DirtyPCBs, and PCBway. Below is a mini-review of my experiences with each fab.

I used DirtyPCBs for the breakout adapters. The shipping time is the fastest per dollar (using the cheapest shipping rate, I got the package in two and a half weeks). The board quality was good but a couple of the adapters had the PCIe connector cut improperly and therefore won’t fit the Vita without some sanding. There was no problem with the wiring or drills even though I used the smallest allowed sizes.

I purchased the first three prototypes from SeeedStudios because their website was the easiest to use and the cleanest of everyone on PCBshopper. The cheapest shipping was slow (took almost a month to arrive) and more than half the adapters I received had the PCIe connectors not cut properly. I found no electrical problems.

EasyEDA had the best quality of all the fabs I’ve used. All the cuts were good and the drill holes were very precise and exactly centered. They do not offer cheap shipping and build time was a couple days longer than their estimate of 2-4 days. I also ordered a stencil from them and that came out great as well.

PCBway would be my recommended fab. Although the quality was not as excellent as EasyEDA, it was still better than the other fabs (no issues with the connector). They also do not offer cheap shipping but their build time is a couple day faster than EasyEDA. More importantly, PCBway offers a competitive rate (5x cheaper than SeeedStudios) for PCB assembly and eventually became the fab that produced the final production run for this project.

Prototyping

What’s the most cost effective way to debug the design? Considering how cheap it is to build these boards, it is no surprise that the best way to debug is to build another board. I created a second mini-PCIe based design–this time with a mini-PCIe socket on the card to act as a breakout board. Because the design for the breakout board is simple, the only requirement to verify the board is to do connectivity test on each pin after it arrives. Then I can probe the pads on the breakout port to debug the signals on the main design.

Breakout 1 Breakout 2

Using the breakout board, I can inspect the signals from the 3G modem in anticipation of some sort of custom handshake protocol. Fortunately, there wasn’t such a sequence and the USB port works as-is. When the first boards came back (a month of waiting), I was able to test it by connecting the USB pads on the breakout board to a USB cable and connect the psvsd card to the computer.

Breakout 2

Immediately, I found some errors and fixed it in the design. Having a test plan ready by the time the boards arrived really sped up the process.

Funding

The nice thing about software hacking as a hobby is that it costs nothing but time. But for this hardware hack, I have spent a little over $100 on this project in parts, supplies, and boards. That’s less than buying two video games, so I have no qualms about the cost, but considering the interest the community showed, I think it would be more than fair to spread the cost across everyone who is interested. My idea is this: I will make a limited production of 100 boards (no more because I will be shipping the packages myself and it’s fairly laborious). These boards will be sold at cost and an extra $1 will be added to cover my expenses. I have heard many horror stories of crowd funding gone wrong, so I took many steps to ensure that this will be a success.

First, I made a spreadsheet covering all the costs: supplies, boards, shipping materials, platform fees, etc. Then I added a $100 buffer for any extraneous expenses (another prototype run, for example). Next, I made sure to be very clear upfront about what contributors are paying for: the supplies for me to develop this project. Because undoubtedly, manufacturing 100 boards at such a low cost will not have a perfect yield, I know a small number of these boards will have defects. I don’t have the time or money to deal with customer service for these issues, so part of the low price of the boards is that each contributor takes some amount of risk that their board is defective. Finally, I set a fixed goal so I do not receive the money until after 60 days. I am spending my own money in the meantime. My hope is that after 60 days, I’ll either complete the project and use the unlocked money to reimburse myself and fund the limited production. Or, I’ll run into some major unresolvable issue, in which I will refund everyone and just lose the ~$200 I spent so far. However, after a month of steady progress I felt confident enough to take 400 more orders for a total of 500. Then after getting lots of good samples from the fab I also felt it was fine to test and ensure that every adapter works before shipping it out.

The feedback was tremendous, and the funding goal was met in a day after it was posted. That gives me enough confidence and motivation to continue the project and ensure it is a success.

Software

Fortunately, the driver is pretty easy to create. The Vita already has drivers for USB storage (it’s used on PS TV safe mode for reinstalling firmware), but is normally disabled. A simple patch running on HENkaku Ensō enables it at boot and using The_FloW’s patches for mounting USB storage as a memory card, it all pretty much just works.

Testing

Next is an important part that I feel many ambitious project leaders skip–which is testing. I want some real-world usage data and more importantly, I want to know what the battery impact of my design is. This was the first hurdle I ran into. Initial results showed that the battery life lasted an hour less with psvsd installed when idle. Worse, the battery was consumed even when powered off (not lasting overnight). This is unacceptable for daily use. I took one of my breakout boards and re-purposed it to act as a current measurement harness by cutting the trace to the power input and attaching each end to an ammeter.

Power Measurement

Then, I was able to measure the exact current consumption during various usage cases (read, write, idle, etc). Below is a video of some of these tests.

After testing the power usage of a couple of different USB devices and asking around on hardware forums, I found out the problem was two-fold. First, when the Vita is powered off, it does not power off the USB voltage line, but it does pull both USB data lines low. Unfortunately, this leaves the USB device in “reset” mode instead of “low power suspend” mode. Likely this wasn’t an issue for the 3G modem because it was a custom design meant only to pair with the Vita and has a separate power management IC that is smarter than just looking at the USB data lines. The second problem is that the USB2244 is a power hog of a chip. It draws an average and minimum of 100mA when not in “low power suspend” (which the Vita does not support) even if there is no activity on the SD card.

As a result, I had no choice but to go for an “cheap Asia manufactured chips” even though there was less documentation and support. Luckily I found some datasheets and reference schematics for GL823 online and was able to buy a couple of them to play with. I discovered that cheaper doesn’t always means lesser quality. Not only did the GL823 consume less power (only 30mA average and 1.5mA in “reset” mode) but it also outperformed the USB2244 in read and write speeds as well! Even better, the GL823 does not require an external crystal so I can remove some of the area footprint as well. I really should have chosen this chip to start with.

I also purchased a dedicated USB power tester at this point so I was able to get quick data measurements.

Power Measurement 2

Power Measurement 3

Because the extra hardware must be powered somehow, some dip in battery life is expected, but in the final design this dip is not noticeable at all.

What’s next?

In the end, I made five prototypes and a breakout adapter. Here’s a family photo along with the final product to the right.

Family

Family 2

Thanks to everyone who contributed to this project! There are more detailed posts (with more pictures) for each step in the process on the Indiegogo page for those who are interested. You can find the design on psvsd.henkaku.xyz. Since the design is open source and free for commercial use, I think someone will manufacture, sell, and support it. Here’s another free idea: buy a large number of < 3.60 firmware 3G motherboards (they are around $15-25 a piece on Aliexpress) and screws (M1.6x4mm flat-head no countersink) and bundle them together with a psvsd adapter and a microSD card to form a Vita hacking starter kit.

I don’t plan to mass produce this myself but I do have at most 50 extra units due to canceled orders and extra parts. As a result, I’ve decided to auction them off to those who most want one up until September 2017. You can find more information about that here.

↧

Foobar, Blossoms, and Isomorphism

September 13, 2017, 12:00 am

≫ Next: HENkaku Ensō bootloader hack for Vita

≪ Previous: psvsd: Custom Vita microSD card adapter

A friend recently invited me to participate in Foobar, Google’s recruiting tool that lets you solve interesting (and sometimes not-so-interesting) programming problems. This particular problem, titled “Distract the Guards” was very fun to solve but I found no good write-ups about it online! Solutions exist but it is rather hard to understand how the author came upon the solution. I thought I might take a shot and go into detail into how I approached it–as well as give proofs of correctness as needed.

Disclaimer: If you are participating in Foobar (hello googler) or have aspirations to do so in the future, please stop here in the spirit of the challenge. It’s well known that Google has a finite pool of problems so you will miss out if you just read the solution.

To begin, here is the problem statement:

Distract the Guards
===================

The time for the mass escape has come, and you need to distract the guards so
that the bunny prisoners can make it out! Unfortunately for you, they're
watching the bunnies closely. Fortunately, this means they haven't realized yet
that the space station is about to explode due to the destruction of the
LAMBCHOP doomsday device. Also fortunately, all that time you spent working as
first a minion and then a henchman means that you know the guards are fond of
bananas. And gambling. And thumb wrestling.

The guards, being bored, readily accept your suggestion to play the Banana
Games.

You will set up simultaneous thumb wrestling matches. In each match, two guards
will pair off to thumb wrestle. The guard with fewer bananas will bet all their
bananas, and the other guard will match the bet. The winner will receive all of
the bet bananas. You don't pair off guards with the same number of bananas (you
will see why, shortly). You know enough guard psychology to know that the one
who has more bananas always gets over-confident and loses. Once a match begins,
the pair of guards will continue to thumb wrestle and exchange bananas, until
both of them have the same number of bananas. Once that happens, both of them
will lose interest and go back to guarding the prisoners, and you don't want
THAT to happen!

For example, if the two guards that were paired started with 3 and 5 bananas,
after the first round of thumb wrestling they will have 6 and 2 (the one with 3
bananas wins and gets 3 bananas from the loser). After the second round, they
will have 4 and 4 (the one with 6 bananas loses 2 bananas). At that point they
stop and get back to guarding.

How is all this useful to distract the guards? Notice that if the guards had
started with 1 and 4 bananas, then they keep thumb wrestling! 1, 4 -> 2, 3 -> 4,
1 -> 3, 2 -> 1, 4 and so on.

Now your plan is clear. You must pair up the guards in such a way that the
maximum number of guards go into an infinite thumb wrestling loop!

Write a function answer(banana_list) which, given a list of positive integers
depicting the amount of bananas the each guard starts with, returns the fewest
possible number of guards that will be left to watch the prisoners. Element i of
the list will be the number of bananas that guard i (counting from 0) starts
with.

The number of guards will be at least 1 and not more than 100, and the number of
bananas each guard starts with will be a positive integer no more than
1073741823 (i.e. 2^30 -1). Some of them stockpile a LOT of bananas.

Languages
=========

To provide a Python solution, edit solution.py
To provide a Java solution, edit solution.java

Test cases
==========

Inputs:
    (int list) banana_list = [1, 1]
Output:
    (int) 2

Inputs:
    (int list) banana_list = [1, 7, 3, 21, 13, 19]
Output:
    (int) 0

Now I love a good story and I love a challenging problem but the two fit together like chocolate and eggplant parmesan but I digress. If you parse through the bananas and thumb wrestling, it is easy to see that this is a combinatorics problem. The first thing to do is to break the large problem into some smaller ones that can be pieced together. Here we see that a key piece is figuring out, for any two given guards, if they will go into an infinite loop or not. Once we figure that out, the second part is to find which guards can be paired into infinite loops such that a maximum number of guards end up in infinite loops. Let’s solve the second part first.

Maximum Matching

Assume we have a predicate for two guards, each with bananas and bananas that returns true if the pair will loop. Can we then pair up all the guards optimally so we have the most number of infinite loops? Note once we have this, the answer will be simple: just return the total number of guards minus the number of guards that are paired up into infinite loops.

What if we just brute-force and try to find every possible pairing? We take one guard and try to pair her with another guard and if they don’t loop, we try pairing her with a different guard. This will find us a solution but how do we find a maximum one where the most number of guards are paired off? Well, we can then try to find every possible set of pairings. How long will that take? Let’s say there are guards. Then it will take time to find one set of pairings. To find every possible pairing, notice that once we pair off two guards, those two guards cannot be used to pair with anyone else. So for every pairing in every solution set of pairings, we can remove that particular pair, reassign the remaining pairings, and be left with another potential solution. This means the whole process could take time to process! Clearly infeasible.

At this point we should take a step back and approach this another way. Instead of trying to find an algorithm to solve this specific problem, we should try to cast it into an existing problem. To do so, we need to find a structure that can hold the problem together. The word “graph” should be screaming at you right now and indeed this looks perfect for a graph: we have a set of guards (nodes) where any two guards are related (edge) by . Let’s draw out a graph for the second test case.

Graph

Here we labeled each node (guard) by the number of bananas they start with. We draw an edge between two guards if is true between them. What does it mean to have a set of pairings? If the pairings are a set of edges, that means each node can have at most one edge in the pairing. Here is an example of a set of pairings.

Graph 2

Notice that the guard with 13 bananas and the guard with 19 bananas are not paired with anyone. We cannot select an edge for either of them because doing so means that one of the already colored nodes will have two edges in the solution set, which is not allowed. However, we can find a better set of pairings.

Graph 3

Now every guard is paired up and therefore we know the fewest number of guards that won’t infinite loop is zero. This is a simple example where we can find the solution visually but what if there are 100 guards? What if the solution is greater than zero? How will we know when we reached the minimum and there is no better set of pairings? Most importantly, is it even possible to solve this problem in sub-exponential time (otherwise our solution will be infeasible and we get the dreaded execution time out error)? Turns out these exact questions have been asked by computer scientists for many decades. It is a problem in graph theory called perfect matching, which can be reduced to a closely related problem of maximum matching. Formally, a maximum matching can be defined thus: given a graph , find a largest set where for each , there is at most one such that . Note I say “a largest set” because there can be multiple sets of equal cardinality that is maximum.

In the 1960s, Jack Edmonds lit the algorithms world on fire by finding a polynomial time (specifically ) algorithm to solve perfect matching for any graph. His “blossom algorithm” as it came to be called is not a simple one and I won’t attempt to explain it here. If you want to know more about how it works, it’s presented at an undergraduate level by Professor Roughgarden in these notes. The upshot is that we can apply this algorithm directly to our graph to get the maximum matching. A quick Google search for a Python implementation turns up this page.

Now all that’s left is to define .

Loop Detection

Our intuitive approach will be dead simple: let’s just simulate the game until either it ends or we detect a loop. How will we detect a loop? We could keep a list of “seen counts of bananas” and after each round we check to see if the current counts has previously been seen. If so, we know we are in a loop because the same sequence of banana counts will proceed. Otherwise, at some point we will see both players end up with the same number of bananas. How well does this perform? If is the total number of bananas “in play” (the sum of the two players’ banana at the start), then we see that the most number of turns would be turns because after turns, you would have to either see both players have the same count or see every single count of bananas and therefore must repeat one such count. But could be as large as so this will not do. It’s sub-linear or bust!

We wish to find a formula (predicate) for predicting the outcome of the game without playing it. To start, let’s just write down a couple of examples and try to find patterns. Below, each line is a round of the banana thumb wrestling game where and are the number of bananas currently in each player’s possession. I’ll list a couple of games below, both with and without loops.

(3,5)
(6,2)
(4,4)

(5,7)
(10,2)
(8,4)
(4,8)
(8,4)
...

(1,4)
(2,3)
(4,1)
(3,2)
(1,4)
...

(3,13)
(6,10)
(12,4)
(8,8)

You can smell the hint of a pattern although it may not be obvious yet. Let’s try to suss out the scent. We know there is some periodic structure (groups, you say?) but how do we go from one line to the next without following the complex rule? Is there an easier way to generate this sequence? Well if at first you don’t succeed, try and change domains. Notice a key fact: the sum of the bananas in each round is always the same. This may be obvious considering no bananas are created or destroyed in each round–let’s call it the Law of Conservation of Bananas. With that in mind, let’s work in where . Note that when working with numbers modulo , negative numbers are the same as .

(3,-3) % 8
(6,-6) % 8
(4,4)  % 8

(5,-5) % 12
(-2,2) % 12
(-4,4) % 12
(4,-4) % 12
(-4,4) % 12
...

(1,-1) % 5
(2,-2) % 5
(4,-4) % 5
(-2,2) % 5
(1,-1) % 5
...

(3,-3) % 16
(6,-6) % 16
(-4,4) % 16
(8,8)  % 16

Do you see it? We notice two facts. First, by how we defined , we have . This is a given. The more important fact is that we can see that each round is exactly two times the previous round . This seems like an important fact but it doesn’t appear to give us an answer immediately. We also made a lot of assumptions that seems to be unstable and although we might have found a pattern–it might also be a red herring. I am a strong proponent of what I call the 3-examples rule which is: if something works for three random examples you make up, it probably works for all integers. QED. However, until the mathematics community accepts my rule as law, we unfortunately must do things the old fashioned way.

My first tool of choice, as always, is group theory because it’s easy but sounds hard so that maximizes the show-off factor. Let’s formalize this game into a group whose elements can be generated by the group operator. We will see later that the advantage of this is that we can dangle from the shoulder of giants and not have to prove anything major. Lets define group on with elements and the operator which we will now construct.

In constructing we note that the only elements we care about how the operator works for is (from here on, we drop the when obvious for brevity). We want to say something like “if we apply to for times, then we get element which is the result of playing rounds of the banana thumb wrestling game”. We do not care (for now) what does to other element pairs. Let’s formalize this

Note we start with the definition of the game and apply the Law of Conservation of Bananas to remove the dependency. Then we apply the modulus and simplify to get our final form of . Note this shouldn’t be surprising given our initial intuition. Now comes the point again where we want to turn our problem into something more familiar. It’s easy to see that is a valid group but we want to cast the subgroup generated by to be isomorphic to something well known (like the additive group ). Why? So we can do cool complex stuff like multiplication and division without worrying about all the pesky details like “is this a ring?”. With that in mind, lets complete the definition of to be . You are now convinced that this general definition is consistent with what we have for our special case above.

So it turns out our is just , big deal right? Well turns out this is exactly the additive group , but that is not too important. What’s important is that our subgroup is isomorphic to the additive group . I won’t give the proof here because there is little substance but notice that since by construction. That means we drop the dependency. Again, this should all feel redundant because we got to this point from our intuition which gave strong indication that this is correct.

Going back to the game, we see that at round , we can find by taking . Now we can define the predicate

If this was math class then we would be done. But as programmers, we don’t care about the existence of solutions–we want the damn solution! So how do we get something closed form? With a little manipulation the above turns to

Where is just with all factors of 2 removed (for example if then ). (If you have not seen before, it means “does not divide”). Immediately follows is an algorithm for :

defwillLoop(x,y):n=x+yn_tilde=nwhilen_tilde%2==0:n_tilde=n_tilde/2return(x%n_tilde)!=0

It is easy to see that the only work in this algorithm is dividing and that happens at most times so this runs in time. In fact, for all intents and purposes, this is really time since the while-loop is just trimming out the leading 0 bits of the binary representation of . However, don’t say that to a computer scientist unless you want to be hit on the head with a word-ram-model (pretty heavy).

Appendix A

Here is the full solution without the Python implementation of Edmonds’ blossom algorithm

defwillLoop(x,y):n=x+yn_tilde=nwhilen_tilde%2==0:n_tilde=n_tilde/2return(x%n_tilde)!=0defbananaGraph(banana_list):G={i:[]foriinrange(len(banana_list))}fori,ainenumerate(banana_list):forj,binenumerate(banana_list):ifi!=jandwillLoop(a,b):G[i].append(j)returnGdefanswer(banana_list):G=bananaGraph(banana_list)matches=matching(G)returnlen(banana_list)-len(matches)printanswer([1,1])printanswer([1,7,3,21,13,19])printanswer([1])printanswer([1,7,1,1])

↧