iHack 2020: Monster Inc The Middle (RDP Network Forensics) Writeup

10 min readJun 22, 2020

For this year’s iHack CTF, I designed a track with 4 flags in it:

Yesterday, I successfully got into the Monster Inc. network. I performed a Monster In The Middle (MITM) attack against Mike Wasowski. He connected to his machine remotely using RDP and I was able to capture two sessions using an open source tool. However, before I could do anything more, the blue team found my foothold in the network and kicked me out. I really want to find juicy stuff about Monster Inc. and I need help! I attached the packet capture files and the secrets needed to decrypt the traffic.
Link to download the challenge: https://drive.google.com/file/d/1ZrYnGKeReED4xVIdojLBer45YzgfwUsh/view?usp=sharing
Note: if you use the intended way to solve this track, you might want to use the various-fixes branch.

Preparing to solve

If you have not used Wireshark a lot, you might want to look at the “introduction” section of my writeup from last year, where I give some tips to use Wireshark effectively to analyze pcaps. You should at the very least add the destination port as a column.

The archive contains three files:

ssl.log contains 4 lines that look like this:

CLIENT_RANDOM 8a3d4bac7c0777144ae9011783d5259ce7cb784bd346ccefb06afee5954f65b7 bb7c21c675d3a0f7b5b34e48867fe4a15b50456a7c0ef09e6900077f19905cb3f8c25fb267f9ef868f21c8c72ba0a904

These are TLS master secrets that can be used by Wireshark to decrypt a TLS connection.

In Wireshark, go to Edit->Preferences->Protocols->TLS

Then, in (Pre)-Master-Secret log filename, click “Browse” and choose ssl.log. Wireshark will then find the associated TLS connection and decrypt it.

When looking at the decrypted traffic, it might be confusing at first. Let’s clear things up a bit. Looking at the “destination port” column, we see that traffic originates from both ports 3389 (RDP port) and 3390. There are two simultaneous connections. To see which one a packet belongs to, right click on any packet->Colorize Conversation->TCP->Choose a color (I chose orange).

Much better (note that I ignored the first connection because it was short and contained little information). looking at the “orange” connection, we only see TLS 1.2 traffic, whereas the other connection shows dissected RDP traffic. To fix this, right click on an orange packet->Decode as and enter this information:

Note that TPKT is used because it is the first protocol used by RDP during a connection to separate different messages.

We now have decrypted traffic and are ready to go!

Flag 1 (Monitor Size)

Part 1: What are the dimensions (in mm) of Mike Wasowski’s main monitor? Use the second packet capture file to solve this challenge.
Flag format: HF-{monitorWidth_monitorHeight}

RDP has a pretty complex connection handshake, and it can be hard to understand, so I’ll share a method to get to the flag without having to understand too much about it.

By Googling “rdp spec”, the first result is the MSDN link to MS-RDPBCGR which stands for “Remote Desktop Protocol: Basic Connectivity and Graphics Remoting”. This contains most of the handshake information. Then by Googling “MS-RDPBCGR monitor size”, the two first link seem interesting.

The first contains information about the placement and resolution of monitors in the RDP connection. The second one contains additionnal information about the monitors. More specifically, it contains a monitorAttributesArray structure:

This structure contains exactly what we’re looking for:

The question now is where can we find this structure in the pcap? One method to find out is by looking at the hierarchy of structures in the MSDN navigation:

We can see that the monitorAttributesArray is in the ClientMonitorExtendedData (we knew that) which is in the Client MCS Connect Initial. Now, there are cleaner ways to do this, but this is where I feel we have enough information to look into the decrypted traffic to see what we have. If we look at the first rdp packet in the trace by using the rdp filter, we see the clientData packet. By looking at the numerous protocol layers, we see Multipoint-comminication-service, MCS, which confirms that it is the connect-initial packet:

Bingo! Then, simply develop the rdp dissector and we see clientMonitorExData

The dissector sadly stops at monitorCount and does not parse the monitorAttributesArray structure. However, it is pretty easy to parse manually. If we look back to this structures, it says that the first 4 bytes are the physical width (in mm) and the 4 bytes after are the physical height (in mm).

Simply take the bytes 56 02 00 00 and 50 01 00 00 and interpret them as a little-endian-encoded integer and we have our dimensions! I used CyberChef to do that.

Flag is HF-{598_336}.

Flag 2 (Corporate Secret)

What is the big corporate secret???
Note: do not use the --secrets option, its broken.

For this one, what we wanted to do was replay the sessions to see what happened. The description mentions that this was captured using an open source tool. There are not many RDP Man In The Middle (MITM). Googling “RDP MITM” or something like that should get you straight to PyRDP’s code.

Disclaimer: I’m an author and contributor to PyRDP.

PyRDP is a tool to perform MITM between a client and a server. It has a lot of interesting feature, but the one we’re mostly interested about can be found by reading the README:

pyrdp-convert is a helper script that performs several useful conversions. […]
The following conversions are supported:
- Network Capture (PCAP) to PyRDP replay file
- Network Capture to MP4 video file
- Replay file to MP4 video file

The documentation specifies the required steps to convert a pcap to a replay file.

We first need to extract the decrypted PDUs to another PCAP. In Wireshark, File->Export PDUs to file->Choose “OSI Layer 7”, then save as exported(1|2).pcap. Do this for both pcaps.

Then, convert the new pcap to a .pyrdp file:

python pyrdp-convert.py — src 192.168.110.1 exported2.pcap

Open the file of the second pcap using pyrdp-player.py, and you see the flag!

Flag is HF-{4aa33f61d71adfda}

Flag 3 (Todo list)

Mike Wasowski is a busy monster

The third flag could be found in the first replay. We see Mike Wasowski opening todo.txt which contains three elements, two of them which were not relevant to the challenge, and the third named “try ecoji”. After writing “done” to the first two elements, our favorite green cyclope copies something on his host machine. PyRDP then shows this data on the player:

Googling “ecoji” should get you to its GitHub repo. Its a program that encodes bytes to emojis (like base64 but good looking and way less efficient). Its usage is pretty straightforward:

Flag is HF-{wowEmojisAreSoCoolBase64BTFO}

Flag 4 (Motivational Song)

What is the name of Mike Wasowski’s motivationnal song?
Flag format: HF-{songNameCamelCase}
Example: If the song was “never gonna give you up”, the flag would be
HF-{neverGonnaGiveYouUp}

When looking at the first replay, we see Mike Wasowski playing motivation_song.wav .

RDP has many “extensions” to the aforementioned MS-RDPBCGR . These extensions describe how different RDP features such as clipboard/drive redirection, RDP over UDP, GFX, etc. work and should be implemented. Audio redirection works the same and is described in [MS-RDPEA]: Remote Desktop Protocol: Audio Output Virtual Channel Extension.

Note: The RDP protocol provides more than one transport and more than one audio encodings to deliver audio between server and client. For simplicity, these different ways will be ignored for this writeup.

As described in the “relationship to other protocols” section,

The […] Extension is embedded in a static virtual channel transport, as specified in [MS-RDPBCGR] section 1.3.3

Virtual channels are a way for RDP clients and servers to split different types of communications (clipboard, audio, etc) to allow for a modular implementation.

We learned that we need to find the virtual channel that sends audio. We could try to read a lot of doc and understand how to find the right channel the “clean” way, but it is a CTF and time matters, so let’s go with the CTF way to solve this.

Since sending audio in real time requires little buffers, surely we can see a spike in traffic when the audio is read from a specific virtual channel! Fortunately for us, the RDP/MCS dissector in Wireshark correctly dissects the channelId!

Right click on channelId ->Apply as Column. Now for every packet, we see which channelID it belongs to!

In the replay, Mike Wasowski starts playing audio about 60 seconds in. If we go to about 60 seconds in the pcap (by looking at the first column), one virtual channel seems to send a lot of packets:

1005, bingo! We can now filter by right clicking 1005->Apply as Filter->Selected and we will only have audio data (if it is not already done you should also filter by either the source or destination address like this ip.addr == 192.168.110.133)!

Now, we need to know what kind of audio was sent to the client. In MS-RDPEA, there is an “initialization” section which contains what we’re looking for!

It says:

The Client Audio Formats and Version PDU is a PDU that is used to send version information, capabilities, and a list of supported audio formats from the client to the server.<5> After the server sends its version and a list of supported audio formats to the client, the client sends back a Client Audio Formats and Version PDU to the server containing its version and a list of formats that both the client and server support.

This is the first packet send by both the client and the server, so it should be pretty easy to find:

Pretty easy to find. Now, we need to parse the data. The PDU page describes the PDU structure:

What we want is the sndFormats structure, so we can safely ignore the first 24 bytes. The AUDIO_FORMAT structure is described here.

To parse the structure, I wrote a small Python script and copied the PDU info from Wireshark:

And we see only one output, meaning the client only supports one audio encoding (which will simplify our task):

To understand wFormatTag == 1, we need to go to RFC2361, linked in the structure description. The format 0x01 is WAVE_FORMAT_PCM . PCM (Pulse-Code Modulation) is the simplest format to store and read audio, where each sound sample is appended one after the other. This is also known as “raw” audio.

To read the audio, we need to extract its data from Wireshark and read it into Audacity, which can read raw audio easily.

To export packet data from pcaps, I like to use TShark, the CLI for Wireshark. It usually comes with Wireshark. This command exports the audio data as a hexstream (one per line)

tshark -r exported1.pcap -Y “ip.src == 192.168.110.133 && t124.channelId == 1005” -T fields -e “rdp.virtualChannelData” > audio.hex

We then need to transform the hexstream to actual bytes. I used Cyberchef for that, but a lot of CLI tools can do that just fine as well!