Don't Sanitize Inputs; Encode Outputs

November 29, 2019

There is plenty of advice on the Internet about sanitizing inputs.

Example

Let’s take a simple example: you built a website that allows users to leave comments. The comments are stored in a MySQL database like so:

<?php
$comment = $_POST["comment"];

$sql = "INSERT INTO Comments (comment) VALUES ('$comment')";
$conn->exec($sql);
?>

The problem with this code is that users can inject code into the SQL statement.

// remove any brackets
$comment = str_replace(array( '<', '>' ), '', $comment);

$sql = "INSERT INTO comments (name, body) VALUES (?,?)";
$stmt = mysqli_prepare($sql);
$stmt->bind_param("sss", $_POST['name'], $comment);
$stmt->execute();
?>

<p><?php echo $comment.body ?></p>

Some mischevious users leave comments like <script>alert()</script> which causes anyone who visits the page to see a dialog box. You decide to fix this by removing angle brackets from comments.

<?php
$comment = $_POST["comment"];

// remove any brackets
$comment = str_replace(array( '<', '>' ), '', $comment);

$sql = "INSERT INTO comments (name, body) VALUES (?,?)";
$stmt = mysqli_prepare($sql);
$stmt->bind_param("sss", $_POST['name'], $comment);
$stmt->execute();
?>

Problem solved right?

Perfect! A few months later, you decide to switch your frontend to Angular. Whoops, looks like the fix is no longer effective.

Conclusion

To effectively prevent code injection by sanitizing inputs, the engineer must account for all possible ways data could be intepreted as code downstream. This is effectively proving a negative.

Sanitizing inputs is ineffective at stopping code injection because it requires upfront accounting for all ways data could be executed as code.

A more effective strategy is to validate input (make sure data is in the expected format) and encode output (treating input as data, not as code).