If cybersecurity has taught us anything, it’s that there will always be flaws, hacks, and vulnerabilities in systems. Cybersecurity, as a career field, is constantly evolving; thus, with the emergence of new technologies and hacking techniques, one must continuously update their knowledge.
In the previous article on secure coding, we discussed input validation. In this article, we aim to explore output encoding as another way to enhance the security of your code. But what’s the difference between these two methods?
Input Validation is the process of checking if the data provided by users or other sources matches our expected format, type, and range. On the other hand, output encoding is the process of transforming the data being sent to the browser or other destinations into a secure, consistent format. In short, input validation says, “I only accept data that is clearly data,” while output encoding says, “I only transmit data and code that can be distinguished from each other.” The goal of both methods is to prevent malicious data from compromising the functionality, security, or usability of your application, ensuring that harmful data doesn’t make its way into your system’s structure.
The primary concern here is the ability to distinguish between “code” and “data.” This is typically done by developers through a “code contract”, which specifies which parts of the data are code and which parts are plain data. There are various ways to achieve this, and just as it depends on how you communicate with your colleagues, it also depends on the libraries/programming languages/platforms you’re using.
A simple example is sending an email using the Simple Mail Transfer Protocol (SMTP). Assuming you are the sender, after connecting and preparing to send the email, you issue the commands “RCPT TO [email protected]” and “MAIL FROM [email protected]” to specify who the sender and recipient are, followed by the “DATA” command. After the “DATA” command, as expected, the message content arrives line by line. (For simplicity, we’ll ignore the headers in the email, although they are part of the process).
But where do the data end? The end of the data is indicated by a line containing just a single period. Now, imagine an attacker sends an email with a period in its own line, followed by malicious SMTP commands. Essentially, the email server would be asked to execute these commands. Or perhaps someone accidentally enters a lone period, triggering random commands for execution.
The solution that SMTP designers came up with is to prepend an additional period when sending a line that begins with a period. Thus, a single period becomes two, two periods become three, and so on. This way, the server knows that this is data and not a command to execute. This method, known as dot-stuffing, is one of the output encoding techniques used in the SMTP protocol and applied in the email header.
This is perhaps the simplest example of output encoding.
The following is an example of dot-stuffing in PHP:
$message = “This is a message.\n.It starts with a dot.”;$dot_stuffed_message = str_replace(“\n.”, “\n..”, $message);echo $dot_stuffed_message;
Output:
This is a message…It starts with a dot.
Another example of output encoding is calling HtmlEncode (or a similar function for your framework) on data that you know shouldn’t be executed as HTML. If there is no HTML code within the data, the HtmlEncode function doesn’t alter it. Otherwise, by calling this simple function, you can prevent XSS (Cross-Site Scripting) and HTML injection attacks.
The following is an example of using HtmlEncode in PHP:
$string = “This is some text that needs to be encoded & < >.”;$encoded_string = htmlspecialchars($string);echo $encoded_string;
Output:
This is some text that needs to be encoded & < >.
Managing the flow of data in a system requires both input validation and output encoding. Input validation is cheap, easy, and understandable by everyone. It also allows you to quickly eliminate bad data before the server wastes resources on it. However, this solution doesn’t cover all potential risks and shouldn’t be used as the ultimate defense. Therefore, despite being more complex and harder to implement, output encoding must also be employed. This ensures that data that may look like code after input validation doesn’t get transferred as code to the next layer.
In short, some of the techniques mentioned by OWASP include: