Protecting against Cross Site Scripting
One of the most important defenses against cross site scripting is encoding the output. The .Net framework provides built in routines for you. These methods, HTMLEncode and HTMLAttributeEncode, can be found in the HTTPUtility class. It is very easy to implement these methods and they should be used on all output that is un-trusted (ie, from the database, or user input).
It is simple to implement these methods. When you are sending output to the browser, wrap the output in one of these method calls. Here are a few examples:
- lblText.Text = HttpUtility.HtmlEncode(Request.Form[“fName”]);
- StringBuilder sb = new StringBuilder();
- sb.Append(“<img src=\”someimage.jpg\” alt=\””);
- sb.Append(HttpUtility.HtmlAttributeEncode(myVariable));
- sb.Append(“\” />”);
- Response.Write(sb.ToString());
In the first line, the HTMLEncode method is used to encode a value from the QueryString before it goes to the client. it is important to note that some server controls will automatically encode their output. For example, A textbox will encode its output by default. A label control, however, will not.
The rest of the code snippet shows encoding an HTML attribute value. This is often overlooked by developers, not realizing how this can be an attack vector. It is possible for malicious data to escape out of the attribute tag and start creating its own tags or events. I have an example of this on my post about IE8 XSS Protection.
One might wonder why there are two separate methods, instead of just one. The reason for this is that each case requires a different set of characters to be encoded. The HTMLEncode will not encode characters like the quote or the double quote whereas the HTMLAttributeEncode will encode those values.
Microsoft has created the Anti-Cross Site Scripting Library (currently Version 3.1) which provides these same methods. The key difference between Microsoft’s implementation and the .Net built in routines is how they determine what gets encoded. The .Net framework uses a technique called black-listing to make this decision. This means that there is an internal list of all invalid characters that need to be encoded. The problem with this technique is that the list can be very large and new characters may be recognized later as needing to be encoded. Microsofts Anti-XSS library take an approach called white-listing. White-listing has an internal list of all valid characters and then encodes all of the rest. The advantage to this approach is that the list is smaller and shouldn’t need to get updated because the list of valid characters will probably not change.
You can download Microsoft’s Anti-Cross Site scripting Library from http://www.microsoft.com/downloads/details.aspx?FamilyId=051ee83c-5ccf-48ed-8463-02f56a6bfc09&displaylang=en.
Both of these approaches aim to reach the same goal, to encode output and protect against cross site scripting. It is important that developers start implementing one of these methods to help protect their applications from malicious activity. This technique may not completely stop cross site scripting, but it will add another layer of defense.