JavaScript in an HREF or SRC Attribute
The anchor (<a>) HTML tag is commonly used to provide a clickable link for a user to navigate to another page. Did you know it is also possible to set the HREF attribute to execute JavaScript. A common technique is to use the onclick event of the anchor tab to execute a JavaScript method when the user clicks the link. However, to stop the browser from actually redirecting the HREF can be set to javascript:void(0);. This cancels the HREF functionality and allows the JavaScript from the onclick to execute as expected.
In the above example, notice that the HREF is set with a value starting with “javascript:”. This identifier tells the browser to execute the code following that prefix. For those that are security savvy, you might be thinking about cross-site scripting when you hear about executing JavaScript within the browser. For those of you that are new to security, cross-site scripting refers to the ability for an attacker to execute unintended JavaScript in the context of your application (https://www.owasp.org/index.php/Cross-site_Scripting_(XSS)).
I want to walk through a simple scenario of where this could be abused. In this scenario, the application will attempt to track the page the user came from to set up where the Cancel button will redirect to. Imagine you have a list page that allows you to view details of a specific item. When you click the item it takes you to that item page and passes a BackUrl in the query string. So the link may look like:
https://jardinesoftware.com/item.php?backUrl=/items.php
On the page, there is a hyperlink created that sets the HREF to the backUrl property, like below:
<a href=”<?php echo $_GET[“backUrl”];?>”>Back</a>
When the page executes as expected you should get an output like this:
<a href=”/items.php”>Back</a>
There is a big problem though. The application is not performing any type of output encoding to protect against cross-site scripting. If we instead pass in backUrl=”%20onclick=”alert(10); we will get the following output:
<a href=”” onclick=”alert(10);“>Back</a>
In the instance above, we have successfully inserted the onclick event by breaking out of the HREF attribute. The bold section identifies the malicious string we added. When this link is clicked it will prompt an alert box with the number 10.
To remedy this, we could (or typically) use output encoding to block the escape from the HREF attribute. For example, if we can escape the double quotes (” -> " then we cannot get out of the HREF attribute. We can do this (in PHP as an example) using htmlentities() like this:
<a href=”<?php echo htmlentities($_GET[“backUrl”],ENT_QUOTES);?>”>Back</a>
When the value is rendered the quotes will be escapes like the following:
<a href=”" onclick=&"alert(10);“>Back</a>
Notice in this example, the HREF actually has the entire input (in bold), rather than an onclick event actually being added. When the user clicks the link it will try to go to https://www.developsec.com/” onclick=”alert(10); rather than execute the JavaScript.
But Wait… JavaScript
It looks like we have solved the XSS problem, but there is a piece still missing. Remember at the beginning of the post how we mentioned the HREF supports the javascript: prefix? That will allow us to bypass the current encodings we have performed. This is because with using the javascript: prefix, we are not trying to break out of the HREF attribute. We don’t need to break out of the double quotes to create another attribute. This time we will set backUrl=javascript:alert(11); and we can see how it looks in the response:
<a href=”javascript:alert(11);“>Back</a>
When the user clicks on the link, the alert will trigger and display on the page. We have successfully bypassed the XSS protection initially put in place.
Mitigating the Issue
There are a few steps we can take to mitigate this issue. Each has its pros and many can be used in conjunction with each other. Pick the options that work best for your environment.
- URL Encoding – Since the HREF is meant to be a URL, you could perform URL encoding. URL encoding will render the javascript benign in the above instances because the colon (:) will get encoded. You should be using URL encoding for URLs anyway, right?
- Implement Content Security Policy (CSP) – CSP can help limit the ability for inline scripts to be executed. In this case, it is an inline script so something as simple as ‘Content-Security-Policy:default-src ‘self’ could be sufficient. Of course, implementing CSP requires research and great care to get it right for your application.
- Validate the URL – It is a good idea to validate that the URL used is well formed and pointing to a relative path. If the system is unable to parse the URL then it should not be used and a default back URL can be substituted.
- URL White Listing – Creating a white list of valid URLs for the back link can be effective at limiting what input is used by the end user. This can cut down on the values that are actually returned blocking any malicious scripts.
- Remove javascript: – This really isn’t recommended as different encodings can make it difficult to effectively remove the string. The other techniques listed above are much more effective.
The above list is not exhaustive, but does give an idea of ways to help reduce the risk of JavaScript within the HREF attribute of a hyper link.
Iframe SRC
It is important to note that this situation also applies to the IFRAME SRC attribute. it is possible to set the SRC of an IFRAME using the javascript: notation. In doing so, the javascript executes when the page is loaded.
Wrap Up
When developing applications, make sure you take this use case into consideration if you are taking URLs from user supplied input and setting that in an anchor tag or IFrame SRC.
If you are responsible for testing applications, take note when you identify URLs in the parameters. Investigate where that data is used. If you see it is used in an anchor tag, look to see if it is possible to insert JavaScript in this manner.
For those performing static analysis or code review, look for areas where the HREF or SRC attributes are set with untrusted data and make sure proper encoding has been applied. This is less of a concern if the base path of the URL has been hard-coded and the untrusted input only makes up parameters of the URL. These should still be properly encoded.