Discover a few points to keep in mind during and after the creation of your web site, such as form referrer check, URL canonization, and so forth.
So, you've got your web site ready: The layout is jazzed up (round corners, pastel reflections, obnoxiously large text and the obligatory "Beta" in the logo), all the functionality is in place (configurable in 88 different ways, several different APIs for others to use), and a telephone hotline to your local Nescafé plant for an uninterrupted supply of power.
Before you give it that big push, though, it's always good to review your code one more time. And by review here, I really don't mean test it. Testing exists in all sorts of guises and forms; it's even one of the voices of your conscience. What I'm referring to is little, often-overlooked quirks and salient points that don't usually figure into web site building activities for the sake of expediency. No doubt, these are things without which your web site will still work, but it won't be the best it can be.
In this article, I will go over a few of what I consider the most important sanity checks for your web site. It doesn't have to be an ASP.NET web site, but some of the code in here will assume that you have a working knowledge of ASP.NET. The topics I will cover are:
- Lower-case URLs and canonicalization
- The Request Referrer Check
- AJAX and SEO
- Inline ASP.NET
Lower-Case URLs and Canonicalization
Although Google may be the best thing since sliced bread and the Ctrl+Shift+B key combo for your Visual Studio IDE, it unfortunately isn't perfect. Google's crawler is unable to distinguish between lower case and upper case URLs. It's a strange thing if you think about it.
If you're told to go to the Microsoft Headquarters in Redmond and upon arrival you see, in big bold letters "MICROSOFT HEADQUARTERS," would you think that you've arrived at the wrong place? (Hint: no) And so, any address on the Internet is unique in the series of alphanumeric values it uses, which Google's algorithms do not take into account; this is a limitation of the operating system that the underlying search engine lives on. And, because it's the world's largest search engine, you must work around this flaw. This is known as canonicalization.
There are two ways to canonicalize your URLs:
- Have it lead a pious life and make it perform several miracles before it dies and then lodge a petition with the Holy See of the Vatican. This option is ruled out because it's called canonization, not canonicalization.
- Use ASP.NET.
It's easier to do this in ASP.NET than by the first option or even compared to other languages such as PHP, because it gives you the global.asax class and allows you to hook into the processing pipeline. Perform a 301 redirect in the Application_BeginRequest event.
Here is an example:
Dim currentURL As String = Request.RawUrl.ToLower()
If currentURL <> Request.RawUrl Then
Response.Status = "301 Moved Permanently"
You're simply comparing the requested URL with its lower-case counterpart and, if they're the same, you allow processing to continue; otherwise, you return a 301, telling the crawler that the lowercase version of the URL is the one you prefer. Remember, don't do a Response.Redirect because this returns a 302 Temporary Redirect to the crawler, which isn't the same as a 301.
In addition, there's also the difference between www.yourdomain.com and yourdomain.com and www.yourdomain.com/default.aspx, which is brought up in several search engines. In other words, they can't tell the difference.
Again, in global.asax's Application_BeginRequest event, you can perform a check and force your visitors to use either one, but not both.
Dim currentHost As String = Request.Url.Host
If currentHost.Contains(".com") Then
If Not currentHost.Contains("www.") Then
Response.Status = "301 Moved Permanently"
Response.AddHeader("Location", "http://www." & _
Request.Url.Host & Request.RawUrl)
This particular code tells the crawler, "I want you to use this new URL from now on. So, don't use mydomain.com; use www.mydomain.com instead." And it should.
The Request Referrer Check
It's quite likely that your web site will have a page in which members can send messages or emails to each other, or even to non-members (send this page to a friend, email to a friend, and so forth). And so, you'll obviously have a form that users can fill out, that they then submit to another page which performs all the checks and then sends the email. Or, you might have a guestbook, again with a form and again with a page that performs the processing before writing to the database.
Pages such as the ones mentioned above are usually targets of spambots, advertisers, and other entities that don't have anything better to do. The problem that arises is if submissions to your email form can be automated, they can use it to send hundreds of emails per minute, and this can get your domain blacklisted, which isn't very good for business. You want to make sure that the people using your form are using it legitimately.
The simplest check for this can be done via a Request.Referrer check. At the beginning of your code, just before you start performing your validation, do this:
If Request.UrlReferrer.Host = Request.Url.Host Then
'Rest of your code.
Because a page that handles submissions may not necessarily be the same as the page that contained the form itself, you want to make sure that the request originated from the same domain as the currently hosted processing page. Comparing the UrlReferrer with the Url is one way to do this. You could of course also use Cross-Page postbacks in multiple-form-single-processor scenarios, but this is an ASP.NET 2.0 (and onwards) feature, so the code above applies to previous versions and can be suitably modified for other languages.
AJAX and SEO
I now come back to search engines. Search engine optimization is extremely important in today's World Wide Web. There's a lot of competition and you want your page to appear high up in those page rankings so that users can find you.
At the same time, you have all sorts of new trends and buzzwords being thrown at you, that you somehow believe that your web site needs to be successful. AJAX, REST, 2.0, blogosphere, podcasts, widgets, web parts, goats, and tag clouds... it can be overwhelming and the compulsion to attempt to implement all of it can be quite strong, as though these are secret recipes to success. But, they're not. As some of the individuals who've learned the hard way will point out to you, always use the right tool for the right job. So, although a podcast-based browsing format may work for web site A, it may not work for web site B, which prefers AJAX + Flash, and certainly not for yours, which may not even need any of those things.
Of these trends, AJAX is proving to the the most popular. It has two unfortunate side effects, though:
- Security holes
- Search Engine Unfriendliness
The problem with AJAX is that it often is not properly implemented: Everyone wants to use it, so they rush in and rely on copy-paste examples that they then expand upon without really understanding the implications of any actions performed. And, because of the fact that it introduces bits of business logic into the presentation layer, it opens up several holes in your web site's security model for others to exploit. I won't get into details because that's not what this article is about, but I will talk about the problem of SEO and AJAX.
When you use AJAX, you're making a call to a 'source' to retrieve some information, which you then write back to the page. This may definitely look nifty and slick, but search engines don't really care, nor can they understand what that AJAXy link is supposed to do. In other words, anything that's loaded using AJAX on a page, a search engine cannot see it. If you happened to use this technique and if this information is the heart and soul of your web site, you've just shot yourself in the foot with a cannon. In fact, you could even refer to this as URL cannonization. (Ha, ha! Get it?)
If you take the example of a shopping cart or news web site, what you want a crawler to do is to get to your front page and then follow a hyperlink to your next page of products or news articles. It may appear as a trade-off to you; AJAX offers the speed of dynamically loading articles or products and lets your users stay on a single page, but it breaks the web model in that users cannot bookmark your page and send it to their friends and search engines see a gaping void where the content is supposed to be. Workarounds for this often involve implementing a set of features that are supposed to fundamentally exist in a page. An example of such a workaround is to provide a different set of URLs to search engines and users. One set of URLs is plain text with hyperlinks and nothing fancy; this you give to the search engines by looking at request headers. The other set of URLs is the fancier version for users only.
If, however, your AJAX features work on a section of the site that is meant for users only (think of any section of a web site that requires authentication... you wouldn't want a search engine there), using AJAX may actually be of an advantage, because you don't have a host of other considerations to take into account once the users have been authorized to be there.
Before attempting to do anything with AJAX, always ask yourself "Do I really need this tool for this task? Is the advantage significant enough to justify the overhead to the task?" And, just like the protagonist in any Disney movie, you will know what to do.
Inline ASP.NET Code
If you've got an ASP.NET web site, chances are you've used some of its great controls. And, if you've used some of its great controls, chances are that you've used the GridView somewhere. Or a FormView or a Repeater. And, if you've used some of these controls, chances are you've used inline code, and it would look something like one of these statements:
<%# Eval("Column1") %>
<%# DataBinder.Eval(Container.DataItem, "Price") %>
<%# Container.DataItem("EmployeeName") %>
In fact, you'll see examples all over the web that use this in tutorials and code examples. It's unfortunate that this is the case. It's a quick and dirty way of doing things and avoids having to write more in an explanation, but that is not a justified reason. Let me go over the problems with this.
Using inline code is often the result of laziness, because it's faster to do things this way as part of a "quick and dirty hack" mentality. However, anything that can be done in inline code can be done in the codebehind. For example, the Evals and Container.DataItem examples above can be done in the ItemDataBound or RowDataBound (or any databound) events of the gridview and repeater controls. Doing this not only makes things cleaner, but it actually gives you a better understanding of how ASP.NET works.
The codebehind files for your web site's pages are compiled into a DLL. You know all about this. The code that's inline, though, is not. It is actually processed when the page is processed. Because it needs to access certain information that is now in the DLLs, it then has to use Reflection to find out what it's supposed to do. This in turn can lead to runtime errors, things that you'd never have picked up at design time. A single error as a result of such carelessness can affect the impression your web site has upon first-time visitors.
Bad coding practice
The reason a codebehind file exists in ASP.NET is for you to separate your logic—be it UI logic, business logic, or data logic—from the page itself. This is one of the fundamental principles behind the advantages of ASP.NET. This further means that if you mix these bits of inline code into your page design, you are creating a maintenance nightmare for yourself or for other developers at some point in the future, as innocuous as it may seem when coding. It will save someone else (or yourself) two years from now having to look at a non-working page's codebehind and wonder why that variable is assigned a value but never actually used... oh wait, it's in the design!
ASP.NET 2.0 may have a better inline coding model, but the point about bad coding standards alone is enough to warrant not using it. You will find many blogs, posts, and articles mentioning that it's all right to use inline code... sometimes. This adds to the confusion, because allowing for a small exception is just as bad as approving a bad practice. The rule of thumb to keep in mind, then, is "If you find yourself using inline code, you're doing something wrong." It's always better to nip the problems in the bud than to have to face it years later, like ghosts of coding past.
This article was originally published on Friday Feb 1st 2008
I've covered a set of mostly unrelated topics in this article, but their purpose is singular: To help you ensure that your web site creation and maintenance goes smoothly and efficiently. It's not a primer; it's meant to complement your existing knowledge of web development and should serve as a reminder of those little things that we all often miss out on in the excitement of developing websites.
Keeping a sanity checklist beside me while I code may be the result of a bad memory, but it does help to keep me inline (sorry). It also serves to remind me of not getting carried away. I hope this checklist works the same way for you and that you've overlooked some of the puns I've had to use in this article. If you have any comments, you can use the form at the bottom of this page to postback to me (sorry again!)