Wednesday, October 21, 2009

Quick Lesson from DevDays

Ok, so I went to Stack Overflow DevDays in San Francisco this last Monday, and it was a lot of fun. I went with a friend from work, and we both walked away happy, stickers in hand.

I took some notes, and I would like to share what I thought of all the speakers and their presentations, but we'll save that for another day... if I actually end up writing it (post a comment if you are interested, and hopefully that will push me to write it up).

What I want to talk about is a small bit I learned from one of the presenters. If you don't know me, I'm a big Linux fan, I heart Google, I have an Android G1, I love open source and Ruby and Rails, and Microsoft is practically the devil. Given that, you might find it interesting that the thing that struck me most... enough to actually write about it... came from Microsoft. That's right, it came from the near-devil. Well, it came from Scott Hanselman, who was a pretty good speaker.

Scott talked about Microsoft .NET MVC. It seems like a relatively cool product, if you can bear to be in the Microsoft world, and if you can bear to deal with such a clunky language as C#. Ok, I gotta give props that C# is a decent language... for a statically typed language... but the platform restrictions (excluding Mono), and it not being nearly as awesome as Ruby, will keep me from ever dealing with it again (I worked in C# professionally for a few years). Yes, .NET provides access to a fleet of languages, but C# is still their flagship. Anyway, from the discussion, MVC seems like it does a decent job at mimicking Rails, though it still didn't seem nearly as elegant.

That was a long lead-up for a simple point I extracted from the Big M... <%: value %>. That's it. I have wished the default of scriptlets was to sanitize html output for a long time, and I'm pretty sure Jeff Atwood had a post on it at one point. That tag, if I understood Scott right, differs from <%= value %> in that it will sanitize the output automatically.

On the train ride back to the south bay, I set it upon myself to implement this auto sanitization in Rails for my current side project. Initially I thought I could do something like:

def :(arg)
h arg

But alas, that doesn't work. Ruby won't let you define ":" as a method... it throws a syntax error... drat!

So, I figured, maybe I could monkeypatch the ERB handling so that <%: value %> works as I want, so I looked around the Rails source for how ERB compiles the views. It didn't take long to find erb.rb, which deals with the compilation. I inspected it for a bit, and toyed with some ways to deal with the problem, and ultimately came up with:

class ERB::Compiler::Scanner
alias_method :initialize_without_sanitize, :initialize

def initialize_with_sanitize(src, trim_mode, percent)
initialize_without_sanitize src, trim_mode, percent
@src.gsub! /<%:(.*?)%>/, "<%= h(\\1) %>"

alias_method :initialize, :initialize_with_sanitize

If the above code is cryptic for you, it basically overrides the initialize method (which is the constructor method for all you non-Rubyists) and uses a regex to replace all of the new sanitizing scriptlets with what would actually work to sanitize it. So, <%: value %> gets transformed to <%= h( value ) %>. I can't vouch for the performance (because I haven't done any performance testing), but it works.

Put the above snippet wherever you put your monkeypatches that you want run once. I have mine in a monkeypatches.rb file set to load after Rails has initialized (so it runs just once, at startup time).

Go forth in sanitized goodness.


Jeff said...

ASP.NET MVC is a step in the right direction - it's just too bad that the time between MS releases is still measured in years. I think Rails will continue to be a better framework for web development for some time to come.

Just fyi, Rails 3.0 will be sanitizing all output by default, and there's a plugin you can use now in 2.3 projects:

Mike Stone said...

Thanks for the comment Jeff! I have serious doubts that MS will ever be able to catch up to Rails, but it's still nice to see good ideas from them.

I'm glad to hear 3.0 sanitizes by default, and that looks like a pretty nifty plugin! At first glance, I see a distinction that the plugin doesn't introduce a new scriptlet tag, so strings must be marked as safe or the "raw" method must be called to output raw strings.

I kind of like having a separate scriptlet tag that does the sanitizing rather than just flipping the cases when you must invoke a method (or otherwise do something explicit). Of course, having 2 separate scriptlet tags probably makes it easier to forget to sanitize something that needs to be.

I can't wait to see what other cool stuff is in Rails 3.0! Maybe I will have to take a breather sometime soon and go check it out...

Mark Wilden said...

This is a cool hack. But it doesn't "auto-sanitize". It just gives you some syntactic sugar so you can say %: foo % instead of %h foo %. You (and everyone else on the team) still has to remember to use it. And in the Rails world, %h is more readily understandable than %: Still, interesting post.

Mike Stone said...

Thanks for the comment Mark! I think you meant <%=h foo %>, as <%h foo %> will do nothing, since h is just a method, and it doesn't concat the sanitized results, it merely returns it.

The point is to provide an alternate scriptlet tag to <%= foo %> which allows you to sanitize without having to invoke a method (namely, h). It's minor, I know, but it irks me when I need to put in parens because I need to sanitize and call another method within a scriptlet. For example: <%= h foo(bar, baz) %>, when it could be <%: foo bar, baz %>.

More importantly, if you get in the habbit of always using <%: foo %>, you get the benefit of reducing the risk of creating an XSS exploit (because it is automatically invoking h for you). Granted, as you pointed out, you still have to remember to use it, and it's non-standard, but I still like having a tag that defaults to sanitizing. It's a bit moot considering Jeff's comment, but I still like it, and it was fun to figure out :-)

MaurĂ­cio Linhares said...

Well, maybe it's just me, but I think that sanitization isn't something that belong in your view layer. If you get data from your user and store it in your database, you should either sanitize it in your controllers or in your models, not in the view.

The first reason is that it's a waste of processing power. Instead of just performing the sanitization once, when the data is submited, you do it every time it's shown to the user.

And the other reason is that it's tiresome, error prone and sometimes you're just not thinking about it to add the "h" call in there.

So, with this problem in mind I wrote a plugin that gracefully sanitizes user input before storing it into your database so you don't have to worry about cleaning up the data in your view.

Mike Stone said...

Thanks for the response MaurĂ­cio! I was beginning to respond, when I realized it would be so big as to warrant a full blown blog post (and I felt the discussion interesting enough to warrant the escalation).