Hey, what smells like blue?

Friday, February 12, 2010

Authorization And You

I have been noticing a dangerous anti-pattern recently. It comes in the form of not enough authorization checking. When you have a function in your website that should only be available for particular users (administrators, moderators, whatever), you might be tempted to implement the code to restrict access once, in the view. Consider the following view to delete from a list of users (I snuck in the %: scriptlet tag which I've discussed before):


<ul>
  <% @users.each do |user| %>
    <li>
      <%: user.name %>
      <% if @me.can_delete_users? %>
        <a href="/users/delete?id=<%: user.id %>">delete</a>
      <% end %>
    </li>
  <% end %>
</ul>

This list might be accessible to everyone, but the deletion should be allowed only to privileged users. This could be a good argument to put all administrative features like this in their own actions/controllers, but there's really no need. Just be sure to check authorization again on the controller side:


class UsersController < ApplicationController
  before_filter :construct_me # Set up @me from session

  def delete
    raise "Nice try!" unless @me.can_delete_users?
    User.delete params["id"].to_i
    redirect_to "/users/list"
  end

  def list
    @users = User.find :all
  end
end

Did you see the extra check that the user is allowed to delete users from within the delete action? That is extra critical. It's easy to consider relying on security through obscurity. Who will try that URL to delete the user if they have no idea what the URL is and what parameters are needed? If the URL were less obvious, you may be even more strongly considering just hiding the URL. While the thought to reduce code clutter and duplicated work may be tempting, it leaves a gaping hole to anyone malicious who decides to start playing around with URLs, searching for administrative functions they shouldn't have access to. You may even get disgruntled former administrators/moderators/whatever who knew the URLs to decide to have some nasty fun with those functions they've been since blocked from.

Security is a hard thing to get right, but there's no excuse to succumb to laziness and knowingly leave dangerous holes in your system. The more powerful the function, the more on edge you need to be for possible ways to break your security.

Note that my snippets include an XSRF hole, but that's a problem for another post.

Friday, January 15, 2010

Design for Extension

I'm in the beginning of writing a simple library, primarily written in Java, with the intention of open sourcing it when it is in a reasonably functional state. As I began work on this project, I decided to try out a new way of designing my classes.

Ruby is my favorite programming language, and one of my favorite aspects of the language is that anything goes. Any object is extensible, even its private methods. Not even the built in classes are safe from extension. This is what's called giving you enough rope to hang yourself. Sure, you could hang yourself with this much power... but you could also use that rope to do some pretty cool things.

Nothing irks me more than non-extensible code. Many times I have run across Java code in the wild that either made a method final that I felt the need to override, or didn't expose a critical bit of API that prevented me from extending it in a key way I needed. It is fine if it is code I can change... I can just make something more open and extend as needed. The problem lies in third party libraries that I don't want to maintain private patches for. This, combined with my love of Ruby's openness, has led me to the goal of making all my Java classes as extensible as possible. The final keyword will only be used in dire circumstances (aka likely never), or to fake a closure in an anonymous class. The private keyword will be reserved only for fields and methods of no consequence.

Before you prepare to write a nasty comment about how ignorant and stupid this design will be, let me jump the gun and tell you why I think it is a bad idea. Either the API becomes locked into potentially poor choices, or the API changes underneath the feet of users of my code. The former is a serious problem if I want to have the freedom to evolve my code, so I opt for the latter. I see exposing a bunch of protected methods as a way to let the users of my code extend my classes in ways I didn't anticipate, yet also imply that it is fair game to change as I see fit. My goal would be to keep public methods as stable as possible, but even those I would be willing to shift around in the name of a better design (be it for code clarity, performance, or whatever).

Granted, this exposes implementation details that aren't really necessary for the client code. However, if someone abuses the details enough to cause themselves pain... I feel the pain is necessary to teach them a lesson. After all, we grow by making mistakes.

The benefit is code that can be essentially patched without needing to modify the source directly, and the ability to use the code in whatever way the client author can imagine. I'm always in favor of more freedom. If it were really that bad of an idea, I don't think Ruby (and other dynamic languages) would be as successful and popular as they are.

As I begin to try out this design mentality, I have run across some important discoveries immediately. It is very important to not depend on fields, because if you do, you limit the ways your code can be extended. Consider:


class Person {
    private String firstName;
    private String lastName;

    public String getFirstName() {
        return firstName;
    }

    public String getLastName() {
        return lastName;
    }

    @Override
    public String toString() {
        return firstName + " " + lastName;
    }
}

The above code is fairly extensible, but what if I wanted to change how getFirstName() works? Maybe I want to prepend a prefix, for example. Regardless, I can override getFirstName() and getLastName(), but toString() will not follow suit to the newly defined behavior in the subclass. Instead, I should have written Person as follows:


class Person {
    private String firstName;
    private String lastName;

    public String getFirstName() {
        return firstName;
    }

    public String getLastName() {
        return lastName;
    }

    @Override
    public String toString() {
        return getFirstName() + " " + getLastName();
    }
}

I could have also resolved this by making the fields protected, or exposing setters. Exposing fields beyond private still doesn't typically sit well with me in Java land (perhaps that's foolishly old school, but I'm stuck in some of my ways). Exposing setters seems to just expose complexity that isn't needed. I may not want my Person to have changeable names. Besides, the ability to redefine a getter by appending, prepending, replacing, or mangling the result is quite different from needing to set the value ahead of time.

The other key aspect I have discovered is to be very cautious when designing your constructors. Constructors in Java pose one of the biggest ways to accidentally lock your code into specific details. For example:


public class Person {
    public Person(String username) {
        // Connect to a database and load the person
        ...
    }
}

Do you see the problem? Yeah... you have made it so that you cannot construct a Person without connecting to a database and loading a person. Even if you subclass this, you are still stuck with that behavior no matter what. This has a couple possible solutions. You could extract the database connection details like so:


public class Person {
    public Person(String username) {
        loadFromDatabase(username);
    }

    protected void loadFromDatabase(String username) {
        // Connect to a database and load the person
        ...
    }
}

This will allow a subclass to override loadFromDatabase(String) so that it does nothing. This feels a bit hacky to me, but it works perfectly. The other way around this is to expose an alternative constructor (such as one without parameters). This other constructor could have an empty body, and you could even make it protected if you only want to expose it to give more power to subclasses.

So, I will see how this design works for me, but I already like it. Also, don't take my meaning to say that all methods will be exposed as protected or public. For example, a method that has a very well defined behavior might grow rather large and require some breaking apart. There is no need to make all the pieces protected if they really only make sense as a means of implementing the bigger method. There might be no serious harm in exposing them, but I don't believe in absolute rules either.

There is no One True Way to design. The flaws in this plan might prove more serious than I anticipate, but it doesn't negate that there is some value in allowing free extension. Design is also a bit of a personal adventure, so if this doesn't float your boat, look for alternatives that give you the Happy Feeling.

Any thoughts? Am I destined to fail? Is this something you've tried and enjoy?

Wednesday, January 13, 2010

Static State

Beware of static state. Need I say more?

Ok, I guess I shall, regardless of if I need to. If your automated tests all run from the same context (in my case, a JVM instance), static state will bite you... eventually. You may think you are dealing with the state just fine, but it will hurt you eventually.

I have seen many problems crop up with static state so far. Cached data from the database causes data from one test to bleed into another test. Between tests, you will likely have different data. If the data is not based on a fixed set, it is inevitable that some of the data will be slightly tweaked between tests. If that data is cached, the later test will end up with a cached copy of the earlier test's data. This may not crop up right away, and can happen at any time with cached data. A new test that runs before some old tests could cause the old tests to fail. If the order of tests is not predictable in your automated build, you could get a new ordering that sets up a perfect storm to cause some invalid cached data to stick around and haunt another test.

Similarly, you might have some mock versions of static data that makes testing easier. If you don't clean up the mock data, another test may be expecting the default behavior, and again you are failing because of an indirect cause.

Ultimately, static data can be useful and convenient at times, just be aware that your test environment gets weakened with every static cache or storage you use. You can't always simply try to clean up the cached data when you use it. Consider a class, A, that caches some form of data. Your current test depends on class B, which luckily doesn't use the cached data in A, but it does use class C. You don't deal with cleaning up A. Then, down the road, someone changes C to update or view the static state in A. All of a sudden the tests for B are dependent on static state you had no idea about. Or you are modifying that data unknowingly, and affecting all tests that run after yours that depend on the state of A being in a default state.

I've found the most effective approach is a setup or tear down method in your tests that explicitly clears all known cache. This will ensure no test can hide data that leaks into another test, causing an unexpected (and difficult to track) test failure.

Just be on the lookout for the telltale sign of static state causing failures. If a test class fails on the build machine (or in a group), but it doesn't when run individually... you might be seeing static state showing its ugly head.

Monday, January 11, 2010

Meaningful Comments

Comments are an interesting beast. You can comment your code too much, which will clutter the code so much as to decrease readability. You can comment your code too little such that it is difficult to understand the more complicated areas. You can even not comment at all, and hope the code speaks for itself.

Probably due mostly to laziness, I don't comment my code heavily. When I do, I like my comments to have meaning. I try to write JavaDocs or RDocs for every method, even if it is just a simple comment describing the purpose of the function, with expected behavior. More often than not, I inspect unknown code by looking at the actual implementation. However, sometimes I prefer to jump to the online documentation, so I try to do my part in making sure it is fleshed out.

The problem I see with JavaDocs and RDocs is that it takes diligence to keep them up to date. Sometimes you can unintentionally change the behavior of some method by twiddling with a private or protected method, causing an undocumented change that breaks the usefulness of your JavaDocs and RDocs. Even so, there is benefit in being able to quickly see what a method does, and use that as your base assumptions. If tests prove otherwise, the code can hopefully speak for itself and resolve your conflicts.

The trickiest comments, though, are inline comments. When it comes time to document the details of your code, I've found it very difficult to do a good job of thoroughly commenting it. The problem lies in the fact that the comments are useless until it is time to revisit old code. Once you have lost the context of why you wrote some code, it is very difficult to get it back. We have all run into this problem... you find a bug in some month old code, and start to retrace what the code is doing. You are browsing around, and you see some oddities, but can't for the life of you figure out why the code was written in this way. If you wrote it, you might remember you made some critical decision based on some immediate goal, but the goal and decision are now completely lost.

Therein lies the problem. Code needs to be documented to show why you wrote it that way, not what that code does. This comment is useless:


// Add 1 to the count of items.
itemCount++;

If you describe why you did something, you might give your future self a break when it comes time to track down an error, so try a comment more like:


// The item count is off by 1 because the first item is
// implicitly dealt with already.
itemCount++;

Now, this still leaves the problem open of when to write those comments. The best advice I can give is to look for those moments that you had to think a while to figure out what to do, or the code just looks too complicated. The best solution is so obvious it documents itself, but that can sometimes be rather difficult to achieve.

What are your guidelines of when to document code? Do you still run into those "what was I thinking" moments?

Friday, January 8, 2010

Peer Demo

A lot of software practices make sense when described. Source control, automated tests, automated build server, etc. Even the most obvious practices become crystal clear the moment they have a real impact on your development. That automated test that caught the bug as soon as you created it. The automated build that notified you of an old test that is now failing because of a new bug. A massive refactoring that turns out to be fruitless, easily reverted with some easy source control commands. An old diff helping you regain context of why you (or someone else) made a particular change.

Recently I had the revelation with simple peer review. Not even peer review of code, but the simple practice of demoing your finished code to a peer. Before going in to this recent demo, I fully understood the benefits of such practice, but it transformed a neat idea to a must do.

Now, like most developers, I am very obsessive about my code. I do everything I can to make sure it is polished and perfect when it is time to deploy. I try to write as many reasonable unit tests as I can. I constantly exercise my code, both with test driven development and with actually looking at the visible output to make sure it is doing what I expect. With all my obsessing over my code, I tend to finish with a fairly confident feeling that it is ready for production. Don't misunderstand my meaning to think that I feel my code is perfect, or even necessarily good, and most definitely not bug free, but I take pride in my craft and try to call it done when I really think it is done.

There is a big problem though... like every other developer, I am human. I don't anticipate a certain kind of error condition. I forget to check a particular path in the code. I over complicate the solution and miss a way to make the UI a lot simpler and easier to use. Ultimately, I am left with a couple bugs that I just couldn't quite run across.

For one, several, or all of the above reasons, I end up with a demo that makes blatantly clear a few key mistakes that I made. This happened with my most recent demo. Together, the reviewer and I filed quite a few bugs in our issue tracking software that I needed to resolve. Not all were directly caused by me and my code... some were issues that dated back quite a while... but quite a few of them were most definitely because I hadn't exercised the code quite as thoroughly as I had thought.

The key thing I take away from this reminds me of Jeff Atwood's post on not going dark. I'm not afraid to share my code as I develop it. In fact, I enjoy showing off a small snippet or clever bug fix I just wrote that I feel does something neat. I similarly like to see when other developers finish something they feel proud of. However, I still end up writing my feature in relative darkness. The demo is like shining a flashlight in the corners that I didn't get to. By having another person walk through your code, or even just the simple act of showing that it works, you will often end up seeing a different perspective that you never thought of while you were working. Sometimes you run across code that had been manually tested, but changed since the tests and is now broken. Whatever the case, you will inevitably run across something that could be improved upon that you didn't quite catch.

It doesn't even have to be a bug, necessarily. For example, in the demo, we decided that the action we were exposing to the user could benefit from being run multiple times, sometimes. So, we made an easy to access link that allows you to retry the action when you want to, which saved the user from the need to search a table for the same row to run the action again. This was a missing feature in the original spec, not exactly a bug. Nevertheless, the demo made this change an obvious need. Perhaps it is the halfway point between just deploying new code, and dogfooding your code. In the absence of being able to dogfood your code, a simple demo of all new features will expose a good chunk of what you would have been able to find through dogfooding.

So before you deploy that new feature... grab a friend and just demo it to them. Even if you don't find a single issue, at least you will be more confident that your code is ready.