Hey, what smells like blue?: 2009

Wednesday, December 30, 2009

Blame Yourself First

Why is it that the stupidest, most obvious bugs are the ones you end up spending half an hour on? Did I say half an hour? Make that several hours... maybe even a day or more, sometimes. Consider the following JavaScript code (written with jQuery):


$(function() {
  $(".clickme").click(function() {
    var link = $(this);
    link.css("color", "gray");

    finish = function() {
      link.css("color", "blue");
    };

    $.ajax({
      url: "/my/remote/url",
      success: function(data) {
        // ... process the result ...
        finish();
      },
      error: function(data) {
        // ... process the error ...
        finish();
      }
    });

    return false;
  });
});

I wrote code very similar to this recently, albeit a bit more complicated, but with the same bug. Can you spot the bug? What if I renamed the finish function variable to iAmABonehead? Well, if that doesn't make it obvious, you might want to brush up on JavaScript, if you plan on using it much. Instead of creating a local reference to a closure named finish, I was creating a global reference to a closure named finish that gets overwritten every time the click handler fires. Thus, a quick couple clicks in a row, and you will end up with some permanently gray links.

Here's the funny part, though. I did what every good software engineer does... I saw the behavior in my browser of choice (Chrome at the moment), and thought... gee, that's an odd bug in jQuery with Chrome. My code surely doesn't cause that problem. How can I figure out a way to work around this odd bug?

Thankfully I caught myself, and gave myself a good reprimand:

Self! Don't be such an idiot! You are surely the cause of this bug... drill down a centimeter and you will find it!

So I took this quite excellent advice and decided to give the same bug a shot in Firefox. My reason being that surely our version of jQuery is thoroughly tested with Firefox... if the bug comes up again, it's either jQuery, or me. Except that it probably isn't jQuery.

A quick test indicated that ~~jQuery~~ I was the likely cause of the problem. A couple minutes of a closer inspection of the code and I found the accidental global and fixed my bug.

I guess this is the "Science" aspect of Computer Science. When dealing with bugs, you need to treat it like a (hopefully) repeatable experiment. Form a hypothesis on why you are seeing your experimental results, then conduct further tests and experiments to drill down until you have found and squelched the bug.

For your first hypothesis, don't think about which of your frameworks is likely causing the bug. Eliminate them as the possible issue. The bug is 99.99942% likely to be in your code. When I come across a bug in fresh code, it's weird how the frameworks and platforms I am using are the first my mind blames. It takes all those times being wrong with such accusations before I finally started to stop myself and reconsider which part of my code might exhibit the behavior.

The sooner you blame your own code, the sooner you will find the bug. It helps to take a breather and come back with fresh eyes, too. Usually the bug isn't just in your code, it's something so blatantly obvious that your eyes skip over it and can't see the semicolons for the braces.

Wednesday, December 23, 2009

Better Estimation

Your time estimations suck. I don't care how accurate you are sure they are, they suck. The reason they suck is not because you are a bad developer, or you don't have experience estimating... they suck because you don't know the whole story yet.

It is impossible to predict everything that will come up during the course of your development iterations, and many things will come up that you had no way of predicting. Even if you break down your problem to the tiniest, bite sized pieces, you will have emergencies come up. Your colleague's dog will get sick causing him or her to be out half a day, making it impossible for you to complete the planned integration between your 2 services. A user won't be able to input their Woozit into your Whatzit page because of some overly zealous input validation you shipped last month, and it will take a day to track it down and fix it. You will meticulously estimate each minute detail of how to build the new Bazzle Integration Service, and you will get it all perfectly done, but forget to leave some extra time to do some dry runs at the end, only to realize their system doesn't quite match up with yours the way you expected, and 2 more days will be lost patching up the differences.

I guarantee you, no matter what you do, your estimates cannot be perfect. So stop thinking they can be, and figure out a way to adjust for the unadjustable.

I can't tell you how much I would love to see Joel's Evidence Based Scheduling implemented at my workplace. However, it's always something to tackle next iteration. Beyond the lack of commitment from the whole team, it seems like a lot of work to track the time you spend on everything, even though the results, I am sure, are spectacular.

If you can't commit to such effort, I suggest a more low-tech technique. For every work item that you are a little unsure of, add in a task to figure things out. Break your work items down to the most meaningless, mindless 20 minute tasks you can, and set them as half hour tasks... hour tasks... or even 2 hour tasks. You will have to test that 20 minute's worth of coding, after all. Understanding exactly what you are building will help prevent missing key aspects that might have been forgotten. More things will come up as you are in the details of your implementation, but I find it helps to try to think of as many contingencies as possible.

When you are done with your estimation, however you do it, double the final amount you have. This gives you significant buffer to help you reach your milestone, and it will help address a lot of the unknowns that will crop up between now and the milestone. Then, adjust this 2x multiplier each iteration based on how long it took the last iteration. If your initial estimates for the last iteration was 1 month of work, and you doubled it to 2 months, but you finished in 1.5 months, then multiply the next amount by 1.6, to be safe. Keep track of your multiplier at every milestone, and over time you should be able to develop a multiplier that works for your team to hit almost every milestone.

This is a much cruder form of Evidence Based Scheduling, but consequently, much easier to employ. It may not result in perfect results, but if you are missing every deadline, it should help better than continuing down the same path over and over and over, like a broken record.

Monday, December 21, 2009

Protect Your Environment, Part 2

Last time I talked about separating your environment, and automating it such that almost nothing needs to be customized outside your source control. Today, I want to expand on that thought a bit and specifically talk about your build environment. If you aren't using a build environment like Cruise Control, you really should be. By automating your build, you can quickly detect when you've changed code that causes a regression test to fail, or if you've otherwise unintentionally broken things.

I'm not really trying to espouse the benefits of an automated build system, though, so I'm going to assume you have one set up. The next step is to protect your build environment.

With the build environment, it's especially important to keep everything in source control, and contained within the root directory. If you can't have 2 parallel checkouts of your source simultaneously running your build without failing in some way, you are doing it wrong.

This means a few things. Primarily, do not depend on external files or libraries. If your code looks for a file in /usr/share/myconfig.properties, then move that file to within your root checkout and change your code to point to that properties file with a relative path instead. Commit every jar or other runtime library into source control, and point your build to load the files locally.

What this gains you is flexibility to have parallel development. You can have a branch that represents what is currently deployed, and a branch that represents your current development. You can update that external jar you depend on in your current development branch and it won't affect your currently deployed branch. This means you can have 1 build server constantly running your tests on both branches, and neither will need to worry that they share the same file system.

If you have any kind of branching going on, you run a much higher risk in hitting build issues if you depend on something outside what is committed to source control. If you have a test that needs to look for a file in a special location on your file system, there will be a problem the moment that file diverges between your branches.

If you must depend on some custom environment variables, prepare those environment variables within the build. Just like diverging files, it's entirely possible a separate branch will need to tweak those variables. Once you need to, your other branches will be hosed.

So, once again, once you've prepared your environment to be bulletproof, protect it vigorously. Don't let your build depend on anything outside what is committed to your source control repository, and your automated build server will thank you.

Friday, December 18, 2009

Protect Your Environment

Recently, I have been thinking of the environment a lot. Not global warming. Not the O-Zone disappearing. Not the ocean. Not the humpback whale or the spotted owl. I'm talking about your build environment. Your development environment. And, of course, your production environment.

These environments need protection just as much as our physical environment around us. Well, I suppose our physical environment should matter more, but as someone who lives on the computer, my computer environments matter a whole heck of a lot to me.

I'm here to tell you to set up an environment that automates just about everything you do, store it in your source control system, and then protect it vigorously. The moment someone strays and tries to let strands of wild growth choke your environment with custom tweaks, special path settings, and other such nonsense, trim those vines and get back to a clean garden. It's really not that hard.

I can't vouch for any fancy tools that manage this stuff automatically, because thus far I've stuck with living within my basic build environment. I've heard of tools that can manage your environment for you, but sometimes simple will work just as well. At my day job, we use Ant to run our builds, and little by little I've been molding our kudzu of a system into a manageable environment that requires little more than a subversion client, and a few additional packages. Ant can manage the classpath, so there's really no need to keep throwing jars into your shell's CLASSPATH variable.

My most recent addition was a simple custom Ant task to mimic Rail's environment setup. I created an environments directory and stored all the custom files that differ between our production environment, development environment, and any environment in between. It will hold custom properties files that are loaded at runtime, and Tomcat's context.xml so we don't have a local diff pointing to our development database. Within the environments directory is a directory for each environment, and the ant task will figure out which environment you are using, then copy over files from the appropriate environments directory to the correct location. This is much like Rail's environment.rb, with the different scripts in the environments directory.

A colleague recently set us up with Rails migrations, and I can't express how direly we needed this. It is important to version control all your database changes, but if you don't have a very orderly process to migrate between one release to the next, you are in for a world of hurt. Not knowing what has been run on your live database means you might miss something that could have drastic consequences with the code you are releasing along with it. Forget an important index and your queries will grind to a halt. Forget a key table, and your code could fail in an unexpected manner that might trigger other unintended failures, create corrupted data, or lose unrecoverable data.

By keeping your environment consistent, you can be more confident that deploying your new code will work exactly as you tested. By automating your environment, removing all of those special files you need to move to the right location or custom environment variables you need to tweak, you make it a lot easier to work within your team. You also can then bring in new colleagues and have a new development environment up and running for them faster.

The moment you tweak some kind of configuration that everyone in your team should do, automate it and put it in source control. If your tweak should only be in development or test, set up a simple environment switch in your build system, and grab the right file automatically. The production version and the development version should both be checked in.

So there... protect your environment. Who wants to be working in a jungle, anyways?

Wednesday, December 16, 2009

A Simple Ruby Pattern

By being a multi-paradigm language, Ruby provides numerous possible styles to write your programs with. I am a big fan of metaprogramming in Ruby, and one style strikes my fancy especially. By defining methods in your Class instances, your subclasses start to act like a domain specific language. You end up setting up a structural framework for how your subclasses behave, then declare the behaviors specific to each specific subclass.

To illustrate this, consider a simple calculator application. You could tackle such an application any number of ways, but we can come up with an interesting solution using class methods.

For starters, we need our base CalculatorBase class, which will provide the plumbing for dealing with input, handling if the input is numeric versus operations, and overall flow. I am not interested in these details for this post, so I will leave them as a homework problem for you to play with. Don't worry, it's fun!

First, your base calculator needs to expose a way to define operations:

class CalculatorBase
  def self.operation(op, arity, &block)
    define_method(op.to_sym) { |*args|
      unless args.size == arity
        raise "Wrong number of arguments"
      end
      block.call *args
    }
  end
end

From this, you can declaratively define what operations your calculator may support:

class Calculator < CalculatorBase
  operation(:+, 2) { |x, y| x + y }
  operation(:-, 2) { |x, y| x - y }
  operation(:/, 2) { |x, y| x / y }
  operation(:*, 2) { |x, y| x * y }
  operation(:sin, 1) { |x| Math.sin x }
  operation(:cos, 1) { |x| Math.cos x }
  operation(:tan, 1) { |x| Math.tan x }
end

Now, you can invoke operations with ease:

calc = Calculator.new
calc.+ 2, 3
calc.sin 3.14159

However, I feel we can do better than this. There is a lot of repetition going on with the arity. To make our calculator implementation more domain specific, let's allow binary and unary operators to be defined easier:

class CalculatorBase
  def self.operation(op, arity, &block)
    define_method(op.to_sym) { |*args|
      unless args.size == arity
        raise "Wrong number of arguments"
      end
      block.call *args
    }
  end

  def self.binary(op, &block)
    operation op, 2, &block
  end

  def self.unary(op, &block)
    operation op, 1, &block
  end
end

With these new methods, we can make our operation declarations a bit easier:

class Calculator < CalculatorBase
  binary(:+) { |x, y| x + y }
  binary(:-) { |x, y| x - y }
  binary(:/) { |x, y| x / y }
  binary(:*) { |x, y| x * y }
  unary(:sin) { |x| Math.sin x }
  unary(:cos) { |x| Math.cos x }
  unary(:tan) { |x| Math.tan x }
end

This isn't a particularly difficult approach in Ruby, but I really like the results whenever I am able to employ it. It can remove a lot of repetitive method declaration, and make your main class very clearly declare the behaviors it employs.

Of course, with great power comes great responsibility. It is easy to obfuscate the behavior you are creating with this pattern. Used appropriately, you can design an API and then speak the language of your domain in a much easier to comprehend fashion.

Sunday, December 13, 2009

Common Java Implementations

Java makes a lot of tasks more difficult than they should be. For example, checking if 2 objects differ can be cumbersome when you take into consideration that values could be null. Consider the following example implementing equals for a Name object (yes, some of that duplication could be simplified, but ignore that for now):

public class Name {
    private String first;
    private String middle;
    private String last;

    ...

    @Override
    public boolean equals(Object other) {
        if (other == null || !(other instanceof Name)) {
            return false;
        }

        Name otherName = (Name) other;

        if ((first == null) != (otherName.first == null) {
            return false;
        }

        if ((middle == null) != (otherName.middle == null) {
            return false;
        }

        if ((last == null) != (otherName.last == null) {
            return false;
        }

        return first.equals(otherName.first) &&
               middle.equals(otherName.middle) &&
               last.equals(otherName.last);
    }
}

Now consider the task of reading all the lines of a file:

File file = ...;
BufferedReader br = new BufferedReader(new FileReader(file));
String line = br.readLine();
List<String> lines = new ArrayList<String>();

while (line != null) {
    lines.add(line);
    line = br.readLine();
}

Maybe you need to build a string that is a colon delimited list of the items in a list:

if (list.isEmpty()) {
    return "";
}

String result = "";
result += list.get(0);

for (int i = 1; i < list.size(); i++) {
    result += ":";
    result += list.get(i);
}

return result;

Well, whenever you start cracking your knuckles, preparing to dig in and write a bit of code that you feel should be easier, stop. Take a deep breath. Consider who might have solved this problem first. It has likely been done before, so you can bank on another's implementation, with the benefit of it being thoroughly tested and likely highly performant.

In particular, the Apache Commons projects has a suite of useful tools to bank on. The equals example can be simplified to the following (thanks to the lang commons project):

@Override
public boolean equals(Object other) {
    if (other == null || !(other instanceof Name)) {
        return false;
    }

    Name otherName = (Name) other;
    return new EqualsBuilder()
            .append(first, otherName.first)
            .append(middle, otherName.middle)
            .append(last, otherName.last)
     .isEquals();
}

While the reading of lines can be simplified to (thanks to the IO commons project):

File file = ...;
List<String> lines = (List<String>) FileUtils.readLines(file);

Besides the Apache Commons projects, you could try out the Google Collection Library to solve the colon delimited list:

return Joiner.on(":").join(list);

You may also want to browse Google's Guava Libraries project.

Whenever these libraries can be applied, you can be sure to cut the amount of code you are writing to a fraction of what it would have been, and likely a lot easier to understand and maintain.

Thursday, December 10, 2009

Don't make V1 TOO Crappy

Avery is a company that provides some... paper needs. I don't know all of what they provide, but I know they provide labels. I know this because my fiancée and I were printing out labels for our upcoming wedding. They provide an online tool that could be... very good. Instead, it is... so so.

They provide web tools to construct a PDF which can be printed on to your labels. This is an awesome idea because it is a very portable way to print to their paper products, and could provide highly customized behavior geared towards the type of paper you are printing to. Labels, in our case, could have some really cool software designed specifically around producing the exact labels you want on the specific Avery label product you purchased. If Avery wanted to be a leader, they could even support their leading competitor's products, so when you think of a good experience for your printing needs, you think Avery. This means you might be more likely go for the Avery products the next time you wanted to buy some labels, if they can leave that good of an impression on you. Nevermind the ad revenue they could rake in on top of that.

There is no reason this can't work on any browser with any operating system, right? Well, that is their first failure. The tool (which seemed to be primarily built with flash) wouldn't work on Chrome or Firefox in Ubuntu. I see no compelling reason why it had to be built with Flash, except that it probably was easier to develop their customized UI. Well, I don't believe Flash really gains you much there, but I can understand the thinking that it does. Regardless, some of the controls just failed to respond in my browsers. Namely, a checkbox to toggle the first row in a mail merge as header names vs a separate data row wouldn't respond. Also, the button to add a new span of text on the label didn't respond. Ultimately, I was forced to use IE within my Windows VM, as disgusting as that was. Firefox might have worked on Windows (and maybe even Chrome), but after 2 failed browser attempts, I wanted to go to the crappiest browser that too many unskilled developers still seem to target exclusively.

Ok, force me to use a shitty browser to get my work done... that's a huge mark against you, but if the rest of the user experience is just mind blowingly awesome, I can forgive you. But seriously... I have to have a blown mind by the time I'm done for me to forgive you. It's just too easy these days to write cross browser HTML and JavaScript to excuse the lack of portability. This is not an application that screams the need for Flash, so I think it's an indication of a lack of talent and creativity that they resorted to Flash.

Moving on, I ran across another bug that showed the lack of polish they put into this product. They have a nice mail merge feature that allowed me to upload an Excel spreadsheet and use the rows as addresses. This is a must have for printing labels, but also one of those features that makes you happy to be using a computer instead of doing this stuff before our digital age, like attempting homebrew calligraphy with a ballpoint pen. So, they had a nice feature that I already mentioned. You could mark the first row as the row specifying your field names, effectively starting your data on the second row. Neat! It even worked! Except... for our recent batch of labels, when selecting fields to place in the label, only the first field was showing, and truncated as "Mr." That's odd. We tried dropping the header row, and the problem still showed up. I guess I should add that the first column value of the first row was something like "Mr. & Mrs. John Doe." It dawned on me that maybe these mindless developers were so bad that a simple & character could screw up the mail merge. I moved the row down and put in a more innocuous row at the top. Sure enough, it worked smoothly! So, let me get this straight... a mail merge that could have completely arbitrary data in it breaks down because of a simple & character???!?!? I shudder at the thought of what security holes might be present in this application.

Ok, I have saved by far the most egregious issue I have with their application for last. Their login process. Now, I can't for the life of me understand why they need any of my information to let me generate a PDF. However, they had multiple required fields, including my name and email. Why do you, Avery, need this information to construct a PDF to help me utilize your real product? It baffles me. Well, I gave the information, and then had to go back and start again due to some of the issues above. I figured this process had implicitly registered me, since the form had looked like a registration form. When I went back and attempted to log in, I discovered that this was not the case at all. That irked me, so I decided to register. And, FYI, the registration form looked exactly like the form to just use their web tools directly... just with an added password field. I registered and then clicked the link to get back to printing labels, and guess what? I got the form again to get my name and email to start the label making process again! Are you KIDDING ME?!?!?!?!!!!!

In what could have been an amazing experience, Avery thoroughly dashed my hopes that more companies are starting to understand that cross platform web tools are the future. Instead of being a leader in the specialized paper printing business, they have made me shudder at the incompetent developers that they likely hired to do this job. There were nuggets of a really cool product, so there must be some smart people there... but they are likely drowning from a few really bad apples.

I think this kind of disproves Jeff Atwood's recent post that you should release a crappy version 1 and iterate. Or at least, it reinforces the thought that he included that you shouldn't release total crap for version 1. I guess Avery could redeem themselves in my view if they are iterating and actually listening to good feedback. If I come back in a couple months, and all of these issues are somehow tackled... I could forgive these transgressions. I seriously doubt it will be the case though.

I think Avery is stepping in the right direction with their online tools. They have a lot of potential there. Unfortunately, their execution makes me embarrassed as a fellow web developer.

Friday, December 4, 2009

A Few Worthy Bytes

Bandwidth is a lot cheaper now than it was 10 years ago. Dialup is a dying breed. So please, please, please don't craft your HTML to save every last byte. It is a lot more worthwhile to make your code readable and immediately intuitive than to prevent an extra 10 bytes from going to the user. If you find that you are causing huge downloads of extra HTML, and it is becoming an issue based on real statistics... then start looking into addressing the issue.

For example, consider the following snippet:

<a href="/path/elsewhere"><%
  if @condition
    %>True text<%
  else
    %>False text<%
  end
%></a>

It may not look that bad alone, but when you try to save every byte, the result months from now will be a garbled mess that is difficult to comprehend. Instead, ignore those extra bytes and produce the following:

<a href="/path/elsewhere">
  <% if @condition %>
    True text
  <% else %>
    False text
  <% end %>
</a>

If outputting all that extra useless whitespace makes you feel too icky, Rails has an option for you. If you close your scriptlet tag with a dash, as "-%>", Rails will strip some of the whitespace.

<a href="/path/elsewhere">
  <% if @condition -%>
    True text
  <% else -%>
    False text
  <% end -%>
</a>

Just remember that someone has to maintain the code, and making code more difficult to comprehend will end up costing you more than the bandwidth you saved.

Wednesday, December 2, 2009

Free and Loving It

I have a soft spot for Open Source Software. When I compare two software products, I will almost always lean towards the Open Source alternative, if it can accomplish my task with some amount of pleasure.

Just tonight, I was working with Open Office with my fiancée. We ran into several kinks along the way, ultimately costing us the night. We couldn't finish our task, though I mostly blame my procrastination.

In particular, we were trying to print out some labels, given a spreadsheet of addresses. I had no problem finding howtos and working through the problem. However, we ran into small problems here and there. For example, I couldn't create a database out of a spreadsheet right away because Ubuntu didn't package the Database portion of Open Office along with the rest. I then innocuously named our database SomeName_12.1.2009. When trying to print, this ended with an error that it couldn't connect to the database. Renaming the database to SomeName_12_1_2009 solved the problem. Something we could figure out easily enough, but not something I would expect an average user to think of (nor do I think they should have to think of it). She became frustrated, rightfully so, and ultimately exclaimed that she's going back to Windows (for her Office needs).

At this point, it dawned on me that I don't mind giving Open Source the benefit of the doubt. The hacker in me enjoys figuring out how to work with the software (as long as what I want to do is possible, and reasonably easy). The hacker in me also appreciates that the project was built by people that are likely much more passionate about writing software, and much more passionate about getting something useful to people (instead of making a bunch of money). This is not to say there aren't people in the proprietary world that love software and want to deliver awesome stuff to people, but I just can't identify with that crowd of enthusiastic developers as much.

This is also why I foam at the mouth when I read comments like Jeff Atwood's:

as predicted, Google's "let's copy how Microsoft does phones, but open source!" model is a fail: http://is.gd/589U8

I've read the article he links to, and I consider it complete bullshit. I have a G1, and I love it. I have played with the Droid, and I drool over it. I know several people that have one (or some other Android powered phone), none are unhappy about the pick. Jason Calacanis of This Week in Startups (among many other things) has commented on his show that he loves his new Droid. Browsing some of the comments on that negative post, I see several that point to rogue processes as the likely culprit of the device slowdown discussed in the article. I've found this to be true of my G1. Sometimes I will discover my battery drained much faster with little reason during the day, or it will become extremely sluggish. Both cases I've had more than enough reason to believe it was a rogue process from something I had installed. With greater power comes greater responsibility.

In the Linux side of things, I have become far too attached to Emacs and the powerful command line based applications that I would never willingly go back to a Microsoft prompt. The Free world gets me, and I get them. I tend to believe in live and let live sort of philosophies, which has no room for restrictive licenses and the likes of DRM. Software patents scare me because I want to be able to develop anything I want, without having to worry that someone else may have already thought of the idea and patented it. I also don't care if someone takes my ideas and tries to make them better. I may be a bit envious, but I firmly believe the meat of a product is in the execution, not the imagination. Ideas are a dime a dozen, but passion for your users and the desire to develop something of quality and value is truly rare.

I don't understand the Microsoft world, and I don't want to. Which world do you identify with, and why?

Monday, November 30, 2009

Goodbye Mercury News

Newspapers can't go away quick enough. I blogged about the newspapers recently, but I was speaking merely as an indifferent party at the time. I don't read the paper, and so I don't really care if they survive or perish. Now I have a strong opinion. I want them to go away, and I want them to go away soon. Well, this isn't entirely fair, because I'm basing this strong hatred on the actions of a single particular paper... the San Jose Mercury News. Presumably not all papers act in such sleazy, annoying manners, but any that do... I hope they disappear tomorrow. Scratch that, I hope any business at all that acts this way perishes tomorrow.

Ok, enough teasing. Rewind a few months. My doorbell rang, and I went and looked out the peephole. There was a strange kid... and I foolishly opened the door. It was a kid going door to door selling subscriptions to the San Jose Mercury News in an effort to get help going to college. I had actually helped a kid doing the same thing a year or two before, and I didn't mind helping this kid too. The last time I had given cash, but this time I was out. Then, I made my second mistake... I paid with a check. The kid needed my phone number, supposedly so the Mercury News could verify I was indeed helping him out. I reluctantly gave him my number... my third mistake. Somehow, I knew I was making a mistake, and immediately wished I had just closed the door on him. I had half a mind to call him back and write a new check out to him, and let him cash it and keep it, or buy a paper for his school or something. I indicated I didn't actually want the paper, and that he could give it away... I think he ended up giving it to a neighbor nextdoor that wasn't home.

Fast forward back to present time. The whole thing ended up being a scam. I'm sure the kid got some help from the Mercury News towards college, but at the cost of the Mercury News getting my phone number. I got a call shortly after the subscription ended with a request to resubscribe. Damn are they aggressive when calling! It took a few minutes for me to dash her confidence and finally end the call. I thought that was that, but have since received at least two more calls, one coming just this last Saturday. A holiday weekend no less! I would think they would stop calling if I emphatically tell them I am not interested, and even explicitly say I have never read the newspaper, and never intend to. I guess I'm just a number in a big list of possible sources of revenue now. Next time I guess I just need to tell them to make sure I'm off their list. And the next time some poor kid is going door to door selling San Jose Mercury News? Sorry! I will send 'em packing. I don't mind helping a kid go to college, but not if it is just to get my number on a list to cold call every other month.

I think I walked away with another lesson on how to treat your customers. If you might be bugging them, stop and reconsider what you are doing. Your sources of revenue should be from people that love what you are doing for them, not people so annoyed by you that they buy your product just to shut you up. It's definitely a no no to set up a faux charity just to turn around and annoy the contributers.

Tuesday, November 24, 2009

Closures for Java 7: DOA

To start off today's (probably) brief post, I want to quote Stephen Colebourne's blog:

JDK 7 closures will not have control-invocation statements as a goal, nor will it have non-local returns. He also indicated that access to non-final variables was unlikely. Beyond this, there wasn't much detail on semantics, nor do I believe that there has been much consideration of semantics yet.

Now, I can't vouch for his facts, but it seems accurate, so I am going forward with the assumption that I'm getting it from the horse's mouth.

That said, I want to state what closures in Java 7 will be if they are implemented as stated above. Can you guess? Yeah... sugar syntax for anonymous classes.

As a Java developer, I don't want to complain. Anonymous classes are a major pain in the ass. Any time I attempt some functional programming with them, I always look back and think... that would be so much more elegant with a foreach loop, or some such thing. Anonymous classes are a lot of syntax for very little meat. Getting rid of the painful parts of that syntax will definitely be a good thing.

As a Ruby developer, these so-called "closures" are a laughing stock. Can anyone really claim these are actually closures? Let's just stop kidding ourselves and call them elegant anonymous classes.

Without control-invocation statements or non-local returns, you can't turn a foreach into a method call with a closure. Without access to non-final variables, you have to either move the variable into the class, wrap it in an object, or do the horrible 1 element array trick. You know... construct a 1 length final array which you can then both get and set the element from within the anonymous inner class... it blows.

If my vote matters, it is for waiting till Java 8 and doing closures right, or giving us a little meat for Java 7 closures. Bare minimum, non-final variables have to be accessible.

Sunday, November 22, 2009

Smart XML Processing with Regexes

Recently, Jeff Atwood wrote about parsing HTML with regular expressions. I want to speak about it briefly, because I came across this issue last week. I gathered from his post that the lesson is to consider your options with an open mind, and only block a possible solution if you really understand the alternatives. Use facts and knowledge to choose your implementation details, not superstition and theoretical best practices. Best practices usually are created for a reason, but that's not to say there's never a reason to turn your head on them.

This post hit home with me because I had an XML file to parse that was over a gigabyte. From this XML file, I needed a very small handful of the data, and it was very regular XML. XML parsing is a solved problem, but most XML libraries I've used would easily choke on such a file.

Instead of even considering attempting to process this data with a normal XML processor, I wrote a simple Ruby script to extract the information. It looped over each line, looking for key parts of the data with lines like:

if line["<expectedTag>"]
  # deal with this tag
end

Then, I processed the key tags and data I was looking for with regular expressions, such as:

data = line[/<expectedTag>(.+)</expectedTag>/, 1]

The above was done within the if blocks. The key point being regexes would have been too slow alone, so I used the simple indexer method to quickly determine if the line contained something that mattered to me. Then I used the regex to pull the data that I actually wanted.

Can you write XML to break my processing? Of course! The question is... does it matter? And that answer was no. I only need to process this data once, maybe another time sometime in the distant future, but the XML is so regular that I know it will work for all the data. On top of this, if I missed some data, it wouldn't matter in the slightest for my purposes. So, in short, proper XML processing would have severely slowed me down (ignoring all lines that don't contain a keyword is much faster), and it would have produced no real benefit.

I ended up processing all the data in little over a minute or two, and I considered it a huge success. Over a gigabyte of XML to process seemed a rather daunting task initially!

Wednesday, November 18, 2009

Serve Your Users

I'm a bit upset. Some friends and I were planning a trip to San Francisco soon, and a few of them have booked a night at the Sheraton Fisherman's Wharf (don't worry, I will tie this in to software in a bit, trust me). I needed to book a night for me and my fiancée, so I brought up their website. Uh oh! The hotel was booked solid that night. This was bad... what if we have a hard time finding a place? This wasn't what made me mad though... well, besides at myself for not booking earlier.

What if the website wasn't accurate? I dialed up the hotel, just to be sure. It went something like the following (though it's coming from memory, so expect a bit of embellishment):

Me: Hi! Do you have a room available for the night of X?

Them: I'm sorry, I don't see anything available. Is the night flexible?

Me: Well, my friends already booked the night with you, soooo...

Them: I can check the Starwood Hotels, Le Meridien. It is about a mile away. Shall I check availability for you?

Me: Uuuuh, well, my friends are already staying at your hotel. Is there anything nearby that might have a room?

Them: ... It's only a mile away. Shall I look that up for you?

Me: Sure.

... She proceeds to book a night at Le Meridien, informing me of an offer comparable to what my friends had, though I made sure I had a refundable option so I could think it over ...

Ok, so this may seem like pleasant help from the reservations department at the Sheraton, but it's not quite why I'm angry. You see, after I hung up, I first checked how far on the map the 2 hotels were. It ended up being 1.4 miles... not exactly easily walkable for a night on the town. This wasn't why I was steaming though.

I then did a quick search of the nearby hotels to the Sheraton. I zoomed in on Google Maps, and all the hotels nearby were listed right on the map. The Hyatt, a block away. Holiday Inn, a block away. Best Western, across the street. Radisson, across the street. This was when my anger bubbled up. I called up the Best Western and found out that not only was a room available, but I could get the same price my friends got (and which was offered me at Le Meridien). I quickly cancelled the night at Le Meridien, quite thankful I didn't rush into the no refund deal I was initially offered.

Let's be clear, I fully understand where the Sheraton employee was coming from. They may get some kind of commission for redirecting my business to their sister hotel. They want to ensure they are getting my money. What irks me, though, is that I made it clear I preferred to be near my friends, yet she proceeded to push an option to me when she very likely knew full well there were alternatives that would have suited me better. It may be that any of the hotels I saw would do the exact same thing in a heartbeat, but I feel it is a grave mistake.

First, the Sheraton had a great opportunity to turn me into a fan. Had they pointed me to one of the numerous walking distance competitors, I would have remembered that fondly, and told everyone about my experience. Not many companies clearly have your best interests at heart. Instead, I remember it angrily... and tell everyone about my experience.

This is how I feel this story relates to software... well, more about business, but same thing if you are a software company. The way you need to treat your customers is as if your goal is to see their goal achieved in the way that best makes them happy. If that means pointing them to a competitor who would solve their problem better... then happily point them to your competitor's open arms. Don't treat your customers (or potential customers) as if their money is the only thing you care about, like the Sheraton did in this case. Your users will find out you weren't being completely honest, and they will hate you for it. They will speak out and write on some puny but public blog and tell everyone about the experience. Ultimately, your users will find their way to the option that is aligned with solving their problem, not extracting their money.

Wednesday, November 11, 2009

Cautious Development

One of the most appealing features of Test Driven Development for me is that it helps you write code that actually works when you are done. If you are testing at each step, you end up with code that works for all those features that you explicitly tested. This is not to say you will end up with bug free code, of course. Nobody but a pointy haired boss would expect that.

All too often, I see code that is supposedly done, but a cursory run through some simple examples shows earth shattering bugs. Bugs where the basic features being implemented don't even work. What possesses a developer to think something is done when it hasn't been played with for a while, showing some level of completion? I guess we all fall into the trap of simplistic changes that can't possibly cause a problem, only to have some bug crop up specifically because we aren't looking. Sometimes some manual (or even automated) tests would be so tedious to set up that it just doesn't seem worth the effort. But I've seen cases where it's clear the code was not very carefully crafted in any regard, with no reason that it couldn't have been done better.

The most absurd example came from someone I helped interview quite a while ago. I distinctly remember going in and running the beginnings of the code with him. I put in some input and saw some output... all was good. When I returned a while later, the IDE still showed the exact session we had run as much as an hour earlier. Not only did the code not even compile, it had no hope of working even if we could trick the compiler into giving us a green light. Is it a rare quality among developers to actually play with the code as you go along?

I'm not saying you need to exercise the code in a particular manner, just that you exercise it in any manner. Run the code at every step, playing with the new features you are working on, and sanity testing some older features. Write it test first and watch new tests succeed as older tests continue to succeed. Hell, even take a waterfall approach with a big bang batch of code, then furiously cycle through short runs to find and fix a bug. I don't care, as long as when you say you are done, it isn't trivial to find a bug in the code.

Unsurprisingly, I'm also a fan of writing your code in the most cautious manner possible. Some developers seem to like to go in to old code and just run wild in it, tearing things out and replacing them left and right with no regard or respect for something that might be doing the job well, or at least well enough. Sometimes a piece of code can prove to be so obscure and hard to maintain (or even understand) that it makes little sense but to throw it out and start from scratch. That should be more rare than common, though.

When it comes to refactoring, I like to go in and be very careful that the new code preserves equivalent functionality, especially if the code isn't covered by a good suite of tests. Don't go in and replace a whole class with a new class that does the same thing in a way you like better. Instead, move code around and massage it into the shape you need, slowly but surely preparing it for the new feature you need to add. Continually ask yourself if the shape of the code does exactly what was being done before (unless of course you discover a bug). Pretend that if you introduce a new bug as a result of your changes, the entire company and all the users will come and yell at you and deride you for making such a mistake. Worry for every millisecond that you might be making a breaking change.

Care for your code. Envision the code is your lover. Would you do a single thing that might hurt your lover's feelings? Would you want her to stub her toe because you moved the table to the wrong location? Then don't make sweeping changes without being extremely cautious, because you will only guarantee to stub your code's toes. If you treat your code right, she will only get more beautiful, while learning new and exotic tricks.

Monday, November 9, 2009

View Sanitizing and Micro-Optimizing

Maurício Linhares posted an intriguing response to my recent post about auto-sanitizing Rails views. I was just gearing up to respond via a comment when I realized I could probably turn it into a full post, so here goes my response!

So, first of all, let me applaud Maurício for actually writing some code and sharing it, rather than keeping the discussion academic and merely flinging arguments around the interwebs. I can't say I always do it, but I have the most fun writing a blog post where I show some code to achieve a solution to a problem I am having. He even went the extra mile to create a plugin for his idea.

That said, I must respectfully disagree with his approach. The gist of how he tackles the problem is to sanitize data as it comes in rather than when you are displaying it. He even argues in his blog post that it is "more adherent to the MVC":

Now, as the data is cleanly stored in your database, you don’t have to waste CPU cycles cleaning up data in your view layer (and you can even say that you’re more adherent to the MVC, as cleaning up user input was never one of it’s jobs).

He makes a convincing argument that the view layer should not sanitize input. My big problem with this is that you have actually taken very specific view details and moved it into the controller and model layers, contrary to what he is claiming. Namely, you have introduced into the controller/model layers the idea that your data is going to be displayed via HTML. However, what if you want to expose a JSON API later on via the same controllers, but with new views? Now, you will need to unsanitize the data, and resanitize it for JavaScript output! You have inadvertently snuck view information into the database! Your data is pigeonholed as HTML data, and it now takes double effort to use the data in another matter (such as JSON data).

This last point deserves some extra attention. Consider if our transform on the data was a lossy transform. This case isn't, because you can easily unsanitize sanitized HTML, but forget that for a second. For example, let's say we wanted all data to be sanitized and censored, such that words like "ass" and "crap" got changed to "***". If we had a bug that caused "crass" to be changed to "cr***", we have just lost information that is irretrievable. If we saved the sanitizing and censoring for the view, where it belongs, we could always fix the censoring code and our "high fidelity" representation will allow us to now correctly show "crass." Let me quote a Stack Overflow podcast, where Joel explains this same position:

Spolsky: Here's my point. Uhh, in general, my design philosophy, which I have learned over many years, is to try and keep the highest fidelity and most original document in the database, and anything that can be generated from that, just regenerate it from that. Every time I've tried to build some kind of content management system or anything that has to generate HTML or anything like that. Or, for example, I try not to have any kind of encoding in the database because the database should be the most fidelitous, (fidelitous?) highest fidelity representation of the thingamajiggy, and if it needs to be encoded, so that it can be safely put in a web page then you run that encoding later, rather than earlier because if you run it before you put the thing in the database, now you've got data that is tied to HTML. Does that make sense? So for example, if you just have a field that's just their name, and you're storing it in the database, they can type HTML in the name field, right? They could put a < in there. So, the question is what do you store in the database, if they put a < as their name. It should probably just be a < character, and it's somebody else's job, whoever tries to render an HTML page, it's their job to make sure that that HTML page is safe, and so they take that string, and that's when you convert it to HTML. And the reason I say that is because, if you try to convert the name to HTML by changing the less than to < before you even put it in the database. If you ever need to generate any other format with that name, other than HTML - for example you get to dump it in HTML to an Excel file, or convert it to Access, or send it to a telephone using SMS, or anything else you might have to do with that, or send them an email, for example, where you're putting their name on the "to" line, and it's not HTML - in all those cases, you'd rather have the true name. You don't want to have to unconvert it from HTML.

Yes, it is tedious and error prone to use "h" everywhere, but that is the exact same problem I was trying to address in my post. However, I feel training myself to use <%: foo %> over <%= h foo %> is a better muscle memory than marking all input as sanitizing. Let's consider the consequences if you forget to apply the new scriptlet versus if you forget to apply the sanitizing of inputs. If you forget the new scriptlet, you have a new XSS hole that needs to be closed by simply changing "=" to ":" (or alternatively adding a call to "h"). If you forget to use the sanitizing of inputs, you have 2 major problems. You have an unknown amount of XSS exploits (everywhere you display that data, which could be in many places). You also have a bunch of data that is now invalid. You now need to either add sanitizing to all the view locations you output the information (which would be tedious and contrary to the whole point of Maurício's approach), or you need to update all existing records to be sanatized, just before enabling sanitizing of input.

There is another issue with this approach that a new scriptlet tag avoids. By making the sanitize decision in the view layer, you have the option of exactly what you will sanitize. Let's consider a site like a blog or Stack Overflow. In such applications, you want some amount of HTML displayed, though not necessarily on all fields of the model. You might want to whitelist sanitize the blog post, question text, or answer text, yet fully sanitize the labels or tags. Granted, you could update the plugin to allow such complexity, but it will be just that... complexity that will bleed through how you invoke the plugin. You will now need to not only specify which actions or controllers use this sanitizing, but also which parameters are excluded from being sanitized.

All of the above pales in comparison to the biggest sin of sanitizing the parameters, and it is one of my biggest pet peeves. It is one of the key points for why Maurício chose the path he did. Premature optimization.

The argument goes that rather than waste the CPU cycles every time you load the page (which is hopefully a lot), you should waste the cycles once as the input is being passed in and saved to the database. Premature optimization usually rears its ugly head in the form of much more insane choices, like insisting on how you should concat your strings. Thankfully, Jeff Atwood has already done metrics showing us that it doesn't matter.

Is sanitizing as quick as string concatenation? Probably not. I would be willing to bet, though, that it is fast enough for a small website. Why waste extra consideration on it until you have the awesome problem of having too many users?

Let's take a step back. Is pushing View logic into Model/Controller territory even worth the possible performance benefit? If I am going to throw proper MVC separation concerns out the window, it better be for a damn good reason. Allowing us to get orders of magnitude more pageviews might be worth it (if metrics proved that it was the best possible improvement, which is doubtful, but let's consider it). The whole concept is to cache something you are doing a lot by doing it once, before it even goes to the database. Let's extrapolate that concept. Instead of sanitizing first so we don't have to sanitize on every page view, why don't we cache the result of the action invocation itself? For every action/params pair that produces a view based strictly on data in the database, we could invoke the action once and just cache the rendered view for future use. Then, simply blow the cache away and re-render when you change the related database row(s). It is typically much more common to view than to update, so I expect this approach would give significantly better performance benefits than simply avoiding a bunch of sanitization calls.

With some careful thinking, we now have a much better solution to remove all the redundant sanitize invocations... and we've even removed redundant calls to the database, and any other costly algorithms we have done within our actions! All while preserving proper MVC separation of concerns. You can bet that I will explore this space when I have the fortunate problem of having too many users (and I wouldn't be surprised if there are available solutions that match my description).

Sorry for going off so much on your very well-meaning post, Maurício! I think you brought a very interesting possible solution, and it's always great to see code brought to the table. However, I do feel we should all seriously consider all approaches, and fully consider the consequences of the path we choose... not just to this particular problem, but any problem. It's best to drill down early and think about what issues our code may cause for us in the future. Don't consider this an excuse to dwell on issues to the point of failing to release useful functionality, though.

Thursday, November 5, 2009

Easy Partials in Rails

I created something at my job that has proven to be extremely useful that I think many people could benefit from. I like to call it Easy Partials, and the goal is to make using partials a bit easier (in case the name didn't make that glaringly obvious). The problem is that rendering a partial requires method calls when a little extra work will allow simpler and more readable partial invocation via convention.

You are probably lost, so let me give you some examples. As it stands today, to render a partial you would do:

<%= render :partial => "my_partial" %>

Which would take the partial "_my_partial.erb" from the same directory and render it.

It works, but what if you could take the whole "convention over configuration" idea and change it into:

<% _my_partial %>

A lot simpler, and pretty intuitive, right? Since partials by Rails conventions start with "_", it makes sense to just name a method as such to render the partial. Note that there is no "=" in the scriptlet, the _my_partial method will concat the partial, so you won't obtain a string to render.

We could create a helper method for every single partial we want to render like that, but that's rather cumbersome, isn't it? It's also not very DRY. You won't have exactly the same code, but you will find yourself with a lot of helpers that look rather similar. Instead, let's try overriding method_missing in our application_helper so we can avoid all those repeated helpers!

module ApplicationHelper
  alias_method :method_missing_without_easy_partials, :method_missing

  def method_missing_with_easy_partials(method_name, *args, &block)
    method_str = method_name.to_s

    if method_str =~ /^_.+$/
      partial_name = method_str[/^_(.+)$/, 1]
      concat_partial partial_name
    else
      method_missing_without_easy_partials method_name, *args, &block
    end
  end

  alias_method :method_missing, :method_missing_with_easy_partials

  # Concat the given partial.
  def concat_partial(partial_name)
    content = render :partial => partial_name
    concat content
    nil
  end
end

What we've done here is check on method_missing to see if the method name starts with "_", and, if so, treat it as a partial and concat it. If the method doesn't start with "_", we fall back to the original method_missing implementation.

This works, but what if the partial needs some local variables? Before you would do:

<%= render :partial => "my_partial", :locals => { :var => "123" } %>

Instead, let's do:

<% _my_partial :var => "123" %>

Again, a lot simpler, and quite intuitive. To achieve this, our code will look like this:

module ApplicationHelper
  alias_method :method_missing_without_easy_partials, :method_missing

  def method_missing_with_easy_partials(method_name, *args, &block)
    method_str = method_name.to_s

    if method_str =~ /^_.+$/
      partial_name = method_str[/^_(.+)$/, 1]
      concat_partial partial_name, *args
    else
      method_missing_without_easy_partials method_name, *args, &block
    end
  end

  alias_method :method_missing, :method_missing_with_easy_partials

  # Concat the given partial.
  def concat_partial(partial_name, options = {})
    content = render :partial => partial_name, :locals => options
    concat content
    nil
  end
end

Now we are passing the Hash passed in from the view on to concat_partial so we can specify the locals we want to render. We could check that there is no more than 1 argument passed into method_missing, but I prefer not to (feel free to use and improve anything you see here, in case that wasn't clear).

The next improvement we can make is to allow blocks to be passed in. There is no direct correlation for this next part, that I know of, except to build it yourself with helper methods. It was inspired by Ilya Grigorik.

Here is an example of what we will build:

<% _my_partial :var => "123" do %>
  <p>
    Some block content.
  </p>
<% end %>

This will allow us to effectively pass in a block to the partial, so we can abstract some of the content in the partial so that the caller can define it. And now for the code:

module ApplicationHelper
  alias_method :method_missing_without_easy_partials, :method_missing

  def method_missing_with_easy_partials(method_name, *args, &block)
    method_str = method_name.to_s

    if method_str =~ /^_.+$/
      partial_name = method_str[/^_(.+)$/, 1]
      concat_partial partial_name, *args, &block
    else
      method_missing_without_easy_partials method_name, *args, &block
    end
  end

  alias_method :method_missing, :method_missing_with_easy_partials

  # Concat the given partial.
  def concat_partial(partial_name, options = {}, &block)
    unless block.nil?
      options.merge! :body => capture(&block)
    end

    content = render :partial => partial_name, :locals => options
    concat content
    nil
  end
end

Within your partial, you will use a "body" variable to output the contents of the block passed in. If you try to use a variable named "body" along with partial blocks, the block will override the body variable, so keep that in mind.

For the final improvement, consider a partial that belongs to more than one controller. What do we do then? Well, how about we maintain a shared directory that we pull from if the partial cannot be found within the local directory. Thus:

module ApplicationHelper
  alias_method :method_missing_without_easy_partials, :method_missing

  def method_missing_with_easy_partials(method_name, *args, &block)
    method_str = method_name.to_s

    if method_str =~ /^_.+$/
      partial_name = method_str[/^_(.+)$/, 1]

      begin
        concat_partial partial_name, *args, &block
      rescue ActionView::MissingTemplate
        partial_name = "shared/#{partial_name}"
        concat_partial partial_name, *args, &block
      end
    else
      method_missing_without_easy_partials method_name, *args, &block
    end
  end

  alias_method :method_missing, :method_missing_with_easy_partials

  # Concat the given partial.
  def concat_partial(partial_name, options = {}, &block)
    unless block.nil?
      options.merge! :body => capture(&block)
    end

    content = render :partial => partial_name, :locals => options
    concat content
    nil
  end
end

So, if you use the _my_partial examples from above within the views for "person_controller", but there is no views/person/_my_partial.erb, it will fall back on views/shared/_my_partial.erb.

Using Easy Partials, you can avoid redundant helper methods, keep html in easy to access ERB templates, improve the readability of your views that use partials, and make your code more accessible for your non-programmer UI designers. Note that you can even invoke Easy Partials from within helper methods.

Special thanks to Ilya Grigorik's post on block helpers, which planted the seeds for Easy Partials.

Wednesday, November 4, 2009

Dev Days: Mobile

Let's revisit Dev Days San Francisco. Last time, I talked about the Microsoft talk, and how I was able to incorporate auto sanitization into Rails. This time, I would like to discuss the three mobile talks that were presented. Just for full disclosure, I am a big fan of Android, so you will likely see that bias come through. I know I felt the bias while taking in the three talks.

iPhone

First up was Rory Blythe talking about iPhone development. Of the three presenters, he was the most at ease speaking in front of a crowd. I'm sure Rory is a nice guy, but he had a swagger that... felt very Apple-esque. He was the hipster Apple guy smugly looking at us and telling us why we should cow down to the almighty Steve Jobs, and why we should be happy to do so. He made the audience laugh the most, most often with flaws in the iPhone development platform that we all know is an issue. Cheekily telling us three or four times that you cannot develop for the iPhone on anything but an Apple machine sure doesn't sit well with an Ubuntu fan like me. I don't think he was trying to sell the platform, though, just give us a taste for what it was like.

Most of the talk was him walking through the steps to develop a simple application in Xcode. It was very visual. Click here and drag over there to hook up a click handler of some kind. Drag this widgit in line with these others. Ultimately, we got to peek at some actual Objective C. If you weren't at Dev Days and you have never seen Objective C, be thankful you didn't have to, because it ain't pretty. He joked about how many places you have to adjust the code just to set up a simple property. As a Rails web developer, it made me want to claw my eyes out. I would never want to develop in such a redundant, cumbersome language.

There was good news, though. He introduced us to MonoTouch, a development environment for producing iPhone applications written in C# with the .NET platform. I'll take Ruby any day, but between the choices of C# and Objective C, it is a no brainer (C# of course... from someone who generally despises Microsoft technology). Too bad I can't use Ruby to develop an iPhone app on an Ubuntu machine. I might consider iPhone development then, though not seriously. If you wonder what is wrong with me... well, I choose an open platform, because I feel in the long run it will win out to a closed platform. I am convinced all the bad press towards the App Store approval process fiasco, and Apple's draconian attitude towards the 3rd party development community will ultimately be the iPhone's undoing.

Maemo/QT

The next mobile talk came from Daniel Rocha of Nokia. He talked about Qt, specifically in the context of Maemo. He was very dry. Dryer than a desert when compared to Rory. Some people just aren't built for speaking... I'm sure I would freeze up and be just as bad, but thankfully I don't seek such torture for myself.

It was a running theme to do some actual development on stage to demo the platform for the audience. The core message of his talk was "Look! Cross platform! Same code for all the major platforms, including mobile!" Alright, C++ cross platform I guess is kinda cool, except it's not that impressive in today's context. Java has been highly cross platform from the beginning, and all the primary scripting languages are quite cross platform as well.

The most impressive aspect of his cross platform demo is that the UI is among this cross-platformness, which I suppose is the whole point. However, he admitted about still needing macros to segregate the platforms now and then. And I really don't believe that a mobile UI should be the same as a desktop UI. With the state of mobile devices, I think it's wiser to develop a UI with a small screen (and touch screen capabilities) in mind. Negative points also for running Linux in a VM inside Windows instead of the other way around.

The Qt IDE he was showing off was also quite unimpressive. It looked like Visual Studio circa 2000. The UI designer looked exactly like an old Visual Studio UI designer, with that grid of dots and all. I got the feeling that anyone used to a good IDE like Eclipse would feel shackled in this thing. Anyone used to an awesome editor like Emacs will... well, an Emacs user would never stoop to a lesser editor of any kind.

The funny thing about this talk is that it was followed by an Android presenter who pointed out how most major manufacturers, except Nokia, were coming out with devices with Android. All I can say is, Nokia is dropping the ball if they stick with this platform over hopping on Android. I saw no compelling reason to develop for Qt/Maemo.

Android

James Yum finished the Mobile hat trick with an Android talk. I wanted to be blown away by this talk, but unfortunately Rory was the only awesome speaker of the three. To be fair, James was up front that he was asked to do this talk last minute, replacing the original Google speaker, who probably would have crushed it.

He started off by showing the "Droid Does" Verizon commercial, which was an awesome commercial (despite not actually showing the device, and despite coming from a company I'm not a fan of). He then read some of the YouTube comments for the commercial, to humorous effect. I really hope T-Mobile comes out with a phone this compelling. According to James and the commercial, it has an 800x480 screen with a physical keyboard and a powerful chipset.

Great start, lousy finish. He followed this up with an uninspiring demo of how to deal with threads in Android, with the specific goal of making a snappier UI. He went from a simplistic but completely incorrect way to do threading to an object oriented way using the Android API. It made developing with Java look almost as bad as Objective C. I'm not exactly a fan of Java (though I work in it almost every day), but it really isn't this bad to work with Java, and specifically Android. I think he would have better shown off the platform by taking a real (though small) application idea, and implementing it on stage. This is what Rory did, and it had a much greater effect than slogging through the most complicated aspect of modern programming you could possibly think of.

Overall, James was far too green for the speech. He was (understandably) clearly nervous, and he was unable to answer most of the questions at the end of his talk. I was silently rooting for him, hoping he could show all the would-be iPhone developers what they are missing, but I was disappointed. Maybe Google will learn from this and keep a good speaker on hand as a backup, should the primary speaker have to drop out last minute.

Tuesday, November 3, 2009

Accidental Complexity

I thought I had some interesting insights in my last post about Postgres and why I was going back to MySQL, at least for the time being. So much so, that I posted it on Reddit. Alas, it didn't go over well. It had a lot of downvotes, and most of the comments were fairly negative. I think I'm still right about what I did and why, but I think I can better articulate why I think so in two words:

Accidental Complexity.

It's definitely not a new concept, and I'm sure many people have talked about it before, but in reading the responses on Reddit, it became clear that this is precisely the reason why I switched to MySQL, and precisely why I probably won't switch back anytime soon.

Wikipedia defines accidental complexity as "complexity that arises in computer programs or their development process (computer programming) which is non-essential to the problem to be solved."

This perfectly describes the issues I was having. Think about if I could sudo apt-get install postgres, hook it up in Rails, and be on my way. I could focus nearly all of my time on essential complexity (ie developing features for my actual application), or at the very least, accidental complexity arising from other applications or from my coding choices.

The choice may bite me later on with other accidental complexity that Postgres tackles well, but right now I want to actually get the project to a point where I can start showing people, and maybe start getting some users. After all, getting users is the whole point. To that end, reducing the hassle being caused by the database is a Good Thing.

So, I really think this should be a lesson for all the applications I write, and all the applications you write as well. Make sure your application reduces the accidental complexity you are forcing on your users. If a significant portion of them are spending a lot of their time configuring your application instead of solving the problem they are trying to solve, you have failed them.

Don't take this to the extreme, though... you still have to see the big picture. Who knows, maybe the default user settings is good for the majority of users, and I just happen to fall in the unlucky minority, but comments like "pg_hba.conf can be a bitch, but updating it to allow transparent local access is just a couple of lines away" tells me that I'm not the only person who has had trouble configuring Postgres, even if the configuration changes are small.

Sunday, November 1, 2009

Postgres... is it worth it?

In working on my side projects, I needed to pick a database. I decided to pick Postgres, and it has been nothing but problems since the beginning. It sucks, because half the problems don't really seem to be the fault of Postgres, though they all seem to stem from the issue that IS their fault.

So, when I first started with Postgres, right off the bat it is a pain. Every single time I need to install Postgres (usually on a new development machine), it is a pain in the butt! They provide a lot of flexibility for database user security. However, the last time I installed MySQL (which is admittedly a long time ago), you set it up, and it asks you for a root user password, and that's it. The rest seemed a snap, at least in the rose colored glasses of the past. With Postgres, each and every install requires me to search to figure out what settings to change in which config file just to get the damn users to be able to connect! Give me some reasonable defaults! If I try to connect to my database locally with a specific user, it would be awesome if the system would give full permissions to any databases I create with that user. Then things like rails would work out of the box, no configuration necessary. I will lock down the system to my hearts content if I need to, but make it JUST WORK first.

That was probably the extent of what I can actually blame Postgres for. It's a big fault in my opinion, though. Software that just works out of the box with no configuration necessary is nirvana. That is why Rails typically feels like heaven to me.

The next issue was probably HostMonster's fault. They give Postgres a back-seat treatment. They run an old Postgres version (I think 8.1), with the excuse that they need to wait till cPanel (their site management web app) upgrades what they support. Ok, I can live with that, but it sucks. When I actually tried to create a database, though, Rails couldn't connect to it! Grrr. HostMonster was on it though, and actually fixed my configuration issue. That was cool of them, go HostMonster! I have been a fan of their support, but that's not what this blog post is about. The issue had been a Rails configuration, I think changing how it was connecting... probably because HostMonster's user connection settings were set up a certain way, contrary to my Rails configuration.

Things were smooth until I needed to create a new database for a new website I wanted to create. It just didn't work. I couldn't create the database, and there was nothing I could do about it... possibly something HostMonster had done, but I didn't want to wait for a fix, so I took the dive and just used a MySQL database.

But I stuck with Postgres on my development machines. Why change? I still wanted it more than MySQL.

Until now.

I just upgraded to Karmic Koala on my netbook. I think it went well, except the upgrade from Postgres 8.3 to 8.4 didn't go so well (which was part of the Karmic upgrade). Both versions are now installed, and when I try to load Rails, I can't connect to the database. Running rake db:migrate even fails. No nirvana, no more Postgres. I'm fed up with the issues. I'm going to the dark side of MySQL, and I'm not going to come back until it is dead simple to set up a Postgres database with Rails.

Postgres intrigued me so much because of the recent drama surrounding MySQL... namely that the Evil Empire (Oracle) now owns them. Perhaps it's a bit superficial, but it's drama I didn't want to get caught up in. I want to know my database system will be open and free throughout the life of my products. However, MySQL makes things so easy that I just can't resist switching back. I will reconsider if it becomes easier to set up a database on Postgres out of the box.

Lesson? Make defaults that work towards achieving your users goals without them needing to dig into any documentation. If it's not easy to start, your users won't start, so you won't have users (at least not as many as you could have).

Saturday, October 31, 2009

A Nerd's Proposal

I got out of the car, and then helped her out. I was a bit nervous as we walked towards the restaurant, arm in arm. She reached her arm around me, edging her hand close to my inner left breast pocket. Uh oh! I stiffened my arm and blocked a potential revelation on her part. Safe for now (so I thought, anyway).

We walked into the restaurant and took a booth. It was New Year's Day, and not really that full. I had wanted a more significant location, but my first choice place was closed that day... drat! After some idle chat and ordering our meal, I figured it was now or never... or at least now or some other day. I wanted to do it now, so I took out my phone.

"I found this really cool Will and Grace app!" Her Christmas present was the Will and Grace box set. She was a big fan, so I figured a Will and Grace app would be interesting to her.

"Check it out, it's pretty cool. Click through that splash screen." I reached down into my inner coat pocket and grabbed the box. I opened it and waited.

She clicked through, selected a season and episode. She hit the item to play the episode, and after a brief pause, "Can't Take My Eyes Off You" from the Jersey Boys musical started playing. That's our special song, so I figured it was a good pick. I pulled up the box and held it out as she looked at a graphic with a couple hearts with the words "Will you marry me?"

Each moment that passed after that was stretched out like an eternity. She finally looked up and simply said "yes." That's it! I'm now engaged. She's the most wonderful woman in the world, and now I get to share my life with her.

Now that you know how it went down, let me tell you what went wrong. I settled on the day a few weeks before, while I was at my parent's house for Christmas vacation. I realized my Android phone would be the perfect way for me to pop the question... I could write a customized little app just for her! Google images proved useful for the "Will you marry me" graphic. It was a snap to rip our song from my CD musical recording. The hardest part was gathering all the content for all the Will and Grace episodes. Episode names was all I needed, but I had to take them from IMDB and convert them into Java strings. It was time consuming, but once all the content was set up, the Java app itself was a snap.

I guess that was what went well. The blunders came closer to the day. First came when I was asking her for dinner on the first. "Let's have dinner New Year's Day to ring in the new year!" Whoops. Very bad choice of words, especially since we picked the ring together... she knew it was coming, just not when. Why the hell did I say THAT??? She was of course suspicious, but I assured her it was just for dinner... I totally hadn't meant to give her a hint so early, but what's done is done.

We decided on the perfect place for dinner. Our 6 month anniversary had been at a nice fondu place. Perfect! I called them up, hoping to make a reservation, but alas! They are closed on the first! I expressed my disappointment to her... perhaps too emphatically. We agreed on another place. Not ideal, but I was determined to do this on the first.

New Year's Eve was at her parent's place. It was a lot of fun, and we were all packed to head over to Sacramento, which was our destination. I spilled some drink or something on my pants and all I had to spare was my nice clothes prepared for dinner. She suggested I change, and when she saw I only had nice clothes, she looked at me odd "we are dressing up for dinner?" I gave an excuse that it was New Year's... while wishing I had planned this a little better. All my effort went into the program to propose, why hadn't I worked out these details a little better? I'm not exactly good at thinking ahead and planning things well. If I could do it all over again, you can bet I would figure it out a lot better, but in the end it's the marriage that excites me more than the proposal.

With all those hints, she knew it was coming. In fact, she confessed she was reaching around while we were walking in because she wanted to test my inner pocket for the box. Sneaky! She saw me fiddling under the table when I passed her the phone, and thought "this is it." Thankfully, my application fooled her enough to second guess her inclination, so I salvaged a bit of a surprise.

I'm almost embarrassed to post to the world how unromantic my proposal was, but I think all the blunders made it memorable in it's own right... and it's very much me. I focused on the part I liked most, the programming, and assumed the rest would just go smoothly... because... you know, I had wrote a PROGRAM to PROPOSE! ON MY PHONE! I'm now just excited to become a part of her wonderful family, and know that I can spend the rest of my life trying to make her as happy as she makes me.

Wednesday, October 21, 2009

Quick Lesson from DevDays

Ok, so I went to Stack Overflow DevDays in San Francisco this last Monday, and it was a lot of fun. I went with a friend from work, and we both walked away happy, stickers in hand.

I took some notes, and I would like to share what I thought of all the speakers and their presentations, but we'll save that for another day... if I actually end up writing it (post a comment if you are interested, and hopefully that will push me to write it up).

What I want to talk about is a small bit I learned from one of the presenters. If you don't know me, I'm a big Linux fan, I heart Google, I have an Android G1, I love open source and Ruby and Rails, and Microsoft is practically the devil. Given that, you might find it interesting that the thing that struck me most... enough to actually write about it... came from Microsoft. That's right, it came from the near-devil. Well, it came from Scott Hanselman, who was a pretty good speaker.

Scott talked about Microsoft .NET MVC. It seems like a relatively cool product, if you can bear to be in the Microsoft world, and if you can bear to deal with such a clunky language as C#. Ok, I gotta give props that C# is a decent language... for a statically typed language... but the platform restrictions (excluding Mono), and it not being nearly as awesome as Ruby, will keep me from ever dealing with it again (I worked in C# professionally for a few years). Yes, .NET provides access to a fleet of languages, but C# is still their flagship. Anyway, from the discussion, MVC seems like it does a decent job at mimicking Rails, though it still didn't seem nearly as elegant.

That was a long lead-up for a simple point I extracted from the Big M... <%: value %>. That's it. I have wished the default of scriptlets was to sanitize html output for a long time, and I'm pretty sure Jeff Atwood had a post on it at one point. That tag, if I understood Scott right, differs from <%= value %> in that it will sanitize the output automatically.

On the train ride back to the south bay, I set it upon myself to implement this auto sanitization in Rails for my current side project. Initially I thought I could do something like:

def :(arg)
 h arg
end

But alas, that doesn't work. Ruby won't let you define ":" as a method... it throws a syntax error... drat!

So, I figured, maybe I could monkeypatch the ERB handling so that <%: value %> works as I want, so I looked around the Rails source for how ERB compiles the views. It didn't take long to find erb.rb, which deals with the compilation. I inspected it for a bit, and toyed with some ways to deal with the problem, and ultimately came up with:

class ERB::Compiler::Scanner
  alias_method :initialize_without_sanitize, :initialize

  def initialize_with_sanitize(src, trim_mode, percent)
    initialize_without_sanitize src, trim_mode, percent
    @src.gsub! /<%:(.*?)%>/, "<%= h(\\1) %>"
  end

  alias_method :initialize, :initialize_with_sanitize
end

If the above code is cryptic for you, it basically overrides the initialize method (which is the constructor method for all you non-Rubyists) and uses a regex to replace all of the new sanitizing scriptlets with what would actually work to sanitize it. So, <%: value %> gets transformed to <%= h( value ) %>. I can't vouch for the performance (because I haven't done any performance testing), but it works.

Put the above snippet wherever you put your monkeypatches that you want run once. I have mine in a monkeypatches.rb file set to load after Rails has initialized (so it runs just once, at startup time).

Go forth in sanitized goodness.

Thursday, September 10, 2009

Why Newspapers Can Go Away

Joel Spolsky and Jeff Atwood's most recent podcast struck a nerve with me. They go on for a period of time about Craig's List and the effect it has had on newspapers. Joel in particular tries to argue that Craig's List has essentially killed off the public good of investigative journalism, and by refusing to cash out on Craig's List in some way, has not returned the public with anything other than the marginal usefulness of free classified ads.

Normally I think very highly of Joel and his opinions, but in this instance, I think he has completely missed the boat. Jeff tries to poke holes in his argument, but he didn't articulate the argument that I most strongly feel....

Investigative journalism is now a distributed affair, and no longer needs the legwork of the newspapers that Joel alludes to. On a side note, I want to point out that the linking to eachother that Joel accuses bloggers of is quite prevalent in newspapers as well... aren't popular Associated Press articles picked up and run nationally all the time?

I just want to point out one recent case study to illustrate my point... the shooting of Oscar Grant on New Years Day in an Oakland BART station.

I admit, I haven't followed up on how the case against the officer panned out (or if it is even through yet), but the point remains that something like excessive police force cannot go unnoticed in todays world. We don't need newspapers to do the legwork anymore because the ubiquity of the web, combined with a computer in almost everyone's pocket in the form of a smart phone capable of snapping a photo, shooting a video, or recording a conversation and then posting it online immediately has put the power of investigative journalism into the hands of everyone, everywhere.

If something corrupt happens within earshot (or eyesight) of someone, there is a much greater chance nowadays that someone is listening and could easily publish the wrongdoings within minutes. This is not to say that we can do without real journalists going out and finding corruption and shining the light on it, but I firmly believe the Internet and technology is our salvation, not the newspapers.

Jeff, I think you are right that we are in a transition, and the spotlight on the bugs under the rock is being lit by everyone now.

Friday, April 17, 2009

Dealing with Errors in Rails

Jeff Atwood's recent blog post on "Exception-Driven Development" has driven me to write a post that I've been thinking about writing for the last couple weeks.

I'm working on a Rails application in my spare time, and one of the concepts I wanted to drill into the codebase from the beginning was easy management of the system from within the system. This means I can log in and, given that I have the proper access, view errors that have happened in the site, migrate the database, restart the application, and perform various other actions that would otherwise require tedious activities like digging in log files, or obtaining shell access and executing various commands. While I am quite comfortable on the shell (I run Linux on all the machines that I use with any regularity), I would much rather have an easy to use GUI that gives me access in the context of when I need it (ie, when I am actually checking up on the website... from within it).

So, this post will basically show you how to add error logging within your Rails application, as I've done in mine.

First up: the database. Here's the database migration script you will need:

class CreateErrors < ActiveRecord::Migration
 def self.up
   create_table :errors do |t|
     t.string :level
     t.string :title
     t.boolean :handled
     t.text :description

     t.timestamps
   end
 end

 def self.down
   drop_table :errors
 end
end

The level column is so you can keep track of errors vs warnings, or have informational messages as well. It's not strictly necessary, but I want the flexibility, plus I'm used to the concept from all the logging frameworks that support multiple levels of messages.

The title and description should be self explanatory. In case you are slow though, the title is just a header so you can at a glance see what errors have cropped up, and the description provides a detailed look at the error (such as a backtrace).

The handled column is critical so you can keep track of which errors you have dealt with, and which need looking at. It acts as a soft delete.

Since you've seen the database end, you might as well peak at the model next:

class Error < ActiveRecord::Base
  def self.error(title, description)
    log :error, title, description
  end

  def self.warn(title, description)
    log :warn, title, description
  end

  def self.info(title, description)
    log :info, title, description
  end

  def self.log(level, title, description)
    Error.new(:level => level.to_s, :title => title,
        :description => description, :handled => false).save
  end
end

The error, warn and info methods are to easily log something at the given level, much like a typical logging library. All of those methods use the one logging method which will actually save the error as a row in the database. Pretty simple, as is the usual case with ActiveRecord classes! You could make a case for making the log method private, but I don't really want to, so I won't.

Next up, the errors controller:

class ErrorsController < ApplicationController
  title "Errors"
  require_admin

  def index
    @total = Error.count :all, :conditions => ["handled = ?", false]
    @errors = Error.find :all,
        :conditions => ["handled = ?", false],
        :order => "created_at DESC", :limit => 50
    @start = [1, @total].min
    @amount = @errors.size
  end

  def show
    @error = Error.find params[:id]
  end

  def handle
    @error = Error.find params[:id]
    @error.update_attributes :handled => true
    redirect_to errors_url
  end
end

The title and require_admin methods are some utility methods I baked into the base controller... I may publish those at some point, but they aren't the focus of this post, so not now. If I get any comments requesting the code, I will probably follow up with the code, so let me know if it is intriguing you! Their purpose are to set the title on all views associated with this controller, and check for administrator privileges respectively.

If you will notice, the index will only load the most recent 50 errors... I figured it would be possible for the application to get in a weird loop where you trigger tons and tons of errors. I wouldn't want to haplessly check out the current list of errors, only to watch in horror as I am downloading a list of 100,000 errors. The handle method is the only means to update an error... and it simple turns on the soft delete so the error won't show anymore. It might make sense to set up a mass handled action which would handle a group of selected errors all at once... but I don't really need that yet... I will build it when the need arises.

The rest in index should make sense as you check out the index.html.erb view:

<h1>Listing errors</h1>

<p>
  <%= @start %> - <%= @amount %> of <%= @total %> errors.
</p>

<table>
  <tr>
    <th>Date</th>
    <th>Level</th>
    <th>Title</th>
  </tr>

<% @errors.each do |error| %>
  <tr>
    <td><%= h format_datetime(error.created_at) %></td>
    <td><%= h error.level %></td>
    <td><%= h truncate(error.title, :length => 100) %></td>
    <td><%= h error.handled %></td>
    <td><%= link_to "Show", error %></td>
    <td><a href="/errors/handle/<%= error.id %>">Handled</a></td>
  </tr>
<% end %>
</table>

Pretty straightforward, huh? Well, here's show.html.erb

<p>
  <b>Title:</b>
  <%= h @error.title %>
</p>

<p>
  <b>Time:</b>
  <%= h format_datetime(@error.created_at) %>
</p>

<p>
  <b>Level:</b>
  <%= h @error.level %>
</p>

<p>
  <b>Description:</b>

  <br/>

  <%= format_multilines h(@error.description) %>
</p>

<a href="/errors">Back</a>

Also pretty simple. If you are curious about format_datetime and format_multilines, they are simple helpers I created to respectively format a date as I want them formatted, and take a string with newlines and convert the newlines to <br/> elements. Pretty simple helpers, but I like keeping as much logic off my views as possible.

That's it! Or wait... that's all the stuff that rails practically generates for you (tweaked for simplicity, but still). How can this stuff get put into practice? Well, I'll give you that code too... since you've gotten this far. In your base application controller:

class ApplicationController < ActionController::Base
  RENDER_LIMIT = 0.5
  around_filter :time_page

...

  private
  def time_page
    before = Time.new
    yield
    after = Time.new
    delta = after - before

    if delta >= RENDER_LIMIT
      title = "Page took #{delta} seconds to render"
      description = "Execution of #{request.path} took #{delta} seconds to render."
      Error.warn title, description
    end
  end

  def rescue_action(e)
    title = "Caught exception '#{e}'"
    description = "Exception was caught: '#{e}'\n\nBacktrace:\n#{e.backtrace.join("\n")}"
    Error.error title, description
    super
  end
end

With this addition, your new in-site error reporting mechanism will report any page that takes over half a second to load, or absolutely any exception that crops up. This includes failing to parse a Ruby file, actions or controllers that were browsed to that don't exist, or any random exception the user encounters. You may want to adjust the RENDER_LIMIT constant to the threshold of your liking. At some point I would like to add a separate timer for all database queries, but I haven't had the time or interest to do it yet... and besides, the render limit should encompass all queries anyways, so it would be slightly redundant (though I would like to have a much lower threshold of query time than render time).

And as a caveat, this will not report a failure to parse the application controller Ruby file... because... well... think about it and you should realize why it can't :-)

Go forth and practice Exception-Driven Development in Rails.