We recently underwent a round of hiring at work. Our first hire was a junior developer. Previously, for the almost 2 years where we were understaffed and had trouble hiring, we weren’t looking for junior developers. We reasoned that we could not afford the training cost because of how busy we were. Still, after well over a year of running the ship on a skeleton crew, those fears eventually wore away. Whatever the cost was, we had to pay it.
Bringing a new junior developer onboard is a huge investment for a team, especially one strapped for time. The time investment goes beyond just getting them used to the tools and the codebase (which in our case is fairly substantial). A junior developer by definition will not have a lot of experience in professional software development, maintaining production systems, etc. Teaching a junior developer all of the best practices you’ve learned over the years is a long journey. I for one have a hard time remembering what it was like to be in their shoes, which is necessary for setting expectations on how quickly they should be absorbing knowledge and how much I need to explain.
I’ve never had to train a developer who was junior to me so this is still a learning experience for me, but I have identified a few unexpected benefits that have occurred along the way.
Training a new developer involves a lot of talking. At work we have a dedicated pairing station for the new developer. I sit with a pair of monitors, keyboard, and mouse plugged into her machine and we pair program on problems. After starting this training, I’ve really come to realize how much I rely on jargon in day-to-day programming. I constantly catch myself saying things like “memoize that”, “delegate this”, and “let’s write a DSL for this”.
This jargon is extremely valuable if you are working with somebody that understands it. A big part of pair programming is staying on track. Even more imporant, you should realize when you are moving ahead of your pair and be able to quickly cast a line out with a carefully selected pattern and reel them back onto the boat.
When working with a junior developer, I’m suddenly forced to prove that I really know what these terms mean by reducing them to simpler terms that someone can understand without ever having experienced the need to use them. I even find myself seeking to justify our style guide in practical terms. Why do we prefer &&/|| and/or? Why should we prefer method access to instance variables rather than referencing them directly? I never had to verbalize method-to-proc in Ruby before. How could I explain it to someone who was on shaky ground with procs to begin with?
When I started at my company as an intern I was essentially put in a separate room and left to my own devices. While the independence was nice sometimes, it also lengthened how long it took me to get to proficiency. For a good solid month, I fell in love with the #returning method introduced by ActiveSupport, which later got reimagined as #tap. I used it all over the place. I convinced myself that getting rid of local variables in exchange for block parameters was a Good Thing. If just one senior developer would have challenged me to explain why I believed this, my enthusiasm for #tap would have been appropriately curtailed.
One of the nice things about training a junior developer is that you can see cargo culting and misguided habits start to form and hopefully set the dev back on the right track again. You may even identify and eliminate bad habits and magical thinking you’ve aquired over the years while doing so.
This one was really surprising to me. After you’ve been at a company for a while, you begin to develop a high pain tolerance. Many developers, most notably DHH, have discussed how pain can be used as a formative tool for your software. A developer should be tuned to what things in the code pain them and should use this to guide refactorings. James Edward Gray in his keynote at Ruby Midwest talked about how he has a higher tolerance for complexity and hacks than most and how much of a curse this was as it dulled his ability to feel pain from code (to paraphrase).
There are literally hundreds of areas in the code at work that just don’t make sense, require workarounds and slow development (or even worse testing). I could rattle off dozens of them at the top of my head. It is so easy to use this institutional knowledge as a crutch and never fix it. It doesn’t hurt and it lets me bitch about how wrong someone else before me got it. Why change it?
Then I found myself explaining this code to a new developer. I saw the excitement drain from her eyes as we spent nearly an hour trying to tiptoe around a litany of overreaching ActiveRecord callbacks just to write a fucking test for a reporting tool. She put up with it because she figured that was just the way it was. It was embarrassing.
There are no savings to be had from not addressing confusing, poorly-designed code when you have a new developer. This code lies to the developers that rely on it. New devs will spin their wheels, ask for help (hopefully), and might even get themselves stuck. Ultimately, they’ll become used to it. Their pain sensors will dull and you’ll be back to having a full team building new shinies on top of a rotting foundation. Some devs will burn out and leave. I’ve definitely felt close to that point several times.
Enough is enough. I’m making a conscious effort to heighten my sense of pain. If I come across something that’s been a thorn in my side for 3 years, I’m going to test it, refactor it, and unceremoniously defenestate the remainder. If that takes extra time then that’s how long it is going to take. I will stand my ground to anyone who would argue that we don’t have the time for refactoring. Software is maintenance. If I don’t improve maintainability then I’m not doing my job. If you honestly advocate taking on more technical debt in projects as large and impoverished as some of ours, you can no longer count yourself as having an interest in its long term success.
A little while back at work we needed to write a feature that would test some fairly complicated session exchanging logic. I dreaded the task of testing it because I was certain it was going to be painful. I’d probably have to monkey around in Capybara internals to mutate the cookie jar the way an attacker would. It turns out Capybara handles this quite elegantly.
In short, our problem is that all of our clients have their own domain without their own SSL cert. That’s fine, but when it comes to checkout and sensitive account actions, they needed to be transferred over to our secure domain, which will have a different, secure-only session cookie.
One of the attack vectors we focused on was an attacker sniffing out a token from when a user was browsing insecure pages on the site and parleying that token into the secure parts of the site to steal private information, check out, etc. An attacker would sniff that session, set it as their own in their own browser and then try to do nefarious stuff.
After doing it the hard way, we learned that Capybara comes with multiple session support out of the box. In any given test, Capybara can maintain multiple browser sessions.
A security test you’d write in cucumber may look like this:
Given I am browsing the site
And I add some stuff to my cart
When an attacker attempts to steal my cart
Then they should not have a cart
But I should still see my cart
And some of the steps could look like
When "an attacker attempts to steal my cart" do
stolen_session_id = Capybara.current_session.driver.request.cookies['_session_id']
Capybara.using_session(:attacker) do
page.driver.browser.set_cookie("_session_id=#{stolen_session_id}")
end
end
If you have lots of different steps featuring the attacker, you may even want to extract it to a method:
def as_attacker(&block)
Capybara.using_session(attacker, &block)
endJust a quick tip as I had to try a couple different ways to get it working.
First off, you’ll need to install the temporary package.
I needed to test parsing an actual file on disk. I still find some difficulty writing good tests that perform IO in Haskell, but this worked well enough.
Buster has a function to parse a YAML config file that looks like:
loadConfig :: FilePath -> IO (Either String Config)
loadConfig = decodeFileEither
decodeFileEither :: FromJSON a => FilePath -> IO (Either String a)
decodeFileEither fp = decodeHelper (Y.decodeFile fp) >>= either throwIO return
I’ve already thoroughly tested the pure parsing component but I was not completely certain of my implementation of the file one.
Here’s an excerpt from the test suite:
spec :: Spec
spec = describe "parsing from file" $
it "parses a full config successfully" $
withPreloadedFile fullConfigStr $ \path ->
loadConfig path `shouldReturn` Right fullConfig
where fullConfigStr = "....."
fullConfig = Config { ... }
withPreloadedFile :: ByteString -> (FilePath -> IO a) -> IO a
withPreloadedFile content action = withSystemTempFile filenameTemplate callback
where filenameTemplate = "buster_fixture.yml"
callback path handle = BS.hPut handle content >> hFlush handle >> action path
withSystemTempFile will open up a file in your system-specific temporary dir (such as /tmp) using the filename template you specify. That file basically gets treated like a prefix, so it may generate a file like buster_fixter.ymlAAA, or something close to it.
Take special note of the flush. You want to ensure the write gets flushed before the test runs or it will appear to be empty.

Hello brother
Buster is a utility for hitting a list of URLs periodically.
At work, we hit certain sites with a special URL parameter that causes the full page cache to be rebuilt every 5 minutes. Previously, this was done in a cron task. That’s fine but I felt this could be a good learning experience and would end up being more configurable. By the way, if you do want to use curl/cron to accomplish this task, be sure to redirect output to /dev/null or you’ll end up with a multi gigabyte spool file from cron!
Either clone from my GitHub or download it from HackageDB. It will install an executable called buster
Run buster config.yml
See the README and examples directory for what a config should look like. Buster supports optional automatic config file reloading or reloading via a HUP signal.
At this stage in my Haskell career, if I embark on a new projects, its because I want to learn some new things along the way. Buster itself isn’t that compelling of a utility, but the learning experience was.
Since I work in a Ruby shop, I thought it would be a good idea to have the configuration be in YAML. That way, coworkers could pretend they are using a normal Ruby executable without noticing Haskell’s slow infiltration into day to day operations.
The yaml package is interesting because it uses the mighty aeson parsing combinators. That was kind of weird but it made writing the config parser simple:
instance FromJSON Config where
parseJSON (Object v) = Config <$> v .:? "verbose" .!= False
<*> v .:? "monitor" .!= False
<*> v .: "urls"
<*> v .:? "log_file"
parseJSON _ = fail "Expecting Object"
instance FromJSON UrlConfig where
parseJSON (Object v) = UrlConfig <$> v .: "url"
<*> v .: "interval"
<*> v .:? "method" .!= "GET"
parseJSON _ = fail "Expecting Object"
hinotify is a pretty thin abstraction over linux’ inotify feature for monitoring files. I’d used it before in Ruby but I was pleased with being able to monitor the config without resorting to polling.
I started writing a section here and then realized there was too much in my head to confine it to a section in an introductory post. Keep an eye out for a separate post on exceptions in Haskell and why I hate trifling with them. Long story short: the errors library and the EitherT monad transformer helped a great deal. I tried my best to intercept exception-throwing interfaces (though who knows since the goddamn type doesn’t communicate exceptional behavior) at low levels and normalize them to an Either interface. I have decided that if there are going to be errors in my code, the compiler should not allow my program to compile without confronting them in some manner.
Keep an eye out for a small post about this. I wanted to be able to test the parsing of actual config files without having to worry about explicit cleanup. The temporary package is really nice for tasks like this.
I was recently given the task of parsing and processing a huge XML file of data. The format looks something like:
<release id="1">...<release>
<release id="2">...<release>
<release id="3">...<release>
After some experimentation, I decided that Nokgori’s XML::Reader. The standard Nokogiri::XML parser was out because the files are approximately 1GB in size. SAX was out because it requires you to basically build your own state machine and piece together the parsed nodes from events. I don’t know about you but the last thing I want to do is do a bunch of work to parse XML. XML::Reader can incrementally parse a stream and iterate through each parsed node, which is perfect.
The problem is that XML does not support multiple root nodes. If you run this through the standard Nokogiri parser, it will just parse the first node and stop. We need a way to pad the XML with an enclosing tag, say “releases”. But we also want to use an IO stream rather than a full string since the file is upwards of 1gb.
There is a great gem called filter_io that fits just the bill. filter_io wraps an IO object and gives you a stateful stream modifier. We’re interested in attaching 2 transformations to the stream:
Here’s the class I came up with to wrap the IO stream:
require 'filter_io'
class RootTagWrapper
attr_reader :root_tag_name
def initialize(root_tag_name)
@root_tag_name = root_tag_name
end
def wrap(io)
FilterIO.new(io) do |data, state|
if state.bof?
open_tag + data
elsif state.eof?
data + close_tag
else
data
end
end
end
private
def open_tag
"<#{root_tag_name}>"
end
def close_tag
"</#{root_tag_name}>"
end
end
Here is a usage example:
io = RootTagWrapper.new('releases').wrap(ARGF)
Nokogiri:XML(io).each do |node|
if node.name == 'release'
yield Nokogiri::XML.fragment(node.outer_xml)
end
end