I came across this site quite a while back and have been meaning to blog about it. The site is CodeKata, and the premise is essentially to give reasonably interesting problems that can be solved using code in order to practice and hone your skills. Not everything is a coding problem, but where I can see these coming in handy are using the katas to learn new languages. I always learn better when I have something to implement as I learn. Anyway, my original intention was to just link to the site, however it looks like the site hasn’t been updated in 4 years! It concerns me that it might not be around if I merely link to it, so I am going to copy the katas here for future reference. I hope Dave doesn’t mind.
Kata 11 – Sorting it Out
Just because we need to sort something doesn’t necessarily mean we need to use a conventional sorting algorithm.
We use sorting routines all the time; putting customer records in to name order, arranging orders by value (and even sorting the letters in a word back in KataSix). Most of the time we (wisely) use one of the sort routines built in to our language’s library (such as C’s qsort and Java’s java.Collections.sort). After all, very clever folks spent a lot of time getting these library routines tuned for speed and/or memory usage.
However, there are times when whipping up a sort of our own can outperform these generic routines. Our challenge this week is to implement a couple of different sorts. (However, at the risk of giving the game away, these sorts both have something in common).
Sorting Balls
In the Pragmatic Lottery (motto: There’s One Born Every Minute, and it Might Just Be You!), we select each week’s winning combination by drawing balls. There are sixty balls, numbered (not surprisingly, as we are programmers) 0 to 59. The balls are drawn by the personable, but somewhat distracted, Daisy Mae. As a result, some weeks five numbers are drawn, while other weeks seven, nine, or even fifteen balls make it to the winner’s rack. Regardless of the number of balls drawn, our viewers need to see the list of winning numbers in sorted order just as soon as possible. So, your challenge is to come up with some code that accepts each number as it is drawn and presents the sorted list of numbers so far. The tests might look something like:
rack = Rack.new
assert_equal([], rack.balls)
rack.add(20)
assert_equal([ 20 ], rack.balls)
rack.add(10)
assert_equal([ 10, 20 ], rack.balls)
rack.add(30)
assert_equal([ 10, 20, 30 ], rack.balls)
Sorting Characters
Our resident conspiracy expert, Dr. X, is looking for hidden messages in the collected publications of Hugh Hefner. Dr. X believes the message is hidden in the individual letters, so rather than get distracted by the words, he’s asked us to write a program to take a block of text and return the letters it contains, sorted. Given the text:
When not studying nuclear physics, Bambi likes to play
beach volleyball.
our program would return:
aaaaabbbbcccdeeeeeghhhiiiiklllllllmnnnnooopprsssstttuuvwyyyy
The program ignores punctuation, and maps upper case to lower case.
Are there any ways to perform this sort cheaply, and without using built-in libraries?
Kata 12 – Best Sellers
Consider the implementation of a top-ten best sellers list for a high volume web store.
A GedankenKata this week: no code needed (although writing short prototypes might help you come to a conclusion).
Say you’re writing code for an online site that sells things (something like Amazon). Your site is wildly popular, and you sell millions of items each day.
The marketing department wants the home page to display a top-ten list of the best selling items over the last 24 hours, with the list being updated each hour.
- How would you implement this?
- Are there any changes you could ask for to make the implementation easier?
- What would be the impact if they later came back and said:
- only update the list once per day; or
- we need the list updated in real time: each time the home page is displayed we need the list to reflect the 24 hours up until that point.
This kata might be deeper than it first appears. You might want to consider database vs. in-memory solutions, data structures that allow aging, time-space tradeoffs, and the like.
Kata 13 – Counting Code Lines
Counting lines of code in Java source is not quite as simple as it seems.
This week let’s write something vaguely useful: a utility that counts
the number of lines of actual code in a Java source file. For the purpose
of this exercise, a line is counted if it contains something other than
whitespace or text in a comment. Some simple examples:
- // This file contains 3 lines of code
1 public interface Dave {
- /**
- * count the number of lines in a file
- */
2 int countLines(File inFile); // not the real signature!
3 }
and…
- /*****
- * This is a test program with 5 lines of code
- * \/* no nesting allowed!
- //*****//***/// Slightly pathological comment ending...
-
1 public class Hello {
2 public static final void main(String [] args) { // gotta love Java
- // Say hello
3 System./*wait*/out./*for*/println/*it*/("Hello/*");
4 }
-
5 }
Remember that Java comments are either "//" to the end of line, or "/*" to the next "*/". The block comments do not nest. There may be multiple /*…*/ comments on a line. Whitespace includes tabs, spaces, carriage returns, and vertical tabs. Oh, and remember that comment start sequences that appear inside Java strings should be ignored.
Goals of the Kata
The mixture of line-based things (single line comments, blank lines, and so on) with the stream-based block comments can make solutions slightly ugly. While coding your solution, consider the structure of your code, and see how well it fits the structure of the problem. As with most of these kata, consider coding multiple alternative implementations. Does what you learned on the first tries affect your approach to subsequent ones?
Kata 14 – Tome Swift Under Milk Wood
Trigrams can be used to mutate text into new, surreal, forms. But what heuristics do we apply to get a reasonable result?
As a boy, one of my treats was go to the shops on a Saturday and spend part of my allowance on books; for a nine-year old, I had quite a collection of Tom Swift and Hardy Boys. Wouldn’t it be great to be able to create more and more of these classic books, to be able to generate a new Tom Swift adventure on demand?
OK, perhaps not. But that won’t stop us trying. I coded up a quick program to generate some swash-buckling scientific adventure on demand. It came up with:
- …it was in the wind that was what he thought was his companion. I think would be a good one and accordingly the ship their situation improved. Slowly so slowly that it beat the band! You’d think no one was a low voice. "Don’t take any of the elements and the inventors of the little Frenchman in the enclosed car or cabin completely fitted up in front of the gas in the house and wringing her hands. "I’m sure they’ll fall!"
- She looked up at them. He dug a mass of black vapor which it had refused to accept any. As for Mr. Swift as if it goes too high I’ll warn you and you can and swallow frequently. That will make the airship was shooting upward again and just before the raid wouldn’t have been instrumental in capturing the scoundrels right out of jail."
Stylistically, it’s Victor Appleton meets Dylan Thomas. Technically, it’s all done with trigrams.
Trigram analysis is very simple. Look at each set of three adjacent words in a document. Use the first two words of the set as a key, and remember the fact that the third word followed that key. Once you’ve finished, you know the list of individual words that can follow each two word sequence in the document. For example, given the input:
I wish I may I wish I might
You might generate:
"I wish" => ["I", "I"]
"wish I" => ["may", "might"]
"may I" => ["wish"]
"I may" => ["I"]
This says that the words "I wish" are twice followed by the word "I", the words "wish I" are followed once by "may" and once by "might" and so on.
To generate new text from this analysis, choose an arbitrary word pair as a starting point. Use these to look up a random next word (using the table above) and append this new word to the text so far. This now gives you a new word pair at the end of the text, so look up a potential next word based on these. Add this to the list, and so on. In the previous example, we could start with "I may". The only possible next word is "I", so now we have:
I may I
The last two words are "may I", so the next word is "wish". We then look up "I wish", and find our choice is constrained to another "I".
I may I wish I
Now we look up "wish I", and find we have a choice. Let’s choose "may".
I may I wish I may
Now we’re back where we started from, with "I may." Following the same sequence, but choosing "might" this time, we get:
I may I wish I may I wish I might
At this point we stop, as no sequence starts "I might."
Given a short input text, the algorithm isn’t too interesting. Feed it a book, however, and you give it more options, so the resulting output can be surprising.
For this kata, try implementing a trigram algorithm that generates a couple of hundred words of text using a book-sized file as input. Project Gutenberg is a good source of online books (Tom Swift and His Airship is here). Be warned that these files have DOS line endings (carriage return followed by newline).
Objectives
Kata’s are about trying something many times. In this one, what we’re experimenting with is not just the code, but the heuristics of processing the text. What do we do with punctuation? Paragraphs? Do we have to implement backtracking if we chose a next word that turns out to be a dead end?
Kata 15 – Playing With Bits
At my son's karate school, they spend most of their time doing various exercises, with the occasional round of sparring thrown in. But every now and then the teacher finds a way to break the routine by injecting some kind of game or surprise into the mix. This kata is one of those. It's nothing serious, and unlikely to have practical benefits (unless you're working in some fairly specialized areas).
Think of binary numbers: sequences of 0's and 1's. How many n-digit binary numbers are there that don't have two adjacent 1 bits? For example, for three-digit numbers, five of the possible eight combinations meet the criteria: 000, 001, 010, 011, 100, 101, 110, 111. What is the number for sequences of length 4, 5, 10, n?
Having worked out the pattern, there's a second part to the question: can you prove why that relationship exists?