The other day I was thinking about a young semi-student programmer that I know, and thought about presenting him a small set of “Teeth cutting” exercises. Small tasks that would serve a double purpose, help me evaluate his present aptitude for development tasks, and try and prepare him a wee bit for what his future education is likely to throw out. Unlike what seems to be the most common norm in college environments, I can also gently push in more, ahem, practical directions that what most students I’ve met have learned. I still have yet to find out if the number of stupid programmers on earth is due to the schooling or the students. Alas, that’s drifting off topic.

When I stopped thinking about the whole teeth cutting thing, I had done so because no ideas of what to use as a starting exercise had come to mind. Today while chatting, one did: a bare bones version of the UNIX tail program.

(06:30:27 PM) Spidey01: A first exercise:
 language: your choice
 description: implement a program called ‘tail’ that displays the last N lines of a file, where N is supplied by the user. It need not be a GUI, but can be if you wish.
 goals:
  A/ Minimise the scope your variables are accessible from.
  B/ Describe the procedure (algorithm) you came up with for finding the last N lines in the file.
  C/ Think and discuss, is there a way to improve on your algorithm?

Tail is complex enough that some C implementations are horrendously overcomplicated, yet simple enough that it is an easily completed without a gruelling mental challenge. Especially if the -n option is the only one you care about. The choice of A was chosen it’s a very common foul up among programmers, young and old a like.

I wrote a more complex program that that in C years ago as a learning process, that was more or less a fusion of the unix cat, head, and tail programs. Since the student in question was using Visual Basic .NET (oi), I opted to use C# so as to keep things at least, in the same runtime family. Here is a listing of the example code I wrote, the display here was done by feeding it into gvim and using :TOhtml to get syntax highlighted HTML to post here, than clipping a few things, hehe. The gvim theme is github.


  1 /**
  2  * comments having // style, are notes to young readers.
  3  *
  4  * CAVEATS:
  5  *  line numbers are represented by int, and thus have a size limit imposed by
  6  *  the 32-bit integer representation of the CLR.  Whether the users computer
  7  *  will run out of memory before that is irrelevant.
  8  *
  9  *  If there are less lines read than requested by the user, all lines are
 10  *  displayed without error message. I chose this because the error message
 11  *  would be more annoying than useful.
 12  */
 13 
 14 using System;
 15 using System.IO;
 16 using System.Collections.Generic;
 17 
 18 class Tail {
 19     enum ExitCode { // overkill
 20         Success=0,
 21         Failure=1,
 22         NotFound=127,
 23     }
 24 
 25     static void Main(string[] args) {
 26         if (args.Length != 2) {
 27             usage();
 28         }
 29 
 30         using (var s = new StreamReader(args[1])) {
 31             try {
 32                 var n = Convert.ToInt32(args[0]);
 33                 foreach (var line in tail(n, s)) {
 34                     Console.WriteLine(line);
 35                 }
 36             } catch (FormatException) {
 37                 die(ExitCode.Failure,args[0] + ” is not a usable line number”);
 38             } catch (OverflowException) {
 39                 die(ExitCode.Failure, args[0] + ” to big a number!”);
 40             }
 41         }
 42     }
 43 
 44     static void usage() {
 45             Console.WriteLine(“usage: tail.exe number file”);
 46             Console.WriteLine(“number = number of lines to display from “
 47                               +“end of file”);
 48             Console.WriteLine(“file = file to read from tail”);
 49             Environment.Exit((int)ExitCode.Success);
 50     }
 51 
 52     // Instead of doing the display work itself, returns a sequence of lines
 53     // to be displayed. This means this function could be easily used to fill
 54     // in a textbox in a GUI.
 55     //
 56     // It could also take a delegate object to do the display work, thus
 57     // improving runtime performance but that would be less flexible.  In this
 58     // particular programs case, just doing Console.WriteLine() itself would
 59     // be OK. See the foreach loop over tail() up in Main() for reference.
 60     //
 61     // This method also sucks up memory like a filthy whore because it stores
 62     // the whole file in memory as a IList<T>.  That’s fine for a quick and
 63     // dirty protype. In real life, this should use a string[] array of length
 64     // ‘n’ and only store that many lines. That way it could handle files 5
 65     // billion lines long just as efficently as files 5 lines long.
 66     //
 67     // I chose not to make that change in this example, in order to make the
 68     // code as simple to read as possible.
 69     //
 70     // Incremental development +  code review = good idea.
 71     //
 72     static IEnumerable<string> tail(int n, TextReader s) {
 73         string line;
 74         var list = new List<string>();
 75 
 76         try {
 77             while ((line = s.ReadLine()) != null) {
 78                 list.Add(line);
 79             }
 80         } catch (OutOfMemoryException) {
 81             die(ExitCode.Failure, “out of memory”);
 82         } catch (IOException) {
 83             die(ExitCode.Failure, “error reading from file”);
 84         }
 85 
 86         if (n > list.Count) {  // a smart bounds check!
 87             n = list.Count;
 88         }
 89 
 90         // implecations of a GetRange() using a shallow copy rather than a
 91         // deep copy, are left as an exercise to the reader.
 92         return list.GetRange(list.Count – n, n);
 93     }
 94 
 95     static void die(ExitCode e, string message) {
 96         Console.Error.WriteLine(“tail.exe: “ + message);
 97         Environment.Exit((int)e);
 98     }
 99 }
100 

This is a backtrace of the development process involved.
The program started simple: with the Main() method in Tail. The first thing I did was a simple check to see if args.Length was == 0, and exiting with a usage message. Then I remembered while writing the Console.WriteLine() calls for the usage message, that I really wanted (exactly) two arguments. That’s how the test became what’s written above in the code listing. A couple minutes later I moved the usage message code from inside that if statement to a usage() method. I did that to keep the Main() method more concise: unless you have reasons to groan over function call overhead, doing that is often a good idea. (Up to a point that is.)
From the get go, I knew I wanted the meat and potatoes to be in a method named tail() instead of Main(). For the same reasons that I created usage(). So that because a short using statement over a new StreamReader object.
First up was converting the string arg[0] from a string representation of a number, to an integeral representation of a number. At first I used uint (Unsigned Integer, 32-bit length) but later decided to make it plane int (Signed Integer, 32-bit length) because that’s what subscripts into a collection are defined in terms of. I don’t care if the user wants to display more than ~2,147,483,647 lines, it’s only an example program, damn it! Because tail() shouldn’t give a fuck about converting the programs argument vector to a type it can use (which obviously needs to be numeric), the caller (Main()) does the conversion. First tried args[0].ToUInt32() and when that compiled with errors, I hit Google. That gave me the System.Convert documentation on MSDN, from which it was easy to find the proper method. Because MSDN lists what exceptions System.Convert.ToInt32 can throw, and I know from experience that testing for such things is necessary ^_^, I quickly stubbed out catch clauses for FormatException and OverflowException. I wrote a simple set of messages to the standard output stream and an exit for each one. Than I converted them to using the standard error stream and wrote an enum called ErrorCodes, complete with casts to int when needed.

It was about this time, that I decided that implementing a simple method like Perls die() or BSDs err() would be convenient. Thus I implemented die() and replaced the repetitive error code. Functions are almost like a reusable template in that way. Then I decided that ExitCode was a better than for the enumeration than ErrorCodes, since it was being used more generally as an exit status (code) than an error report; unlike Microsoft I do not consider Success to be an error code ;). That was a simple global search and replace, or :%s/ErrorCodes/ExitCode/g in vim. Followed by a quick write (save) and recompile to test. Job done.

While I was at it, I also had an intentional bug encoded into the exception handlers for Convert, originally n variable was in a higher scope than the Convert (the using instead of try block). The error message for handling FormatException, used n.ToString() and the one for OverflowException used args[0]. The bug here was a subtle food for thought: one displays the result of the conversion, which might not match what the user supplied -> thus confusing the user. The other displayed what the user entered, which might not be what the program had tried to used. That also pushes an interesting thought on your stack, since the same data is used by both die()’s why do we have to write and maintain it twice? Alas, I realised the n variable was in too wide a scope and thus made that mind-play a moot point (by removing n from the scope of the catch statements). If you recall: using minimal scope for variables was actually the intent of the exercise, not error handling and code reuse.

Next I focused on implementing tail(). At first it was a simple. Just take a number and a StreamReader, and do a little loop over reading lines—for a quick test. When I checked the documentation on MSDN, I noticed that StreamReader was an implementation rather than a base class for TextReader. I always find that weird, but that’s outside the scope of this journal entry. Thus I made the using statement in Main() create a StreamReader and pass it to tail(), now taking a TextReader. Originally it also had a void return type, and simply printed out its data. I did that to make testing easier. The comments above make a sufficient explanation of why IEnumerable is used, and what I’ve already written about StreamReader/TextReader may suggest why it doesn’t return a straight string[] (e.g. array of strings).

The heart of it of course, is just feeding lines from a file into a generic List of strings. Since the exceptional possibilities are more straightforward, I wrote the catch blogs first. After that it is merely the question of extracting the correct lines from the tail end of the list. That’s a simple one to one (1:1) abstraction to how you might do it manually. I believe simple is the best way to make a prototype. Since the student in question was joking about how his implementation would likely crash if the line numbers were out of whack from what’s really in the file, I was sure to include a simple check. If the # of lines requested is greater than what really is there, just scale down. Volia. The comments at the top of the listing above, show why there is no error message displayed.

Extracting the items was a bit more of a question, my first implementation was a simple C-style for loop over the list using Console.WriteLine(). In the conversion to returning the data to be displaced, in which the tail() call in Main() became the above foreach loop. I added the comment about GetRange() more so as food for thought (from a code reuse and optimizational perspective). The math needed to extract the correct range of lines is trivial.

I then took a few moments to look at things over, doing a sort of code review. A few things were rearranged for clarity. I also introduced a bug, breaking the specification goals. If you look close enough at tail(), you will see that the variable line is only used inside the try block, yet it is declared at method scope. The #1 goal of the exercise was to avoid such things, hehe. I also thought about adjusting things to use an n sized cache of lines, rather than slurping the entire file in memory but decided against it. To keep the code easier to read, since the target-reader knows neather C# nor a lot of programming stuff, I just left comments noting that pro and contra of the matter.

Some people might find the method naming style odd for C#, but it’s one that I’ve come to like, thanks to Go and C#. I.e. publicly exposed functions get NamesLikeThis and those that ain’t, get namesLikeThis. Although personally I prefer C style names_like_this, aesthetically speaking.

The test file I used during the run was this:

line one
line two
line three
line four
line five

and most tests were done using various adjustments on:

terry@dixie$ gmcs tail.cs && mono tail.cs 2 test.txt

After sending the files over, I also whipped up a Visual Studio solution in MonoDevelop, and than realised that I left a rather unprofessional bug. If the filename in args[1] didn’t exist, the program would crash. That was easily fixed on the fly.

Overall the program took about an hour and a half to write. For such a simple program, that’s actually kind of a scar on my pride lol. But hey, I’ve barely written any code this month and I had to look up most of the system library calls in MSDN as I went along, I also tried to make it more polished than your typical example code. Which usually smells.

I can also think of a few ways to incrementally adopt that first exercise, into several other exercises. That might be useful.

I’ve been experimenting with window managers lately: fluxbox, openbox, awesome, and musca. Fluxbox and openbox, are pretty much just generic window managers, at least in my eyes. That said, they are well worth using, for most peopel. Awesome and Musca are tiling window managers, and a lot more, eh, minimalist. While I used to collect window managers, among quite a few other odds and ends back when I had time for it: but I have never done the “Tiling thing” beyond a very brief test drive of dwm.

Awesome and Musca create an interesting experience: you create the windows, the window manager, manages them. It’s almost alien lol. Normally you create a window, the window manager figures out where to draw it. You do the rest, e.g. by resizing and moving it around as necessary. In these tiling window managers however, newly created windows are automatically arranged and sized by dividing containers.

Launching your first window is like maximising, lauching a second window causes everything to resize and give each program half the screen, and so on based on some tiling pattern. The most used seems to be 1 half sized window left and 2 quarter sized windows right; works better than you might think. Rather than resizing the windows individual, you resize the containers. So if the screen is laid out as:



|         | term |
| Firefox |------|
|         | chat |


>

Selecting either the term or chat window and attempting to resize will resize all three windows. Try to enlarge the chat window horizontally, and Firefox will shrink and term grow, horizontally. Try to shrink the term window vertically and the chat window grows vertically, and so on.

It’s mind blowingly better than what the style of window management people are used to these days, which dates back to like Mac OS 2 or Mac OS 3 back in the ’80s. It is also a little bit awkward to let the computer take care of something, that you’ve been doing by hand for almost twenty years!

Relishing the experience however, has made me think of something different. I was just experimenting with the MinOverlapPlacementPenalties and MinOverlapPercentPlacementPenalties settings in FVWM, and it hit me. What if you could dynamically define what windows are important? I.e. what screen space should have more “Don’t cover this up unless necessary”, and how big a frame (i.e. for auto-tiling) should be, and so on?

It is technically possible, if perhaps computationally ‘interesting’ to figure out at the machine level. The windows that spend the most time focused or are most often gaining the focus, would be prime candidates. If the user ‘uses’ the window more than others, give it a larger chunk of available space scaled to its idea of how much space it needs, then prefer minimising the percentage of those windows being covered over or shrunken to absorb other windows in the same screen space.

It is food for thought!

Hmm I must admit that custom configuring a Linux kernel, seems to offer three possibilities:

  • Lean, mean, and sexy kernel build
  • More modules than you can shake a stick at
  • Major headaches

I’m tempted to configure for a balance between the first and second, it is an interesting idea though. If I tuned a kernel build for my very specific system, it would strip out most of the usual bloat. The downside is there are so many configuration options, that making the config might take longer than compiling Linux!

A reminder of why I don’t miss Ruby

About two minutes if thumbing around the official documentation for Ruby’s standard library, I quickly realised it would be faster to Google how to use the module for MD5/SHA* stuff than read the official stuff:



    def encrypt_password
      # This is just a place holder implementation for now
      require ‘digest’
      self.password = Digest::SHA512.hexdigest(password)
    end
Compare the documentation for Ruby’s Digest to Pythons’ hashlib. Even an uncommented C++ header file would be more useful than the docs for Ruby’s Digest module :-S.

Programming: Things We Should Take For Granted

Since an anniversary is coming up, and I’ve accumulated a nice chunk of experience in the years since I started learning how to program—I’ve been reflecting on what languages have thought me, about getting stuff done. This is also, obviously a rant.

That’s the cardinal rule. I don’t give a shit if it has more fan boys than Apple, or if it’s the most brilliant idea since they blamed the invention of fire on Prometheus. It has to be useful for getting stuff done!

Here’s some nick nacks that you should expect from programming languages since the 1970s, oh how little we have moved forward…







Serious Scoping

The ability to define modular scopes. In most cases this simply has to do with managing namespaces, either via built in language constructs (C++, PHP5.3, C#) or the natural order of modules in general (C, Java, Python), but really it’s trivial. It kind of has to do with building out of the whole ‘scope’ concept in general.  In other wards, we could say that a namespace is a kind of scope. Looking at the Wikipedia page, it seems someone agrees with that notion. Using this lingual functionality, you essentially have object oriented programming at your disposal: sans the the ‘tors hidden behind the curtain. Some languages even thrown those in, making OOP a real snap. While the inheritance model is popular and the less well known prototyping model is interesting, they are not required for achieving the core aims of object oriented programming; at least not the ones touted ;).

Anonymous Executable Code Blocks

Basically anonymous functions. This means we can write higher order functions, in other words functions can take and return functions. We can even do part of this in C through the use of function pointers (and C++ also has function objects, which are nicer if more verbose). But we can’t define those functions on the fly, right there and then, making things harder. C++ and Java will at long fucking last be gaining a solution to this, if we live long enough to see it happen. Most languages worth learning have made doing all this pretty easy.

Abuse of Lexical Scoping

This usually means lexical closures, which in tern makes the aforementioned anonymous functions a prerequisite for happiness in normal cases. They could also cook up something different but equivalent to that for a change, but alas don’t count on that! So the anon’ func’ thing is where it’s at for now. If the language lacks closures, it’s obviously a fucking moron, or it had better be old. C at least has that excuse. What about you Java? The road to 7 is a sick joke.  Anyway as I was saying, if you have both serious scoping and a way to define executable blocks without a name, you ought to be able to treat it just like anything else that has a value. Here’s an example in pseudo code:

var x = function {
  var y = 0
  return function { y = y+1; return y }
}

test1 = x()
test2 = x()

do { test1(); test2() } for 0 to 5

You can expect either of two things to happen here:

  1. Every function returned by x will increment the instance of y stored in x.
  2. Every function returned by x will get it’s own private copy of y.

The latter is the norm. I rather think the former is better, if you also provide means of doing the latter, syntactically; this makes the closing over of data more explicit. In terms of how it would look, think about the static and new keywords in many staticly typed OO languages and copy constructors, then you’ll get some sort of idea. I blame my point of view on this on spending to much time with Python. Ditto for my thoughts on memory allocation, courtesy of C.

While we’re at it, if the vaguely EcmaScript like pseudo code above was a real language: you should expect that function x {} and test1(), test()2 for … to be sugar for the relevant snippets out of the above, or else you ought to break out the tar and feathers! I wrote them as I did to make it painfully obvious what’s happening to people still struggling at var x = function {}. Kudios if you figured out why I wrote do { } for instead of for {}. If you’re laughing after reading this, after reading the above paragraph, yippe :-). If you’re now scrolling back up, then sorry.

Just shuddup & build it!

Which is so sorely missed in most compiled languages. There should be no need to define the relationship between modules or how to build it, it should be inferable from the source code. So should the type of most things, if the Lisp, SML, C#, and C++0x people are anyone to listen to. Building a cross platform C or C++ app is a bitch. Java is gay. Most dynamic languages are fine, so long as you keep everything in that language and have no C/C++ extensions to build. The closest that is possible to this “Just shuddup & build it” concept in most main stream languages, depends on dedicated build infrastructures built on top of make tools (FreeBSD, Go, etc) or IDEs. Either case is a crock of shit until it comes as part of the language, not a language implementation or a home brewed system. Build tools like CMake and SCons can kiss my ass. It’s basically a no win situation. JVM/CLI-languages seem to take a stab at things but still fail flat; Go manages better.

Dependency management also complicates things, because building your dependencies and using them becomes part of building your project. Most dynamic languages have gown some method of doing this, compiled ones are still acting like it’s the 70s. For what it’s worth, Google’s goinstall is the least painful solution I’ve encountered. Ruby’s gems would come in second place if it wasn’t for Microsoft Windows and C/C++ issues cropping in here and there. Python eggs and Perl modules are related.

Here’s an example of brain damage:

using Foo;

class X {
  public test() {
    var x = new Foo.Bar();
    // do something to x
  }
}

Which should be enough to tell the compiler how to use Foo. Not just that you can type new Bar() instead of new Foo.Bar(), should you wish to do so. Compiling the above would look something like:

dot-net> csc -r:Foo.dll /r:SomeDependencyOfFoo.dll
mono$ mcs -r:Foo.dll -r:SomeDependencyOfFoo.dll

Which is a lazy crock of shit excuse for not getting creative with C#’s syntax (See also the /r:alias=file syntax in Microsoft’s compiler documentation for an interesting idea). The only thing that a using statement lacks for being able to tell the compiler, is what file is being referenced, i.e. where to find the Foo namespace at run time. Some languages like Java (and many .NET developers seem to agree) impose a convention about organising namespaces and file systems. It’s one way, but I don’t care for it.

What is so fucking hard about this:

using "Foo-2.1" as Foo;

in order to say that Foo is found in Foo-2.1.dll instead of Foo.dll. If that sounds like weird syntax, just look closer at P/Invoke, how Java does packages, and Pythons import as syntax.

So obviously we can use the languages syntax to provide whatever linking info should be needed at compile time; figuring out where to find the needed files at compile and run time is left as an exercise for the reader (and easily thunk about if you know a few language implementations).

In short, beyond exporting an environment variable saying where extra to look for dependencies at compile/runtime, we should not have to be arsed about any of that bull shit. If you’re smart, you will have noted most of what I just talked about has to do with the compiling and linking against dependencies, not building them. Good point. If we’re talking about building things without a hassle, we also have to talk about building dependencies.

Well guess what folks, if you can build a program without having to bitch fuck about with linking to your libraries, you shouldn’t have to worry about building them either. That’s the sense behind the above. What’s so hard about issuing multiple commands (one per dependency and then again for your program) or doing it recursively. Simple. But no, just about every programming language has to suck at this. Some language implementations (Visual C++) make a respectable crack at it, so long as you rely on their system. Being outside the language standards, such things can suck my rebel dick before I’ll consider them in this context.

Let’s take a fairly typical language as an example. C# and Java for starters, and to the lesser extent of C and C++ as far as their standards allow, we can infer a few things by looking at source code. Simply put if the main method required for a program is not there, obviously the bundle of files refer to a library or libraries. If it is there, you’re building a program. Bingo, we have a weener! Now if we’re making life easier, you might ask about a case where the code is laid out in the file system in a way that makes that all harder to figure out. Well guess what, that means you laid it out wrong, or you shouldn’t be caring about the static versus dynamic linking stuff. D’uh.

Run down of the Main Stream

Just about every language has had serious scoping built in from day one, or has grown it over the years, like FORTRAN and COBOL. Some languages get a bit better at it, C# for example is very good at it. C++ and Python get the job done but are a bit faulty, since you can circumvent the scoping in certain cases; although I reckon that’s possible in any language that can compile down to a lingual level below such scoping concepts. Some might also wonder why I had listed PHP 5.3 earlier when talking about “Serious scoping”, well 5.3 basically fixes the main faults PHP had, the only problem is in common practice PHP is more often written like an unstructured BASIC. Die idiots Die. Languages like Java and Ruby basically run it fairly typical. By contrast Perl mostly puts it in your hands by abusing lexical scope. I love Perl.

In terms of anonymous functions, Perl, Lisp, and SML are excellent at it and modern C# manages quite nicely (if you don’t mind the type system). Where as C, C++, and Java are just fucking retards on this subject. Python and Ruby also make things someone icky: functions are not first class objects in Ruby, so you have to deal with Proc to have much fun; like wise Pythons lambda’s are limited, so you usually have to result to Pascalesque scope abuse to manage something useful, in the way of nesting things in the scope. Lisp is queen here, Perl and JavaScript are also very sexy beasts when getting into anonymous functions.

In terms of lexical closures across languages, I’ll just leave it to the text books.

As far as being able to shout “Just shuddup & build it!”, they all suck!!! Most of the build related stuff is not iron clad defined by the C99/C++03 standards documents, and you are a fucking moron if you expect the entire world to use any given tool X to build their C/C++ software o/. That rules out things like Visual Studio (vcbuild/msbuild), Make, and all the wide world of build tools you can think of. The most common dynamic languages in the main stream are better. In order of least to most suckyness Perl, Ruby, and Python make the process less painful; my reason for rating them thus is because of extension modules. PHP and Java, I’m not even going to rate. They provide tools to help you build and distribute your extension modules and your native modules. The only gripes come from the principal language implementations being implemented in C. For pure modules (E.g. no C extensions), they are excellent! The least painful language that I’ve gotten to use, has been Go – which has the best thing since CPAN. Which is still harder than it should be o/.

The question I wonder, is if it took until like the 2000s to get closures going strong after the serious scoping stuff took off by the late 1960s/early 1970s; will I have to survive until the 2030s-2050s to be able to bask in the wake of just being able to build stuff in peace? Most likely old friend C will still be there, but other languages should reach main stream in another 30+ years… not just the ones we have now. That, or we could go back to lisp…. hehehe.

Fun with linenoise

Well, it’s been a fairly productive night; I’ve managed to conclude the days research, take care of a few odds and ends, plus wrap up a few changes to linenoise. It’s essentially a micro-library for giving console programs capabilities for line editing,like GNU readline—sans the migraine headache.

On/off I’ve been tinkering with it since mid August. The linenoise API has almost everything you could want except for a few minor things, and it is insanely easy to use. As to what’s lacking: some common keybindings, direct access to the loaded history, and a completion hook. None of which is hard to add. Since it’s so damn nicer to use than readline, but lacking a few keybindings that I’ve come to rely on; I’ve been adding them to my own fork on github. Pull requests periodically sent to the author of course. It will likely be my first choice whenever requiring line editing support in one of my applications.

Two things that interest me also, is a Windows port and implementing completion. The former is a bit of work, where as the latter is only an intellectual issue: what keystroke to use. Using tab is an issue, in that one has to be able to figure out when to insert a tab and when to trigger completion. Something more like the (real) korn shells “escape escape” provides the simplest means.

Linenoise is just one of the many things on the back burner that I would like to bring closer to the front flame.

To tired to focus on code at the moment, yet much to awake to sleep o/. Been tinkering with an old project: tpsh. Mostly I’ve been polishing the codebase and doing a bit of refactoring; there’s no a debug mode, which reduces some of the cruft that’s crept in from testing. Sometime I also need to import the test scripts lol.

One of the changes, is adding support for running off the Strawberry Perl distribution rather than ActiveStates Perl distrio for Windows. The downside is, most of tpsh biggest problems have to do with Perls portability, namely Windows quirks. Gotta love’em.

Since the merge of the “Code generator”, tpsh has supported a very limited subset of sh script. Not very usefully however, since the shell doesn’t have a real concept of $? yet. Likewise there needs to be some changes in the handling of environment variables. Most ideal IMHO, is a tied hash wrapping $ENV (a magicly tied hash of environment variables) with the ability to hook reads and writes, etc. Preferably in a user-exposabe way that can be extended by shell functions, rather than being limited to the scripts own Perl code.

It’s the more general case however, of having to build up the language rather than use its building blocks. The problemo isn’t in maintainability, but in portability and development time.

Having reactored a chunk of the shells initialization code,  I know that I’ll have to retackle the readline stuff.  To say that I hate Term::ReadLine would be an understatement. I’ve no love for the GNU Readline API either (I would use linenoise), but the GRL C API isn’t as nasty in practice as the plugable Term::ReadLine system Perl uses. The only good thing I can say is that there is a semi-useless stub version included. Licensing issues of just having a GRL backend aside >_>. In my experience, mileage varies quite a lot between Perl readline pacakges, even within the scope of the fscking manual. It’s just a load of crap.

Actually if there’s a way of accessing the API  functions needed for unix termios / windows console, I would be tempted to throw out the fucking thing and just write a Term::ReadLine::SANE module based on linenoise (A C library). I’m more familiar with the unix parts of that.

The STAMAN Project: Phase IV,

Having thought of tasks and storage formats, it’s now time to figure out an implementation language, i.e. what programming langauge am I going to write the task manager in.

O.K. based on what we’ve got so far, it is easy to infer the following is worth having:

  • Portable between systems—a must for me 😉
  • Easy access to SQLite—usually trivial.
  • Better tools than gmtime().
That means this page is rather useful, for what languages can be ruled out. In my répertoire this means Go (aww), Scheme, AWK, and shell languages can be skipped. Reason being the portability of Scheme bindings in general, and the others lacking sufficient portability (for my taste) at this time. That still leaves about 13 languages, lol. PHP, Java, Lua, JavaScript, and X86 assembly are easy for me to rule out. Reasons for that can all be easily guessed; at least if you remember how much I enjoy Suns Java tools. JS/Lua are great choices but I don’t want to screw with the bindings and stuff.

I’m not very interested in compiling SQLite in C/C++ on Windows, or the CLI binding everywhere. So this effectively makes the choice Perl, Python, or Ruby. Out of those three, none is perfect either: perl doesn’t come with the required database code, it just has the definitive interface for databases everybody mimics. Python and Ruby on the other hand, come with SQLite bindings—which many distributions separate out into separate packages. It’s just a lose, lose situation when you think about dependencies, but it does beat writing your own everything for every program. Sometimes. Setup with these three dynamic languages would be easy though, in so far as we’ve gotten with the above.

Time handling is another issue. Perl has fairly minimalist handling of time built in, but on the upside, if you need it, it’s probably three abreast on CPAN. Time::Format and the core Time::Piece module each come to mind. What isn’t built into Perl, often comes with it or can be added to it. Ruby provides a simple but effective Time class, that makes for more natural code than you may expect. More complex operations will require Googling for a Ruby gem, or hand coding it on demand. Python on the other hand provides a comprehensive datetime module, and supporting time and calendar modules, all out of the box! I would say Python takes the lead here.

Rule one of getting work done: know how to leverage libraries.

In terms of programming languages, Perl, Ruby, and Python are generally equal enough for most tasks, so long as you don’t shoot yourself in the foot. Some subtle differences that personally irk me:

  • Perls autovivification can be almost as much a miss-feature as it can be a convenience. You’ve just got to learn the damn language :-P.
  • Ruby functions are not first class objects! Some things can also be weird if you’re not used to Ruby.
  • Python doesn’t always stand up well to typos, especially if they involve indentation o/.

Because of how many lines of code I’ve done in Python over the years, I am more familiar with it’s set of “Irks” than Ruby’s, like wise I know Sh, C, and Perl more intimately than other languages, so I really know their irks. For perl, it’s mostly thin things that get in inconvenient when combing the warnings pragmata with the nature of perl syntax. They spiritually conflict at times. Under Ruby, I mostly find gripes that have a bigger place in programmer culture. My issues with Python generally have to do with trade-offs that I disagree with as a matter of my convenience, even if it usually results in a Good Thing overall. It comes from a C-oriented background meshed with a love for the Perl programming language.

This is a fact: you will always be irked by your programming language, if you use it enough. What can I say, nothing is perfect. Shoot!

For this particular application, there’s some things worth noting also: language portability. If the machine doesn’t run perl, it’s not a real computer. Most systems you’re likely to care about will run Ruby and Python, and there’s probably a crusty old version of Python for those you don’t (nor directly should). In contrast however, Perl is often a lower level of “Cross platform” behaviour than Ruby/Python. You’ll find this highlighted well in the Camel book. One reason I use Python frequently, it always behaves as expected without so many subtle hiccups.

How much this pertains to the current matter, i.e. implementing STAMAN. Perl is the most universally available language, and I’m more prone to need such a feature than most people. A plus over Ruby is no crappy 1.8.x/1.9.x porting issues…! Of course however, I have a camel to ask about minor details, hehe. In my experience the Python 2k/3k thing is less issue than Ruby’s for writing code yourself, more of an issue in leveraging existing code.

So I reckon, that means Perl or Ruby is best called for here. I exclude Python, because I just use the frick’n thing to often.

The STAMAN Project: Phase III, of tasks and storage formats

At least, for me, there are only two pieces to STAMAN that are not trivial to work out before writing the code: choosing the storage format and implementation language.  Both also happen to be areas where experience strongly augments ones intrusion, more so than the rest of the app’.

In the design outline, I noted that YAML would work quite nicely, yet an exposition of the outline suggests that something closer to SQL could better serve the applications design. The reasons behind it should be fairly obvious, if you’ve ever worked with textual data before.

During Phase I, I concentrated on the data involved with task management. It’s not hard to implement an SQL schema capable of representing that. Even better, most dialects offer useful features for handling times/dates. Virtually every programming language has a way of interfacing with such an SQL database, either through natural bindings or calling out to scriptable client programs. SQLite, MySQL, and PostgreSQL in fact provide both means, I’m not familiar with MSSQL. So that’s a big set of pluses all the way around. We even get a reusable DSL to help without having to write it!

The problem however, becomes one of migration paths: what happens if you need to change the data structures, perhaps heavily? That means having a lot more work whenever restructuring is needed, and it’s IMHO, less scriptable than a little perl golf: sufficiently so that I’m not going to screw with it. Insert shameless plug for Ruby on Rails here ;).

In a commercial environment; i.e. oriented on making money off the program, XML would be more likely than any other textual format, but not very convenient for me. I also hate XML parsing with a passion. It is however sufficient for getting the job done, if a bit, ahem, jacking the amount of internal documentation you need to write (or later wish you had) several notches higher than it need be.

Someone might think of a simple Comma Separate Value (CSV) format, but CSV is any thing but simple. Don’t believe me? Just think about data that may contain commas. That being said, the only good things I can say about CSV, from a programming perspective, is CPAN rocks. Unless you’re munging address books or spreadsheet data around, and need a LCD: it is best to avoid CSV, period.

The best bet, in terms of structured text: but one sufficiently able to represent the data set, and be easily edited by hand. What is really needed is a dedicated format: enter YAML. It’s basically a hierarchial way of recording data as sequences of elements and key/value mappings. Works excellently.

The SQL solution relinquishes fine control over the operations, where as the YAML method is assured to slurp up memory in proportion to the input. It’s a lot more like DOM oriented XML, only the translation between the code and textural representation is a hell of a lot more natural. When working with program generated output, it also doesn’t need to be fed through a pretty printer to be comprehensible, which can’t be said of XML—without more pain for someone.

Pro YAML:

  • Easily edited by hand (notepad) and many unix tools.
  • So simple you can skip reading the spec0
  • If you have to write your own parser, make it YAML and save grey hairs.
  • It’s easy to serialize/marshal data around, as easy as it gets without eval().
  • More likely to benefit from compression.
Pro SQL:
  • Less imperative-style code to be written.
  • The hardest processing code is already in the database engine.
  • Can focus on querying data, not parsing it.
  • Languages/frameworks are more likely to ship SQLite bindings then a YAML parser.
Con’ YAML: 
  • It really is as simple as it looks.
  • You have to write your own list/dictionary handling code.
  • Scales less.
Con’ SQL:
  • You have to learn basic SQL.
  • Not the most fun in some languages (C, C++, Java, and C#).
  • Can’t really get at the data, short of a database client.
Note that I haven’t said anything about separating the data store from the client application: using an SQL server is just as viable as storing YAML files on a network drive. It really is that simple.
My personal view? SQLs virtues likely outweigh YAMLs here—unless you’re going to be designing by exploration. I’m not in this case, and I am also competent enough not to shoot myself in the foot. If I was smart, I would make the application wide interface to the data store more abstract than writing SQL queries all over the place like an asshole. Yes, I can be that smart. Don’t tell your neighbours.
0: I read the YAML specification the first time I used it for a project, which was for a rake based built system. How else could I expect to hand write my build spec’s in YAML? :-).

Somehow, I’m really not sure what is worse: the curse of experience or a gringo’s rush.

Concept: tried Quassel IRC, didn’t like it – good software but not my bag. Switched to ircII – love the interface, don’t want to screw with hacking it. IRC clients are simple creatures but tend to be crap. While I could live with (or suitably script) ircII to my hearts content, I also want a more Windows usable client too.

Problem: When it comes to programming languages and what I want (something very ircII like, yet rather lisp like in a way). I can see all the pluses and minuses of any given implementation. If I was a nub, I would just pick a language, rush into it, and try and dig myself out.

Knowing so much can sometimes be a real let down o/.

For those that don’t know it, ircII is a very old school IRC client, even by the best CLI-whorish standards.

The typical IRC client is arranged as a text display area, for the current channel; a line edit for your messages; modern ones include a panel to list names in channel and some “Tab” like interface for marking the channels you’re chatting in. Text mode IRC clients work this way too.

ircII on the other hand, routes everything into a central display area and places a line edit under a “Status line”. Rather than jocking between tabs to see what’s up in other channels—which is very wasteful, even when using keystrokes: in ircII you simply use a command to change your current channel. Exempli gratia:

Typical:

  1. Click #chan1 tab
  2. Read what’s going on
  3. Reply if desired
  4. Change back to #chan0
Becomes:
  1. See what’s going on both in #chan0 and #chan1
  2. Use /j #chan1 to make your subsequent messages go to #chan1 instead of #chan0 until the next /j[oin] command.
It’s just more convenient for me than the ‘modern’ user interface. I like efficiency.

In terms of implementing something like this portably (unix/win), the issue is simply line editing. That’s not a subject I enjoy. Having worked on a unix shell, I know it’s a bitch of a subject. Colour support is another, but minor one. Cmd.exe doesn’t understand what a DEC does.

I also want something dynamically reprogrammable on the fly, basically access to a REPL. O.K. so lisp spoils you. This makes dynamic languages more convenient; which is also it’s own can of worms.

That’s the fact of Programming, it’s all a Kobayashi Maru problem: you’ve just got to deal with it.