The STAMAN Project: Phase IV,

Having thought of tasks and storage formats, it’s now time to figure out an implementation language, i.e. what programming langauge am I going to write the task manager in.

O.K. based on what we’ve got so far, it is easy to infer the following is worth having:

  • Portable between systems—a must for me 😉
  • Easy access to SQLite—usually trivial.
  • Better tools than gmtime().
That means this page is rather useful, for what languages can be ruled out. In my répertoire this means Go (aww), Scheme, AWK, and shell languages can be skipped. Reason being the portability of Scheme bindings in general, and the others lacking sufficient portability (for my taste) at this time. That still leaves about 13 languages, lol. PHP, Java, Lua, JavaScript, and X86 assembly are easy for me to rule out. Reasons for that can all be easily guessed; at least if you remember how much I enjoy Suns Java tools. JS/Lua are great choices but I don’t want to screw with the bindings and stuff.

I’m not very interested in compiling SQLite in C/C++ on Windows, or the CLI binding everywhere. So this effectively makes the choice Perl, Python, or Ruby. Out of those three, none is perfect either: perl doesn’t come with the required database code, it just has the definitive interface for databases everybody mimics. Python and Ruby on the other hand, come with SQLite bindings—which many distributions separate out into separate packages. It’s just a lose, lose situation when you think about dependencies, but it does beat writing your own everything for every program. Sometimes. Setup with these three dynamic languages would be easy though, in so far as we’ve gotten with the above.

Time handling is another issue. Perl has fairly minimalist handling of time built in, but on the upside, if you need it, it’s probably three abreast on CPAN. Time::Format and the core Time::Piece module each come to mind. What isn’t built into Perl, often comes with it or can be added to it. Ruby provides a simple but effective Time class, that makes for more natural code than you may expect. More complex operations will require Googling for a Ruby gem, or hand coding it on demand. Python on the other hand provides a comprehensive datetime module, and supporting time and calendar modules, all out of the box! I would say Python takes the lead here.

Rule one of getting work done: know how to leverage libraries.

In terms of programming languages, Perl, Ruby, and Python are generally equal enough for most tasks, so long as you don’t shoot yourself in the foot. Some subtle differences that personally irk me:

  • Perls autovivification can be almost as much a miss-feature as it can be a convenience. You’ve just got to learn the damn language :-P.
  • Ruby functions are not first class objects! Some things can also be weird if you’re not used to Ruby.
  • Python doesn’t always stand up well to typos, especially if they involve indentation o/.

Because of how many lines of code I’ve done in Python over the years, I am more familiar with it’s set of “Irks” than Ruby’s, like wise I know Sh, C, and Perl more intimately than other languages, so I really know their irks. For perl, it’s mostly thin things that get in inconvenient when combing the warnings pragmata with the nature of perl syntax. They spiritually conflict at times. Under Ruby, I mostly find gripes that have a bigger place in programmer culture. My issues with Python generally have to do with trade-offs that I disagree with as a matter of my convenience, even if it usually results in a Good Thing overall. It comes from a C-oriented background meshed with a love for the Perl programming language.

This is a fact: you will always be irked by your programming language, if you use it enough. What can I say, nothing is perfect. Shoot!

For this particular application, there’s some things worth noting also: language portability. If the machine doesn’t run perl, it’s not a real computer. Most systems you’re likely to care about will run Ruby and Python, and there’s probably a crusty old version of Python for those you don’t (nor directly should). In contrast however, Perl is often a lower level of “Cross platform” behaviour than Ruby/Python. You’ll find this highlighted well in the Camel book. One reason I use Python frequently, it always behaves as expected without so many subtle hiccups.

How much this pertains to the current matter, i.e. implementing STAMAN. Perl is the most universally available language, and I’m more prone to need such a feature than most people. A plus over Ruby is no crappy 1.8.x/1.9.x porting issues…! Of course however, I have a camel to ask about minor details, hehe. In my experience the Python 2k/3k thing is less issue than Ruby’s for writing code yourself, more of an issue in leveraging existing code.

So I reckon, that means Perl or Ruby is best called for here. I exclude Python, because I just use the frick’n thing to often.

The STAMAN Project: Phase III, of tasks and storage formats

At least, for me, there are only two pieces to STAMAN that are not trivial to work out before writing the code: choosing the storage format and implementation language.  Both also happen to be areas where experience strongly augments ones intrusion, more so than the rest of the app’.

In the design outline, I noted that YAML would work quite nicely, yet an exposition of the outline suggests that something closer to SQL could better serve the applications design. The reasons behind it should be fairly obvious, if you’ve ever worked with textual data before.

During Phase I, I concentrated on the data involved with task management. It’s not hard to implement an SQL schema capable of representing that. Even better, most dialects offer useful features for handling times/dates. Virtually every programming language has a way of interfacing with such an SQL database, either through natural bindings or calling out to scriptable client programs. SQLite, MySQL, and PostgreSQL in fact provide both means, I’m not familiar with MSSQL. So that’s a big set of pluses all the way around. We even get a reusable DSL to help without having to write it!

The problem however, becomes one of migration paths: what happens if you need to change the data structures, perhaps heavily? That means having a lot more work whenever restructuring is needed, and it’s IMHO, less scriptable than a little perl golf: sufficiently so that I’m not going to screw with it. Insert shameless plug for Ruby on Rails here ;).

In a commercial environment; i.e. oriented on making money off the program, XML would be more likely than any other textual format, but not very convenient for me. I also hate XML parsing with a passion. It is however sufficient for getting the job done, if a bit, ahem, jacking the amount of internal documentation you need to write (or later wish you had) several notches higher than it need be.

Someone might think of a simple Comma Separate Value (CSV) format, but CSV is any thing but simple. Don’t believe me? Just think about data that may contain commas. That being said, the only good things I can say about CSV, from a programming perspective, is CPAN rocks. Unless you’re munging address books or spreadsheet data around, and need a LCD: it is best to avoid CSV, period.

The best bet, in terms of structured text: but one sufficiently able to represent the data set, and be easily edited by hand. What is really needed is a dedicated format: enter YAML. It’s basically a hierarchial way of recording data as sequences of elements and key/value mappings. Works excellently.

The SQL solution relinquishes fine control over the operations, where as the YAML method is assured to slurp up memory in proportion to the input. It’s a lot more like DOM oriented XML, only the translation between the code and textural representation is a hell of a lot more natural. When working with program generated output, it also doesn’t need to be fed through a pretty printer to be comprehensible, which can’t be said of XML—without more pain for someone.

Pro YAML:

  • Easily edited by hand (notepad) and many unix tools.
  • So simple you can skip reading the spec0
  • If you have to write your own parser, make it YAML and save grey hairs.
  • It’s easy to serialize/marshal data around, as easy as it gets without eval().
  • More likely to benefit from compression.
Pro SQL:
  • Less imperative-style code to be written.
  • The hardest processing code is already in the database engine.
  • Can focus on querying data, not parsing it.
  • Languages/frameworks are more likely to ship SQLite bindings then a YAML parser.
Con’ YAML: 
  • It really is as simple as it looks.
  • You have to write your own list/dictionary handling code.
  • Scales less.
Con’ SQL:
  • You have to learn basic SQL.
  • Not the most fun in some languages (C, C++, Java, and C#).
  • Can’t really get at the data, short of a database client.
Note that I haven’t said anything about separating the data store from the client application: using an SQL server is just as viable as storing YAML files on a network drive. It really is that simple.
My personal view? SQLs virtues likely outweigh YAMLs here—unless you’re going to be designing by exploration. I’m not in this case, and I am also competent enough not to shoot myself in the foot. If I was smart, I would make the application wide interface to the data store more abstract than writing SQL queries all over the place like an asshole. Yes, I can be that smart. Don’t tell your neighbours.
0: I read the YAML specification the first time I used it for a project, which was for a rake based built system. How else could I expect to hand write my build spec’s in YAML? :-).

The STAMAN Project: Phase II, version control

First thing is first: I created the project on my choice of hosting site, than prepped


terry@dixie$ cd ~/proj;git init STAMAN; cd STAMAN; touch README
Initialized empty Git repository in /home/terry/proj/STAMAN/.git/
terry@dixie$ ls
README
terry@dixie$ git add README
terry@dixie$ git commit -m 'first commit'
[master (root-commit) f012d8e] first commit
 0 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 README
terry@dixie$ git remote add origin path spec to the repo
terry@dixie$ git push origin master
Counting objects: 3, done.
Writing objects: 100% (3/3), 222 bytes, done.
Total 3 (delta 0), reused 0 (delta 0)
To path spec to the repo
 * [new branch]      master -> master
terry@dixie$
In essence, create a new git repo in ~/proj/STAMAN, add a blank readme file and push it to the mirror. Simple.
If you’re not using version control, you’re brain damaged. It’s that simple. 
I recommend Git and Mercurial, because I consider CVS, Subversion, and increasingly Bazaar as well, to be flawed models of doing version control. Git is what I use the most, so obviously I’m using it here :-P.
Much to tired to go into the sanity that is using version control, and using it rightly. Learn how to Google.

The STAMAN Project: Phase I, brain storming an outline

The first thing I did was sit down and spend about 5-10 minutes on brain storming, here’s what I came up with on Monday:

terry@dixie$ cat ~/Dropbox/todo-structure.outline
todo's contain
        task short name
        task notes (raw data)
        location
        assoicated url's
        due date
        time estimate
        associated contacts?
        reminder preferences
        list
        project (tag)
        priority

Shebang! YAML would work WELL
terry@dixie$

I like to dive in when designing a program, try and get a good big picture understanding of it, and try to identify the lower level issues that might chop up. The latter gets easier with experience, particularly with your tools rather than the art/science. After writing that file, I took a few more minutes to focus on the implications of its contents.
The purpose of a task management program, is obviously to manage tasks. It is fairly obvious that it is a fairly data centric program; so it’s a good place to start by thinking about what is the data. In this case, I took a couple minutes to think about what represents a task: what data it reflects. The short name being provided, as a convenience for listing tasks.
We can’t know for sure what sort of tasks will have to be managed, so what data will need to be attached should be kept abstract: it could be anything from a simple cat > notefile like stream of text, or an uploaded doc or photo. The important thing, is not shooting off a foot by making it restrictive.  Since *I* am the principal user, I know the content will be quite variable. Excessively so, the more I utilise it. 
Tagging a task with data like a location, associated URLs and contact info would likely be a good thing. You can easily imagine that going somewhere, talking to someone, or referencing a file off the web are all things that might go hand in hand with reviewing and completing a task.
Another frequent issue is keeping track of when xyz needs to get done, how often it needs to be done, how long its expected to take, and being able to do per-task preferences about the “Nag me about it” problem. Come to think of it, a way to note the tasks progress is a good idea too. Changes like these, are one of the main reasons I want something custom, rather than continuing with my beloved RTM – more control over the tasking details.
Keeping a flexible outline of the project, helps you identify spots to grow and or change it ‘in flight’, just like that realisation about progress tacking. Of course that assumes you will have time to think about the project, not just write its code like a drone.

Next up to plate, is the issue of organising tasks. I’ve got so much shit piled into RTM, that I have to periodically triage my task lists, almost like sorting them into a Trove. Notions of lists, priorities, and “Projects” are useful: in order to more easily create ad-hoc hierarchial lists based on such criteria. This is somewhat analogous to what’s possible using the SQL SELECT and JOIN statements. Database normalisation can actually be a good thing to learn aboutlink.  
SQL is not a general purpose programming language, rather it targets the narrower domain of querying and manipulating rows and tables in a database. Although less needed around non web applications, knowing about SQL it is worth it, much like the concept of relational algebra in general. Why I have mentioned Structured Query Language here, is because it’s a useful train of thought to explore. Take some time and ponder about the possible formats, and what the code to manipulate it might.
A serious portion of programming is about solving problems, that’s what we use our languages for. If changing the rules makes solving the problem easier, that’s what we do. Knowing about various tidbits like declarative languages are valuable tools, if you remember to program more like Captain Kirk instead of a dry text book. Don’t bend yourself to the language…. bend the language to your problem, or find a tool or architecture that can help fill the gap.

Data storage formats are potentiality a lengthy issue, so I’ll go into that later.

With how often people have solicited my advice/opinions of programming matters and CS ed, I’ve been thinking about exposing the craft behind a project, and posting it as a journal entry. Well, the way my mother monkey wrenches things, I rarely have both the time, brain, and inclination to focus on detailing that much.

So instead, it’s probably best that I either decide never to go into the details of creating a program, or just stream it through more haphazardly across my journal. I’ll take a crack at the former, since I would like to work on the program.

One of the things on my “To roll own someday” list, is replacing my remember the milk with a local solution. The perks being that I can make custom changes without having to get hired by RTM o/, as well as integrate it with my work flow more naturally. It’s also a good series of mental exercises.

Since I’m not good at naming things, I’ll just call it STAMAN—Spidey01’s TAsk MANger. Which uniquely isn’t far off from Stamina, exactly the rate limiting factor for getting shit done. Especially under my living conditions.