Something I wish I had time to do

Write a conversational IRC bot, lol.

I Figure, load two processes, a communications layer and an automation layer; in this case the communications layer would be a simple relay between an IRC channel and the automatas standard I/O streams; messages on the channel would get written to the automatas standard input, and messages to send to the channel would be read from the automatas standard output.

I would want the thing to have some concept of learning, maybe build a dictionary of language; perhaps start off with a limited knowledge of words; storing words it doesn’t know into a database for later analysis. And then once a number of words have been manually entered into it with attached meta data, program it to perform the analysis itself; trying to figure out what kind of words it sees but doesn’t know about yet, and then write out plugin code with the bots “best guess”, and if it gets it wrong, I could manually change it, and it would have to study the differences between my correction and it’s choice, and modify it’s guessing based on experience. (Implementing that would be fun in it’s own right, lol.).

That would logically be easy enough to design and study, the question is how to make it educating enough that you can actually have people chatter with it.

Haha, I always laugh when people mistake bots for humans on IRC xD

What I hate about programming

Some months ago when it reached Kris Moore’s attention (late as usual) that I had brought up security issues with his Firefox3 PBI, he changed it to something almost as bad. A couple weeks ago, I heard back from Kris that he had [naively] changed the code for making Fx3 the users default browser would no longer run as “root”. After a little more conversation he split it off to something better.

Originally it was a part of the script that runs during PBI installation (and worse then the below script), probably tired of my replies he made an extra wrapper around the Firefox3, that asks the user if they want Firefox3 set default or not, rather then workin’ the user database at install time. (I refuse comment on the following scripts predecessors: if you want to know more, read his SVN). The solution he came up for that wrapper, was to invokes the following code as the user when necessary:

#!/bin/sh
# Helper script to make FF the default browser for a user
##############################################################################

# Get the users homedir
USER="`whoami`"
HOMEDIR="`cat /etc/passwd | grep ^${USER}: | cut -d ":" -f 6`"

if [ -e "${HOMEDIR}/.kde4" ]
then
KDEDIR=".kde4"
else
KDEDIR=".kde"
fi

if [ ! -e "${HOMEDIR}/${KDEDIR}/share/config/kdeglobals" ]
then
echo "ERROR: No kdeglobals file for $USER"
exit 1
fi


TMPKFILE="${HOMEDIR}/.kdeglobals.$$"
TMPKFILE2="${HOMEDIR}/.kdeglobals2.$$"
rm ${TMPKFILE} >/dev/null 2>/dev/null

cat ${HOMEDIR}/${KDEDIR}/share/config/kdeglobals | grep -v '^BrowserApplication' > ${TMPKFILE}

rm ${TMPKFILE2} >/dev/null 2>/dev/null
touch ${TMPKFILE2}
while read line
do
if [ "$line" = "[General]" ]
then
echo "$line" >> ${TMPKFILE2}
if [ "${KDEDIR}" = ".kde4" ]
then
echo "BrowserApplication[$e]=!/Programs/bin/firefox3" >> ${TMPKFILE2}
else
echo "BrowserApplication=!/Programs/bin/firefox3" >> ${TMPKFILE2}
fi
else
echo "$line" >> ${TMPKFILE2}
fi
done < ${TMPKFILE}

# all finished, now move it back over kdeglobals
rm ${TMPKFILE}
mv ${TMPKFILE2} ${HOME}/${KDEDIR}/share/config/kdeglobals

exit 0

which is more secure then the original implementation, and more efficient also. Tonight I sent Kris a casual (read: adapt to need, don’t take as is) suggestion from yours truly:

#!/bin/sh
# Helper script to make FF the default browser for a user
# Should work for KDE3 and KDE4.
##############################################################################

PROG="!/Programs/bin/firefox3"
FILE="./share/config/kdeglobals"

for D in "${HOME}/.kde" "${HOME}/.kde4"
do
cd $D 2>/dev/null || break;

if [ ! -e "$FILE" ]
then
echo "ERROR: No kdeglobals file, unable to set $PROG as default"
exit 1
fi

ed -s "$FILE" <<EOF
/[General]/
/BrowserApplication.*=/
s/=.*/=${PROG}/
wq
EOF
# write your own error handlers
done

exit 0

which should work as far as I can test; since I lack a working KDE install (compiling KDE4.2+ is on my todo list). It’s not perfect, but it sure is nicer then what he had a few months back. I included the a diff of the two scripts in my last message, which may very well go against my decision to “never” send these people patches. But I really don’t care if he accepts it or not, because while I believe in being helpful, I also I do not like doing peoples jobs for them.

I’m a lazy good for nothing creep, but I am lazy of muscle – not lazy of mind. The most productive code I have ever written, is the code I was smart enough /not/ to write in the first place.

merging the new code into master

After 3~4 days of coding, I’ve just merged the parserlexer branch back into master; I love coding 🙂

 commit 9857e5e9556f31543075fb4a74350dbda97a42e5
Merge: c9a8ae4... bb425eb...
Author: Terry ....snip...
Date: Wed Apr 8 07:23:19 2009 +0000

Merge branch 'parserlexer'

The new parser, lexer, and quote expansion code (+ a few bugfixes) has
finally been merged into the mainline of development (branch 'master').
This marks the new sh_eval(), tpsh_parse(), expand_quotes, tpsh_lex()
functions in such a way that they should be considered 'stable' for
general usage.

some nice things that come with this:

a sane way of quoting stuff; but not sh compatible (”’ = ‘, not an error), more then one set of quotes on the line; and things like `cd /foo; vi bar` finally work xD. In the course of the necessary bug smashin’ for the merge, I’ve also cleaned out a few pains in the todo file, that have been there since last month++ lol.

things that remain to be done: pluggable completion; make completion play nicer with perl/gnu readline backends; restore support for pipes (critical); handling of keybinding (likely painful across perl/gnu/zoid Term::ReadLine backends; but at least zoid is nice…). In the more long term: control flow, (subshells), better `handling`, {anonymous macro/grouping}, more advanced I/O redirections (i.e. only >, >>, and < currently work lol); making `fc -l` and `history list` use a format for display rather then print(); make &do_getopt able to be configured by callers (so fc can accept negative indexes, etc); give a way to turn off shell options (set - and set + currently turn on, only lol); and who knows what else that I can't remember atm.
and to abuse {erls idea of objects and verbs: eval { $spidey sleep $now };

I wonder if a programmer goes to heaven, does GOD let him study HIS assembly language?

tpsh: test of expand_quotes()

$ echo 'hi bye' foo "$USER" and "~" or ~
expand_quotes ': echo | hi bye | foo "$USER" and "~" or ~
expand_quotes ": foo | $USER | and "~" or ~
expand_quotes ": and | ~ | or ~
hi bye foo $USER and ~ or /usr/home/Terry
$

# note:
# the 2 spaces /displayed/ between hi and bye are a bug in
# tpsh; echo'ing things to file via I/O redirection works
# properly. "$USER" is not expanded because expand_parameters()
# still needs adjustments.
#

tpsh_parse invokes expand_quotes() to break up its input line based on the shells quoting rules; and proceeds to go about it’s business. tpsh_lex() then accepts the token buffer and begins building a new data structure from it. The tokens from tpsh_parse get analyzed and reassembled “on the quotes”, i.e. it will do it’s check on ‘hi ‘ and ‘bye’ and the rest as separate elements; then reassemble the argument vectors as an array reference: becoming ‘hi bye’ again. (id est quote expansions add escapes to tell the lex phase where to rejoin things) After everything is said and done between parse and lex, the queue like data structure is ready, the argument vectors contained there in are ready to be mapped onto resolve_cmd() calls for execution.

To hunt down any other booboos in the expand_quotes() subroutine, I’ve made it display it’s work, so I can see how it detects what when testing the shell. basically as “expand_quotes QUOTE: unquoted | quoted | remainder”.

As one can guess from what the above shell snippet implies: quoting is handled recursively. Because I’m used to languages with finite stack space and no reliable tail call optimizations; I almost never write recursive functions of any kind, whether they are tco’able or not. Algorithmically, expand_quotes() is a very simple procedure.

It expects to be called with an input line; and treats multiple arguments accordingly (for now). Internally a dispatch table and token stack are maintained; the table contains references to anonymous subroutines, to which the scanned elements are delegated to for the proper expansions.

If no quotes are detected on the line, return the result of expanding it with the default delegate (for unquoted text).

Otherwise break the line on the first set of (matching) quotes.

Any text defined before the beginning quote must be unquoted; apply the default expansion from from the table.

The text between the matching quotes is quoted, apply the appropriate expansion form the table (i.e. ‘, “, or `).

Any text remaining after the matching quotes may or may not be quoted; invoke expand_quotes() on the remainder to find out, and apply the result.

Each expansion applied is pushed onto the token stack in the escaped form it expanded to (i.e. “‘hi bye'” becomes “hi bye”), and the stack is returned to the caller once processing is completed.

With refactoring, the procedure could likely be made tail recursive but I don’t think perl does TCO. Either way, the users fingers or (likely) the machine generating the inputs should run out of stack space before tpsh could pop a cork at the number of quotes lol. An earlier design for expand_quotes() had more in common with finite state machines (in so far as I’ve seen them implemented), but was a lot more contorted then expand_quotes()’ present shape :-/.

Current bugs are handling nested escaped quotes or multiple empty quotes (the spliter) and removing unquoted quotes (addition to delegate sub for unquoted text).

# bugs in expand_quotes
$ echo 'foo "bar'
expand_quotes ': echo | foo "bar |
foo "bar
$ echo "foo "bar"
expand_quotes ": echo | foo | bar"
foo bar"
#
# correct result would have been equal to the previous command
#
$ echo '' "" '' "" '""' '' "" '"' "'"
expand_quotes ': echo | | "" '' "" '""' '' "" '"' "'"
expand_quotes ": " | '' | " '""' '' "" '"' "'"
expand_quotes ': " | "" | '' "" '"' "'"
expand_quotes ': ' | "" | "' "'"
expand_quotes ": "' | ' |
" '' " "" ' "" "' '
#
# correct result would have been: "" " '
# at least, that's how all bourne based shells I
# know about treat it; I would prefer: "" " '
# i.e. without leading whitespace.
#

For some reason this makes me curious, has anyone ever explained why shell syntax allows “”” but not ”’ ? (the results being ” and unclosed quote /or syntax error respectively)

When trying to solve a programming problem, generally I try the most simple solution before I try something more complex: and then evaluate a neater method. I consider the implications solutions have on efficiency, but that is trying to avoid shooting myself in the foot later, rather then trying to optimize the code for a machine.

Some how, I think expanding quotes is just naturally recursive in my crazy brain :-D.

EDIT


commit aeac14bd177a93b84c138a0c62e2cda49e5fe15c
Author: Terry <***snip***>
Date: Tue Apr 7 22:24:35 2009 +0000

bugfix: parameters now expand within quotes via expand_quotes and may be escaped

commit 089fda7cca0049dcabdc8b9659f94dcae417074b
Author: Terry <***snip***>

bugfix: escaped quotes witihn quotes and multiple quotes handled correctly

previous behaviour:

$ echo 'foo "bar'
expand_quotes ': echo | foo "bar |
foo "bar
$ echo "foo "bar"
expand_quotes ": echo | foo | bar"
foo bar"
$ echo '' "" '' "" '""' '' "" '"' "'"
expand_quotes ': echo | | "" '' "" '""' '' "" '"' "'"
expand_quotes ": " | '' | " '""' '' "" '"' "'"
expand_quotes ': " | "" | '' "" '"' "'"
expand_quotes ': ' | "" | "' "'"
expand_quotes ": "' | ' |
" '' " "" ' "" "' '
$

new behaviour:

$ echo 'foo "bar'
expand_quotes ': echo | foo "bar |
foo "bar
$ echo "foo "bar"
expand_quotes ": echo | foo "bar |
foo "bar
$ echo '' "" '' "" '""' '' "" '"' "'"
expand_quotes ': echo | | "" '' "" '""' '' "" '"' "'"
expand_quotes ": | | '' "" '""' '' "" '"' "'"
expand_quotes ': | | "" '""' '' "" '"' "'"
expand_quotes ": | | '""' '' "" '"' "'"
expand_quotes ': | "" | '' "" '"' "'"
expand_quotes ': | | "" '"' "'"
expand_quotes ": | | '"' "'"
expand_quotes ': | " | "'"
expand_quotes ": | ' |
"" " '
$

rofl

http://thedailywtf.com/Articles/The-Super-Hacker.aspx

on man, this ones got me rolling on the floor laughing my butt off – what a great way to make a buck

Two schools of thought: random thinkings from a spider.

DRAFT todo: fix footnote indexes and further proof reading / copy editing.

Two schools of thought

random thinkings from a spider.

There is certainly more then one way to solve most problems, it’s just a matter of their merits. This paper serves to compare and contrast two common methods of solving a simple problem.

In our example, let us say we have a large and complex E-Mail message. We began editing it on one computer, then copied the draft to a USB stick0 and took it home. We then begin making some further adjustments. Later, we return to the first machine to finish the draft, but realize we didn’t copy the file back to the USB stick! We continue to edit the draft, and take it home again. We now have two different but related forms of our message. The message is quite big, and we don’t want to rewrite any of the good parts, so what do we do? We have to compare and merge the two different messages into a single final draft.

How might we solve this problem?

Most people I know1, would open each version in a mail client (e.g. Outlook, Thunderbird) or an editor (e.g. notepad.exe, gedit) and place the windows side by side; and then go about visually comparing the files; either merging one into the other, or using a third window. This is slow, error prone, and rather clumsy.

Since I’m not willing to take that much time, I would apply software to fix this problem2. This means we would need software to compare, merge, and edit the files. How might these programs work?

Common software that comes to mind: diff, patch, diff3, kdiff3, kompare, meld, windiff, winmerge, and most decent (programmers) editors support the task as well (vim, emacs, jedit, etc), along with decent Version Control Systems (VCSs) and Integrated Development Environments (IDEs). Some programs are textual, some are graphical, some are an amalgamation of parts, while others are heavy lifters3 in their own right.

We’ll take a look at how two different styles might result in software to complete these simple tasks:

  1. Compare two or more files.4
  2. Merge two files into a third.
  3. Allow the editing of changes to be done.

I’ll call them styles A and B.

Let’s start off easy, how can we compare the files? It’s rather easy for a program to do a byte by byte or character by character5 comparison of a files contents, but it would be nice to be able to see what actually changed – in some format we can understand.

In Style A, we write a simple program that can pretty print the differences to a text file (or better yet, an output stream6). If we want a more interactive interface, we can view the output file in an editor or pager; or write such a program of our own. One that understands our pretty printed format, and can display it to the user. So let’s say we now have a simple ‘compare’ program that outputs text, and a simple ‘viewer’ program that accepts the result of the compare, and allows the user to browse it on their display.

In Style B, we write a program ‘compview’ that generates a format suitable for it’s own internal use; perhaps a list of change-nodes with data on where to display it and how to display it. Then set out and write a viewer program to display this to the users display – in essence creating a pager like program, whether it is textual or graphical in nature. For ease of viewing the differences over time, someday we may add an “Export to file” feature that dumps the data to a suitable format.

Now let’s take a step back, and look at what we can do with these kinds of solution. Since style A developed ‘compare’ to output a very simple textual stream, we can view the file in any program that we like, without having to use the supplied ‘view’ program. We might even develop a program (or change compare) to [re]generate the output in HTML, so that we can view the comparison in a web browser instead.

In short, the design choice makes the view program almost superficial, not to
mention that it can be kept quite simple; for people without better
tools7 rather then the whole kit and kabootle. The ‘compview’
program built in style B, will likely have a close relationship between the
file compassion and viewing operations; perhaps to the point of excluding the
ability to do the view in an external program without exporting to a suitable format8,
which may or may not be easy to use with other tools. In style B, even if the
internal format was XML or HTML, compview would still likely contain half a web
browser9.

Now that our software can compare two files in a way we like, let’s move on to the process of merging the two files.

Style A might create a program, ‘merge’ that understands the output of compare or a filter that can convert the output into instructions for another program (edit) to complete the changes itself. Some operating systems (i.e. UNIX and DOS) provide suitable editors for this task: some people might be inclined to implement their own hbatch or stream editor (I would suggest installing ‘ex’ and ‘sed’, or make it a *really* expressive line editor). The up side of the latter approach, the compare format and the file merging can be more readily separated; the user could even find other ‘edit’ programs or intersperse the chain of commands with other compatible tools.

Style B would likely take it upon itself to conduct the merge operation directly. One down side of this would be the means of which we save or ‘Export’ the comparisons from compview. If an format based on compview’s internal data structures was used, different versions of compview might not even be able to understand it; oh joy, now we have to remember what versions of our compare & merge app knew what! But all in all, the program will probably develop some good interactive comparison and merging features, if the programmer doesn’t go postal first.

What about editing the file or adjusting the file comparisons to create a more complex merge? If style A provided an ‘edit’ back end, we don’t even need another text editor to work with compare’s output: but we could use our own11 and feed the result back into the tool chain. Style B might provide support for an external editor or build one into compview’s interface. Since an external editor would mean ‘compview’ would have to translate it’s data to text for the editor, then back into the internal data, it would be a major pain. Building in an editor might be a fairly simple task or a problem; most GUI toolkits provide an editable text area, some TUI toolkits may not. Depending on the libraries being used, the programmer might have to hash out a HTML WYSIWYW12 text editor component built into compview13; assuming compview isn’t half a web browser in it’s own right yet.

Bonus: change of interface

Let’s say we want to convert from a graphical to a textual or from a textual to a graphical interface.

The compare and related utilities developed in style A, could have been done largely in a display agnostic view; what does a file comparator need to know about user interfaces? In this case, the ‘view’ program would only need replacement with a tool that supports the other interface style. Another benefit, because of the separation between tools, even the changed interface may be change yet again in some strange new way21. The possibilities are virtually endless and shared libraries may be utilized to ease related tasks.

While compview on the other hand, developed with style B is likely chained to its interface. If the code monkey was smart, as much of the code base as possible would be abstracted20 away from the interface, and kept simple enough to be used as a library. If the library didn’t exist previously, or was not user interface agnostic: it will be much more labour intensive to make it so (or create it) after the fact, then to have done it in the first place22. The ever increasing bloat and complexity of compview, may be its untimely my downfall; because it must either adapt to the changing world, or be replaced. If it can not do so easily, it will either fall by the way side or restrict its users through its own (lame) limitations.

Discourse on the Results

Either of these programs, the compare, view, merge, and edit suite or the compview jack of trades would be suitable for completing the task at hand, but what comes of these two courses or styles of solving it?

Style A may have a tendency to fragment things, or depending on the programmers mind, fall into more ‘odds and ends’grouping then a useful tool set. A great advantage, because each element is a separate program, they have a very minimal knowledge of one another’s internal workings. Because all communication is accomplished through simple streams of input and output; you can even replace parts of the suite with other tools14. Small or big changes could be done with a change of program, and have very little impact on how the operation is performed by the tool set overall. The only major issues being the comparison format and editor instructions15. Properly documenting the protocols and relationships between the programs would make things all the more easier; both for maintainers and other monkey’s who may need to replace or tweak parts of it. Adding features can be done fairly easy, but making big changes may break compatibility with the last decades version.

Style B takes a swiss army knife point of view: whatever needs to be accomplished should be done by ‘compview’ whenever possible. Depending on the competency of the programmer16, compview can fall into many subtle traps17. In most probability, compview will either be troublesome to inter-operate with other such tools, or require irksome filters or worse; building the filters into compview in a way that may or may not be user serviceable. How closely entwined the various elements are, would likely depend on the attentiveness and skill of the programmer; some people create maintainable tools, others create balls of horse dung that they may end up hating years later. Adding new features may also require massive restructuring of the program, dpeneding on how it was implemented.

In my opinion, compview would likely be capable of becoming more efficient at what it does then the compare/view/merge/edit suite, but is more likely to become inefficient and more difficult to maintain in the long run; because it is more difficult to engineer such a complex program correctly. In a way, you could say it just has to many moving parts…. Why cram the engine, transmission, and power steering into one huge moving part, when you can have 3 smaller moving parts?18. One interesting side effect, the software created by style A may be fairly easy to script, but the program from style B would have to embed it’s own scripting language23.

I generally opt for pieces that work together on simple protocol; because it helps keep me from shooting off my own foot later19.

Footnotes

0. Rather then using an USB Stick, I actually would use Webmail or some other network solution – and avoid this kind of problem altogether.

1. Most people that I know, would first have to figure out how to get the e-mail message shuffled between computers, let along view it side by side

2. I would probably use Vi IMproved’s ‘diff’ mode to interactively compare, edit, and merge the files.

3. I believe the modern GNU diffutils and friends have grown horns of their own.

4. We might want to compare our final draft against the two old drafts!

5. These may or may not be the samething, depending on who, what, where, and when.

6. I.e. allow feeding the programs output into another program, without the need for shared memory or (insecure) temporary files.

7. I like less, but using vi as a readonly ‘view’ is sometimes fun.

8. XML, CSV, Binary dump of internal data; etc.

9. And just like ‘compare’ would without a ‘compare2html’ filter, bloat out with having to escape various character data into the format, or risk breaking interoperability with other programs (e.g. Internet Explorer, if compatibility is possibility with it in the first place)

10. In point of fact, because of the flexibility that pipes and redirection offered the UNIX system, it was possible to use the early ‘diff’ program and ‘ed’ editor to carry out this kind of solution. To deal with the early systems simplicity, as more useful ‘diff’ output formats became the normal, Larry Walls ‘patch’ program was created to heuristically apply the changes to the file set more effectively then was previously possible. Replacing the ed program and simple ‘ed diffs’ for once and for all (actually, patch could feed ed diffs into ed if that format was used). Since the take over of non-scriptable screen editors had become more common by then, I can’t help but wonder if a more expressive program then ‘ed’ had been available, what shape Larry Walls patch may have taken.

11. I use Vi IMproved (vim); Emacs, KATE, jEdit, and TextMate are also good choices. I have little love for tools like Notepad, Edit, or feature-packed clones.

12. I say What You See Is What You Want because What You See Is Not Always What You Get.

13. A particularly poor programmer, or poor engineer might make this editor component very tightly integrated with compview, rather then something that could be reused on other projects and plugged into our current one. Given the nature of compview, I think the former is a more likely psychological trap then the latter.

14. Much like patch has superseded ed for batch processing of diffs.

15. A smarter programmer will make it easy to retool ‘merge’ to feed instructions into a new editor; a brillant programmer might anticipate the need, and choose to do see this ahead of time, and choose to supply an instruction set telling merge how to generate said instructions for the associated ‘edit’ tool; rather then designing it for any specific editor component.

16. And the stress to get it done ‘on time’

17. Subtle traps to most, but obvious to me. I’ve had to deal with to much software that just ‘sucks’ over the years, not to notice ;-}.

18. I’m scared to think about the auto-industry.

19. That, and I’ve found many more powerful and flexible tools that can be used that way, then I have ever found swiss army knives that can match such flexibility.

20. Note that I do not mean a group of abstract base classes.

21. Such as from Tcl/Tk to C/Gtk+ or Java/Swing.

22. This is one of the traps new or casual programmers seem to fall into.

23. I would suggest a language like JavaScript or Python if possible for such a task. Unless it resembles a common language or is suitably domain specific, I dislike programs that create their own scripting or extension languages just for a specific applications plugin/automata. The last thing a user needs to do is learn YOUR apps language, that is also highly specific to your specific program that it is also highly useless everywhere else. A customized dialect of LISP or a class library mated with a known language is much better.