To tired to focus on code at the moment, yet much to awake to sleep o/. Been tinkering with an old project: tpsh. Mostly I’ve been polishing the codebase and doing a bit of refactoring; there’s no a debug mode, which reduces some of the cruft that’s crept in from testing. Sometime I also need to import the test scripts lol.

One of the changes, is adding support for running off the Strawberry Perl distribution rather than ActiveStates Perl distrio for Windows. The downside is, most of tpsh biggest problems have to do with Perls portability, namely Windows quirks. Gotta love’em.

Since the merge of the “Code generator”, tpsh has supported a very limited subset of sh script. Not very usefully however, since the shell doesn’t have a real concept of $? yet. Likewise there needs to be some changes in the handling of environment variables. Most ideal IMHO, is a tied hash wrapping $ENV (a magicly tied hash of environment variables) with the ability to hook reads and writes, etc. Preferably in a user-exposabe way that can be extended by shell functions, rather than being limited to the scripts own Perl code.

It’s the more general case however, of having to build up the language rather than use its building blocks. The problemo isn’t in maintainability, but in portability and development time.

Having reactored a chunk of the shells initialization code,  I know that I’ll have to retackle the readline stuff.  To say that I hate Term::ReadLine would be an understatement. I’ve no love for the GNU Readline API either (I would use linenoise), but the GRL C API isn’t as nasty in practice as the plugable Term::ReadLine system Perl uses. The only good thing I can say is that there is a semi-useless stub version included. Licensing issues of just having a GRL backend aside >_>. In my experience, mileage varies quite a lot between Perl readline pacakges, even within the scope of the fscking manual. It’s just a load of crap.

Actually if there’s a way of accessing the API  functions needed for unix termios / windows console, I would be tempted to throw out the fucking thing and just write a Term::ReadLine::SANE module based on linenoise (A C library). I’m more familiar with the unix parts of that.

Oooh, I’ve just had a little brain fart of interest.

For lack of anything more interlectually interesting to do with my sleep deprived mind, I’ve been thinking a bit about some coding I could do on tpsh; then this hit me. The shell expands ‘, “, and ` quotes using a simple table to map the symbols to appropriate transformations, why not use the same code for () and {} grouping?

Internally the look up table for quote expansions is hard coded into the principle tokenization subroutine, because that is the codes only designated purpose in the program; it didn’t need to be more general, then being easily converted to something more generic. After getting a working implementation that I could drool over, I thought about posting a modified version on a forum, as a demonstration of how to do quote handling in config files and such.

Now I’m thinking of more places I can use that little blighter with a few minor changes lol.

Several hours of testing has shown where the problem lays in regards to the kinks in tpsh’s basic control flow…. a strict ficken implementation lol.

Without a few extra ‘;’ between language elements, the parser doesn’t see the keywords as keywords, and instead they get fed into the code generator as arguments to the preceding keyword. Testing the same test code against the version of bash with MSYS1.0, their shell doesn’t give a hoot, nor should any other bourne-style shell I have access to. So in a way, you could say my code took a stricter interpretation of the sh script syntax, then what is required (and desired).

Somehow this actually makes me feel better, lol.

Also something that needs working on, is dealing with crappy old-style paths like CP/M, DOS, and Windows NT use.

$ P:/Editors/Vim/vim-personal/gvim.exe
tpsh: command not found: P:/Editors/Vim/vim-personal/gvim.exe at S:Visual Studio 2008Projectstpsh-dirtpsh line 883.


$ cd P:

$ P:/Editors/Vim/vim-personal/gvim.exe
tpsh: command not found: P:/Editors/Vim/vim-personal/gvim.exe at S:Visual Studio 2008Projectstpsh-dirtpsh line 883.


$ /Editors/Vim/vim-personal/gvim.exe

$

It seems to have a bit of an issue about drive letters.

In more detail, the search_path() function that converts an external command name into the direct path to it (done in order to avoid a dependency on the systems shell, e.g. /bin/sh or %COMSPEC%), fails to find a valid file when the drive-letter: notation is used. Since the function returns undef to indicate program not found, simply put search_path() needs fixing. The change should be trivial though.

I managed to get some coding in today, maybe that’s why I’m feeling a bit better. On the down side though, my cable connection seems to be running half as fast as normal lately, games were nearly unplayable for much of this afternoon for some reason.

I think my shells codegen branch has lived to the virtual extent of it’s usefulness. Today I sorted a few minor things and enabled nested control flow for statements using the ‘then’ and ‘do’ keywords, e.g. control flow statements can contain control flow statements as well as commands within their block. There are still a few kinks to iron out but the most important are done; I’m currently uncertain if the remaining problem lays in the code generation phase or somewhere further up the processing chain, but I expect it lays in generating the perl code.

Basic conditional flow and looping is ready, which was the primary purpose of this branch of development: to explore implementing the shells scripting language by dynamically generating Perl code for it, as well as replacing the static command sequence executor by generating it as well.

Oddly, the main things that need doing in the short-term are actually script related, rather then interactive. But then again, tpsh has always been meant for interactive usage first and support for batch jobs as a secondary concern. Main points of lacking at the moment, is a concept of “exit status”, in Bourne shell parlance the $? variable isn’t used for a hill of beans yet. Likewise all $variable expansion is done on the fly during tokenization, so:

for X in 1 2 3; do echo $X; done

The value of $x in the loop body would be expanded before the code is evaluated, resulting in the following:

for X in 1 2 3; do echo; done

because X is not defined at the time the statements are being parsed, the expansion is “”.

I haven’t decided how this will be solved yet. Solving it however, is well outside the scope of the codegen branch. I would say the only thing left to do in the branch, is implement the break num and continue num to enable a way out of the while/until loops. The former will take a bit more testing but the latter, is not likely to be to hard. Perl supports a robust way of breaking out of loops that can be easily used. Once that stuff is done, I’ll likely merge the branch into master and move on to other tasks.

Since I’ve been stuck working more often on the NT machine, development of tpsh gets a slightly higher priority lol.

In sleeping on it and having a job, that frankly leaves my mind free: I’ve come up with an idea for the tpsh problem.

The tokenizer is beautiful, the only problem is it does the expansions through a table of call backs. Now if I was to modify that so that it instead attaches meta-data to each token instead, those expansions could be delayed until later without major redesign. It will be necessary to redactor the lexer in order to handle the change in data structure. The code generator can then be tasked with completing any expansions during generation, by expanding lexemes to corrisponding code as it goes. After wards, re do the code generate into something more elegant and voila…. we have what it should’ve been in the first place!

A good nights sleep always helps the code[r]

A small shoot yourself in the foot, coders moment.

Being bored and lacking further ops I can get done before bed, I picked up on tpsh again. In look for a quick challenge, I noted that the git repo was still on the ‘codegen’ branch. Basically, a branch to test the idea of generating the execution code on the fly per command sequence.

As a quickie of interest, I picked up the generation phase for the for-loop. Then I hit a road block. Since my shell expands variables, globs, and aliases quickly during tokenization. The reason being, the input field separator ($IFS) and quoting rules determine how this shell splits text into ‘words’ or tokens for the execution mechanisms—you could say it’s “On the way there”, thus deals with it as it comes. Currently tpsh handles environment variables by fouling around with the programs own idea of the referenced environment variables (%ENV) without any distinction between exported and unexported variables. My intentions have been to use a more controllable interface for shell variables at a later date, since it is kind of a low-yield concern at this stage of development.

I see several choices:

a/ redesign how things work (obviously this is the whole story, lol), saving the issue until later when other components have matured to match.

b/ leave expanding environment variables (etc) until the last minute, I don’t like this idea.

c/ have some way of retaining things that can not be confirmed until later, with an indicator to strip out or expand the remainders at the last minute (this gives me visions of ugly code)

d/ incorporate the code generator closer into the process, so that things get expanded only after they have been confirmed, but generated as soon as possible (a more multiple pass focused design comes to mind).

a is basically a worry-later, and see if other things that needs doing either fix or exacerbate the problem (this is a two edged flaming sword). b is possible but would take a lot of reworking crap, and IMHO result in an ugly LA phase and become prone to introducing bugs into the final results. c sounds simple enough at first glance but I do not see a method that I’m willing to live with. I worry about how easy d would could confuse readers, and what danger a slip up on it could do the results.

For now, I intend to not worry about the minor issue, until after variable handling matures. Because I really love how expand_quotes() works, and that is the best part of the whole program IMHO. Needless to say, tpsh has poor handling of shell/environment variables and has had it throughout its development, since growing the code can wait longer then the other parts.

Not to mention the fact that tpsh has mostly been developed under sleep deprivation in the first place…. lol

edit

In the time it took to submit the entry, type ‘shutdown -p now’, put away the computer, and take a quick leak: I came up with another solution. Give the code that expands variables an understanding of how they are defined, rather then only how they are referenced. Not only would checking if a referenced variable was just defined in the same set of input work with ‘for X’ and ‘for X in …’ like constructs, it could also be used to implement the ‘VAR=… … command …’ syntax at a later stage 🙂

The way expand_quotes() invokes the other expand_* procedures, we would need adjustment before the syntax of prefixing commands with variable settings could work, yet implementing the for-loop in it would be trivial, since anything that would cause the statement to get broken into an unusable token set before defining said variable and attempting variable expansion on it, would also be a usage that gets around for’s keyword status!

Problem solved with less fuss, maybe? It is amazing what you can think of while taking a piss!

quick note to self

once tpshs implementation of shell script is more mature: transition the windows machine to running a shell script on init, rather then the Startup system used by Windows NT, and compare performance.

more tpsh: control flow stuff

I’ve been trying to find a way of hooking in proper shell control flow keywords into tpsh, without uglifing the existing code. At the moment, tpsh understands executing singular command sequences, scripts, and a queue of command sequences. It’s fairly easy to modify the parsing/lexing portions to adjust the internal data structures IAW control flow keywords, the problem is how to handle execution phase.

Originally, I had in mind setting up nested data structures and doing a delayed execution: evaluate the control flow statement and modify the data, then execute the remaining commands (e.g. if CMD0; then CMD1; else CMD2; fi, would execute CMD when the statement needs to be evaluated at execution phase, then reshape the strucure so that only CMD1 or CMD2 remains, and then feed that back into the executor).

Last night, I had an interesting idea… on the case specific code generation.

Shell control flow basically amounts to very simple if, while, for, and case statements; and the more modern until and select statements. Normal execution patterns amount to using a single command sequence (e.g. cat f0 f1 f2 f3 | sort > f), or a queue of such command sequences. Why not replace that executor with a section of code, that understands how to handle those as well as control flow (etc), and then generate the desired code to execute the result.

Exempli gratia:

tpsh_cgen( ( [ 'if', 'test command' ],
[ 'then', 'other commands' ],
....
[ 'else', 'other other commands' ],
....
[ 'fi' ] )
)

might return something like:

sub { 
if (evaluate the 'test command' and test the exit status)
{
execute the 'other commands'
}
else {
execute the 'other other commands'
}
}

and so on and so forth; so that if we call the generated code ref, we have a set of code that will execute the correct commands, whatever they may be, and with quite a lot less fuss.

merging the new code into master

After 3~4 days of coding, I’ve just merged the parserlexer branch back into master; I love coding 🙂

 commit 9857e5e9556f31543075fb4a74350dbda97a42e5
Merge: c9a8ae4... bb425eb...
Author: Terry ....snip...
Date: Wed Apr 8 07:23:19 2009 +0000

Merge branch 'parserlexer'

The new parser, lexer, and quote expansion code (+ a few bugfixes) has
finally been merged into the mainline of development (branch 'master').
This marks the new sh_eval(), tpsh_parse(), expand_quotes, tpsh_lex()
functions in such a way that they should be considered 'stable' for
general usage.

some nice things that come with this:

a sane way of quoting stuff; but not sh compatible (”’ = ‘, not an error), more then one set of quotes on the line; and things like `cd /foo; vi bar` finally work xD. In the course of the necessary bug smashin’ for the merge, I’ve also cleaned out a few pains in the todo file, that have been there since last month++ lol.

things that remain to be done: pluggable completion; make completion play nicer with perl/gnu readline backends; restore support for pipes (critical); handling of keybinding (likely painful across perl/gnu/zoid Term::ReadLine backends; but at least zoid is nice…). In the more long term: control flow, (subshells), better `handling`, {anonymous macro/grouping}, more advanced I/O redirections (i.e. only >, >>, and < currently work lol); making `fc -l` and `history list` use a format for display rather then print(); make &do_getopt able to be configured by callers (so fc can accept negative indexes, etc); give a way to turn off shell options (set - and set + currently turn on, only lol); and who knows what else that I can't remember atm.
and to abuse {erls idea of objects and verbs: eval { $spidey sleep $now };

tpsh: test of expand_quotes()

$ echo 'hi bye' foo "$USER" and "~" or ~
expand_quotes ': echo | hi bye | foo "$USER" and "~" or ~
expand_quotes ": foo | $USER | and "~" or ~
expand_quotes ": and | ~ | or ~
hi bye foo $USER and ~ or /usr/home/Terry
$

# note:
# the 2 spaces /displayed/ between hi and bye are a bug in
# tpsh; echo'ing things to file via I/O redirection works
# properly. "$USER" is not expanded because expand_parameters()
# still needs adjustments.
#

tpsh_parse invokes expand_quotes() to break up its input line based on the shells quoting rules; and proceeds to go about it’s business. tpsh_lex() then accepts the token buffer and begins building a new data structure from it. The tokens from tpsh_parse get analyzed and reassembled “on the quotes”, i.e. it will do it’s check on ‘hi ‘ and ‘bye’ and the rest as separate elements; then reassemble the argument vectors as an array reference: becoming ‘hi bye’ again. (id est quote expansions add escapes to tell the lex phase where to rejoin things) After everything is said and done between parse and lex, the queue like data structure is ready, the argument vectors contained there in are ready to be mapped onto resolve_cmd() calls for execution.

To hunt down any other booboos in the expand_quotes() subroutine, I’ve made it display it’s work, so I can see how it detects what when testing the shell. basically as “expand_quotes QUOTE: unquoted | quoted | remainder”.

As one can guess from what the above shell snippet implies: quoting is handled recursively. Because I’m used to languages with finite stack space and no reliable tail call optimizations; I almost never write recursive functions of any kind, whether they are tco’able or not. Algorithmically, expand_quotes() is a very simple procedure.

It expects to be called with an input line; and treats multiple arguments accordingly (for now). Internally a dispatch table and token stack are maintained; the table contains references to anonymous subroutines, to which the scanned elements are delegated to for the proper expansions.

If no quotes are detected on the line, return the result of expanding it with the default delegate (for unquoted text).

Otherwise break the line on the first set of (matching) quotes.

Any text defined before the beginning quote must be unquoted; apply the default expansion from from the table.

The text between the matching quotes is quoted, apply the appropriate expansion form the table (i.e. ‘, “, or `).

Any text remaining after the matching quotes may or may not be quoted; invoke expand_quotes() on the remainder to find out, and apply the result.

Each expansion applied is pushed onto the token stack in the escaped form it expanded to (i.e. “‘hi bye'” becomes “hi bye”), and the stack is returned to the caller once processing is completed.

With refactoring, the procedure could likely be made tail recursive but I don’t think perl does TCO. Either way, the users fingers or (likely) the machine generating the inputs should run out of stack space before tpsh could pop a cork at the number of quotes lol. An earlier design for expand_quotes() had more in common with finite state machines (in so far as I’ve seen them implemented), but was a lot more contorted then expand_quotes()’ present shape :-/.

Current bugs are handling nested escaped quotes or multiple empty quotes (the spliter) and removing unquoted quotes (addition to delegate sub for unquoted text).

# bugs in expand_quotes
$ echo 'foo "bar'
expand_quotes ': echo | foo "bar |
foo "bar
$ echo "foo "bar"
expand_quotes ": echo | foo | bar"
foo bar"
#
# correct result would have been equal to the previous command
#
$ echo '' "" '' "" '""' '' "" '"' "'"
expand_quotes ': echo | | "" '' "" '""' '' "" '"' "'"
expand_quotes ": " | '' | " '""' '' "" '"' "'"
expand_quotes ': " | "" | '' "" '"' "'"
expand_quotes ': ' | "" | "' "'"
expand_quotes ": "' | ' |
" '' " "" ' "" "' '
#
# correct result would have been: "" " '
# at least, that's how all bourne based shells I
# know about treat it; I would prefer: "" " '
# i.e. without leading whitespace.
#

For some reason this makes me curious, has anyone ever explained why shell syntax allows “”” but not ”’ ? (the results being ” and unclosed quote /or syntax error respectively)

When trying to solve a programming problem, generally I try the most simple solution before I try something more complex: and then evaluate a neater method. I consider the implications solutions have on efficiency, but that is trying to avoid shooting myself in the foot later, rather then trying to optimize the code for a machine.

Some how, I think expanding quotes is just naturally recursive in my crazy brain :-D.

EDIT


commit aeac14bd177a93b84c138a0c62e2cda49e5fe15c
Author: Terry <***snip***>
Date: Tue Apr 7 22:24:35 2009 +0000

bugfix: parameters now expand within quotes via expand_quotes and may be escaped

commit 089fda7cca0049dcabdc8b9659f94dcae417074b
Author: Terry <***snip***>

bugfix: escaped quotes witihn quotes and multiple quotes handled correctly

previous behaviour:

$ echo 'foo "bar'
expand_quotes ': echo | foo "bar |
foo "bar
$ echo "foo "bar"
expand_quotes ": echo | foo | bar"
foo bar"
$ echo '' "" '' "" '""' '' "" '"' "'"
expand_quotes ': echo | | "" '' "" '""' '' "" '"' "'"
expand_quotes ": " | '' | " '""' '' "" '"' "'"
expand_quotes ': " | "" | '' "" '"' "'"
expand_quotes ': ' | "" | "' "'"
expand_quotes ": "' | ' |
" '' " "" ' "" "' '
$

new behaviour:

$ echo 'foo "bar'
expand_quotes ': echo | foo "bar |
foo "bar
$ echo "foo "bar"
expand_quotes ": echo | foo "bar |
foo "bar
$ echo '' "" '' "" '""' '' "" '"' "'"
expand_quotes ': echo | | "" '' "" '""' '' "" '"' "'"
expand_quotes ": | | '' "" '""' '' "" '"' "'"
expand_quotes ': | | "" '""' '' "" '"' "'"
expand_quotes ": | | '""' '' "" '"' "'"
expand_quotes ': | "" | '' "" '"' "'"
expand_quotes ': | | "" '"' "'"
expand_quotes ": | | '"' "'"
expand_quotes ': | " | "'"
expand_quotes ": | ' |
"" " '
$