a text file processing example in gst

When first trying Smalltalk I always found it difficult to obtain simple complete programs to use as a reference tool. Smalltalkers just love showing code 'snippets', but learners need complete examples so they can see the total program. Hopefully this will be of use to somebody.

The Problem: I have a simple database system created & maintained by a REBOL program, which incidently is a very interesting GUI development language, and the collected data has to be imported into an existing MS-Access database,

This REBOL database is really a simple ascii text file that is read/written into REBOL internal structures for processing by a end-user GUI interface. ( REBOL is not the problem BTW :-), the REBOL file is similar to , but not the same as a CSV ( Comma Separated Variables) so loved by MS poducts.

So to import this data some small transformations per record were required, so a simple script program was required, which I almost wrote in Ruby, but just by chance I noted a reference on the web to GNU Smalltalk's latest version so I decived to have a quick look. ( I normally write my Smalltalk programs in Cincom Smalltalk, but as I needed to supply this program to a third party I did not expect them to install the huge overhead that VWST entails. GNU ST fitted nicely).

the record layout

Looking at the REBOL record there are a number of interesting feaures :

  1. each record is braced by [ & ]
  2. string fields are braced by " & "
  3. memo ( multi line) fields are braced by { & }
  4. fields are separated by a single space between field delimiters eg: } { or " { or " " .. etc


Rebol uses [ ] " " { } for processing the records when read/written as required, there will be records that don't have any { } field structures. Which this program must account for.

NOTE: the displayed records below are actually one long line, there is only 1 LF ( line feed character) per record.

There are 8 fields defined -- can you find them ??
["Bendigo Advertiser Supplement" "26/09/1984" "6" {Sandhurst Town Tourist attraction celebrates 10th anniversary} 
 "Everybody's Talking" {Article by Ian Green summaries the attractions available to Bendigo visitors } 
 {Bendigo Pottery; Central Deborah Gold Mine; Chinese Joss House; Cherry Berry Farm; Di Gum San Wax Musuem;
Dudley House; Epsom Market; Hartland's Distillery; Mandurang Orchid Nursery; Woodstock Pottery;
Sweeneys Pottery; Chateau Le Amon; Chateua Dore and Balgownie Wineries} ""]

The output transformed CSV record :
"Bendigo Advertiser Supplement","26/09/1984","6","Sandhurst Town Tourist attraction celebrates 10th anniversary",
"Everybody's Talking","Article by Ian Green summaries the attractions available to Bendigo visitors ","
Bendigo Pottery; Central Deborah Gold Mine; Chinese Joss House; Cherry Berry Farm; Di Gum San Wax Musuem;
Dudley House; Epsom Market; Hartland's Distillery; Mandurang Orchid Nursery; Woodstock Pottery;
Sweeneys Pottery; Chateau Le Amon; Chateua Dore and Balgownie Wineries",""

Smalltalk code below was written using Scite editor onto file ReboltoCsv.st executed by $gst -f ReboltoCsv.st in the same didrectory as the program.


the complete gst program

| inStream utStream inRec inFilename utFilename s1 s2 |
" a simple program to convert a  Rebol file to a CSV file written in GNU Smalltalk
                Brett S Hallett (c) 2008 "

inFilename := 'testdb.txt' .
utFilename := 'testdb.csv' .
Transcript show: ' *** ReboltoCsv2.st '.
Transcript show: Date today printString; show: '  at '; show: Time now printString; nl. 
Transcript tab; show: 'Input file :' ; show: inFilename ; show: ' Output file :' ;  show:utFilename; nl.

utStream := FileStream open: utFilename  mode: FileStream write.
inStream :=FileStream open: inFilename mode: FileStream read.
rcnt := 0.
 " scan thru the text lines, using a traditional i/o loop "
[ inStream atEnd ] whileFalse: [
        rcnt := rcnt + 1.
        inRec := inStream  upTo: Character nl.     " // we have 1 record here "
        Transcript show: '@'; show: inRec; nl.  
                
                s1 := inRec findFirst: [ :e | e = ${ ].    " get next { index, exit loop on zero "
                [ s1 > 0 ] whileTrue: [
                        s1 := inRec findFirst: [ :e | e = ${ ].
                        s2 := inRec findFirst: [ :e | e =  $} ].
                        " Transcript show: s1 printString; tab; show: s2 printString; nl.  " " print { } indexs "
                        
                        s1 to: s2 do: [ :x |  ( (inRec at: x) = $" ) ifTrue: [ inRec at: x put: $' ]. ].
                        inRec at: s1 put: $".
                        inRec at: s2 put: $".    " suppress current { } pair "
                       s1 := inRec findFirst: [ :e | e = ${ ].    " get next { index, exit loop on zero "
                ].
                inRec at: 1 put: $ .
                inRec at: ( inRec size) put: $ .    " suppress current [ ] pair "
                inRec := inRec copyReplaceAll: '" "' with: '","'.        
                Transcript show: '&'; show: inRec; nl. 
                utStream nextPutAll: inRec; nl.  " write converted CSV record to file "
] .

inStream close.
utStream close.
Transcript  tab;show: 'Records Read/Written : '; show: rcnt printString; nl.
================= end =====================================


================== the  program  explained ===========================
This is the same code as above with special comments inserted ( // ) which hopefully explain
the workings of this program. The // comments are AFTER the code lines.

Overview : the program reads each REBOL record ( text line) into variable inRec, inRec is repeatedly scanned
for { } pairs as these REBOL fields require some internal processing, as each pair is processed its { } are
replaced with ", the scanning repeats until all {} fields in this record are processed. 

====================

| inStream utStream inRec inFilename utFilename s1 s2 |

// define local variables, in smalltalk the datatype is automatic by usage

" a simple program to convert a  Rebol file to a CSV file written in GNU Smalltalk
                Brett S Hallett (c) 2008 "

// just a comment 
                
inFilename := 'testdb.txt' .
utFilename := 'testdb.csv' .

// assign filenames to local variables

Transcript show: ' *** ReboltoCsv2.st '.
Transcript show: Date today printString; show: '  at '; show: Time now printString; nl. 
Transcript tab; show: 'Input file :' ; show: inFilename ; show: ' Output file :' ;  show:utFilename; nl.

// print ( on console) a starting message showing program name, data & time of run, the filenames.
          
utStream := FileStream open: utFilename  mode: FileStream write.
inStream :=FileStream open: inFilename mode: FileStream read.
rcnt := 0.
// open the input & output files, set record counter


 " scan thru the text lines, using a traditional i/o loop "
 
[ inStream atEnd ] whileFalse: [

// reads the input file until End of File , 

        rcnt := rcnt + 1.
        
        inRec := inStream  upTo: Character nl.     " // we have 1 record here "
        
// transfer text from the inStream into upto the next NewLine character , a record is now in inRec

        Transcript show: '@'; show: inRec; nl.  
                
                s1 := inRec findFirst: [ :e | e = ${ ].    " get next { index, exit loop on zero "
                
// find the first { character in the record, s1 will have  first {'s index OR 0 if not found        

                [ s1 > 0 ] whileTrue: [
                
//  test if { NOT found, skip { } processing for this record only

                        s1 := inRec findFirst: [ :e | e = ${ ].
                        s2 := inRec findFirst: [ :e | e =  $} ].
                        
// find the { } brace pair, note we already have s1 but we need to re-find to ensure we have a matching pair
// as later code converts the current  { } to " "

                        " Transcript show: s1 printString; tab; show: s2 printString; nl.  " " print { } indexs "
                        
// uncomment the above line to get a travce of s1 & s2 indexs                        
                        
                        s1 to: s2 do: [ :x |  ( (inRec at: x) = $" ) ifTrue: [ inRec at: x put: $' ]. ].
                        
// within {} we need to suppress any " by converting to ' ( CSV cannot have " inside a text field ), as the code loops
// it will find each { } pair in turn, so s1 & s2 are different thu each loop, unitl s1 = 0 ( not found)

                        inRec at: s1 put: $".
                        inRec at: s2 put: $".    " suppress current { } pair "
                        
// suppress current { } pair "                


                        s1 := inRec findFirst: [ :e | e = ${ ].    " get next { index, exit loop on zero "

// s1 controls the record processing loop

                ]. // end whileTrue loop

                
// loop until s1 = 0                
                
                inRec at: 1 put: $ .
                inRec at: ( inRec size) put: $ .    " suppress current [ ] pair "
                
// convert [ ] to blanks

                inRec := inRec copyReplaceAll: '" "' with: '","'.        
                
// convert field separaters to "," for CSV format.

                 Transcript show: '&'; show: inRec; nl. 
                utStream nextPutAll: inRec; nl.  " write converted CSV record to file "
                
 //  write converted CSV record to file 
                                
]. // readStream loop

// loop to get next record

inStream close.
utStream close.
Transcript  tab;show: 'Records Read/Written : '; show: rcnt printString; nl.

// close files and exit program
============================================================

Hope you find this useful

Brett

Hi Brett,

Nice article. Out of curiosity, why didn't you just LOAD the data with REBOL, saving yourself the trouble of parsing it? If it was for fun or exercise, very cool. Cool in any case. Thanks for taking the time to write it up.

For those who don't know REBOL, here's one way to do it:

rec: [
"Bendigo Advertiser Supplement" "26/09/1984" "6" {Sandhurst Town Tourist attraction celebrates 10th anniversary} "Everybody's Talking" {Article by Ian Green summaries the attractions available to Bendigo visitors } {Bendigo Pottery; Central Deborah Gold Mine; Chinese Joss House; Cherry Berry Farm; Di Gum San Wax Musuem;Dudley House; Epsom Market; Hartland's Distillery; Mandurang Orchid Nursery; Woodstock Pottery;
Sweeneys Pottery; Chateau Le Amon; Chateua Dore and Balgownie Wineries} ""
]
out: ""                        ; our output buffer
new-line/all rec off           ; turn off newline markers in the block, just 'cause
append out mold first rec      ; put the first field in the string
foreach fld next rec [repend out ["," mold fld]] ; put the rest in, with comma seps
replace/all out "{" {"}        ; fixup fields that got curly braced.
replace/all out "}" {"}        ; ditto
print out

It's naive, and deals with problems by fixing them. For example, MOLD may put curly braces around a string, rather than quotes, so we just replace those. Put that in a func, load with READ/LINES, add a FOREACH loop, CHANGE in place if you want, and WRITE/LINES to save it.

Hmmm, it can be even simpler, since all fields are strings:

out: mold/only rec
replace/all out "{" {"}
replace/all out "}" {"}
replace/all out {" "} {","}

--Gregg Irwin

Gregg, thanks for your interest and your REBOL code.

I wrote this Smalltalk code to demonstrate a complete text file handling program to Smalltalk newcommers,
because this simple but useful task is often not explained in Smalltalk documentation, and of course for fun as well. Using Smalltalk as a scripting tool is quite interesting, and GNU ST shapes up very nicely
for this task when compared with other scripting tools like Ruby, Python, etc

I mentioned REBOL was not the problem :-) , but it did offer a nice collection of text based records to be processed by my Smalltalk code, the fact that they are 'real' records probably makes this example more interesting.

Also, I doubt that I would have written such an elegant REBOL solution, as I'm still very much a REBOL learner. I'll definately be using your REBOL code in my program -- thanks :-)

User login