[ Previous ] [ Next ] [ Index ] [ C-Kermit Home ] [ Kermit Home ]
Article: 13566 of comp.protocols.kermit.misc
Subject: Case Study: Lynx/Kermit Coordination (Part II)
From: "Dallas E. Legan" <Use-Author-Address-Header@[127.1]>
Author-Address: dallasii AT kincyb DOT com
Date: Wed, 31 Jul 2002 04:44:57 -0700
Newsgroups: comp.protocols.kermit.misc
Lines: 378
***************************************************************************
Note: This article is not intended either to endorse or disparage
http://www.dice.com, but simply to describe the author's use of the service.
***************************************************************************
I had one of the leaders of the local USENIX chapter review my resume for suggested changes and he urged me to check for jobs on www.dice.com.
At first I looked at Dice a little, somewhat discouraged by previous experiences with online job boards. Then I had the idea of automating the process of sending resumes to the finds that came up, and really get after things. Sophisticated time/motion studies weren't needed to realize that only 3 or 4 keystrokes were needed to navigate from one search find to another. In vi cursor control mode these were barely more than finger twitches. To actually send information to to the firm in question though, chewed up several times that many keystrokes and constituted the real bottleneck in using the service. Just see the commented-out 'mailto:' sequence for Lynx in the script below. Then there were editor commands to read in a resume.
One thing I noticed was that the 'mailto:' links were not truly 'mailto:'s. While displaying as mail addresses, in fact they were http:...cgi?... style links apparently doing some things in the background. What I didn't exactly know. Trying to go to one directly from a command line invocation of the browsers I have seemed to paralyze the browser.
The first stab at this was a Rexx script (using the Regina GPL'd interpreter) to be run as a Lynx Extern program, (see http://www.columbia.edu/kermit/case20.html, Part I of this series.) tearing apart the CGI URL and mailing a resume to the firm listing the position. (See the comments in the listing below.) The reason for picking this language was that it seemed like a simple minded use of the Rexx Parse command. I tried running this a few days on what turned up, but the mailings seemed to do nothing whatever. Maybe something important was going on with the CGI script at the server?
The next try at this would use the actual http URL going through the whole process of punching in the browser questions for sending mail. Working on this a Friday afternoon, I would run Lynx inside a C-Kermit pseudoterminal session, (C-Kermit 8.0.200, 12 Dec 2001, running on Debian Linux 2.2) and rough out the first version of a script that would set and react when Kermit detected a 'trigger string' in the browsing session. In this first version, I would activate the 'mail' CGI link, and and the trigger string 'Using mailto:' would show up on the Lynx advanced user mode status line. At this point the C-Kermit script would take over, making sure my name was correctly entered, adding some to the subject line, erasing my address from the 'Cc:' option, and then reading in my resume to the vi edit session my Lynx configuration was set up to use for editing tasks. To my delight it worked! Being encouraged, sometime that afternoon, I decided to go for broke. I brought up the Dice.com advanced query page, set it to hunt for jobs in the last 30 days, 50 hits per page, first 2000 hits, for the keyword 'Perl', hit the 'search' link and 1,399 finds popped up. 1,399, a number I will never forget. :-) I proceeded plowing away.
At Dice 'perl' hit 700, I disconnected my dialup connection, leaving the browser on the page left off at, and went to pick up a couple of newspapers, my mail, get a meal, and some of my daily reading.
Returning, I decided to make an improvement to the script, so that I didn't have to manually activate the 'mailto:' CGI link. Instead, I used part of the CGI link URL, which would show up on the Lynx status line, as the trigger string, so that all I had to do was move the browser cursor over the link, and then C-Kermit would take over and send the e-mail. A little bit of experimentation was needed for this, documented in the commented-out tries. With this change, I pushed ahead until finishing all of the finds, sometime Saturday morning.
The script below is presented warts and all, only with some email addresses mangled to prevent their being spam targets, and comments added later denoted with '# *'. Typically, you might start the session with:
pty lynx -your -choice -of -switches -here http://www.dice.com
at the Kermit command line and you will be put in interactive mode with Lynx. There was an accompanying macro,
define dicecgi take /home..../dicedgi.ksc
to get it started with the script. This could be put in the Kermit startup files. From the interactive Lynx session, hit 'C-\ c' to return to the Kermit command line and use the macro 'dicecgi', instead of the usual 'connect' command to go interactive with browsing again. The script might be made a bit cleaner using the '/trigger:' switch on the 'connect' command, instead of setting it with 'set terminal ...'. As it is, the string is cleared as soon as it triggers return from the connect session, and not reset until Lynx is moved to the end of the page before reconnecting. All of this to avoid going into an infinite loop if it returned directly to the page as it was before triggering, where the same string would still be on the status line. There are probably many parallels with the handling of interrupts and signals. One obvious improvement would be to have it bookmark the links it processes, so a log of what was done could easily be kept. Of course the whole thing could be automated even more, taking on 50 find chunks (per web page) a time would probably fairly easy. On the other hand, leaving things only semi-automated allowed seeing a lot of what would probably be good to have seen before trying to automate even more. Some find pages pulled up a server error message when I tried to view them. Some had no mail link. Some seem to have had mangled email addresses as far as Lynx was concerned. Some pages had the mail link as the first link on the page, and so immediately started sending a resume.
By Saturday night, I estimate about 70 responses of various kinds had showed up. Some were mangled address results. Some were automated responses from the ad placers. Some were duplicate responses. Some the address no longer existed. Etc. A few seemed to actually be from people :-) Not too bad for a weekend.
As of this writing nothing definite, but the resume changes suggested by the local USENIX program manager are will have an effect in the longrun. The main thing is the resume is out there. All over, out there.
In order to bring this up to a mandatory buzzword content level, I thought of using the term 'Client Side Dynamic HTML' for the idea. The server provides its normal content, but the client is able to carry out the wishes of the user, not even having to wait for him to activate links. But the plain fact is that HTML was only peripherally involved in the process. A more accurate description is 'Automated client response'. The various servers being interacted with are all 'automated', and this simply equalizes the other, client side of the equation with them some.
# * <= indicates comments added for this article # script for automating Dice.com resume submission # * Start of the original Rexx script, here for documentation purposes # * # #!/usr/local/bin/rexx # /* ReXX */ # # /* # http://www.dice.com/mailto.cgi?xxxxxx@xxxxx.com&tttttt.EEE1111111191 # Subject: Job EEE1111 on DICE # */ # # # PARSE ARG URL '?' eddress '&' corpcode '.' jobid '&' somenumber ; # # # IF 'http://www.dice.com/mailto.cgi' >< URL THEN # DO # SAY 'invalid URL ='URL'=' ; # exit ; # END ; # # # /* Testing stuff: # SAY "mailx -s 'Subject: Job "jobid" on DICE or any other appropriate job' " , # || " "eddress" < /home/dallas/download/resume12x.txt " # 'cat /home/dallas/download/resume12x.txt ' # # For real: # */ # # SAY "mailx -s 'Subject: Job "jobid" on DICE or any other appropriate job' " , # || " "eddress" < /home/dallas/download/resume12x.txt " # # EXIT ; # * End of the Rexx script # * Listing of what Lynx queries on when processing a 'mailto:' link: # # # You are sending a comment to: # xxxx.xxxxx@xxxxxxxxxxxxxx.com # # Use Ctrl-G to cancel if you do not want to send a message # # Please enter your name, or leave it blank to remain anonymous # Personal Name: Dallas E. Legan II # # Please enter a mail address or some other # means to contact you, if you desire a response. # Use Control-U to erase the default. # From: xxxxxxx@xxxxxxx.com # # Please enter a subject line. # Use Control-U to erase the default. # Subject: Job MMMM_CCCC_Mrr. on DICE # # Enter a mail address for a CC of your message. # Use Control-U to erase the default. # (Leave blank if you don't want a copy.) # Cc: xxxxxxx@xxxxxxx.com # # * End of Lynx console interactions for 'mailto:'s # * Real start of the Kermit script: :recycle clear input-buffer # set terminal trigger {Using mailto:} # set terminal trigger {www\.dice\.com/mailto\.cgi} # set terminal trigger {www\.dice\.com/mailto\.cgi} # set terminal trigger {http://www\.dice\.com/mailto\.cgi\?} # set terminal trigger {www.dice.com/mailto.cgi} set terminal trigger {ailto.cgi} # something, escape sequences or double 'mm' in string, # seemed to through this trigger string off when # 'mailto.cgi' # * When working on final version of the script, went through # * quite a few desperate tries at the trigger string # * before realizing that Lynx was only sending a fragment # * of the CGI script URL wrapped in ANSI terminal escape # * escape sequences - just what was needed to change the displayed # * URL on the status line. # * Escaping the periods should normally never be needed. # * (Like I said, this script is presented warts and all! :-) ) # set terminal trigger {mailto.cgi} connect # * -- Here is where the browsing session starts, # * and when the trigger string is encountered, # * it pops back 'up' to this Kermit command line # * session and this script resumes execution set terminal trigger # -- turn off the trigger if equal {\v(trigger)} {} end 0 else output \013 # * -- If leaving the browsing session via 'C-\ c' # * end this script, otherwise activate the link # * that was under the cursor with a carriage return # * Normally, the 'input' commands, such as follow from here on out # * in this script, should be accompanied by 'if success ...' or # * 'if failure ...', taking appropriate actions to build a robust script. # * I skipped this on this project for several reasons: # * 1) I was in a hurry # * 2) The interacting entities, in this case, were two programs # * running on my computer. When these prompts come up, # * the CGI script has already run and fed a 'mailto:' to the # * local browser, and it is feeding these query prompts # * locally, not over the net. If there is a communication # * foulup, it is indicative of a local problem. # * 3) The 15 seconds allotted for to wait for responses # * in most cases was an eternity # * compared with how fast things were actually happening. # * If anything went wrong, I could just 'C-C' and abort the script, # * and try over. This never proved necessary after things were # * debugged. The purpose of the 'input's are to wait for strings # * from Lynx to the 'console' (really Kermit in this case, # * through the pseudo-terminal connecting them) # * to keep the commands Lynx is *given* synchronized # * with what it is in a state to receive and act on. # * # * Anyway, these checks would probably be a good thing to add before # * moving to higher levels of automation. input 15 {Personal Name:} pause 1 output \021Dallas E. Legan II\013 # input 15 {From: xxxxxxx@xxxxxxx.com} input 15 {From:} input 15 {xxxxxxx.com} output \13 input 15 {Subject:} input 15 {on DICE} output {\ or any position\13} input 15 {Cc} # input 15 {com} # temp to test: output \021\013 output \021\013 # output \013 # Do you wish to include the original message? (y/n) input 15 {(y/n)} output n # * verify that vi is started up: input 15 {~} pause 1 # * The next few commands, 'output ....\13' # * can be replaced with simply 'lineout ....' : output {:read /home/dallas/download/resume12x.txt\13} # pause 2 pause 1 output {:wq\!\013} # Send this comment? (y/n) input 15 {(y/n)} output y # Append '/home/dallas/.lynxsig'? (y/n) input 15 {(y/n)} output y input 1 qqq # pause 5 # * clearing Lynx/pty to accept a command, with an unlikely string output \005 # -- move to the end of the web page # * with C-E input 1 qqq # pause 1 output \012 # -- resetting the console # * with C-L goto recycle # * End of the script
Regards,
Dallas E. Legan II / leganii@surfree.com / dallasii@kincyb.com
Powered by......Lynx, the Internet at hyperkinetic speed.
[ Top ] [ Previous ] [ Next ] [ Index ] [ C-Kermit Home ] [ Kermit Home ]
Kermit Case Study 24 / Columbia University / kermit@kermitproject.org / 31 Jul 2002