You are here: TWiki> OpenGIS Web>Application>WebRobot (2009-11-17, TWikiAdminUser)

Web Robot

We need some web robot to grab special information form the web.

We believe that a correct classification of web robot is:

  • browser / web graber
  • web robot
  • inteligent web robot

Sample Code for Browser / Web Graber

A browser / web graber takes a single http page URL/HREF?PARAMS. The module browse throw the content and grabs same information. To get the html content we

#!/usr/bin/perl -w

use strict;

# We have some input text content
my $input= "Ralf Schaer";

# This will be our text content output
my $output= "";

# 
# With this pipe we can self read from STDOUT 
#
my $pid= open(FROM, "-|");
die "FATAL ERROR: Can't open the pipe FROM $! \n" unless defined($pid);

if ( $pid ) {
   while ( <FROM> ) {
      chomp($_);
      $output.= $_;
   }
   else {
      #
      # We pipe the input content into a sed pipe chaine
      #
      open(PARSER, "| sed -n -f my.sed | sed -n -f second.sed")
         or die "FATAL ERROR: Can't open the pipe PARSER $! \n";
      print PARSER $input;
      close(PARSER);
      exit(1);
   }
}

close(FROM);

# We check the parsed content
#
print "The ouput is $output\n";

1;

-- TWikiAdminUser - 2009-11-12

Topic revision: r3 - 2009-11-17 - 20:50:01 - TWikiAdminUser
 

TWIKI.NET
This site is powered by the TWiki collaboration platformCopyright SUBJECT.CH © 2010 by the contributing authors. All material on this collaboration platform is the property of the contributing.
Ideas, requests, problems regarding TWiki? Send feedback