Circuits Of Imagination

Class Extractor

01 Sep 2007

Ok so I wanted to do a little something in perl because it’s been a while since I’ve written anything in it. Since I still have the code from my myspace profile I decided to write a little script to extract the css classes from it. Well here it is:

#use strict ;
use HTML::Parser ;

sub handle_start{
  shift ;
  #print $tag."\n" ;  
  my @attrs = @_ ;
  for (@attrs) {
    my %hash = %$_ ;
    #print "\t".%hash."\n" ;
     for(keys %hash){
       if(m/^class$/){
  $class = $hash{$_} ;
  if(defined $class_count{$class}){
    $class_count{$class} += 1;
  }
  else{
    $class_count{$class} = 0;
  }
       }
     }
  }
}

$parser = HTML::Parser->new ;
$parser->handler(start=>\&handle_start,  'tag, attr') ;
while(){
  $parser->parse($_) ;
}

for(keys %class_count){
  print $_." : ".$class_count{$_}."\n" ;
}

It seems to work pretty well. I tried to write the same script in python but it’shtml parser it too uptight or myspace so…. I fell kinda guilty because I like python more than perl but sometimes perl’s just kinda cool to play with.

Here is what I got when I ran my unedited profile page throught it.

nametext : 0
redtext : 1
bodyContent : 0
txt : 0
redlink : 0
redbtext : 0
extendedNetwork : 0
contactTable : 0
tdborder : 2
text : 28
btext : 1
navbar : 14
friendSpace : 0
blacktext12 : 0
text tdborder : 1
whitetext12 : 1
commentlinks : 1
navigationBar : 0
userProfileDetail : 0
friendsComments : 0
blacktext10 : 1
userProfileURL : 0
lightbluetext8 : 8
latestBlogEntry : 0
profileInfo : 0
blurbs : 0
orangetext15 : 4
ImgOnlineNow : 1

So I hope that a list of all the classes in myspace is usefull. I think a little more usefull is the a script that extracts classes and count from a web page