Use PHP DOM Parser for more robust screen scraping

I’d just like to put this out there, as I just “failed” a “do-at-home” interview assignment which was to implement a screen scraper using Java/PHP. I had previously (1-2 years ago) done screen scrapers in PHP, so I proceeded to do this assignment the same way – using regexes. Little did I know that using regexes would be one of the weak points of my submission – they wanted me to use a DOM parser instead. In hindsight, I guess I should have looked into that, but it just never occured to me because I already used other methods in the past.

So the moral of the story is to use DOM parsers when writing screen scrapers, they should be more robust than regex parsing in most cases. Here is an example tutorial.

Advertisements

Tags: , , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: