Pull meta and title from a page using PHP

This is a simple PHP script I compiled for a little side project.

The script returns the title and meta data of a page by using a function called ‘get_meta_tags’ and a string search for the title.

I imagine it is similar to what many social bookmarking sites and content aggregators do to pull information when registering a site or submitting a link. I rate this coupled with an RSS aggrigator and the image pulling script could make the basis for a good content aggregator (I have never made a content aggregator so I don’t know too much about this). I would love to know if Afigator.com or Amatomu.com would be interested in pulling images along with posts (I belive technorati does this.)

Anyway, thought this was a useful script and might come in handy.

If you have trouble reading it below, you can download the script by clicking here.

error_reporting(0);
$url = "http://www.noboxmedia.com";
if( ereg("^((http|https|ftp)://)?(((www\.)?[^ ]+\.[com|org|net|edu|gov|us])|([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+))([^ ]+)?$”, $url) == true)
{$page = $url;

$contents = implode(' ', file($page));

if($contents)

{

if(eregi(']*>([^<]*)]*>’, $contents, $regs))

{
echo $title = trim(ereg_replace(’[[:space:]]+’, ‘ ‘, $regs[1]));

$metaTags = get_meta_tags($url);

echo $keywords = $metaTags['keywords'];

echo $description = $metaTags['description'];

}

}

}
?>

Share this post: Share this post with the world.
  • Muti
  • GoGuide
  • Facebook
  • del.icio.us
  • Digg
  • Reddit
  • StumbleUpon
  • Technorati
By Robin / Mar 17th, 2008 / Category:

3 Responses

  1. Useful, but there will be problem if domain name is name of some European state :) So you will have to add all world domain names or solve it with [a-z]{2-4} (not sure if it’s right expression). Anyway, this is the best version for stripping domain :) GJ


  2. @danaketh: Well spotted :-) I was going to use the ‘parse_url()’ function to try get around that. As soon as I have it nailed I’ll post an update.

  3. Hey Robin, good idea! We just need some time to do all these nifty stuff… BTW, we use Python to parse feeds, but the same could be achieved easily enough with the mighty snake!

Leave a Reply