How-To Geek
PHP: Get the contents of a web page, RSS feed, or XML file into a string variable
You will often have the need to access data that resides on another server, whether you are writing an online RSS aggregator or doing screen scraping for a searching mechanism. PHP makes pulling this data into a string variable an extremely simple process.
You can go with the really short method:
$url = “http://www.howtogeek.com”;
$str = file_get_contents($url);
The only problem with that method is that some web hosts have url access blocked in the file methods, for security reasons. You may be able to use this workaround method instead:
function get_url_contents($url){
$crl = curl_init();
$timeout = 5;
curl_setopt ($crl, CURLOPT_URL,$url);
curl_setopt ($crl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($crl, CURLOPT_CONNECTTIMEOUT, $timeout);
$ret = curl_exec($crl);
curl_close($crl);
return $ret;
}
You should now have the contents of the website in a string variable. Note that this doesn’t pull down the supporting files such as javascript or CSS. You will have to further parse the page and retrieve those seperately if you need the whole thing.
Got Feedback? Join the discussion at discuss.howtogeek.com
Comments (25)
Programmer by day, geek by night, The Geek, also known as Lowell Heddings, spends all his free time bringing you fresh geekery on a daily basis. You can follow him on Google+ if you'd like.
- Published 09/25/06




Problem with RSS Feed in WordPress.
I have a subdomain that I installed wordpress for another blog site, but the subdomain site's rss feed points to my parent site.
Can anyone come up with any suggestions?
Hi,
I’ve tried your hack but I always get the same result:
“Destination host forbidden”
How can I solve this issue?
Thanks.
Thanks a lot for your simple function. This really gives a lot of power to user to reuse the internet!
Ofcourse the content should be pulled with prior approval!
i love u!
Hi
fromhis code, is it possible that i fetch google news on my site…..
hi,
i’ve been trying to get the web page content of a forum which requires log in to display the contents. what should I do to first login and then get the content of this webpage?
thanks alot
Hi
How to get a particuar condents form file or web pages,,,,
i luvs u 2 :)
can, can you please post the answer to ur question if you have found it?
I am trying to get contents of a webpage, that requires a authenitcation .
You can not use this to hack lol. The first example but is disable by not just some but almost ALL hosting companies. So the first example is good for local work.
Now the second example is key for grabbing the contents of a webpage by using almost a Proxy type code with Curl.
After you run the code to get the page content you have to do something with it. The code above is a good example but is incomplete for the people just starting out..
function get_url_contents($url){
$crl = curl_init();
$timeout = 5;
curl_setopt ($crl, CURLOPT_URL,$url);
curl_setopt ($crl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($crl, CURLOPT_CONNECTTIMEOUT, $timeout);
$ret = curl_exec($crl);
curl_close($crl);
return $ret;
}
echo get_url_contents($url); // This is just to display the results of the Function above
You can even go further and scan the contents
Pretty much what just happened is that you create a proxy good for only one look. The links will not work and scripts are disabled and blah blah blah.. You can get a hold of me if you have any questions about it.
jgreer2009 at G Mail dot com
I take that back. The links do work!!!!! but you can not use it to hack.. But you can use it for a proxy! getting around school filter systems or your workplace…. Script needs a little more work but could be used as so
Ok I am back with a solution on how to see SOME pages that are not suppose to be seen by users
here is a modified version of the Script above;
function get_url_contents($url){
$crl = curl_init();
$timeout = 5;
$useragent = “Googlebot/2.1 ( http://www.googlebot.com/bot.html)”;
curl_setopt ($crl, CURLOPT_USERAGENT, $useragent);
curl_setopt ($crl, CURLOPT_URL,$url);
curl_setopt ($crl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($crl, CURLOPT_CONNECTTIMEOUT, $timeout);
$ret = curl_exec($crl);
curl_close($crl);
return $ret;
What I did was added a useragent $Var in there and tricks the page you are looking at. It makes it think that it is a google bot. SOME sites, this is a security issue. ( Poor Coding )
Think about it! Who wants to block their site by on of the biggest traffic handlers on the internet?
Got to Love google hacking!!!!!!!!
More information at jgreer2009@gamil.com
Thanks guys hope this help you out
very usefull information.. thx
great! gonna try it!!
@Myister Development: great work!
tried this code:
it works with most of the websites but not the one in this sample, tried everything possible no luck any one can help. I am trying to create bookmarks to the content and I have done it from other websites but not this one.
$options = array(
CURLOPT_USERAGENT => “Mozilla/4.0″, // who am i
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
);
$ch = curl_init();
curl_setopt_array( $ch, $options );
curl_setopt($ch, CURLOPT_URL, ‘http://top-channel.tv/video.php’);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, 30);
$content = curl_exec($ch);
curl_close( $ch );
Anyone know how to clean code once you fetch data? any example where to start? thanks in advance!
Thanks for the code !
I luvs you 3!!!!!!! :)
Thank you Myister Development. You are a lifesaver.
nice
nice info.. love it :D
Thank you Myister Development,
You have said: “You can even go further and scan the contents”
How do i scan and show specific or ?
Thanks,
where do i put this code?
Dear admin, really cool trick, i like it. How can i access another website’s web form via my website’s web form?? I mean the way they process their form, i want to make result in the same way through my website. Please help me