• ARTICLES
SEARCH

How-To Geek

PHP: Get the contents of a web page, RSS feed, or XML file into a string variable

You will often have the need to access data that resides on another server, whether you are writing an online RSS aggregator or doing screen scraping for a searching mechanism. PHP makes pulling this data into a string variable an extremely simple process.

You can go with the really short method:

$url = “http://www.howtogeek.com”;

$str = file_get_contents($url);

 

The only problem with that method is that some web hosts have url access blocked in the file methods, for security reasons. You may be able to use this workaround method instead:

 

function get_url_contents($url){
        $crl = curl_init();
        $timeout = 5;
        curl_setopt ($crl, CURLOPT_URL,$url);
        curl_setopt ($crl, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt ($crl, CURLOPT_CONNECTTIMEOUT, $timeout);
        $ret = curl_exec($crl);
        curl_close($crl);
        return $ret;
}

You should now have the contents of the website in a string variable. Note that this doesn’t pull down the supporting files such as javascript or CSS. You will have to further parse the page and retrieve those seperately if you need the whole thing.

Lowell Heddings, better known online as the How-To Geek, spends all his free time bringing you fresh geekery on a daily basis. You can follow him on if you'd like.

  • Published 09/25/06

Comments (25)

  1. Demetrius

    Problem with RSS Feed in WordPress.
    I have a subdomain that I installed wordpress for another blog site, but the subdomain site's rss feed points to my parent site.
    Can anyone come up with any suggestions?

  2. Davide

    Hi,

    I’ve tried your hack but I always get the same result:

    “Destination host forbidden”

    How can I solve this issue?

    Thanks.

  3. ypi prem

    Thanks a lot for your simple function. This really gives a lot of power to user to reuse the internet!
    Ofcourse the content should be pulled with prior approval!

  4. anonymous

    i love u!

  5. vasu

    Hi

    fromhis code, is it possible that i fetch google news on my site…..

  6. can

    hi,

    i’ve been trying to get the web page content of a forum which requires log in to display the contents. what should I do to first login and then get the content of this webpage?

    thanks alot

  7. Praveenkumar

    Hi

    How to get a particuar condents form file or web pages,,,,

  8. spiggy

    i luvs u 2 :)

  9. user_flex

    can, can you please post the answer to ur question if you have found it?

  10. user_flex

    I am trying to get contents of a webpage, that requires a authenitcation .

  11. Myister Development

    You can not use this to hack lol. The first example but is disable by not just some but almost ALL hosting companies. So the first example is good for local work.
    Now the second example is key for grabbing the contents of a webpage by using almost a Proxy type code with Curl.

    After you run the code to get the page content you have to do something with it. The code above is a good example but is incomplete for the people just starting out..

    function get_url_contents($url){
    $crl = curl_init();
    $timeout = 5;
    curl_setopt ($crl, CURLOPT_URL,$url);
    curl_setopt ($crl, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt ($crl, CURLOPT_CONNECTTIMEOUT, $timeout);
    $ret = curl_exec($crl);
    curl_close($crl);
    return $ret;
    }

    echo get_url_contents($url); // This is just to display the results of the Function above

    You can even go further and scan the contents

    Pretty much what just happened is that you create a proxy good for only one look. The links will not work and scripts are disabled and blah blah blah.. You can get a hold of me if you have any questions about it.

    jgreer2009 at G Mail dot com

  12. Myister Development

    I take that back. The links do work!!!!! but you can not use it to hack.. But you can use it for a proxy! getting around school filter systems or your workplace…. Script needs a little more work but could be used as so

  13. Myister Development

    Ok I am back with a solution on how to see SOME pages that are not suppose to be seen by users

    here is a modified version of the Script above;
    function get_url_contents($url){
    $crl = curl_init();
    $timeout = 5;
    $useragent = “Googlebot/2.1 ( http://www.googlebot.com/bot.html)”;
    curl_setopt ($crl, CURLOPT_USERAGENT, $useragent);
    curl_setopt ($crl, CURLOPT_URL,$url);
    curl_setopt ($crl, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt ($crl, CURLOPT_CONNECTTIMEOUT, $timeout);
    $ret = curl_exec($crl);
    curl_close($crl);
    return $ret;

    What I did was added a useragent $Var in there and tricks the page you are looking at. It makes it think that it is a google bot. SOME sites, this is a security issue. ( Poor Coding )
    Think about it! Who wants to block their site by on of the biggest traffic handlers on the internet?

    Got to Love google hacking!!!!!!!!

    More information at jgreer2009@gamil.com

    Thanks guys hope this help you out

  14. loeis

    very usefull information.. thx

  15. max

    great! gonna try it!!

    @Myister Development: great work!

  16. Al

    tried this code:
    it works with most of the websites but not the one in this sample, tried everything possible no luck any one can help. I am trying to create bookmarks to the content and I have done it from other websites but not this one.

    $options = array(
    CURLOPT_USERAGENT => “Mozilla/4.0″, // who am i
    CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
    );

    $ch = curl_init();
    curl_setopt_array( $ch, $options );
    curl_setopt($ch, CURLOPT_URL, ‘http://top-channel.tv/video.php’);
    curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, 30);

    $content = curl_exec($ch);
    curl_close( $ch );

  17. Peter

    Anyone know how to clean code once you fetch data? any example where to start? thanks in advance!

  18. Ban Tay So

    Thanks for the code !

  19. Anonymous

    I luvs you 3!!!!!!! :)

  20. Tony

    Thank you Myister Development. You are a lifesaver.

  21. AD

    nice

  22. Nunu

    nice info.. love it :D

  23. Gal_op

    Thank you Myister Development,

    You have said: “You can even go further and scan the contents”

    How do i scan and show specific or ?

    Thanks,

  24. ClulessDude

    where do i put this code?

  25. saroj

    Dear admin, really cool trick, i like it. How can i access another website’s web form via my website’s web form?? I mean the way they process their form, i want to make result in the same way through my website. Please help me

Get Free Articles in Your Inbox!

Join 134,000 newsletter readers

Email:

Go check your email!