SEARCH

How-To Geek

Stupid Geek Tricks: Extract Links Off Any Webpage Using PowerShell

image

PowerShell 3 has a lot of new features, including some powerful new web-related features. They dramatically simplify automating the web, and today we are going to show you how you can extract every single link off a webpage, and optionally download the resource if you so wish.

Scraping The Web With PowerShell

There are two new cmdlets that make automating the web easier, Invoke-WebRequest which makes parsing human readable content easier, and Invoke-RestMethod which makes machine readable content easier to read. Since links are part of the HTML of a page they are part of the human readable stuff. All you have to do to get a webpage is use Invoke-WebRequest and give it a URL.

Invoke-WebRequest –Uri ‘http://howtogeek.com’

image

If you scroll down you will see the response has a links property, we can use PowerShell 3’s new member enumeration feature to filter these out.

(Invoke-WebRequest –Uri ‘http://howtogeek.com’).Links

image

As you can see you get a lot of links back, this is where you need to use your imagination to find something unique to filter out the links you are looking for. Lets suppose we want a list of all articles on the front page.

((Invoke-WebRequest –Uri ‘http://howtogeek.com’).Links | Where-Object {$_.href -like “http*”} | Where class -eq “title”).Title

image

Another great thing you can do with the new cmdlets is automate everyday downloads. Lets look at automatically scraping the the image of the day off the Nat Geo website, to do this we will combine the new web cmdlets with Start-BitsTransfer.

$IOTD = ((Invoke-WebRequest -Uri ‘http://photography.nationalgeographic.com/photography/photo-of-the-day/’).Links | Where innerHTML -like “*Download Wallpaper*”).href
Start-BitsTransfer -Source $IOTD -Destination C:\IOTD\

That’s all there is to it. Have any neat tricks of your own? Let us know in the comments.

Taylor Gibb is a Microsoft MVP and all round geek, he loves everything from Windows 8 to Windows Server 2012 and even C# and PowerShell. You can also follow him on Google+

  • Published 11/22/12

Comments (7)

  1. dima

    PowerShell is incredibly powerful tool.

  2. Subhash Debnath

    I tried in windows 7 ulitmate with both version of powershell but it didnt work.
    Would you please tell me where I am wrong. Its giving me message as mentioned below,

    PS C:\Users\User> Invoke-WebRequest -Uri `http://howtogeek.com’
    The term ‘Invoke-WebRequest’ is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path was included, verify that the path is correct and try again.
    At line:1 char:18
    + Invoke-WebRequest <<<< -Uri `http://howtogeek.com'
    + CategoryInfo : ObjectNotFound: (Invoke-WebRequest:String) [], CommandNotFoundException
    + FullyQualifiedErrorId : CommandNotFoundException

  3. Sam J

    Really great article!
    I use it to get a listing of all of the HTML input fields on the screen. However, if Javascript is used to add a field, it is not included. Also, Invoke-WebRequest doesn’t get the final HTML page if the page is redirected to another URL.
    Is there any way to get the final HTML – after any redirection and Javascript fields have been applied?
    Thanks!

  4. Barton

    Taylor, what is the transparent Windows theme you are using in this tutorial?
    Where did you get it from?

    Thanks in advance!

  5. Taylor Gibb

    @Barton its actually the default theme in Windows 8 :)

  6. Taylor Gibb

    @everybody else, this requires PowerShell 3. If you are getting errors that the cmdlet doesnt exist download the latest version of PoSH.

  7. Barton

    @Taylor: How to enable it? I have just the one with non-transparent background & thickier outline (Win8 RTM). I think this one was part of Win8 RP, am I right? Sorry for spamming the discussion…

Enter Your Email Here to Get Access for Free:

Go check your email!