Free Website Scraper Using Google Sheets

I have been using this website scraper for awhile to build my startup list directory and so I wanted to share this post and hack with you.

You can extract specific information from a website and show it in your Google Sheet using some of Sheets’ special formulas.

We will be mostly be using Google Sheet’s IMPORTXML function 🙂

IMPORTXML is a very helpful function that can be used in Google Sheets to effectively crawl and scrape website data in small quantities (especially useful for grabbing titles and meta descriptions, etc.). It can be faster and more convenient that using Screaming Frog or other tools, especially if you only need to pull data for a handful of URLs. This post will show you how to use IMPORTXML with XPath to crawl website data including: metadata, Open Graph markup, Twitter Cards, canonicals and more.

01
of 06
Setting Up The IMPORTXML Formula

This is the IMPORTXML formula: =IMPORTXML(url,xpath_query)

1=IMPORTXML(url,xpath_query)

Y

The first half of the formula just indicates what URL is going to be crawled. This can be an actual URL – but it’s much easier to reference a cell in the spreadsheet and paste the URL there.

02
of 06
Crawling Metadata with IMPORTXML

The following XPath formulas will scrape some of the most commonly desired SEO data like metadata, canonical tags, and H headings. Note that you can scrape any level of H heading by replacing the “h1” with whichever heading you want to scrape (h2, h3, etc.) Title Tags: //title/text() Meta Descriptions: //meta[@name=’description’]/@content Canonical Tags: //link[@rel=’canonical’]/@href H1 Heading(s): //h1/text() H2 Heading(s): //h2/text()

12345Title Tags: //title/text()Meta Descriptions: //meta[@name=’description’]/@contentCanonical Tags: //link[@rel=’canonical’]/@hrefH1 Heading(s): //h1/text()H2 Heading(s): //h2/text()

03
of 06
Social Markup

While social markup has no immediate SEO benefit, it is very important for sites that have active audiences on social media, and implementation of social markup often falls under the umbrella of SEO because of its technical nature. The following XPath formulas will allow you to scrape Open Graph and Twitter Card markup.

Open Graph Markup

Open Graph is used by Facebook, LinkedIn and Pinterest, so all the more reason to make sure it’s implemented correctly. OG Title: //meta[@property=’og:title’]/@content OG Description: //meta[@property=’og:description’]/@content OG Type: //meta[@property=’og:type’]/@content OG URL: //meta[@property=’og:url’]/@content OG Image: //meta[@property=’og:image’]/@content OG Site Name: //meta[@property=’og:site_name’]/@content OG Locale: //meta[@property=’og:locale’]/@content

1234567OG Title: //meta[@property=’og:title’]/@contentOG Description: //meta[@property=’og:description’]/@contentOG Type: //meta[@property=’og:type’]/@contentOG URL: //meta[@property=’og:url’]/@contentOG Image: //meta[@property=’og:image’]/@contentOG Site Name: //meta[@property=’og:site_name’]/@contentOG Locale: //meta[@property=’og:locale’]/@content

Twitter Card Data

Twitter Card markup is only for….Twitter. Still important though! Twitter Title: //meta[@name=’twitter:title’]/@content Twitter Description: //meta[@name=’twitter:description’]/@content Twitter Image: //meta[@name=’twitter:image’]/@content Twitter Card Type: //meta[@name=’twitter:card’]/@content Twitter Site: //meta[@name=’twitter:site’]/@content Twitter Creator: //meta[@name=’twitter:creator’]/@content

123456Twitter Title: //meta[@name=’twitter:title’]/@contentTwitter Description: //meta[@name=’twitter:description’]/@contentTwitter Image: //meta[@name=’twitter:image’]/@contentTwitter Card Type: //meta[@name=’twitter:card’]/@contentTwitter Site: //meta[@name=’twitter:site’]/@contentTwitter Creator: //meta[@name=’twitter:creator’]/@content

04
of 06
Limits

Unfortunately, IMPORTXML & Sheets cannot be used to scrape large quantities of data at scale, or it will stop functioning. For more than a handful of URLs, it’s recommended to use a more robust program like Screaming Frog (Screaming Frog does not have a URL limit when using it in list mode).

05
of 06
Free Website Scraper Template

Free website scraper using google sheets 1 large |  Tips and Tricks On Anything But Net | google sheet, website scraper
Download the free website scraper template today 🙂

Download the template : You can see how this works firsthand by making a copy of this Free website scraper Template

06
of 06
This Post is Brought To You By

Enter sponsor title and e-mail.
Enter website URL. Sponsor title will be hyperlinked with this URL.
loading |  Tips and Tricks On Anything But Net | google sheet, website scraper
0 0 vote
Article Rating

About Maida
She is Zipsite's all around Zipsiter. Only clocks out when she can barely stand. She sings, bakes, and tries so hard to be fit. But don't judge, she can literally strong arm you no problem. She's currently involved with eCorp, Contrib and VNOC, besides being a Brazilian Jiu Jit Su practitioner.

Subscribe
Notify of
0 Comments
Inline Feedbacks
View all comments
ajax-loader
0
Would love your thoughts, please comment.x
()
x
%d bloggers like this: