Scrapebox Automator Guide – Easily Scrape 24/7

Scrapebox Automator Guide – Easily Scrape 24/7

In this article I'm going to give you the exact method I use to completely automate all my scraping. Couple minutes of setup and you won't have to touch Scrapebox for a month while it scrapes millions of links each day! If you haven't already, I recommend that you read my last article about how to get the most out of Scrapebox before trying to implement this method.

Meet Scrapebox Automator

Automator is one of the official paid plugins for Scrapebox.

Like it name suggests, it gives you the ability to automate different tasks in Scrapebox by creating "jobs".


In order to use Automator with any degree of success, you need to be able to scrape reliably.

That means you need to be able to scrape with as few errors as possible for long periods of time.

How would you go about this?

The short answer is...use private proxies and scrape from Bing.

Why?

If you want the long answer then go check out my last article Scrapebox Scraping Tutorial – Easy 56 Million Links / Day.

We won't go over each feature in detail in this article since there's already a video about this made by loopline from Scrapebox explaining everything about Automator:

Also, while the Automator can work with most of the tasks in Scrapebox, in this article we will be focusing exclusively on the harvesting part.

How to Use Automator Properly

There's not much setup needed, better there are couple of things to keep in mind while using it.

The most important thing you should do is create a folder for your Automator. I personally put it in the root of C on my VPS and recommend that you do the same:

Automator Folder

The second most important thing is...don't use public proxies!

I've already talked about this but the tl;dr; version is: public proxies suck.

They won't live long enough for you to complete a single scraping run. And if Scrapebox gets stuck on the first run without working proxies, the rest of your scrape naturally won't work.

Example Automator Job #1

So you can create a pretty simple job for scraping that does the following:

Scrapebox Automator Example Job 1

Pretty cool, right?

This is just a basic example but you get the idea. And you can do so much more...

Right now the job is handy but it will only do a single scraping run and then stop.

So let's expand it a bit...

Example Automator Job #2

Let's create a more automated version of the previous job by adding a loop at the end:

Example Job Expanded

Now we're getting somewhere. This job can run indefinitely and save the scraped list to a new file each time it runs.

But we still have a problem. It can't import a different list of keywords each time it runs.

How Not to Do Jobs

The solution that we might think of at first is to copy the steps 1-4 multiple times, each time importing a different keyword file and then loop all that:

How Not To Do Automator Jobs

Now you have a job that runs couple different scrapes but it's still not what we're looking for.

What we want here is a true set-and-forget job. One that you can setup in minutes and walk away for weeks (even months), knowing that everything still works.

How to Make Scrapebox TRULY Automated

Our example jobs are missing 1 crucial thing in order to make them properly automated.

They're missing a way to automatically use different keywords on each job run, no matter how many times the job runs.

This is a problem that requires some bash scripting. And don't worry if you never done any bash scripting, I'm going to show you the code needed and explain exactly what it does.

The idea is that you should be able to use the same keyword file in the job but swap the keywords in that file before each run using the script.

Easy 5-min Setup

Become a SEOSpartan today and you instantly get access to:

  • ALL of the files mentioned in this tutorial
  • Quickstart guide that will get this setup for you in couple of minutes
  • PDF version of this post for easy reference
  • + all the other features that come with a SEOSpartan account...

Step 1 - Organizing Your Automator Files

Remember when I told you to create an "Automator" folder in C drive? It's kinda a big deal.

You see, we need to keep all the files we need in that folder so our file paths (which we use when setting up a job) always remain the same.

Here's how you should setup this folder:

Automator Folder File Structure

Folders:

  • Jobs - Folder for Automator job files
  • Keywords - Folder where all the keywords are stored (very important, we'll talk about this in a bit)
  • Scraped - Folder where all the scraped results go

Files:

  • footprints.txt - A list of footprints to merge with keywords (used in each run)
  • keywords.txt - List of keywords used in CURRENT scraping run (no need to modify this - it's automatically modified before each run)
  • log.txt - Simple log for keeping track of when / which keyword files are being used
  • proxies.txt - List of all your proxies (ip:port:username:pass)
  • SetKeywords.bat - Batch script that updates keywords.txt before each run (very important)

Step 2 - The Keywords

After we setup our Automator folder properly, we need to setup our Keywords folder as well. (don't worry, it's stupid easy to do).

What we want to put here is many small .txt files with keywords (unlike everything else in Automator folder, it doesn't matter what they're called).

What matters is that you have many of them. The more you have, the longer your Scrapebox will be able to scrape without you having to touch it.

1 scraping run = 1 keyword file.

I recommend that you keep these files small (500-5000 keywords) so that you minimize the risk of crashing Scrapebox (this shouldn't be a problem on dedicated servers using 64-bit Scrapebox but still, how many keywords in each file you have won't affect your speed so might as well keep it as low as possible to maximize reliability.)

Also, don't put more than 30,000 files in here for performance reasons. Just for reference, I run with about 1,000 and it lasts for more than a month.

To get these keywords you have 2 options:

It's PERFECT for this as it gives you access to over 1,800 niche-based keyword files, organized in 25 major niches / categories for a total of over 1.3 mil unique level 1 keywords.

Just for reference, I run about 950 files with 10 footprints at a time which is more than 30 days of completely automated scraping.

You just copy the files in your niche (or all of them if you want) and you're good to go:

How To Organize Your Keywords Folder

And the last thing you need to do here is to and create an empty folder called "USED" like so:

Add Empty USED Folder

This is where all the keywords you already used will go...but we will get to that in a minute when we talk about the script.

Step 3 - The Secret Automation Ingredient

The little thing that makes all this tick is the SetKeywords.bat script.

The script looks like this:

SetKeywords Batch Script

If you find this daunting, don't worry, here's what the script does in layman terms:

  1. It deletes the keywords.txt file
  2. It chooses a random file within Keywords folder
  3. It creates the keywords.txt from that file
  4. It moves the chosen file to the USED folder
  5. It adds a line to log.txt noting the time & filename of the chosen file

The script won't work if it's not in the root of Automator folder OR if you change the names of ANY of the files / folders (except keyword files in Keywords folder - these can be called whatever you want).

The next part is the part that ties it all together.

Step 4 - The Job

Easy 5-min Setup

Become a SEOSpartan today and you instantly get access to:

  • ALL of the files mentioned in this tutorial
  • Quickstart guide that will get this setup for you in couple of minutes
  • PDF version of this post for easy reference
  • + all the other features that come with a SEOSpartan account...

Here it is. The job that uses all that we've done so far to turn Scrapebox 100% automated.

You can setup your footprints, add thousand keyword files, start this job and leave for a good couple of weeks, no problem.

Here's how the job looks like:

1. We run our SetKeywords script in order to prepare keywords.txt:

Automator Job - Execute Script

2. We clear all URLs & Keywords from past runs
3. We load our proxies from proxies.txt:

Automator Job - Load Proxies

4. This is where we set the search engine to scrape, footprint and keyword files we'll be using and the file in which to save the results after each run:

Automator Job - Harvest

Couple of things to note here:

  • Make sure to click "Platform footprints: None". If you wish to use Scrapebox's platform footprints instead of your own - just select the platforms you want in the list, uncheck the "Enable merging of footprints" checkbox and delete the footprints.txt path
  • If your keyword files are already merged with footprints -  uncheck "enable merging of footprints" and delete the footprints path.
  • Make sure that the "Add timestamp" is checked. If not, Scrapebox will overwrite the results file after each run.

5. We loop this for as long as there are keywords in Keywords folder (after the last keyword file has been used, your Scrapebox will give you an error):

Automator Job - Loop

Tips / Notes

Since we're using "Harvest urls" to export our list (rather than using an export function which doesn't support adding timestamps to files), you should enable "Auto remove duplicate urls..." option:

Scrapebox Options - Auto Remove Duplicates

While this will delete duplicates from each individual file after each run, you will still have a lot of duplicates across all lists from different scraping runs.

One way to handle this (automatically) is to use GSA Platform Identifier and setup a "delete duplicates" project and set it to automatically remove duplicates from the "Scraped" folder every day or so.

Conclusion

That's it! You now have everything you need to make Scrapebox scrape over a BILLION links for you with minimal babysitting required.

If you're already a SEOSpartan and want to download the whole Automator folder ready-to-go with a quickstart checklist, please login and go to "SEOSpartan Goodies". If you're not a SEOSpartan, you can become a member today for free.

If I didn't explain something properly or if you have any question, please post them in the comments and I'll answer them as soon as possible.

6 Comments

  1. This is awesome.

    For anyone that spends money buying lists, combined with your previous tutorial, they’ve now got all they need to build their own.

    Many thanks.

    • John

      Yup, that’s automated scraping for you. But in order to build something like a GSA SER list 100% automated, we’re still missing a cruical component…GSA Platform Identifier. That’s what the next article will be about. 😉

  2. squadron

    I tried using the job file but it just loops a couple of times without actually scraping anything and then crashes. Any ideas what the issue might be?

    • John

      If it does a couple of loops before crashing, you’ve probably not set the keyword files in the “Keywords” folder properly and the footprints.txt, check those 2. Check the log file and see what it says. If there’s just blank times in there that means it was trying to find a keyword file to use but didn’t find anything. Also check the USED folder after a run.

  3. mda1125

    Looks like it’s working as promised EXCEPT when the “custom harvester” is running, it doesn’t show any footprints? When it’s completed, I see loads of URLS but nothing that explicitly matches the footprints I loaded.

  4. Indy

    Hello John

    Thanks for the very powerful info.

    I would like to know that is there any way to scrape only Google Adwords urls for given keyword ?

    Regards
    Indy

Leave a Reply