In this article I'm going to give you the exact method I use to completely automate all my scraping. Couple minutes of setup and you won't have to touch Scrapebox for a month while it scrapes millions of links each day! If you haven't already, I recommend that you read my last article about how to get the most out of Scrapebox before trying to implement this method.
Table Of Contents
- 1 Meet Scrapebox Automator
- 2 How to Make Scrapebox TRULY Automated
- 3 Tips / Notes
- 4 Conclusion
- 5 You Might Also Like:
Meet Scrapebox Automator
Automator is one of the official paid plugins for Scrapebox.
Like it name suggests, it gives you the ability to automate different tasks in Scrapebox by creating "jobs".
We won't go over each feature in detail in this article since there's already a video about this made by loopline from Scrapebox explaining everything about Automator:
Also, while the Automator can work with most of the tasks in Scrapebox, in this article we will be focusing exclusively on the harvesting part.
How to Use Automator Properly
There's not much setup needed, better there are couple of things to keep in mind while using it.
The most important thing you should do is create a folder for your Automator. I personally put it in the root of C on my VPS and recommend that you do the same:
The second most important thing is...don't use public proxies!
I've already talked about this but the tl;dr; version is: public proxies suck.
They won't live long enough for you to complete a single scraping run. And if Scrapebox gets stuck on the first run without working proxies, the rest of your scrape naturally won't work.
Example Automator Job #1
So you can create a pretty simple job for scraping that does the following:
Pretty cool, right?
This is just a basic example but you get the idea. And you can do so much more...
Right now the job is handy but it will only do a single scraping run and then stop.
So let's expand it a bit...
Example Automator Job #2
Let's create a more automated version of the previous job by adding a loop at the end:
Now we're getting somewhere. This job can run indefinitely and save the scraped list to a new file each time it runs.
But we still have a problem. It can't import a different list of keywords each time it runs.
How Not to Do Jobs
The solution that we might think of at first is to copy the steps 1-4 multiple times, each time importing a different keyword file and then loop all that:
Now you have a job that runs couple different scrapes but it's still not what we're looking for.
What we want here is a true set-and-forget job. One that you can setup in minutes and walk away for weeks (even months), knowing that everything still works.
How to Make Scrapebox TRULY Automated
Our example jobs are missing 1 crucial thing in order to make them properly automated.
They're missing a way to automatically use different keywords on each job run, no matter how many times the job runs.
This is a problem that requires some bash scripting. And don't worry if you never done any bash scripting, I'm going to show you the code needed and explain exactly what it does.
The idea is that you should be able to use the same keyword file in the job but swap the keywords in that file before each run using the script.
Step 1 - Organizing Your Automator Files
Remember when I told you to create an "Automator" folder in C drive? It's kinda a big deal.
You see, we need to keep all the files we need in that folder so our file paths (which we use when setting up a job) always remain the same.
Here's how you should setup this folder:
- Jobs - Folder for Automator job files
- Keywords - Folder where all the keywords are stored (very important, we'll talk about this in a bit)
- Scraped - Folder where all the scraped results go
- footprints.txt - A list of footprints to merge with keywords (used in each run)
- keywords.txt - List of keywords used in CURRENT scraping run (no need to modify this - it's automatically modified before each run)
- log.txt - Simple log for keeping track of when / which keyword files are being used
- proxies.txt - List of all your proxies (ip:port:username:pass)
- SetKeywords.bat - Batch script that updates keywords.txt before each run (very important)
Step 2 - The Keywords
After we setup our Automator folder properly, we need to setup our Keywords folder as well. (don't worry, it's stupid easy to do).
What we want to put here is many small .txt files with keywords (unlike everything else in Automator folder, it doesn't matter what they're called).
What matters is that you have many of them. The more you have, the longer your Scrapebox will be able to scrape without you having to touch it.
1 scraping run = 1 keyword file.
I recommend that you keep these files small (500-5000 keywords) so that you minimize the risk of crashing Scrapebox (this shouldn't be a problem on dedicated servers using 64-bit Scrapebox but still, how many keywords in each file you have won't affect your speed so might as well keep it as low as possible to maximize reliability.)
Also, don't put more than 30,000 files in here for performance reasons. Just for reference, I run with about 1,000 and it lasts for more than a month.
To get these keywords you have 2 options:
- Scrape them yourself & split them in small files
- Get a ready-to-go list of niche based keywords like this one
It's PERFECT for this as it gives you access to over 1,800 niche-based keyword files, organized in 25 major niches / categories for a total of over 1.3 mil unique level 1 keywords.
Just for reference, I run about 950 files with 10 footprints at a time which is more than 30 days of completely automated scraping.
You just copy the files in your niche (or all of them if you want) and you're good to go:
And the last thing you need to do here is to and create an empty folder called "USED" like so:
This is where all the keywords you already used will go...but we will get to that in a minute when we talk about the script.
Step 3 - The Secret Automation Ingredient
The little thing that makes all this tick is the SetKeywords.bat script.
The script looks like this:
If you find this daunting, don't worry, here's what the script does in layman terms:
- It deletes the keywords.txt file
- It chooses a random file within Keywords folder
- It creates the keywords.txt from that file
- It moves the chosen file to the USED folder
- It adds a line to log.txt noting the time & filename of the chosen file
The next part is the part that ties it all together.
Step 4 - The Job
Here it is. The job that uses all that we've done so far to turn Scrapebox 100% automated.
You can setup your footprints, add thousand keyword files, start this job and leave for a good couple of weeks, no problem.
Here's how the job looks like:
1. We run our SetKeywords script in order to prepare keywords.txt:
2. We clear all URLs & Keywords from past runs
3. We load our proxies from proxies.txt:
4. This is where we set the search engine to scrape, footprint and keyword files we'll be using and the file in which to save the results after each run:
Couple of things to note here:
- Make sure to click "Platform footprints: None". If you wish to use Scrapebox's platform footprints instead of your own - just select the platforms you want in the list, uncheck the "Enable merging of footprints" checkbox and delete the footprints.txt path
- If your keyword files are already merged with footprints - uncheck "enable merging of footprints" and delete the footprints path.
- Make sure that the "Add timestamp" is checked. If not, Scrapebox will overwrite the results file after each run.
5. We loop this for as long as there are keywords in Keywords folder (after the last keyword file has been used, your Scrapebox will give you an error):
Tips / Notes
Since we're using "Harvest urls" to export our list (rather than using an export function which doesn't support adding timestamps to files), you should enable "Auto remove duplicate urls..." option:
While this will delete duplicates from each individual file after each run, you will still have a lot of duplicates across all lists from different scraping runs.
One way to handle this (automatically) is to use GSA Platform Identifier and setup a "delete duplicates" project and set it to automatically remove duplicates from the "Scraped" folder every day or so.
That's it! You now have everything you need to make Scrapebox scrape over a BILLION links for you with minimal babysitting required.
If you're already a SEOSpartan and want to download the whole Automator folder ready-to-go with a quickstart checklist, please login and go to "SEOSpartan Goodies". If you're not a SEOSpartan, you can become a member today for free.
If I didn't explain something properly or if you have any question, please post them in the comments and I'll answer them as soon as possible.