This script performs a reverse DNS lookup against a list of IP addresses. I use it to determine genuine Googlebot requests for log file analysis.
It takes an Excel file (.xslx) called logs.xslx with a sheet named ‘Sheet1’ and looks for IPs in a column called ip. Then it performs a reverse lookup on the unique values. It exports an Excel file called validated_logs.xslx which contains all of the data from logs.
Technical
- Last year I moved this blog from WordPress to Hugo, hosted on Netlify. As a part of this move, I wanted to make the site as fast as possible and made a number of improvements, including adding in support for WebP images. This can be achieved with Hugo by creating the following shortcode: {{ $image := .Params.src }} {{ $type_arr := split $image "." }} {{ $srcbase := index $type_arr 0 }} {{ $srcext := index $type_arr 1 }} {{ $.
- This page describes some ways to extract search engine hits from a websites log files. Extracting Hits from Apache Log Files To extract just the Googlebot hits on the site using the GNU/Linux terminal, try this: grep 'Googlebot\/' access.log > googlebot_access.log That will write the Googlebot hits to a new logfile called googlebot_access.log. You can also pipe that output into another command, for example to extract only the URLs that Googlebot is requesting:
- Using log files for SEO analysis is a great way to uncover issues that you may have otherwise missed. This is because, unlike third party spiders, they allow you to see exactly how Googlebot is crawling a site. If you’re an SEO professional looking to carry out your own log file analysis, then the chances are you’ll have to request the files through your own, or your clients, dev team.