How to Use Gospider for Web Crawling and Scraping

In this guide, you will learn:

What Gospider is and how it works
What features it offers
How to use it for web crawling
How to integrate it with Colly for web scraping
Its main limitations and how to bypass them

Let’s dive in!

What Is Gospider?

Gospider is a fast and efficient web crawling CLI tool written in Go. It is built to scan websites and extract URLs in parallel, handling multiple requests and domains at the same time. Additionally, it respects robots.txt and can discover links even in JavaScript files.

Gospider offers several customization flags to control crawling depth, request delays, and more. It also supports proxy integration, along with various other options for greater control over the crawling process.

What Makes Gospider Unique for Web Crawling?

To better understand why Gospider is special for web crawling, let’s explore its features in detail and examine the supported flags.

Features

Below are the main features provided by Gospider when it comes to web crawling:

Fast web crawling: Efficiently crawl single websites at high speed.
Parallel crawling: Crawls multiple sites concurrently for faster data collection.
sitemap.xml parsing: Automatically handles sitemap files for enhanced crawling.
robots.txt parsing: Complies with robots.txt directives for ethical crawling.
JavaScript link parsing: Extracts links from JavaScript files.
Customizable crawl options: Adjust crawl depth, concurrency, delay, timeouts, and more with flexible flags.
User-Agent randomization: Randomizes between mobile and web User-Agents for more realistic requests. Discover the best User-Agent for web crawling.
Cookie and header customization: Allows custom cookies and HTTP headers.
Link finder: Identifies URLs and other resources on a site.
Find AWS S3 buckets: Detects AWS S3 buckets from response sources.
Find subdomains: Discovers subdomains from response sources.
Third-party sources: Extracts URLs from services like the Wayback Machine, Common Crawl, VirusTotal, and Alien Vault.
Easy output formatting: Outputs results in formats that are easy to grep and analyze.
Burp Suite support: Integrates with Burp Suite for easier testing and crawling.
Advanced filtering: Blacklists and whitelists URLs, including domain-level filtering.
Subdomain support: Includes subdomains in crawls from both the target site and third-party sources.
Debug and verbose modes: Enables debugging and detailed logging for easier troubleshooting.

Command Line Options

This is how a generic Gospider command looks like:

gospider [flags]

In particular, the supported flags are:

-s, --site: Site to crawl.
-S, --sites: List of sites to crawl.
-p, --proxy : Proxy URL.
-o, --output : Output folder.
-u, --user-agent: User Agent to use (e.g., web, mobi, or a custom user-agent).
--cookie : Cookie to use (e.g., testA=a; testB=b).
-H, --header: Header(s) to use (you repeat the flag multiple times for multiple headers).
--burp string: Load headers and cookies from a Burp Suite raw HTTP request.
--blacklist: Blacklist URL Regex.
--whitelist: Whitelist URL Regex.
--whitelist-domain: Whitelist Domain.
-t, --threads : Number of threads to run in parallel (default: 1).
-c, --concurrent : Maximum concurrent requests for matching domains (default: 5).
-d, --depth : Maximum recursion depth for URLs (set to 0 for infinite recursion, default: 1).
-k, --delay int : Delay between requests (in seconds).
-K, --random-delay int : Extra randomized delay before making requests (in seconds).
-m, --timeout int : Request timeout (in seconds, default: 10).
-B, --base : Disable all and only use HTML content.
--js : Enable link finder in JavaScript files (default: true).
--subs : Include subdomains.
--sitemap : Crawl sitemap.xml.
--robots : Crawl robots.txt (default: true).
-a, --other-source : Find URLs from 3rd party sources like Archive.org, CommonCrawl, VirusTotal, AlienVault.
-w, --include-subs : Include subdomains crawled from 3rd party sources (default: only main domain).
-r, --include-other-source : Include URLs from 3rd party sources and still crawl them
--debug : Enable debug mode.
--json : Enable JSON output.
-v, --verbose : Enable verbose output.
-l, --length : Show URL length.
-L, --filter-length : Filter URLs by length.
-R, --raw : Show raw output.
-q, --quiet : Suppress all output and only show URLs.
--no-redirect : Disable redirects.
--version : Check version.
-h, --help : Show help.

Web Crawling with Gospider: Step-by-Step Guide

In this section, you will learn how to use Gospider to crawl links from a multipage site. Specifically, the target site will be Books to Scrape:

The site contains a list of products spread across 50 pages. Each product entry on these listing pages also has its own dedicated product page. The steps below will guide you through the process of using Gospider to retrieve all those product page URLs!

Prerequisites and Project Setup

Before you start, ensure you have the following:

Go installed on your computer: If you have not installed Go yet, download it from the official website and follow the installation instructions.
A Go IDE: Visual Studio Code with the Go extension is recommended.

To verify that Go is installed, run:

go version

If Go is installed correctly, you should see output similar to this (on Windows):

go version go1.24.1 windows/amd64

Great! Go is set up and ready to go.

Create a new project folder and navigate to it in the terminal:

mkdir gospider-project
cd gospider-project

Now, you are ready to install Gospider and use it for web crawling!

Step #1: Install Gospider

Run the following go install command to compile and install Gospider globally:

go install github.com/jaeles-project/gospider@latest

After installation, verify that Gospider is installed by running:

gospider -h

This should print the Gospider usage instructions as below:

Fast web spider written in Go - v1.1.6 by @thebl4ckturtle & @j3ssiejjj

Usage:
  gospider [flags]

Flags:
  -s, --site string               Site to crawl
  -S, --sites string              Site list to crawl
  -p, --proxy string              Proxy (Ex: http://127.0.0.1:8080)
  -o, --output string             Output folder
  -u, --user-agent string         User Agent to use
                                        web: random web user-agent
                                        mobi: random mobile user-agent
                                        or you can set your special user-agent (default "web")
      --cookie string             Cookie to use (testA=a; testB=b)
  -H, --header stringArray        Header to use (Use multiple flag to set multiple header)
      --burp string               Load headers and cookie from burp raw http request
      --blacklist string          Blacklist URL Regex
      --whitelist string          Whitelist URL Regex
      --whitelist-domain string   Whitelist Domain
  -L, --filter-length string      Turn on length filter
  -t, --threads int               Number of threads (Run sites in parallel) (default 1)
  -c, --concurrent int            The number of the maximum allowed concurrent requests of the matching domains (default 5)
  -d, --depth int                 MaxDepth limits the recursion depth of visited URLs. (Set it to 0 for infinite recursion) (default 1)
  -k, --delay int                 Delay is the duration to wait before creating a new request to the matching domains (second)
  -K, --random-delay int          RandomDelay is the extra randomized duration to wait added to Delay before creating a new request (second)
  -m, --timeout int               Request timeout (second) (default 10)
  -B, --base                      Disable all and only use HTML content
      --js                        Enable linkfinder in javascript file (default true)
      --sitemap                   Try to crawl sitemap.xml
      --robots                    Try to crawl robots.txt (default true)
  -a, --other-source              Find URLs from 3rd party (Archive.org, CommonCrawl.org, VirusTotal.com, AlienVault.com)
  -w, --include-subs              Include subdomains crawled from 3rd party. Default is main domain
  -r, --include-other-source      Also include other-source's urls (still crawl and request)
      --subs                      Include subdomains
      --debug                     Turn on debug mode
      --json                      Enable JSON output
  -v, --verbose                   Turn on verbose
  -q, --quiet                     Suppress all the output and only show URL
      --no-redirect               Disable redirect
      --version                   Check version
  -l, --length                    Turn on length
  -R, --raw                       Enable raw output
  -h, --help                      help for gospider

Amazing! Gospider has been installed, and you can now use it to crawl one or more websites.

Step #2: Crawl URLs on the Target Page

To crawl all links on the target page, run the following command:

gospider -s "https://books.toscrape.com/" -o output -d 1

This is a breakdown of the Gospider flags used:

-s "https://books.toscrape.com/": Specifies the target URL.
-o output: Saves the crawl results inside the output folder.
-d 1: Sets the crawling depth to 1, meaning that Gospider will only detect URLs on the current page. In other words, it will not follow found URLs for deeper link discovery.

The above command will produce the following structure:

gospider-project/
  └── output/
        └── books_toscrape_com

Open the books_toscrape_com file inside the output folder, and you will see output similar to this:

[url] - [code-200] - https://books.toscrape.com/
[href] - https://books.toscrape.com/static/oscar/favicon.ico
# omitted for brevity...
[href] - https://books.toscrape.com/catalogue/page-2.html
[javascript] - http://ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js
# omitted for brevity...
[javascript] - https://books.toscrape.com/static/oscar/js/bootstrap-datetimepicker/locales/bootstrap-datetimepicker.all.js
[url] - [code-200] - http://ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js
# omitted for brevity...
[linkfinder] - [from: https://books.toscrape.com/static/oscar/js/bootstrap-datetimepicker/locales/bootstrap-datetimepicker.all.js] - dd/mm/yyyy
# omitted for brevity...
[url] - [code-200] - https://books.toscrape.com/static/oscar/js/bootstrap-datetimepicker/bootstrap-datetimepicker.js

The generated file contains different types of detected links:

[url]: The crawled pages/resources.
[href]: All <a href> links found on the page.
[javascript]: URLs to JavaScript files.
[linkfinder]: Extracted links embedded in JavaScript code.

Step #3: Crawl the Entire Site

From the output above, you can see that Gospider stopped at the first pagination page. It detected the link to the second page but did not visit it.

You can verify this because the books_toscrape_com file contains:

[href] - https://books.toscrape.com/catalogue/page-2.html

The [href] tag indicates that the link was discovered. However, since there is no corresponding [url] entry with the same URL, the link was found but never visited.

If you inspect the target page, you will see that the above URL corresponds to the second pagination page:

Inspecting the “next” element on the first pagination page

To crawl the entire website, you need to follow all pagination links. As shown in the image above, the target site contains 50 product pages (note the “Page 1 of 50” text). Set Gospider’s depth to 50 to ensure it reaches every page.

Since this will involve crawling a large number of pages, it is also a good idea to increase the concurrency rate (i.e., the number of simultaneous requests). By default, Gospider uses a concurrency level of 5, but increasing it to 10 will speed up execution.

The final command to crawl all product pages is:

gospider -s "https://books.toscrape.com/" -o output -d 50 -c 10

This time, Gospider will take longer to execute and produce thousands of URLs. The output will now contain entries like:

[url] - [code-200] - https://books.toscrape.com/
[href] - https://books.toscrape.com/static/oscar/favicon.ico
[href] - https://books.toscrape.com/static/oscar/css/styles.css
# omitted for brevity...
[href] - https://books.toscrape.com/catalogue/page-50.html
[url] - [code-200] - https://books.toscrape.com/catalogue/page-50.html

The key detail to check in the output is the presence of the URL of the last pagination page:

[url] - [code-200] - https://books.toscrape.com/catalogue/page-50.html

Wonderful! This confirms that Gospider successfully followed all pagination links and crawled the entire product catalog as intended.

Step #4: Get Only the Product Page

In just a few seconds, Gospider collected all URLs from an entire site. That could be the end of this tutorial, but let’s take it a step further.

What if you only want to extract product page URLs? To understand how these URLs are structured, inspect a product element on the target page:

From this inspection, you can notice how product page URLs follow this format:

https://books.toscrape.com/catalogue/<product_slug>/index.html

To filter out only product pages from the raw crawled URLs, you can use a custom Go script.

First, create a Go module inside your Gospider project directory:

go mod init crawler

Next, create a crawler folder inside the project directory and add a crawler.go file to it. Then, open the project folder in your IDE. Your folder structure should now look like this:

gospider-project/
├── crawler/
│   └── crawler.go
└── output/
    └── books_toscrape_com

The crawler.go script should:

Run the Gospider command from a clean state.
Read all URLs from the output file.
Filter only product page URLs using a regex pattern.
Export the filtered product URLs to a .txt file.

Below is the Go code to accomplish the goal:

package main

import (
        "bufio"
        "fmt"
        "os"
        "os/exec"
        "regexp"
        "slices"
        "path/filepath"
)

func main() {
        // Delete the output folder if it exists to start with a clean run
        outputDir := "output"
        os.RemoveAll(outputDir)

        // Create the Gospider CLI command to crawl the "books.toscrape.com" site
        fmt.Println("Running Gospider...")
        cmd := exec.Command("gospider", "-s", "https://books.toscrape.com/", "-o", outputDir, "-d", "50", "-c", "10")
        cmd.Stdout = os.Stdout
        cmd.Stderr = os.Stderr

        // Run the Gospider command and wait for it to finish
        cmd.Run()
        fmt.Println("Gospider finished")

        // Open the generated output file that contains the crawled URLs
        fmt.Println("\nReading the Gospider output file...")
        inputFile := filepath.Join(outputDir, "books_toscrape_com")
        file, _ := os.Open(inputFile)
        defer file.Close()

        // Extract product page URLs from the file using a regular expression
        // to filter out the URLs that point to individual product pages
        urlRegex := regexp.MustCompile(`(https://books\.toscrape\.com/catalogue/[^/]+/index\.html)`)
        var productURLs []string

        // Read each line of the file and check for matching URLs
        scanner := bufio.NewScanner(file)
        for scanner.Scan() {
                line := scanner.Text()
                // Extract all URLs from the line
                matches := urlRegex.FindAllString(line, -1)
                for _, url := range matches {
                        // Ensure that the URL has not been added already to avoid duplicates
                        if !slices.Contains(productURLs, url) {
                                productURLs = append(productURLs, url)
                        }
                }
        }
        fmt.Printf("%d product page URLs found\n", len(productURLs))

        // Export the product page URLs to a new file
        fmt.Println("\nExporting filtered product page URLs...")
        outputFile := "product_urls.txt"
        out, _ := os.Create(outputFile)
        defer out.Close()

        writer := bufio.NewWriter(out)
        for _, url := range productURLs {
                _, _ = writer.WriteString(url + "\n")
        }
        writer.Flush()
        fmt.Printf("Product page URLs saved to %s\n", outputFile)
}

The Go program automates web crawling by utilizing:

os.RemoveAll() to delete the output directory (output/)—if it already exists—to guarantee a clean start.
exec.Command() and cmd.Run() to construct and execute a Gospider command-line process to crawl the target website.
os.Open() and bufio.NewScanner() to open the output file generated by Gospider (books_toscrape_com) and read it line by line.
regexp.MustCompile() and FindAllString() to use a regex to extract product page URLs from each line—employing slices.Contains() to prevent duplicates.
os.Create() and bufio.NewWriter() to write the filtered product page URLs to a product_urls.txt file. Step #5: Crawling Script Execution Launch the crawler.go script with the following command:

go run crawler/crawler.go

The script will log the following in the terminal:

Running Gospider...
# Gospider output omitted for brevity...
Gospider finished

Reading the Gospider output file...
1000 product page URLs found

Exporting filtered product page URLs...
Product page URLs saved to product_urls.txt

The Gospider crawling script successfully found 1,000 product page URLs. As you can easily verify on the target site, that is exactly the number of product pages available:

Those URLs will be stored in a product_urls.txt file generated in your project folder. Open that file, and you will see:

https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html
# omitted for brevity...
https://books.toscrape.com/catalogue/frankenstein_20/index.html

Congrats! You just built a Gospider script to perform web crawling in Go.

[Extra] Add the Scraping Logic to the Gospider Crawler

Web crawling is generally just one step in a larger web scraping project. Learn more about the difference between these two practices by reading our guide on web crawling vs. web scraping.

To make this tutorial more complete, we will also demonstrate how to use the crawled links for web scraping. The Go scraping script we are about to build will:

Read the product page URLs from the product_urls.txt file, which was generated earlier using Gospider and custom logic.
Visit each product page and scrape product data.
Export the scraped product data to a CSV file.

Time to add web scraping logic to your Gospider setup!

Step #1: Install Colly

The library used for web scraping is Colly, an elegant scraper and crawler framework for Golang. If you are not familiar with its API, check out our tutorial on web scraping with Go.

Run the following command to install Colly:

go get -u github.com/gocolly/colly/...

Next, create a scraper.go file inside the scraper folder within your project directory. Your project structure should now look like this:

gospider-project/
├── crawler/
│   └── crawler.go
├── output/
│   └── books_toscrape_com
└── scraper/
    └── scraper.go

Open scraper.go and import Colly:

import (
        "bufio"
        "encoding/csv"
        "os"
        "strings"

        "github.com/gocolly/colly"
)

Fantastic! Follow the steps below to use Colly for scraping data from the crawled product pages.

Step #2: Read the URLs to Scrape

Use the following code to retrieve the URLs of the product pages to scrape from the filtered_urls.txt file, which was generated by crawler.go:

// Open the input file with the crawled URLs
file, _ := os.Open("product_urls.txt")
defer file.Close()

// Read page URLs from the input file
var urls []string
scanner := bufio.NewScanner(file)
for scanner.Scan() {
        urls = append(urls, scanner.Text())
}

To make the above snippet work, include these imports at the beginning of your file:

import (
        "bufio"
        "os"
)

Great! The urls slice will contain all the product page URLs ready for scraping.

Step #3: Implement the Data Extraction Logic

Before implementing the data extraction logic, you must understand the structure of the product page’s HTML.

To do that, visit a product page in your browser in incognito mode—to ensure a new session. Open DevTools and inspect the page elements, starting with the product image HTML node:

Next, inspect the product information section:

From the inspected elements, you can extract:

The product title from the <h1> tag.
The product price from the first .price_color node on the page.
The product rating (stars) from the .star-rating class.
The product image URL from the #product_gallery img element.

Given these attributes, define the following Go struct to represent the scraped data:

type Product struct {
        Title    string
        Price    string
        Stars    string
        ImageURL string
}

Since multiple product pages will be scraped, define a slice to store the extracted products:

var products []Product

To scrape the data, start by initializing a Colly Collector:

c := colly.NewCollector()

Use the OnHTML() callback in Colly to define the scraping logic:

c.OnHTML("html", func(e *colly.HTMLElement) {
        // Scraping logic
        title := e.ChildText("h1")
        price := e.DOM.Find(".price_color").First().Text()

        stars := ""
        e.ForEach(".star-rating", func(_ int, el *colly.HTMLElement) {
                class := el.Attr("class")
                if strings.Contains(class, "One") {
                        stars = "1"
                } else if strings.Contains(class, "Two") {
                        stars = "2"
                } else if strings.Contains(class, "Three") {
                        stars = "3"
                } else if strings.Contains(class, "Four") {
                        stars = "4"
                } else if strings.Contains(class, "Five") {
                        stars = "5"
                }
        })

        imageURL := e.ChildAttr("#product_gallery img", "src")
        // Adjust the relative image path
        imageURL = strings.Replace(imageURL, "../../", "https://books.toscrape.com/", 1)

        // Create a new product object with the scraped data
        product := Product{
                Title:    title,
                Price:    price,
                Stars:    stars,
                ImageURL: imageURL,
        }

        // Append the product to the products slice
        products = append(products, product)
})

Note else if structure used to get the star rating based on the class attribute of .star-rating. Also, see how the relative image URL is converted to an absolute URL using strings.Replace().

Add the following required import:

import (
        "strings"
)

Now your Go scraper is set up to extract product data as desired!

Step #4: Connect to the Target Pages

Colly is a callback-based web scraping framework with a specific callback lifecycle. That means you can define the scraping logic before retrieving the HTML, which is an unusual but powerful approach.

Now that the data extraction logic is in place, instruct Colly to visit each product page:

pageLimit := 50
for _, url := range urls[:pageLimit] {
        c.Visit(url)
}

Note: The number of URLs has been limited to 50 to avoid overwhelming the target website with too many requests. In a production script, you can remove or adjust this limitation based on your needs.

Colly will now:

Visit each URL in the list.
Apply the OnHTML() callback to extract product data.
Store the extracted data in the products slice.

Amazing! All that is left is to export the scraped data to a human-readable format like CSV.

Step #5: Export the Scraped Data

Export the products slice to a CSV file using the following logic:

outputFile := "products.csv"
csvFile, _ := os.Create(outputFile)
defer csvFile.Close()

// Initialize a new CSV writer
writer := csv.NewWriter(csvFile)
defer writer.Flush()

// Write CSV header
writer.Write([]string{"Title", "Price", "Stars", "Image URL"})

// Write each product's data to the CSV
for _, product := range products {
        writer.Write([]string{product.Title, product.Price, product.Stars, product.ImageURL})
}

The above snippet creates a products.csv file and populates it with the scraped data.

Do not forget to import the CSV package from Go’s standard library:

import (
       "encoding/csv"
)

This is it! Your Gospider crawling and scraping project is now fully implemented.

Step #6: Put It All Together

scraper.go should now contain:

package main

import (
        "bufio"
        "encoding/csv"
        "os"
        "strings"

        "github.com/gocolly/colly"
)

// Define a data type for the data to scrape
type Product struct {
        Title    string
        Price    string
        Stars    string
        ImageURL string
}

func main() {
        // Open the input file with the crawled URLs
        file, _ := os.Open("product_urls.txt")
        defer file.Close()

        // Read page URLs from the input file
        var urls []string
        scanner := bufio.NewScanner(file)
        for scanner.Scan() {
                urls = append(urls, scanner.Text())
        }

        // Create an array where to store the scraped data
        var products []Product

        // Set up Colly collector
        c := colly.NewCollector()

        c.OnHTML("html", func(e *colly.HTMLElement) {
                // Scraping logic
                title := e.ChildText("h1")
                price := e.DOM.Find(".price_color").First().Text()

                stars := ""
                e.ForEach(".star-rating", func(_ int, el *colly.HTMLElement) {
                        class := el.Attr("class")
                        if strings.Contains(class, "One") {
                                stars = "1"
                        } else if strings.Contains(class, "Two") {
                                stars = "2"
                        } else if strings.Contains(class, "Three") {
                                stars = "3"
                        } else if strings.Contains(class, "Four") {
                                stars = "4"
                        } else if strings.Contains(class, "Five") {
                                stars = "5"
                        }
                })

                imageURL := e.ChildAttr("#product_gallery img", "src")
                // Adjust the relative image path
                imageURL = strings.Replace(imageURL, "../../", "https://books.toscrape.com/", 1)

                // Create a new product object with the scraped data
                product := Product{
                        Title:    title,
                        Price:    price,
                        Stars:    stars,
                        ImageURL: imageURL,
                }

                // Append the product to the products slice
                products = append(products, product)
        })

        // Iterate over the first 50 URLs to scrape them all
        pageLimit := 50 // To avoid overwhelming the target server with too many requests
        for _, url := range urls[:pageLimit] {
                c.Visit(url)
        }

        // Export the scraped products to CSV
        outputFile := "products.csv"
        csvFile, _ := os.Create(outputFile)
        defer csvFile.Close()

        // Initialize a new CSV writer
        writer := csv.NewWriter(csvFile)
        defer writer.Flush()

        // Write CSV header
        writer.Write([]string{"Title", "Price", "Stars", "Image URL"})

        // Write each product's data to the CSV
        for _, product := range products {
                writer.Write([]string{product.Title, product.Price, product.Stars, product.ImageURL})
        }
}

Launch the scraper with the command below:

go run scraper/scraper.go

The execution may take some time, so be patient. Once it completes, a products.csv file will appear in the project folder. Open it, and you will see the scraped data neatly structured in a tabular format:

Et voilà! Gospider for crawling + Colly for scraping is a winning duo.

Limitations of Gospider’s Approach to Web Crawling

The biggest limitations of Gospider’s crawling approach are:

IP bans due to making too many requests.
Anti-crawling technologies used by websites to block crawling bots.

Let’s see how to tackle both!

Avoid IP Bans

The consequence of too many requests from the same machine is that your IP address may get banned by the target server. This is a common issue in web crawling, especially when it is not well-configured or ethically planned.

By default, Gospider respects robots.txt to minimize this risk. However, not all websites have a robots.txt file. Also, even when they do, it might not specify valid rate-limiting rules for crawlers.

To limit IP bans, you could try using Gospider’s built-in --delay, --random-delay, --timeout flags to slow down requests. Still, finding the right combination can be time-consuming and may not always be effective.

A more effective solution is to use a rotating proxy, which guarantees that each request from Gospider will originate from a different IP address. That prevents the target site from detecting and blocking your crawling attempts.

To use a rotating proxy with Gospider, pass the proxy URL with the -p (or --proxy) flag:

gospider -s "https://example.com" -o output -p "<PROXY_URL>"

If you do not have a rotating proxy URL, retrieve one for free!

Bypass Anti-Crawling Tech

Even with a rotating proxy, some websites implement strict anti-scraping and anti-crawling measures. For example, running this Gospider command against a Cloudflare-protected website:

gospider -s "https://community.cloudflare.com/" -o output

The result will be:

[url] - [code-403] - https://community.cloudflare.com/

As you can see, the target server responded with a 403 Forbidden response. This means the server successfully detected and blocked Gospider’s request, preventing it from crawling any URLs on the page.

To avoid such blocks, you need an all-in-one web unlocking API. That service can bypass anti-bot and anti-scraping systems, giving you access to the unblocked HTML of any webpage.

Note: Bright Data’s Web Unlocker not only handles these challenges but can also operate as a proxy. So, once configured, you can use it just like a regular proxy with Gospider using the syntax shown earlier.

Conclusion

In this blog post, you learned what Gospider is, what it offers, and how to use it for web crawling in Go. You also saw how to combine it with Colly for a complete crawling and scraping tutorial.

One of the biggest challenges in web scraping is the risk of being blocked—whether due to IP bans or anti-scraping solutions. The best ways to overcome these challenges are using web proxies or a scraping API like Web Unlocker.

Integration with Gospider is just one of many scenarios that Bright Data’s products and services support. Explore our other web scraping tools:

Web Scraper APIs: Dedicated endpoints for extracting fresh, structured web data from over 100 popular domains.
SERP API: API to handle all ongoing unlocking management for SERP and extract one page.
Scraping Functions: A complete scraping interface that allows you to run your scrapers as serverless functions.
Scraping Browser: Puppeteer, Selenium, and Playwright-compatible browser with built-in unlocking activities

Start free trial

Start free with Google

No credit card required

How to Perform Web Crawling with Gospider in 2025