Scraping Google Search Results and Saving Links as CSV

Syed Asif
09 Aug 2023

In today's world, finding information is really easy with the internet. Web scraping is like a superpower that lets you collect a lot of information quickly. For example, you can gather links from Google search results and put them in a nice list. In this guide, we'll learn how to do that. It's like turning Google results into a helpful list that you can use.

We'll Python for this task. It has special tools, like requests, BeautifulSoup, and csv that work together. These tools help us explore the internet and turn information into something useful and understandable.

In web scraping, each library has an important job. Requests gets information from the web. BeautifulSoup arranges this information so we can easily see it. The csv library helps us keep this information in a tidy CSV file. With the help of Python and these libraries, we can explore the web and gather data automatically.

The Framework for Web Scraping

Before we dive into code, let's outline our approach:

URL Formation: Google search URLs follow a specific pattern. For instance, a search for "web scraping" looks like: https://www.google.com/search?q=web+scraping

We use a "User-Agent header" to act like a normal web browser when getting information from a website. This helps us avoid looking like a robot and keeps our request from getting blocked.

Extracting Links: We'll use the BeautifulSoup library to help us find and collect links from the search results on a web page. This way, we can gather the website addresses of the search results that matter to us.
CSV Creation: After we've taken out the links, we'll keep them in a CSV file. This special file will hold all the website addresses we found, so we can quickly find and organize them whenever we need.

Now that you're familiar with the key steps involved, let's dive into the code implementation and see how each step translates into practical Python code.

Installing the Required Libraries

Before we jump into the code, let's make sure we have the necessary tools installed:

requests: To send HTTP requests and retrieve web pages. Now install the requests library, open your terminal or command prompt and execute the following command:
```
pip install requests
```
BeautifulSoup: To parse and navigate HTML content. execute the following command in terminal for installing beautifulsoup4 library:
```
pip install beautifulsoup4
```
csv: To work with CSV files. Install csv library, Run the command in terminal for it's installation:
```
pip install csv
```

The Code Unveiled

Let's begin with the code that scrapes Google search results and saves links as a CSV:

from bs4 import BeautifulSoup
import csv

search_query = "web scraping"
num_pages = 3  # The number of pages to scrape
results_per_page = 10

csv_filename = "google_search_links.csv"

def get_google_links(query, page):
    url = f"https://www.google.com/search?q={query}&start={results_per_page * (page - 1)}"
    headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3"}
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.content, "html.parser")

    links = []
    for result in soup.find_all("div", class_="tF2Cxc"):
        link = result.find("a").get("href")
        links.append(link)

    return links

with open(csv_filename, "w", newline="") as csv_file:
    csv_writer = csv.writer(csv_file)
    csv_writer.writerow(["Link"])

    for page in range(1, num_pages + 1):
        links = get_google_links(search_query, page)
        csv_writer.writerows([[link] for link in links])

Running the Script

Follow these steps to run the script and execute the web scraping process:

Create a .py File: Open your preferred text editor (I personally recommend VS Code), and create a new file named something like google_search_scrape.py.
Copy and paste the code Open the .py file you created. Copy and paste the code which is provided. Modify the following variables as needed:
search_query: Update this variable with the desired search query (e.g., "web scraping").
num_pages: Set the number of pages you want to scrape (e.g., 3).
csv_filename: Specify the desired name for the CSV file (e.g., "google_search_links.csv").
Run the Script: In your terminal or command prompt, navigate to the directory where the .py file is located and execute the following command:

Conclusion

And there you go! You've achieved it! With Python as your trusty programming tool and the help of requests and BeautifulSoup, you've mastered the art of collecting Google search results and saving them as a list. This fresh skill lets you easily gather valuable information. Just a pinch of code can help you discover significant things. But always remember, it's vital to use this newfound power responsibly and abide by Google's guidelines.

Happy scraping and happy linking!

PrevNext

Scraping Google Search Results and Saving Links as CSV

The Framework for Web Scraping

Installing the Required Libraries

The Code Unveiled

Running the Script

Conclusion

Add Your Insights

Related Posts

Scraping Google Search Results and Saving Links as CSV

DIY Web Scraping Automation with Python and Selenium