Introduction

In this post, we will delve into the world of Python3 programming. Starting with the classic "Hello, World!" example, we will progress to fetching a web page from the internet, and subsequently, cleaning the fetched data using Beautiful Soup 4. Lastly, we will save our cleaned data to a local file.

Requirements

  • Python3 installed on your system. You can download it from the Official Python Website
  • A text editor or IDE of your choice
  • An active internet connection

Install and Setup

  1. Install Beautiful Soup 4 by running the following command in your terminal or command prompt:
pip install beautifulsoup4
  1. Install the Requests library to handle HTTP requests:
pip install requests

Hello, World!

  • Summary of use case: Writing a simple Python program to print "Hello, World!"
  • Benefits of use case: Understanding basic Python syntax and running a Python script
print("Hello, World!")
  • Run the code and observe the output: "Hello, World!"

Fetching a Web Page

  • Summary of use case: Fetching a web page's HTML content using the Requests library
  • Benefits of use case: Learning how to use Python to fetch content from the internet
  • Additional requirements: Requests library
import requests

url = "https://www.example.com"
response = requests.get(url)

print(response.text)
  • Run the code and observe the output: The HTML content of the web page

Web Scraping with Beautiful Soup

  • Summary of use case: Cleaning and parsing the fetched HTML content using Beautiful Soup
  • Benefits of use case: Learning how to extract relevant information from fetched HTML content
  • Additional requirements: Beautiful Soup 4
from bs4 import BeautifulSoup

soup = BeautifulSoup(response.text, "html.parser")

print(soup.prettify())
  • Run the code after fetching the web page and observe the output: Cleaned and indented HTML content

Saving Cleaned Data to a Local File

  • Summary of use case: Saving the cleaned HTML content to a local file
  • Benefits of use case: Learning how to write data to a file using Python
with open("output.html", "w", encoding="utf-8") as file:
    file.write(soup.prettify())
  • Run the code after cleaning the HTML content and observe the output: A new file named output.html containing the cleaned HTML content

Summary

In this post, we have covered the basics of Python3 programming, starting with the simple "Hello, World!" example. We have learned how to fetch a web page from the internet and clean its HTML content using Beautiful Soup 4. Lastly, we have saved our cleaned data to a local file. We hope you enjoyed this adventure in Python programming! If you found this post helpful, please support us by liking and subscribing to our YouTube channel listed below.


Published

Last Updated

Category

Tutorials

Tags

Contact