Table of Contents
Introduction
In this post, we will delve into the world of Python3 programming. Starting with the classic "Hello, World!" example, we will progress to fetching a web page from the internet, and subsequently, cleaning the fetched data using Beautiful Soup 4. Lastly, we will save our cleaned data to a local file.
Requirements
- Python3 installed on your system. You can download it from the Official Python Website
- A text editor or IDE of your choice
- An active internet connection
Install and Setup
- Install Beautiful Soup 4 by running the following command in your terminal or command prompt:
pip install beautifulsoup4
- Install the Requests library to handle HTTP requests:
pip install requests
Hello, World!
- Summary of use case: Writing a simple Python program to print "Hello, World!"
- Benefits of use case: Understanding basic Python syntax and running a Python script
print("Hello, World!")
- Run the code and observe the output: "Hello, World!"
Fetching a Web Page
- Summary of use case: Fetching a web page's HTML content using the Requests library
- Benefits of use case: Learning how to use Python to fetch content from the internet
- Additional requirements: Requests library
import requests
url = "https://www.example.com"
response = requests.get(url)
print(response.text)
- Run the code and observe the output: The HTML content of the web page
Web Scraping with Beautiful Soup
- Summary of use case: Cleaning and parsing the fetched HTML content using Beautiful Soup
- Benefits of use case: Learning how to extract relevant information from fetched HTML content
- Additional requirements: Beautiful Soup 4
from bs4 import BeautifulSoup
soup = BeautifulSoup(response.text, "html.parser")
print(soup.prettify())
- Run the code after fetching the web page and observe the output: Cleaned and indented HTML content
Saving Cleaned Data to a Local File
- Summary of use case: Saving the cleaned HTML content to a local file
- Benefits of use case: Learning how to write data to a file using Python
with open("output.html", "w", encoding="utf-8") as file:
file.write(soup.prettify())
- Run the code after cleaning the HTML content and observe the output: A new file named
output.html
containing the cleaned HTML content
Summary
In this post, we have covered the basics of Python3 programming, starting with the simple "Hello, World!" example. We have learned how to fetch a web page from the internet and clean its HTML content using Beautiful Soup 4. Lastly, we have saved our cleaned data to a local file. We hope you enjoyed this adventure in Python programming! If you found this post helpful, please support us by liking and subscribing to our YouTube channel listed below.