Skip to main content
  1. Articles/

Exploring Numpy and Pandas in Python 3.9: Installation, Benefits, Data Import, Manipulation, and Export

·767 words·4 mins
article guide python numpy pandas etl data
Wing Tang Wong
Author
Wing Tang Wong
SRE/DevOps/Platform Engineer/Software Engineer
Table of Contents

Introduction #

Numpy and Pandas are fundamental libraries for data manipulation and analysis in Python. Numpy provides support for large, multi-dimensional arrays and mathematical functions, while Pandas excels at handling structured data. In this article, we’ll explore both Numpy and Pandas in Python 3.9, covering their installations, caveats, and the benefits they offer. We’ll delve into importing data from various file formats such as CSV, Excel, and tab-delimited files using both libraries. Additionally, we’ll learn how to generate useful metrics, manipulate and massage the data using Numpy and Pandas, and finally export the data in different formats leveraging the power of both modules.

Installation of Numpy and Pandas on Python 3.9 #

To install Numpy and Pandas for Python 3.9, you can use pip, the package installer for Python:

pip install numpy pandas

Benefits of Numpy and Pandas #

Numpy: #

  1. Efficiency: Numpy arrays are more memory-efficient and faster for numerical computations compared to standard Python lists.

  2. Multidimensional Arrays: Numpy allows you to work with multi-dimensional arrays, enabling efficient handling of large datasets.

  3. Broadcasting: Numpy supports broadcasting, a powerful feature that simplifies array operations.

  4. Mathematical Functions: Numpy comes with a wide range of mathematical functions for various numerical operations.

Pandas: #

  1. Data Structures: Pandas provides two primary data structures - Series and DataFrame - that are ideal for handling structured data.

  2. Data Alignment: Pandas aligns data automatically based on labels, making it easy to perform operations on datasets with missing or misaligned data.

  3. Data Wrangling: Pandas offers powerful tools for data wrangling, including filtering, transforming, and aggregating data.

Importing Data from Files into Numpy and Pandas #

Importing CSV Files #

Numpy:

import numpy as np

data_np = np.genfromtxt('data.csv', delimiter=',', skip_header=1)

Pandas:

import pandas as pd

data_pd = pd.read_csv('data.csv')

Importing Excel Files #

Numpy:

Numpy does not have direct support for reading Excel files. You can use Pandas to read the data and then convert it to a Numpy array:

import pandas as pd
import numpy as np

data_pd = pd.read_excel('data.xlsx')
data_np = data_pd.to_numpy()

Importing Tab-Delimited Files #

Numpy:

import numpy as np

data_np = np.genfromtxt('data.txt', delimiter='\t', skip_header=1)

Pandas:

import pandas as pd

data_pd = pd.read_csv('data.txt', delimiter='\t')

Generating Useful Metrics with Numpy and Pandas #

Numpy:

import numpy as np

data_np = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Mean
mean_np = np.mean(data_np)

# Standard Deviation
std_dev_np = np.std(data_np)

# Sum along rows or columns
sum_rows_np = np.sum(data_np, axis=1)
sum_cols_np = np.sum(data_np, axis=0)

Pandas:

import pandas as pd

data_pd = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Mean
mean_pd = data_pd.mean().values

# Standard Deviation
std_dev_pd = data_pd.std().values

# Sum along rows or columns
sum_rows_pd = data_pd.sum(axis=1).values
sum_cols_pd = data_pd.sum().values

Data Manipulation and Massaging with Numpy and Pandas #

Numpy:

import numpy as np

data_np = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Transpose
transposed_data_np = np.transpose(data_np)

# Reshape
reshaped_data_np = data_np.reshape((1, 9))

# Slicing
subset_np = data_np[0:2, 1:3]

# Element-wise operations
doubled_data_np = data_np * 2

Pandas:

import pandas as pd

data_pd = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Transpose
transposed_data_pd = data_pd.transpose()

# Reshape - Not applicable to DataFrame

# Slicing
subset_pd = data_pd.iloc[0:2, 1:3]

# Element-wise operations
doubled_data_pd = data_pd * 2

Exporting Data in Various Formats using Numpy and Pandas #

Exporting to CSV #

Numpy:

import numpy as np

data_np = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

np.savetxt('output_np.csv', data_np, delimiter=',')

Pandas:

import pandas as pd

data_pd = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

data_pd.to_csv('output_pd.csv', index=False)

Exporting to Excel #

Numpy:

Numpy does not have direct support for writing to Excel files. You can use Pandas to write the data:

import pandas as pd
import numpy as np

data_np = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

data_pd = pd.DataFrame(data_np)
data_pd.to_excel('output.xlsx', index=False)

Exporting to Tab-Delimited Text File #

Numpy:

import numpy as np

data_np = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

np.savetxt('output.txt', data_np, delimiter='\t')

Pandas:

import pandas as pd

data_pd = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

data_pd.to_csv('output.txt', sep='\t', index=False)

Conclusion #

Both Numpy and Pandas are powerful libraries for data manipulation and analysis in Python. Numpy excels at numerical computations with multi-dimensional arrays, while Pandas is ideal for handling structured data. By exploring their installation, data import, metrics generation, data manipulation, and export capabilities, you’ll be well-equipped to work on a wide range of data-centric projects. Whether you’re performing scientific computing, data analysis, or data cleaning, these libraries will streamline your workflow and empower you with powerful tools for data exploration and manipulation. Happy data crunching!

Related

Python Installation and Virtual Environments: A Quick Guide
·566 words·3 mins
article guide python venv anaconda conda
Introduction # Python has been a staple programming language for Linux administrators and developers alike.
Installing Python 3.9 and Managing Multiple Versions on Mac OS X and Linux
·577 words·3 mins
articles guide python linux mac osx setup
Introduction # Python is a versatile programming language, and it’s essential to have the latest version installed on your system for leveraging the newest features and improvements.
Strong Winds In Bay Area Knock Down Powerlines!
·155 words·1 min
news news weather failures
Strong Winds In California March 14th, 2023 # On March 14th, 2023, as I sat in my car at the “Jack in the Box” fast-food drive-through, little did I know I would witness a rather unexpected event.