Table of Contents

NumPy (Numerical Python) is the fundamental package for scientific computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a comprehensive collection of high-level mathematical functions to operate on these arrays efficiently.

Overview

NumPy is the foundation of the Python scientific computing ecosystem. It offers:

  • N-dimensional array object (ndarray): Fast and versatile array structure
  • Broadcasting functions: Perform operations on arrays of different shapes
  • Linear algebra operations: Matrix operations, decompositions, eigenvalues
  • Random number generation: Comprehensive suite of random sampling functions
  • Integration with C/C++/Fortran: Ability to wrap compiled code
  • Memory efficiency: Optimized data storage and computation

NumPy arrays are faster and more memory-efficient than Python lists, making them ideal for numerical computations and data analysis tasks.

Installation

Install NumPy using pip:

pip install numpy

For scientific computing environments, consider installing via Anaconda:

conda install numpy

Verify installation:

import numpy as np
print(np.__version__)

Array Creation

Creating Basic Arrays

NumPy provides multiple ways to create arrays:

import numpy as np

# From Python list
arr1 = np.array([1, 2, 3, 4, 5])
print(arr1)  # [1 2 3 4 5]

# 2D array from nested lists
arr2 = np.array([[1, 2, 3], [4, 5, 6]])
print(arr2)
# [[1 2 3]
#  [4 5 6]]

# Specify data type
arr3 = np.array([1, 2, 3], dtype=np.float64)
print(arr3)  # [1. 2. 3.]

# Complex numbers
arr4 = np.array([1+2j, 3+4j])
print(arr4)  # [1.+2.j 3.+4.j]

Array Initialization Functions

# Array of zeros
zeros = np.zeros((3, 4))  # 3x4 array of zeros

# Array of ones
ones = np.ones((2, 3, 4))  # 3D array of ones

# Empty array (uninitialized)
empty = np.empty((2, 2))

# Array with constant value
full = np.full((3, 3), 7)  # 3x3 array filled with 7

# Identity matrix
identity = np.eye(4)  # 4x4 identity matrix

# Array from range
arange_arr = np.arange(0, 10, 2)  # [0 2 4 6 8]

# Evenly spaced values
linspace_arr = np.linspace(0, 1, 5)  # [0. 0.25 0.5 0.75 1.]

# Logarithmically spaced values
logspace_arr = np.logspace(0, 2, 5)  # [1. 3.16 10. 31.62 100.]

Creating Arrays Like Existing Arrays

x = np.array([[1, 2], [3, 4]])

# Create zeros with same shape
zeros_like = np.zeros_like(x)

# Create ones with same shape
ones_like = np.ones_like(x)

# Create empty with same shape
empty_like = np.empty_like(x)

Array Properties and Attributes

Understanding array properties is essential for effective NumPy usage:

arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])

# Shape - dimensions of array
print(arr.shape)  # (2, 4)

# Number of dimensions
print(arr.ndim)  # 2

# Total number of elements
print(arr.size)  # 8

# Data type of elements
print(arr.dtype)  # int64

# Size of each element in bytes
print(arr.itemsize)  # 8

# Total bytes consumed
print(arr.nbytes)  # 64

# Transpose
print(arr.T)
# [[1 5]
#  [2 6]
#  [3 7]
#  [4 8]]

Array Indexing and Slicing

Basic Indexing

arr = np.array([10, 20, 30, 40, 50])

# Single element access
print(arr[0])  # 10
print(arr[-1])  # 50 (last element)

# Slicing [start:stop:step]
print(arr[1:4])  # [20 30 40]
print(arr[::2])  # [10 30 50] (every other element)
print(arr[::-1])  # [50 40 30 20 10] (reverse)

Multi-dimensional Indexing

arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Access element at row 1, column 2
print(arr2d[1, 2])  # 6

# Row slicing
print(arr2d[0, :])  # [1 2 3] (first row)

# Column slicing
print(arr2d[:, 1])  # [2 5 8] (second column)

# Subarray
print(arr2d[0:2, 1:3])
# [[2 3]
#  [5 6]]

Boolean Indexing

arr = np.array([1, 2, 3, 4, 5, 6])

# Boolean mask
mask = arr > 3
print(mask)  # [False False False True True True]

# Filter using mask
filtered = arr[mask]
print(filtered)  # [4 5 6]

# Inline filtering
print(arr[arr % 2 == 0])  # [2 4 6] (even numbers)

# Modify elements using boolean indexing
arr[arr > 3] = 0
print(arr)  # [1 2 3 0 0 0]

Fancy Indexing

arr = np.array([10, 20, 30, 40, 50])

# Index with array of integers
indices = np.array([0, 2, 4])
print(arr[indices])  # [10 30 50]

# 2D fancy indexing
arr2d = np.arange(12).reshape(3, 4)
rows = np.array([0, 2])
cols = np.array([1, 3])
print(arr2d[rows, cols])  # [1 11]

Array Operations

Arithmetic Operations

a = np.array([1, 2, 3, 4])
b = np.array([10, 20, 30, 40])

# Element-wise operations
print(a + b)  # [11 22 33 44]
print(a - b)  # [-9 -18 -27 -36]
print(a * b)  # [10 40 90 160]
print(a / b)  # [0.1 0.1 0.1 0.1]
print(a ** 2)  # [1 4 9 16]

# Operations with scalars (broadcasting)
print(a + 10)  # [11 12 13 14]
print(a * 2)  # [2 4 6 8]

Comparison Operations

a = np.array([1, 2, 3, 4, 5])

print(a > 3)  # [False False False True True]
print(a == 3)  # [False False True False False]
print(a <= 2)  # [True True False False False]

# Compare arrays
b = np.array([5, 4, 3, 2, 1])
print(a < b)  # [True True False False False]

Universal Functions (ufuncs)

arr = np.array([1, 4, 9, 16, 25])

# Mathematical functions
print(np.sqrt(arr))  # [1. 2. 3. 4. 5.]
print(np.exp(arr))  # Exponential
print(np.log(arr))  # Natural logarithm
print(np.log10(arr))  # Base-10 logarithm

# Trigonometric functions
angles = np.array([0, np.pi/2, np.pi])
print(np.sin(angles))  # [0. 1. 0.]
print(np.cos(angles))  # [1. 0. -1.]
print(np.tan(angles))  # Tangent values

# Rounding functions
arr_float = np.array([1.23, 4.56, 7.89])
print(np.round(arr_float))  # [1. 5. 8.]
print(np.floor(arr_float))  # [1. 4. 7.]
print(np.ceil(arr_float))  # [2. 5. 8.]

Broadcasting

Broadcasting allows NumPy to perform operations on arrays of different shapes:

# 1D array + scalar
a = np.array([1, 2, 3])
print(a + 5)  # [6 7 8]

# 2D array + 1D array
matrix = np.array([[1, 2, 3], [4, 5, 6]])
vector = np.array([10, 20, 30])
print(matrix + vector)
# [[11 22 33]
#  [14 25 36]]

# 2D array + column vector
matrix = np.array([[1, 2, 3], [4, 5, 6]])
col_vector = np.array([[10], [20]])
print(matrix + col_vector)
# [[11 12 13]
#  [24 25 26]]

# Broadcasting rules example
a = np.arange(3).reshape(3, 1)  # Shape (3, 1)
b = np.arange(3)  # Shape (3,)
print(a + b)
# [[0 1 2]
#  [1 2 3]
#  [2 3 4]]

Broadcasting Rules

  1. If arrays don't have the same rank, prepend shape of lower rank array with 1s
  2. Arrays are compatible when dimensions are equal or one of them is 1
  3. After broadcasting, each array behaves as if it had shape equal to element-wise maximum

Array Manipulation

Reshaping

arr = np.arange(12)

# Reshape to 2D
reshaped = arr.reshape(3, 4)
print(reshaped)
# [[ 0  1  2  3]
#  [ 4  5  6  7]
#  [ 8  9 10 11]]

# Reshape to 3D
reshaped_3d = arr.reshape(2, 3, 2)

# Flatten array
flattened = reshaped.flatten()  # Copy
raveled = reshaped.ravel()  # View (no copy)

# Auto-calculate dimension
auto_reshape = arr.reshape(3, -1)  # -1 means "figure it out"

Transposing and Swapping Axes

arr = np.arange(12).reshape(3, 4)

# Transpose
transposed = arr.T
print(transposed.shape)  # (4, 3)

# Swap axes
arr3d = np.arange(24).reshape(2, 3, 4)
swapped = arr3d.swapaxes(1, 2)  # Swap axes 1 and 2
print(swapped.shape)  # (2, 4, 3)

# Transpose with axes specification
transposed_3d = arr3d.transpose(2, 1, 0)
print(transposed_3d.shape)  # (4, 3, 2)

Stacking and Splitting

a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])

# Vertical stack (row-wise)
v_stack = np.vstack((a, b))
# [[1 2]
#  [3 4]
#  [5 6]
#  [7 8]]

# Horizontal stack (column-wise)
h_stack = np.hstack((a, b))
# [[1 2 5 6]
#  [3 4 7 8]]

# Concatenate along axis
concat = np.concatenate((a, b), axis=0)  # Same as vstack

# Split array
arr = np.arange(9)
split1, split2, split3 = np.split(arr, 3)
print(split1)  # [0 1 2]

# Split 2D array
arr2d = np.arange(16).reshape(4, 4)
upper, lower = np.vsplit(arr2d, 2)
left, right = np.hsplit(arr2d, 2)

Adding and Removing Elements

arr = np.array([1, 2, 3, 4, 5])

# Append
appended = np.append(arr, [6, 7])
print(appended)  # [1 2 3 4 5 6 7]

# Insert
inserted = np.insert(arr, 2, 99)
print(inserted)  # [1 2 99 3 4 5]

# Delete
deleted = np.delete(arr, [1, 3])
print(deleted)  # [1 3 5]

# Unique values
arr_dup = np.array([1, 2, 2, 3, 3, 3, 4])
unique = np.unique(arr_dup)
print(unique)  # [1 2 3 4]

# Unique with counts
unique, counts = np.unique(arr_dup, return_counts=True)
print(counts)  # [1 2 3 1]

Aggregation Functions

Statistical Functions

arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Sum
print(np.sum(arr))  # 45
print(arr.sum(axis=0))  # [12 15 18] (column sums)
print(arr.sum(axis=1))  # [6 15 24] (row sums)

# Mean
print(np.mean(arr))  # 5.0
print(arr.mean(axis=0))  # [4. 5. 6.]

# Median
print(np.median(arr))  # 5.0

# Standard deviation and variance
print(np.std(arr))  # Standard deviation
print(np.var(arr))  # Variance

# Min and max
print(np.min(arr))  # 1
print(np.max(arr))  # 9
print(arr.min(axis=1))  # [1 4 7]

# Argmin and argmax (indices)
print(np.argmin(arr))  # 0 (flattened index)
print(np.argmax(arr, axis=0))  # [2 2 2]

# Cumulative sum and product
print(np.cumsum([1, 2, 3, 4]))  # [1 3 6 10]
print(np.cumprod([1, 2, 3, 4]))  # [1 2 6 24]

# Percentiles
print(np.percentile(arr, 50))  # 5.0 (median)
print(np.percentile(arr, [25, 50, 75]))  # [3. 5. 7.]

Logical Operations

a = np.array([True, False, True, False])
b = np.array([True, True, False, False])

# Logical AND, OR, NOT, XOR
print(np.logical_and(a, b))  # [True False False False]
print(np.logical_or(a, b))  # [True True True False]
print(np.logical_not(a))  # [False True False True]
print(np.logical_xor(a, b))  # [False True True False]

# Any and all
arr = np.array([1, 2, 0, 4])
print(np.any(arr > 3))  # True
print(np.all(arr > 0))  # False

Linear Algebra

NumPy provides comprehensive linear algebra operations through numpy.linalg:

Matrix Operations

A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

# Matrix multiplication
C = np.dot(A, B)
# or
C = A @ B
print(C)
# [[19 22]
#  [43 50]]

# Element-wise multiplication
element_wise = A * B
# [[5 12]
#  [21 32]]

# Matrix power
print(np.linalg.matrix_power(A, 3))  # A^3

# Transpose
print(A.T)

# Trace (sum of diagonal)
print(np.trace(A))  # 5

# Diagonal elements
print(np.diag(A))  # [1 4]

# Create diagonal matrix
print(np.diag([1, 2, 3]))
# [[1 0 0]
#  [0 2 0]
#  [0 0 3]]

Matrix Decompositions

A = np.array([[1, 2], [3, 4], [5, 6]])

# Singular Value Decomposition (SVD)
U, s, Vt = np.linalg.svd(A)
print(f"U shape: {U.shape}, s shape: {s.shape}, Vt shape: {Vt.shape}")

# QR Decomposition
Q, R = np.linalg.qr(A)

# Eigenvalues and eigenvectors
A_square = np.array([[1, 2], [2, 1]])
eigenvalues, eigenvectors = np.linalg.eig(A_square)
print(f"Eigenvalues: {eigenvalues}")
print(f"Eigenvectors:\n{eigenvectors}")

# Cholesky decomposition (for positive definite matrices)
pos_def = np.array([[4, 2], [2, 3]])
L = np.linalg.cholesky(pos_def)

Matrix Properties

A = np.array([[1, 2], [3, 4]])

# Determinant
det = np.linalg.det(A)
print(f"Determinant: {det}")  # -2.0

# Inverse
try:
    A_inv = np.linalg.inv(A)
    print(f"Inverse:\n{A_inv}")
    
    # Verify: A @ A_inv should equal identity
    print(np.allclose(A @ A_inv, np.eye(2)))  # True
except np.linalg.LinAlgError:
    print("Matrix is singular")

# Rank
rank = np.linalg.matrix_rank(A)
print(f"Rank: {rank}")

# Norm
norm_fro = np.linalg.norm(A, 'fro')  # Frobenius norm
norm_2 = np.linalg.norm(A, 2)  # Spectral norm
print(f"Frobenius norm: {norm_fro}")

Solving Linear Systems

# Solve Ax = b
A = np.array([[3, 1], [1, 2]])
b = np.array([9, 8])

x = np.linalg.solve(A, b)
print(f"Solution: {x}")  # [2. 3.]

# Verify solution
print(np.allclose(A @ x, b))  # True

# Least squares solution (overdetermined system)
A = np.array([[1, 0], [1, 1], [1, 2]])
b = np.array([0, 1, 2])

x, residuals, rank, s = np.linalg.lstsq(A, b, rcond=None)
print(f"Least squares solution: {x}")

Random Number Generation

NumPy provides powerful random number generation capabilities:

Basic Random Generation

# Legacy random (still widely used)
np.random.seed(42)  # For reproducibility

# Random floats in [0, 1)
random_floats = np.random.random(5)
print(random_floats)

# Random integers
random_ints = np.random.randint(0, 10, size=5)
print(random_ints)

# Random choice from array
choices = np.random.choice(['a', 'b', 'c'], size=10)
print(choices)

# Random permutation
arr = np.arange(10)
np.random.shuffle(arr)  # In-place
print(arr)

permuted = np.random.permutation(10)  # Returns new array
print(permuted)
# Create a random generator
rng = np.random.default_rng(seed=42)

# Random floats
random_floats = rng.random(5)

# Random integers
random_ints = rng.integers(0, 10, size=5)

# Random choice
choices = rng.choice(['a', 'b', 'c'], size=10)

# Random permutation
permuted = rng.permutation(10)

Probability Distributions

rng = np.random.default_rng(seed=42)

# Normal (Gaussian) distribution
normal = rng.normal(loc=0, scale=1, size=1000)  # mean=0, std=1

# Uniform distribution
uniform = rng.uniform(low=0, high=1, size=1000)

# Binomial distribution
binomial = rng.binomial(n=10, p=0.5, size=1000)

# Poisson distribution
poisson = rng.poisson(lam=5, size=1000)

# Exponential distribution
exponential = rng.exponential(scale=1.0, size=1000)

# Beta distribution
beta = rng.beta(a=2, b=5, size=1000)

# Gamma distribution
gamma = rng.gamma(shape=2, scale=1, size=1000)

# Chi-square distribution
chisquare = rng.chisquare(df=2, size=1000)

# Multivariate normal
mean = [0, 0]
cov = [[1, 0.5], [0.5, 1]]
multivariate = rng.multivariate_normal(mean, cov, size=100)

Advanced Topics

Memory Views and Copies

# View vs. copy
original = np.arange(10)

# View (shares memory)
view = original[::2]
view[0] = 999
print(original)  # [999 1 2 3 4 5 6 7 8 9] - original changed!

# Copy (independent)
original = np.arange(10)
copy = original[::2].copy()
copy[0] = 999
print(original)  # [0 1 2 3 4 5 6 7 8 9] - original unchanged

# Check if array owns its data
print(original.flags['OWNDATA'])  # True
print(view.flags['OWNDATA'])  # False

Structured Arrays

# Define structured data type
dt = np.dtype([('name', 'U10'), ('age', 'i4'), ('weight', 'f8')])

# Create structured array
people = np.array([
    ('Alice', 25, 55.5),
    ('Bob', 30, 75.0),
    ('Charlie', 35, 80.2)
], dtype=dt)

# Access fields
print(people['name'])  # ['Alice' 'Bob' 'Charlie']
print(people['age'])  # [25 30 35]

# Access individual record
print(people[0])  # ('Alice', 25, 55.5)
print(people[0]['name'])  # Alice

Masked Arrays

# Create masked array with invalid values
data = np.array([1, 2, -999, 4, 5, -999])
masked = np.ma.masked_equal(data, -999)

print(masked)  # [1 2 -- 4 5 --]
print(masked.mean())  # 3.0 (ignores masked values)

# Create mask manually
mask = np.array([False, False, True, False, False, True])
masked2 = np.ma.array(data, mask=mask)

Advanced Indexing with np.where

arr = np.array([1, 2, 3, 4, 5, 6])

# Find indices where condition is true
indices = np.where(arr > 3)
print(indices)  # (array([3, 4, 5]),)

# Conditional replacement
result = np.where(arr > 3, arr * 2, arr)
print(result)  # [1 2 3 8 10 12]

# Multiple conditions
result = np.where((arr > 2) & (arr < 5), arr * 2, arr)
print(result)  # [1 2 6 8 5 6]

Performance Optimization

# Use vectorized operations instead of loops
import time

# Slow: Python loop
n = 1000000
arr = np.arange(n)
start = time.time()
result = []
for x in arr:
    result.append(x ** 2)
loop_time = time.time() - start

# Fast: NumPy vectorization
start = time.time()
result = arr ** 2
vectorized_time = time.time() - start

print(f"Loop: {loop_time:.4f}s, Vectorized: {vectorized_time:.4f}s")
print(f"Speedup: {loop_time / vectorized_time:.1f}x")

# Use in-place operations when possible
arr = np.arange(1000000)
arr += 1  # In-place (faster)
# vs
arr = arr + 1  # Creates new array (slower)

# Use appropriate data types
arr_float64 = np.arange(1000000, dtype=np.float64)  # 8 bytes per element
arr_float32 = np.arange(1000000, dtype=np.float32)  # 4 bytes per element
print(f"float64: {arr_float64.nbytes / 1e6:.1f} MB")
print(f"float32: {arr_float32.nbytes / 1e6:.1f} MB")

Working with Files

Saving and Loading Arrays

# Save single array
arr = np.array([1, 2, 3, 4, 5])
np.save('array.npy', arr)

# Load single array
loaded = np.load('array.npy')

# Save multiple arrays
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
np.savez('arrays.npz', a=arr1, b=arr2)

# Load multiple arrays
loaded = np.load('arrays.npz')
print(loaded['a'])
print(loaded['b'])

# Compressed save (for large arrays)
np.savez_compressed('compressed.npz', a=arr1, b=arr2)

Text File I/O

# Save to text file
arr = np.array([[1, 2, 3], [4, 5, 6]])
np.savetxt('data.txt', arr, delimiter=',', fmt='%d')

# Load from text file
loaded = np.loadtxt('data.txt', delimiter=',')

# Load CSV with header
# data.csv:
# x,y,z
# 1,2,3
# 4,5,6
data = np.genfromtxt('data.csv', delimiter=',', names=True)
print(data['x'])  # [1. 4.]

Common Use Cases and Patterns

Image Processing Basics

# Represent image as NumPy array
# Grayscale image: 2D array (height, width)
# Color image: 3D array (height, width, channels)

height, width = 100, 100
grayscale_image = np.random.randint(0, 256, (height, width), dtype=np.uint8)
color_image = np.random.randint(0, 256, (height, width, 3), dtype=np.uint8)

# Image operations
flipped = np.flipud(grayscale_image)  # Flip vertically
rotated = np.rot90(grayscale_image)  # Rotate 90 degrees
cropped = grayscale_image[10:50, 10:50]  # Crop region

# Normalize pixel values
normalized = grayscale_image / 255.0

Time Series and Signal Processing

# Generate time series
t = np.linspace(0, 1, 1000)
signal = np.sin(2 * np.pi * 5 * t) + 0.5 * np.sin(2 * np.pi * 10 * t)

# Add noise
noise = 0.2 * np.random.randn(len(t))
noisy_signal = signal + noise

# Moving average (simple smoothing)
window_size = 10
smoothed = np.convolve(noisy_signal, np.ones(window_size)/window_size, mode='valid')

# Calculate differences (derivatives)
diff = np.diff(signal)

# Calculate cumulative sum (integration)
cumsum = np.cumsum(signal)

Data Normalization

data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=float)

# Min-Max normalization (scale to [0, 1])
min_val = data.min()
max_val = data.max()
normalized = (data - min_val) / (max_val - min_val)

# Z-score normalization (standardization)
mean = data.mean()
std = data.std()
standardized = (data - mean) / std

# Column-wise normalization
col_min = data.min(axis=0)
col_max = data.max(axis=0)
col_normalized = (data - col_min) / (col_max - col_min)

Best Practices

Performance Tips

  1. Vectorize operations: Avoid Python loops, use NumPy operations
  2. Use appropriate data types: Choose smallest type that fits your data
  3. Avoid unnecessary copies: Use views when possible
  4. Use in-place operations: arr += 1 instead of arr = arr + 1
  5. Preallocate arrays: Create array with final size instead of appending
  6. Use built-in functions: NumPy's C implementations are much faster
# Bad: Growing array
result = np.array([])
for i in range(1000):
    result = np.append(result, i)

# Good: Preallocate
result = np.empty(1000)
for i in range(1000):
    result[i] = i

# Better: Vectorize
result = np.arange(1000)

Memory Management

# Check memory usage
arr = np.arange(1000000)
print(f"Memory: {arr.nbytes / 1e6:.2f} MB")

# Delete large arrays when done
del arr

# Use memory mapping for huge arrays
mmap_arr = np.memmap('large_array.dat', dtype='float32', mode='w+', shape=(1000000,))
mmap_arr[:] = np.arange(1000000)
del mmap_arr  # Flush to disk

Code Style

# Import convention
import numpy as np

# Prefer built-in methods
arr.sum()  # Good
np.sum(arr)  # Also fine

# Use axis parameter for clarity
arr.mean(axis=0)  # Column means
arr.mean(axis=1)  # Row means

# Chain operations for readability
result = (arr
    .reshape(10, -1)
    .mean(axis=1)
    .round(2))

Common Pitfalls and Solutions

Pitfall 1: View vs Copy Confusion

# Problem
arr = np.arange(10)
subset = arr[::2]
subset[0] = 999  # Modifies original!

# Solution: Explicit copy when needed
subset = arr[::2].copy()
subset[0] = 999  # Original unchanged

Pitfall 2: Integer Division

# Problem
arr = np.array([1, 2, 3])
result = arr / 2  # Result is float

# Solution: Be explicit about types
result = arr // 2  # Integer division
result = arr.astype(float) / 2  # Float division

Pitfall 3: Broadcasting Mistakes

# Problem: Unintended broadcasting
a = np.array([1, 2, 3])  # Shape (3,)
b = np.array([[1], [2]])  # Shape (2, 1)
result = a + b  # Shape (2, 3) - probably not intended

# Solution: Verify shapes
print(f"a.shape: {a.shape}, b.shape: {b.shape}")
print(f"result.shape: {result.shape}")

Pitfall 4: Floating Point Precision

# Problem
a = np.array([0.1, 0.2, 0.3])
print(a.sum() == 0.6)  # False (floating point error)

# Solution: Use allclose for comparisons
print(np.allclose(a.sum(), 0.6))  # True

Integration with Other Libraries

NumPy arrays are the foundation for the scientific Python ecosystem:

# Pandas integration
import pandas as pd
df = pd.DataFrame(np.random.randn(5, 3), columns=['A', 'B', 'C'])
arr = df.values  # Convert to NumPy array

# Matplotlib integration
import matplotlib.pyplot as plt
x = np.linspace(0, 2*np.pi, 100)
y = np.sin(x)
plt.plot(x, y)

# SciPy integration
from scipy import stats
arr = np.random.randn(1000)
print(stats.describe(arr))

# scikit-learn integration
from sklearn.preprocessing import StandardScaler
data = np.random.randn(100, 5)
scaler = StandardScaler()
scaled = scaler.fit_transform(data)

Performance Comparison

Understanding when to use NumPy vs. pure Python:

import time

# Test: Sum 1 million numbers
n = 1000000

# Python list
python_list = list(range(n))
start = time.time()
result = sum(python_list)
python_time = time.time() - start

# NumPy array
numpy_array = np.arange(n)
start = time.time()
result = numpy_array.sum()
numpy_time = time.time() - start

print(f"Python: {python_time:.4f}s")
print(f"NumPy: {numpy_time:.4f}s")
print(f"NumPy is {python_time / numpy_time:.1f}x faster")

See Also

Official Documentation

Learning Resources

  • SciPy - Scientific computing library built on NumPy
  • Pandas - Data analysis library using NumPy arrays
  • Matplotlib - Plotting library compatible with NumPy
  • scikit-learn - Machine learning library using NumPy

Further Reading