NumPy (Numerical Python) is the fundamental package for scientific computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a comprehensive collection of high-level mathematical functions to operate on these arrays efficiently.
Overview
NumPy is the foundation of the Python scientific computing ecosystem. It offers:
- N-dimensional array object (ndarray): Fast and versatile array structure
- Broadcasting functions: Perform operations on arrays of different shapes
- Linear algebra operations: Matrix operations, decompositions, eigenvalues
- Random number generation: Comprehensive suite of random sampling functions
- Integration with C/C++/Fortran: Ability to wrap compiled code
- Memory efficiency: Optimized data storage and computation
NumPy arrays are faster and more memory-efficient than Python lists, making them ideal for numerical computations and data analysis tasks.
Installation
Install NumPy using pip:
pip install numpy
For scientific computing environments, consider installing via Anaconda:
conda install numpy
Verify installation:
import numpy as np
print(np.__version__)
Array Creation
Creating Basic Arrays
NumPy provides multiple ways to create arrays:
import numpy as np
# From Python list
arr1 = np.array([1, 2, 3, 4, 5])
print(arr1) # [1 2 3 4 5]
# 2D array from nested lists
arr2 = np.array([[1, 2, 3], [4, 5, 6]])
print(arr2)
# [[1 2 3]
# [4 5 6]]
# Specify data type
arr3 = np.array([1, 2, 3], dtype=np.float64)
print(arr3) # [1. 2. 3.]
# Complex numbers
arr4 = np.array([1+2j, 3+4j])
print(arr4) # [1.+2.j 3.+4.j]
Array Initialization Functions
# Array of zeros
zeros = np.zeros((3, 4)) # 3x4 array of zeros
# Array of ones
ones = np.ones((2, 3, 4)) # 3D array of ones
# Empty array (uninitialized)
empty = np.empty((2, 2))
# Array with constant value
full = np.full((3, 3), 7) # 3x3 array filled with 7
# Identity matrix
identity = np.eye(4) # 4x4 identity matrix
# Array from range
arange_arr = np.arange(0, 10, 2) # [0 2 4 6 8]
# Evenly spaced values
linspace_arr = np.linspace(0, 1, 5) # [0. 0.25 0.5 0.75 1.]
# Logarithmically spaced values
logspace_arr = np.logspace(0, 2, 5) # [1. 3.16 10. 31.62 100.]
Creating Arrays Like Existing Arrays
x = np.array([[1, 2], [3, 4]])
# Create zeros with same shape
zeros_like = np.zeros_like(x)
# Create ones with same shape
ones_like = np.ones_like(x)
# Create empty with same shape
empty_like = np.empty_like(x)
Array Properties and Attributes
Understanding array properties is essential for effective NumPy usage:
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
# Shape - dimensions of array
print(arr.shape) # (2, 4)
# Number of dimensions
print(arr.ndim) # 2
# Total number of elements
print(arr.size) # 8
# Data type of elements
print(arr.dtype) # int64
# Size of each element in bytes
print(arr.itemsize) # 8
# Total bytes consumed
print(arr.nbytes) # 64
# Transpose
print(arr.T)
# [[1 5]
# [2 6]
# [3 7]
# [4 8]]
Array Indexing and Slicing
Basic Indexing
arr = np.array([10, 20, 30, 40, 50])
# Single element access
print(arr[0]) # 10
print(arr[-1]) # 50 (last element)
# Slicing [start:stop:step]
print(arr[1:4]) # [20 30 40]
print(arr[::2]) # [10 30 50] (every other element)
print(arr[::-1]) # [50 40 30 20 10] (reverse)
Multi-dimensional Indexing
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# Access element at row 1, column 2
print(arr2d[1, 2]) # 6
# Row slicing
print(arr2d[0, :]) # [1 2 3] (first row)
# Column slicing
print(arr2d[:, 1]) # [2 5 8] (second column)
# Subarray
print(arr2d[0:2, 1:3])
# [[2 3]
# [5 6]]
Boolean Indexing
arr = np.array([1, 2, 3, 4, 5, 6])
# Boolean mask
mask = arr > 3
print(mask) # [False False False True True True]
# Filter using mask
filtered = arr[mask]
print(filtered) # [4 5 6]
# Inline filtering
print(arr[arr % 2 == 0]) # [2 4 6] (even numbers)
# Modify elements using boolean indexing
arr[arr > 3] = 0
print(arr) # [1 2 3 0 0 0]
Fancy Indexing
arr = np.array([10, 20, 30, 40, 50])
# Index with array of integers
indices = np.array([0, 2, 4])
print(arr[indices]) # [10 30 50]
# 2D fancy indexing
arr2d = np.arange(12).reshape(3, 4)
rows = np.array([0, 2])
cols = np.array([1, 3])
print(arr2d[rows, cols]) # [1 11]
Array Operations
Arithmetic Operations
a = np.array([1, 2, 3, 4])
b = np.array([10, 20, 30, 40])
# Element-wise operations
print(a + b) # [11 22 33 44]
print(a - b) # [-9 -18 -27 -36]
print(a * b) # [10 40 90 160]
print(a / b) # [0.1 0.1 0.1 0.1]
print(a ** 2) # [1 4 9 16]
# Operations with scalars (broadcasting)
print(a + 10) # [11 12 13 14]
print(a * 2) # [2 4 6 8]
Comparison Operations
a = np.array([1, 2, 3, 4, 5])
print(a > 3) # [False False False True True]
print(a == 3) # [False False True False False]
print(a <= 2) # [True True False False False]
# Compare arrays
b = np.array([5, 4, 3, 2, 1])
print(a < b) # [True True False False False]
Universal Functions (ufuncs)
arr = np.array([1, 4, 9, 16, 25])
# Mathematical functions
print(np.sqrt(arr)) # [1. 2. 3. 4. 5.]
print(np.exp(arr)) # Exponential
print(np.log(arr)) # Natural logarithm
print(np.log10(arr)) # Base-10 logarithm
# Trigonometric functions
angles = np.array([0, np.pi/2, np.pi])
print(np.sin(angles)) # [0. 1. 0.]
print(np.cos(angles)) # [1. 0. -1.]
print(np.tan(angles)) # Tangent values
# Rounding functions
arr_float = np.array([1.23, 4.56, 7.89])
print(np.round(arr_float)) # [1. 5. 8.]
print(np.floor(arr_float)) # [1. 4. 7.]
print(np.ceil(arr_float)) # [2. 5. 8.]
Broadcasting
Broadcasting allows NumPy to perform operations on arrays of different shapes:
# 1D array + scalar
a = np.array([1, 2, 3])
print(a + 5) # [6 7 8]
# 2D array + 1D array
matrix = np.array([[1, 2, 3], [4, 5, 6]])
vector = np.array([10, 20, 30])
print(matrix + vector)
# [[11 22 33]
# [14 25 36]]
# 2D array + column vector
matrix = np.array([[1, 2, 3], [4, 5, 6]])
col_vector = np.array([[10], [20]])
print(matrix + col_vector)
# [[11 12 13]
# [24 25 26]]
# Broadcasting rules example
a = np.arange(3).reshape(3, 1) # Shape (3, 1)
b = np.arange(3) # Shape (3,)
print(a + b)
# [[0 1 2]
# [1 2 3]
# [2 3 4]]
Broadcasting Rules
- If arrays don't have the same rank, prepend shape of lower rank array with 1s
- Arrays are compatible when dimensions are equal or one of them is 1
- After broadcasting, each array behaves as if it had shape equal to element-wise maximum
Array Manipulation
Reshaping
arr = np.arange(12)
# Reshape to 2D
reshaped = arr.reshape(3, 4)
print(reshaped)
# [[ 0 1 2 3]
# [ 4 5 6 7]
# [ 8 9 10 11]]
# Reshape to 3D
reshaped_3d = arr.reshape(2, 3, 2)
# Flatten array
flattened = reshaped.flatten() # Copy
raveled = reshaped.ravel() # View (no copy)
# Auto-calculate dimension
auto_reshape = arr.reshape(3, -1) # -1 means "figure it out"
Transposing and Swapping Axes
arr = np.arange(12).reshape(3, 4)
# Transpose
transposed = arr.T
print(transposed.shape) # (4, 3)
# Swap axes
arr3d = np.arange(24).reshape(2, 3, 4)
swapped = arr3d.swapaxes(1, 2) # Swap axes 1 and 2
print(swapped.shape) # (2, 4, 3)
# Transpose with axes specification
transposed_3d = arr3d.transpose(2, 1, 0)
print(transposed_3d.shape) # (4, 3, 2)
Stacking and Splitting
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])
# Vertical stack (row-wise)
v_stack = np.vstack((a, b))
# [[1 2]
# [3 4]
# [5 6]
# [7 8]]
# Horizontal stack (column-wise)
h_stack = np.hstack((a, b))
# [[1 2 5 6]
# [3 4 7 8]]
# Concatenate along axis
concat = np.concatenate((a, b), axis=0) # Same as vstack
# Split array
arr = np.arange(9)
split1, split2, split3 = np.split(arr, 3)
print(split1) # [0 1 2]
# Split 2D array
arr2d = np.arange(16).reshape(4, 4)
upper, lower = np.vsplit(arr2d, 2)
left, right = np.hsplit(arr2d, 2)
Adding and Removing Elements
arr = np.array([1, 2, 3, 4, 5])
# Append
appended = np.append(arr, [6, 7])
print(appended) # [1 2 3 4 5 6 7]
# Insert
inserted = np.insert(arr, 2, 99)
print(inserted) # [1 2 99 3 4 5]
# Delete
deleted = np.delete(arr, [1, 3])
print(deleted) # [1 3 5]
# Unique values
arr_dup = np.array([1, 2, 2, 3, 3, 3, 4])
unique = np.unique(arr_dup)
print(unique) # [1 2 3 4]
# Unique with counts
unique, counts = np.unique(arr_dup, return_counts=True)
print(counts) # [1 2 3 1]
Aggregation Functions
Statistical Functions
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# Sum
print(np.sum(arr)) # 45
print(arr.sum(axis=0)) # [12 15 18] (column sums)
print(arr.sum(axis=1)) # [6 15 24] (row sums)
# Mean
print(np.mean(arr)) # 5.0
print(arr.mean(axis=0)) # [4. 5. 6.]
# Median
print(np.median(arr)) # 5.0
# Standard deviation and variance
print(np.std(arr)) # Standard deviation
print(np.var(arr)) # Variance
# Min and max
print(np.min(arr)) # 1
print(np.max(arr)) # 9
print(arr.min(axis=1)) # [1 4 7]
# Argmin and argmax (indices)
print(np.argmin(arr)) # 0 (flattened index)
print(np.argmax(arr, axis=0)) # [2 2 2]
# Cumulative sum and product
print(np.cumsum([1, 2, 3, 4])) # [1 3 6 10]
print(np.cumprod([1, 2, 3, 4])) # [1 2 6 24]
# Percentiles
print(np.percentile(arr, 50)) # 5.0 (median)
print(np.percentile(arr, [25, 50, 75])) # [3. 5. 7.]
Logical Operations
a = np.array([True, False, True, False])
b = np.array([True, True, False, False])
# Logical AND, OR, NOT, XOR
print(np.logical_and(a, b)) # [True False False False]
print(np.logical_or(a, b)) # [True True True False]
print(np.logical_not(a)) # [False True False True]
print(np.logical_xor(a, b)) # [False True True False]
# Any and all
arr = np.array([1, 2, 0, 4])
print(np.any(arr > 3)) # True
print(np.all(arr > 0)) # False
Linear Algebra
NumPy provides comprehensive linear algebra operations through numpy.linalg:
Matrix Operations
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
# Matrix multiplication
C = np.dot(A, B)
# or
C = A @ B
print(C)
# [[19 22]
# [43 50]]
# Element-wise multiplication
element_wise = A * B
# [[5 12]
# [21 32]]
# Matrix power
print(np.linalg.matrix_power(A, 3)) # A^3
# Transpose
print(A.T)
# Trace (sum of diagonal)
print(np.trace(A)) # 5
# Diagonal elements
print(np.diag(A)) # [1 4]
# Create diagonal matrix
print(np.diag([1, 2, 3]))
# [[1 0 0]
# [0 2 0]
# [0 0 3]]
Matrix Decompositions
A = np.array([[1, 2], [3, 4], [5, 6]])
# Singular Value Decomposition (SVD)
U, s, Vt = np.linalg.svd(A)
print(f"U shape: {U.shape}, s shape: {s.shape}, Vt shape: {Vt.shape}")
# QR Decomposition
Q, R = np.linalg.qr(A)
# Eigenvalues and eigenvectors
A_square = np.array([[1, 2], [2, 1]])
eigenvalues, eigenvectors = np.linalg.eig(A_square)
print(f"Eigenvalues: {eigenvalues}")
print(f"Eigenvectors:\n{eigenvectors}")
# Cholesky decomposition (for positive definite matrices)
pos_def = np.array([[4, 2], [2, 3]])
L = np.linalg.cholesky(pos_def)
Matrix Properties
A = np.array([[1, 2], [3, 4]])
# Determinant
det = np.linalg.det(A)
print(f"Determinant: {det}") # -2.0
# Inverse
try:
A_inv = np.linalg.inv(A)
print(f"Inverse:\n{A_inv}")
# Verify: A @ A_inv should equal identity
print(np.allclose(A @ A_inv, np.eye(2))) # True
except np.linalg.LinAlgError:
print("Matrix is singular")
# Rank
rank = np.linalg.matrix_rank(A)
print(f"Rank: {rank}")
# Norm
norm_fro = np.linalg.norm(A, 'fro') # Frobenius norm
norm_2 = np.linalg.norm(A, 2) # Spectral norm
print(f"Frobenius norm: {norm_fro}")
Solving Linear Systems
# Solve Ax = b
A = np.array([[3, 1], [1, 2]])
b = np.array([9, 8])
x = np.linalg.solve(A, b)
print(f"Solution: {x}") # [2. 3.]
# Verify solution
print(np.allclose(A @ x, b)) # True
# Least squares solution (overdetermined system)
A = np.array([[1, 0], [1, 1], [1, 2]])
b = np.array([0, 1, 2])
x, residuals, rank, s = np.linalg.lstsq(A, b, rcond=None)
print(f"Least squares solution: {x}")
Random Number Generation
NumPy provides powerful random number generation capabilities:
Basic Random Generation
# Legacy random (still widely used)
np.random.seed(42) # For reproducibility
# Random floats in [0, 1)
random_floats = np.random.random(5)
print(random_floats)
# Random integers
random_ints = np.random.randint(0, 10, size=5)
print(random_ints)
# Random choice from array
choices = np.random.choice(['a', 'b', 'c'], size=10)
print(choices)
# Random permutation
arr = np.arange(10)
np.random.shuffle(arr) # In-place
print(arr)
permuted = np.random.permutation(10) # Returns new array
print(permuted)
New Random Generator API (Recommended)
# Create a random generator
rng = np.random.default_rng(seed=42)
# Random floats
random_floats = rng.random(5)
# Random integers
random_ints = rng.integers(0, 10, size=5)
# Random choice
choices = rng.choice(['a', 'b', 'c'], size=10)
# Random permutation
permuted = rng.permutation(10)
Probability Distributions
rng = np.random.default_rng(seed=42)
# Normal (Gaussian) distribution
normal = rng.normal(loc=0, scale=1, size=1000) # mean=0, std=1
# Uniform distribution
uniform = rng.uniform(low=0, high=1, size=1000)
# Binomial distribution
binomial = rng.binomial(n=10, p=0.5, size=1000)
# Poisson distribution
poisson = rng.poisson(lam=5, size=1000)
# Exponential distribution
exponential = rng.exponential(scale=1.0, size=1000)
# Beta distribution
beta = rng.beta(a=2, b=5, size=1000)
# Gamma distribution
gamma = rng.gamma(shape=2, scale=1, size=1000)
# Chi-square distribution
chisquare = rng.chisquare(df=2, size=1000)
# Multivariate normal
mean = [0, 0]
cov = [[1, 0.5], [0.5, 1]]
multivariate = rng.multivariate_normal(mean, cov, size=100)
Advanced Topics
Memory Views and Copies
# View vs. copy
original = np.arange(10)
# View (shares memory)
view = original[::2]
view[0] = 999
print(original) # [999 1 2 3 4 5 6 7 8 9] - original changed!
# Copy (independent)
original = np.arange(10)
copy = original[::2].copy()
copy[0] = 999
print(original) # [0 1 2 3 4 5 6 7 8 9] - original unchanged
# Check if array owns its data
print(original.flags['OWNDATA']) # True
print(view.flags['OWNDATA']) # False
Structured Arrays
# Define structured data type
dt = np.dtype([('name', 'U10'), ('age', 'i4'), ('weight', 'f8')])
# Create structured array
people = np.array([
('Alice', 25, 55.5),
('Bob', 30, 75.0),
('Charlie', 35, 80.2)
], dtype=dt)
# Access fields
print(people['name']) # ['Alice' 'Bob' 'Charlie']
print(people['age']) # [25 30 35]
# Access individual record
print(people[0]) # ('Alice', 25, 55.5)
print(people[0]['name']) # Alice
Masked Arrays
# Create masked array with invalid values
data = np.array([1, 2, -999, 4, 5, -999])
masked = np.ma.masked_equal(data, -999)
print(masked) # [1 2 -- 4 5 --]
print(masked.mean()) # 3.0 (ignores masked values)
# Create mask manually
mask = np.array([False, False, True, False, False, True])
masked2 = np.ma.array(data, mask=mask)
Advanced Indexing with np.where
arr = np.array([1, 2, 3, 4, 5, 6])
# Find indices where condition is true
indices = np.where(arr > 3)
print(indices) # (array([3, 4, 5]),)
# Conditional replacement
result = np.where(arr > 3, arr * 2, arr)
print(result) # [1 2 3 8 10 12]
# Multiple conditions
result = np.where((arr > 2) & (arr < 5), arr * 2, arr)
print(result) # [1 2 6 8 5 6]
Performance Optimization
# Use vectorized operations instead of loops
import time
# Slow: Python loop
n = 1000000
arr = np.arange(n)
start = time.time()
result = []
for x in arr:
result.append(x ** 2)
loop_time = time.time() - start
# Fast: NumPy vectorization
start = time.time()
result = arr ** 2
vectorized_time = time.time() - start
print(f"Loop: {loop_time:.4f}s, Vectorized: {vectorized_time:.4f}s")
print(f"Speedup: {loop_time / vectorized_time:.1f}x")
# Use in-place operations when possible
arr = np.arange(1000000)
arr += 1 # In-place (faster)
# vs
arr = arr + 1 # Creates new array (slower)
# Use appropriate data types
arr_float64 = np.arange(1000000, dtype=np.float64) # 8 bytes per element
arr_float32 = np.arange(1000000, dtype=np.float32) # 4 bytes per element
print(f"float64: {arr_float64.nbytes / 1e6:.1f} MB")
print(f"float32: {arr_float32.nbytes / 1e6:.1f} MB")
Working with Files
Saving and Loading Arrays
# Save single array
arr = np.array([1, 2, 3, 4, 5])
np.save('array.npy', arr)
# Load single array
loaded = np.load('array.npy')
# Save multiple arrays
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
np.savez('arrays.npz', a=arr1, b=arr2)
# Load multiple arrays
loaded = np.load('arrays.npz')
print(loaded['a'])
print(loaded['b'])
# Compressed save (for large arrays)
np.savez_compressed('compressed.npz', a=arr1, b=arr2)
Text File I/O
# Save to text file
arr = np.array([[1, 2, 3], [4, 5, 6]])
np.savetxt('data.txt', arr, delimiter=',', fmt='%d')
# Load from text file
loaded = np.loadtxt('data.txt', delimiter=',')
# Load CSV with header
# data.csv:
# x,y,z
# 1,2,3
# 4,5,6
data = np.genfromtxt('data.csv', delimiter=',', names=True)
print(data['x']) # [1. 4.]
Common Use Cases and Patterns
Image Processing Basics
# Represent image as NumPy array
# Grayscale image: 2D array (height, width)
# Color image: 3D array (height, width, channels)
height, width = 100, 100
grayscale_image = np.random.randint(0, 256, (height, width), dtype=np.uint8)
color_image = np.random.randint(0, 256, (height, width, 3), dtype=np.uint8)
# Image operations
flipped = np.flipud(grayscale_image) # Flip vertically
rotated = np.rot90(grayscale_image) # Rotate 90 degrees
cropped = grayscale_image[10:50, 10:50] # Crop region
# Normalize pixel values
normalized = grayscale_image / 255.0
Time Series and Signal Processing
# Generate time series
t = np.linspace(0, 1, 1000)
signal = np.sin(2 * np.pi * 5 * t) + 0.5 * np.sin(2 * np.pi * 10 * t)
# Add noise
noise = 0.2 * np.random.randn(len(t))
noisy_signal = signal + noise
# Moving average (simple smoothing)
window_size = 10
smoothed = np.convolve(noisy_signal, np.ones(window_size)/window_size, mode='valid')
# Calculate differences (derivatives)
diff = np.diff(signal)
# Calculate cumulative sum (integration)
cumsum = np.cumsum(signal)
Data Normalization
data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=float)
# Min-Max normalization (scale to [0, 1])
min_val = data.min()
max_val = data.max()
normalized = (data - min_val) / (max_val - min_val)
# Z-score normalization (standardization)
mean = data.mean()
std = data.std()
standardized = (data - mean) / std
# Column-wise normalization
col_min = data.min(axis=0)
col_max = data.max(axis=0)
col_normalized = (data - col_min) / (col_max - col_min)
Best Practices
Performance Tips
- Vectorize operations: Avoid Python loops, use NumPy operations
- Use appropriate data types: Choose smallest type that fits your data
- Avoid unnecessary copies: Use views when possible
- Use in-place operations:
arr += 1instead ofarr = arr + 1 - Preallocate arrays: Create array with final size instead of appending
- Use built-in functions: NumPy's C implementations are much faster
# Bad: Growing array
result = np.array([])
for i in range(1000):
result = np.append(result, i)
# Good: Preallocate
result = np.empty(1000)
for i in range(1000):
result[i] = i
# Better: Vectorize
result = np.arange(1000)
Memory Management
# Check memory usage
arr = np.arange(1000000)
print(f"Memory: {arr.nbytes / 1e6:.2f} MB")
# Delete large arrays when done
del arr
# Use memory mapping for huge arrays
mmap_arr = np.memmap('large_array.dat', dtype='float32', mode='w+', shape=(1000000,))
mmap_arr[:] = np.arange(1000000)
del mmap_arr # Flush to disk
Code Style
# Import convention
import numpy as np
# Prefer built-in methods
arr.sum() # Good
np.sum(arr) # Also fine
# Use axis parameter for clarity
arr.mean(axis=0) # Column means
arr.mean(axis=1) # Row means
# Chain operations for readability
result = (arr
.reshape(10, -1)
.mean(axis=1)
.round(2))
Common Pitfalls and Solutions
Pitfall 1: View vs Copy Confusion
# Problem
arr = np.arange(10)
subset = arr[::2]
subset[0] = 999 # Modifies original!
# Solution: Explicit copy when needed
subset = arr[::2].copy()
subset[0] = 999 # Original unchanged
Pitfall 2: Integer Division
# Problem
arr = np.array([1, 2, 3])
result = arr / 2 # Result is float
# Solution: Be explicit about types
result = arr // 2 # Integer division
result = arr.astype(float) / 2 # Float division
Pitfall 3: Broadcasting Mistakes
# Problem: Unintended broadcasting
a = np.array([1, 2, 3]) # Shape (3,)
b = np.array([[1], [2]]) # Shape (2, 1)
result = a + b # Shape (2, 3) - probably not intended
# Solution: Verify shapes
print(f"a.shape: {a.shape}, b.shape: {b.shape}")
print(f"result.shape: {result.shape}")
Pitfall 4: Floating Point Precision
# Problem
a = np.array([0.1, 0.2, 0.3])
print(a.sum() == 0.6) # False (floating point error)
# Solution: Use allclose for comparisons
print(np.allclose(a.sum(), 0.6)) # True
Integration with Other Libraries
NumPy arrays are the foundation for the scientific Python ecosystem:
# Pandas integration
import pandas as pd
df = pd.DataFrame(np.random.randn(5, 3), columns=['A', 'B', 'C'])
arr = df.values # Convert to NumPy array
# Matplotlib integration
import matplotlib.pyplot as plt
x = np.linspace(0, 2*np.pi, 100)
y = np.sin(x)
plt.plot(x, y)
# SciPy integration
from scipy import stats
arr = np.random.randn(1000)
print(stats.describe(arr))
# scikit-learn integration
from sklearn.preprocessing import StandardScaler
data = np.random.randn(100, 5)
scaler = StandardScaler()
scaled = scaler.fit_transform(data)
Performance Comparison
Understanding when to use NumPy vs. pure Python:
import time
# Test: Sum 1 million numbers
n = 1000000
# Python list
python_list = list(range(n))
start = time.time()
result = sum(python_list)
python_time = time.time() - start
# NumPy array
numpy_array = np.arange(n)
start = time.time()
result = numpy_array.sum()
numpy_time = time.time() - start
print(f"Python: {python_time:.4f}s")
print(f"NumPy: {numpy_time:.4f}s")
print(f"NumPy is {python_time / numpy_time:.1f}x faster")
See Also
Official Documentation
Learning Resources
Related Libraries
- SciPy - Scientific computing library built on NumPy
- Pandas - Data analysis library using NumPy arrays
- Matplotlib - Plotting library compatible with NumPy
- scikit-learn - Machine learning library using NumPy