data-science
📋 Copy All SciPy Commands
📄 Generate SciPy PDF Guide
SciPy Cheatsheet
Installation
Platform
Command
Ubuntu/Debian
sudo apt-get install python3-scipy or pip install scipy
macOS
pip3 install scipy or brew install python3 && pip3 install scipy
Windows
pip install scipy or conda install scipy (recommended)
Conda (All platforms)
conda install scipy or conda install -c conda-forge scipy
Virtual Environment
python -m venv env && source env/bin/activate && pip install scipy
Specific Version
pip install scipy==1.11.4
With Dependencies
pip install scipy[all]
Verify Installation
python -c "import scipy; print(scipy.__version__)"
Basic Commands
Import and Setup
Command
Description
import scipy
Import the SciPy library
import numpy as np
Import NumPy (required for SciPy)
from scipy import optimize
Import optimization module
from scipy import stats
Import statistics module
from scipy import linalg
Import linear algebra module
scipy.__version__
Check SciPy version
scipy.show_config()
Display SciPy configuration
Optimization Basics
Command
Description
minimize(func, x0)
Find minimum of a scalar function
minimize(func, x0, method='BFGS')
Minimize using BFGS algorithm
root(func, x0)
Find root of a function
curve_fit(func, xdata, ydata)
Fit curve to data points
fsolve(func, x0)
Solve system of nonlinear equations
minimize_scalar(func)
Minimize single-variable function
linprog(c, A_ub, b_ub)
Linear programming optimization
Integration
Command
Description
quad(func, a, b)
Integrate function from a to b
dblquad(func, a, b, gfun, hfun)
Double integration
odeint(func, y0, t)
Solve ordinary differential equations
solve_ivp(func, t_span, y0)
Modern ODE solver with events
trapz(y, x)
Trapezoidal rule integration
simps(y, x)
Simpson's rule integration
Interpolation
Command
Description
interp1d(x, y)
1D interpolation function
interp1d(x, y, kind='cubic')
Cubic spline interpolation
griddata(points, values, xi)
Interpolate scattered data
UnivariateSpline(x, y)
Univariate spline approximation
interp2d(x, y, z)
2D interpolation function
Statistics
Command
Description
stats.norm.rvs(loc, scale, size)
Generate normal random variables
stats.ttest_1samp(data, popmean)
One-sample t-test
stats.ttest_ind(data1, data2)
Independent two-sample t-test
stats.pearsonr(x, y)
Pearson correlation coefficient
stats.norm.pdf(x, loc, scale)
Normal probability density function
stats.norm.cdf(x, loc, scale)
Cumulative distribution function
stats.kstest(data, 'norm')
Kolmogorov-Smirnov test
stats.chi2_contingency(table)
Chi-square test of independence
Linear Algebra
Command
Description
linalg.solve(A, b)
Solve linear system Ax = b
linalg.inv(A)
Calculate matrix inverse
linalg.det(A)
Calculate determinant
linalg.eig(A)
Compute eigenvalues and eigenvectors
linalg.svd(A)
Singular value decomposition
linalg.lstsq(A, b)
Least-squares solution
linalg.norm(A)
Matrix or vector norm
linalg.qr(A)
QR decomposition
Signal Processing
Command
Description
signal.butter(N, Wn, btype)
Butterworth filter design
signal.filtfilt(b, a, data)
Zero-phase digital filtering
signal.find_peaks(data)
Find peaks in signal
signal.correlate(sig1, sig2)
Cross-correlation of signals
signal.convolve(sig1, sig2)
Convolution of signals
signal.spectrogram(data, fs)
Compute spectrogram
signal.resample(data, num)
Resample signal to num samples
Sparse Matrices
Command
Description
sparse.csr_matrix(data)
Create compressed sparse row matrix
sparse.csc_matrix(data)
Create compressed sparse column matrix
sparse.eye(n)
Sparse identity matrix
sparse.diags(diagonals, offsets)
Construct sparse diagonal matrix
matrix.toarray()
Convert sparse to dense array
matrix.todense()
Convert sparse to dense matrix
Advanced Usage
Advanced Optimization
Command
Description
minimize(func, x0, method='SLSQP', constraints=cons, bounds=bnds)
Constrained optimization with bounds
differential_evolution(func, bounds)
Global optimization using genetic algorithm
basinhopping(func, x0, minimizer_kwargs)
Global optimization with basin-hopping
least_squares(func, x0, bounds)
Nonlinear least squares with bounds
minimize(func, x0, jac=gradient_func)
Optimization with analytical gradient
shgo(func, bounds)
Simplicial homology global optimization
dual_annealing(func, bounds)
Dual annealing global optimization
Advanced Integration
Command
Description
solve_ivp(func, t_span, y0, method='RK45', events=event_func)
ODE solver with event detection
solve_ivp(func, t_span, y0, dense_output=True)
ODE with continuous solution
nquad(func, ranges)
N-dimensional integration
odeint(func, y0, t, Dfun=jacobian)
ODE with Jacobian matrix
solve_bvp(func, bc, x, y)
Boundary value problem solver
quad(func, a, b, weight='cos')
Integration with weight functions
Advanced Interpolation
Command
Description
RBFInterpolator(points, values)
Radial basis function interpolation
BSpline(t, c, k)
B-spline basis functions
make_interp_spline(x, y, k=3)
Make interpolating B-spline
Akima1DInterpolator(x, y)
Akima interpolation (smooth curves)
PchipInterpolator(x, y)
PCHIP 1-D monotonic cubic interpolation
RegularGridInterpolator((x, y), z)
Interpolation on regular grid
Advanced Statistics
Command
Description
stats.gaussian_kde(data)
Kernel density estimation
stats.multivariate_normal.rvs(mean, cov, size)
Multivariate normal samples
stats.bootstrap((data,), statistic, n_resamples)
Bootstrap confidence intervals
stats.anderson(data, dist='norm')
Anderson-Darling test
stats.mannwhitneyu(x, y)
Mann-Whitney U test
stats.kruskal(*groups)
Kruskal-Wallis H-test
stats.wilcoxon(x, y)
Wilcoxon signed-rank test
stats.spearmanr(x, y)
Spearman rank correlation
stats.linregress(x, y)
Linear regression analysis
Advanced Linear Algebra
Command
Description
linalg.cholesky(A)
Cholesky decomposition
linalg.lu(A)
LU decomposition
linalg.schur(A)
Schur decomposition
linalg.expm(A)
Matrix exponential
linalg.logm(A)
Matrix logarithm
linalg.sqrtm(A)
Matrix square root
linalg.solve_triangular(A, b)
Solve triangular system
linalg.pinv(A)
Moore-Penrose pseudoinverse
Advanced Signal Processing
Command
Description
signal.stft(data, fs)
Short-time Fourier transform
signal.istft(Zxx, fs)
Inverse STFT
signal.welch(data, fs)
Power spectral density (Welch method)
signal.coherence(x, y, fs)
Coherence between signals
signal.hilbert(data)
Hilbert transform
signal.savgol_filter(data, window, polyorder)
Savitzky-Golay filter
signal.detrend(data)
Remove linear trend from data
signal.wiener(data)
Wiener filter for noise reduction
Advanced Sparse Operations
Command
Description
sparse.linalg.spsolve(A, b)
Solve sparse linear system
sparse.linalg.eigsh(A, k)
Find k eigenvalues of sparse matrix
sparse.linalg.svds(A, k)
Sparse SVD for k singular values
sparse.linalg.cg(A, b)
Conjugate gradient solver
sparse.linalg.gmres(A, b)
GMRES iterative solver
sparse.kron(A, B)
Kronecker product of sparse matrices
Spatial Operations
Command
Description
spatial.distance.pdist(X, metric='euclidean')
Pairwise distances between observations
spatial.distance.cdist(XA, XB)
Distance between two collections
spatial.KDTree(data)
k-d tree for fast nearest neighbor
spatial.ConvexHull(points)
Convex hull of points
spatial.Delaunay(points)
Delaunay triangulation
spatial.Voronoi(points)
Voronoi diagram
spatial.distance_matrix(x, y)
Distance matrix computation
Image Processing
Command
Description
ndimage.gaussian_filter(image, sigma)
Gaussian smoothing filter
ndimage.median_filter(image, size)
Median filter for noise reduction
ndimage.rotate(image, angle)
Rotate image by angle
ndimage.zoom(image, zoom)
Zoom image by factor
ndimage.binary_erosion(image)
Binary morphological erosion
ndimage.binary_dilation(image)
Binary morphological dilation
ndimage.label(image)
Label connected components
ndimage.sobel(image)
Sobel edge detection
Configuration
NumPy/SciPy Configuration
# Display build and configuration information
import scipy
scipy . show_config ()
# Set NumPy print options (affects SciPy output)
import numpy as np
np . set_printoptions ( precision = 4 , suppress = True , linewidth = 100 )
# Configure warning filters
import warnings
warnings . filterwarnings ( 'ignore' , category = RuntimeWarning )
Optimization Configuration
# Configure optimization options
from scipy.optimize import minimize
options = {
'maxiter' : 1000 , # Maximum iterations
'disp' : True , # Display convergence messages
'ftol' : 1e-8 , # Function tolerance
'gtol' : 1e-8 # Gradient tolerance
}
result = minimize ( func , x0 , method = 'BFGS' , options = options )
Integration Tolerances
from scipy.integrate import quad , solve_ivp
# Configure integration accuracy
result , error = quad ( func , a , b ,
epsabs = 1e-10 , # Absolute error tolerance
epsrel = 1e-10 , # Relative error tolerance
limit = 100 ) # Subdivision limit
# Configure ODE solver
sol = solve_ivp ( func , t_span , y0 ,
method = 'RK45' ,
rtol = 1e-6 , # Relative tolerance
atol = 1e-9 , # Absolute tolerance
max_step = 0.1 ) # Maximum step size
from scipy import sparse
# Choose format based on use case
# CSR: efficient row slicing, matrix-vector products
A_csr = sparse . csr_matrix ( data )
# CSC: efficient column slicing, matrix-vector products
A_csc = sparse . csc_matrix ( data )
# COO: efficient construction, conversion
A_coo = sparse . coo_matrix ( data )
# LIL: efficient incremental construction
A_lil = sparse . lil_matrix (( 1000 , 1000 ))
Random Number Generation
from scipy import stats
import numpy as np
# Set random seed for reproducibility
np . random . seed ( 42 )
# Use RandomState for thread-safe operations
rng = np . random . RandomState ( 42 )
data = stats . norm . rvs ( loc = 0 , scale = 1 , size = 1000 , random_state = rng )
Common Use Cases
Use Case 1: Curve Fitting and Model Selection
import numpy as np
from scipy.optimize import curve_fit
from scipy import stats
import matplotlib.pyplot as plt
# Generate noisy data
x = np . linspace ( 0 , 10 , 100 )
y_true = 2.5 * np . exp ( - 0.5 * x ) + 1.0
y_noisy = y_true + 0.2 * np . random . normal ( size = len ( x ))
# Define model
def exponential_model ( x , a , b , c ):
return a * np . exp ( - b * x ) + c
# Fit curve
params , covariance = curve_fit ( exponential_model , x , y_noisy )
y_fit = exponential_model ( x , * params )
# Calculate R-squared
residuals = y_noisy - y_fit
ss_res = np . sum ( residuals ** 2 )
ss_tot = np . sum (( y_noisy - np . mean ( y_noisy )) ** 2 )
r_squared = 1 - ( ss_res / ss_tot )
print ( f "Parameters: a= { params [ 0 ] : .3f } , b= { params [ 1 ] : .3f } , c= { params [ 2 ] : .3f } " )
print ( f "R-squared: { r_squared : .4f } " )
Use Case 2: Signal Filtering and Analysis
import numpy as np
from scipy import signal
from scipy.fft import fft , fftfreq
# Create noisy signal
fs = 1000 # Sampling frequency
t = np . linspace ( 0 , 1 , fs )
clean_signal = np . sin ( 2 * np . pi * 50 * t ) + np . sin ( 2 * np . pi * 120 * t )
noisy_signal = clean_signal + 0.5 * np . random . normal ( size = len ( t ))
# Design and apply Butterworth filter
sos = signal . butter ( 10 , 100 , btype = 'low' , fs = fs , output = 'sos' )
filtered_signal = signal . sosfilt ( sos , noisy_signal )
# Find peaks
peaks , properties = signal . find_peaks ( filtered_signal ,
height = 0.5 ,
distance = 20 )
# Compute power spectral density
f , Pxx = signal . welch ( filtered_signal , fs , nperseg = 256 )
print ( f "Found { len ( peaks ) } peaks" )
print ( f "Dominant frequency: { f [ np . argmax ( Pxx )] : .2f } Hz" )
Use Case 3: Statistical Hypothesis Testing
import numpy as np
from scipy import stats
# Generate two sample datasets
np . random . seed ( 42 )
group1 = stats . norm . rvs ( loc = 100 , scale = 15 , size = 50 )
group2 = stats . norm . rvs ( loc = 105 , scale = 15 , size = 50 )
# Test for normality
_ , p_norm1 = stats . shapiro ( group1 )
_ , p_norm2 = stats . shapiro ( group2 )
# Test for equal variances
_ , p_var = stats . levene ( group1 , group2 )
# Perform appropriate t-test
if p_var > 0.05 :
# Equal variances
t_stat , p_value = stats . ttest_ind ( group1 , group2 )
test_type = "Independent t-test (equal variances)"
else :
# Unequal variances (Welch's t-test)
t_stat , p_value = stats . ttest_ind ( group1 , group2 , equal_var = False )
test_type = "Welch's t-test (unequal variances)"
# Calculate effect size (Cohen's d)
pooled_std = np . sqrt (( np . std ( group1 ) ** 2 + np . std ( group2 ) ** 2 ) / 2 )
cohens_d = ( np . mean ( group1 ) - np . mean ( group2 )) / pooled_std
print ( f "Test: { test_type } " )
print ( f "t-statistic: { t_stat : .4f } , p-value: { p_value : .4f } " )
print ( f "Cohen's d: { cohens_d : .4f } " )
Use Case 4: Optimization with Constraints
import numpy as np
from scipy.optimize import minimize
# Portfolio optimization: maximize return, minimize risk
def portfolio_objective ( weights , returns , cov_matrix , risk_aversion = 1.0 ):
portfolio_return = np . sum ( returns * weights )
portfolio_risk = np . sqrt ( np . dot ( weights . T , np . dot ( cov_matrix , weights )))
return - ( portfolio_return - risk_aversion * portfolio_risk )
# Sample data
n_assets = 4
returns = np . array ([ 0.10 , 0.12 , 0.15 , 0.08 ])
cov_matrix = np . array ([[ 0.005 , - 0.001 , 0.001 , 0.000 ],
[ - 0.001 , 0.008 , 0.002 , 0.001 ],
[ 0.001 , 0.002 , 0.012 , 0.003 ],
[ 0.000 , 0.001 , 0.003 , 0.004 ]])
# Constraints: weights sum to 1
constraints = { 'type' : 'eq' , 'fun' : lambda x : np . sum ( x ) - 1 }
# Bounds: each weight between 0 and 1
bounds = tuple (( 0 , 1 ) for _ in range ( n_assets ))
# Initial guess
x0 = np . array ([ 1 / n_assets ] * n_assets )
# Optimize
result = minimize ( portfolio_objective , x0 ,
args = ( returns , cov_matrix ),
method = 'SLSQP' ,
bounds = bounds ,
constraints = constraints )
optimal_weights = result . x
print ( "Optimal portfolio weights:" )
for i , weight in enumerate ( optimal_weights ):
print ( f "Asset { i + 1 } : { weight * 100 : .2f } %" )
Use Case 5: Image Processing Pipeline
import numpy as np
from scipy import ndimage
from scipy import signal
# Load or create image (grayscale)
image = np . random . rand ( 256 , 256 ) * 255
# Step 1: Denoise with Gaussian filter
denoised = ndimage . gaussian_filter ( image , sigma = 2 )
# Step 2: Edge detection
edges_x = ndimage . sobel ( denoised , axis = 0 )
edges_y = ndimage . sobel ( denoised , axis = 1 )
edges = np . hypot ( edges_x , edges_y )
# Step 3: Threshold to create binary image
threshold = np . mean ( edges ) + np . std ( edges )
binary = edges > threshold
# Step 4: Morphological operations
struct = ndimage . generate_binary_structure ( 2 , 2 )
cleaned = ndimage . binary_opening ( binary , structure = struct )
cleaned = ndimage . binary_closing ( cleaned , structure = struct )
# Step 5: Label connected components
labeled , num_features = ndimage . label ( cleaned )
# Step 6: Calculate properties
sizes = ndimage . sum ( cleaned , labeled , range ( num_features + 1 ))
centers = ndimage . center_of_mass ( cleaned , labeled , range ( 1 , num_features + 1 ))
print ( f "Found { num_features } objects" )
print ( f "Average object size: { np . mean ( sizes [ 1 :]) : .2f } pixels" )
Best Practices
Choose the Right Method : Select optimization algorithms based on problem characteristics (gradient-based for smooth functions, global optimizers for multimodal problems)
Vectorize Operations : Use NumPy array operations instead of loops for better performance; SciPy functions are optimized for array inputs
Handle Numerical Stability : Use appropriate tolerances (rtol, atol) for integration and optimization; be aware of condition numbers in linear algebra operations
Leverage Sparse Matrices : For large matrices with many zeros, use sparse matrix formats (csr_matrix, csc_matrix) to save memory and improve computation speed
Provide Good Initial Guesses : Optimization and root-finding algorithms converge faster with reasonable starting points; use domain knowledge when possible
Use Appropriate Statistical Tests : Verify assumptions (normality, equal variance) before applying parametric tests; use non-parametric alternatives when assumptions are violated
Set Random Seeds : Ensure reproducibility in stochastic algorithms by setting random seeds with np.random.seed() or using RandomState objects
Profile Performance : Use %timeit in Jupyter or cProfile to identify bottlenecks; consider Numba or Cython for critical sections if SciPy functions are insufficient
Check Convergence : Always examine optimization results (result.success, result.message) and integration errors before trusting outputs
Document Units and Scales : Clearly document physical units and data scales; normalize data when appropriate to improve numerical stability