Physics-Informed Feature Engineering
A comprehensive guide to energy forms, SI measurement standards, and practical feature engineering strategies for data scientists building physics-aware ML models with real-world battery discharge example
Physics-Informed Feature Engineering
How do you build ML models that respect the laws of physics? Features derived from first principles—power, energy, dimensionless ratios—outperform purely statistical transformations in energy system modeling. This guide shows how to leverage SI units, conservation laws, and dimensional analysis to create features that generalize across operating conditions.
The Problem: Raw sensor data lacks physical structure. Standard feature engineering ignores dimensional consistency, causing models to fail when conditions change.
The Solution: Convert to SI units → derive physics-based quantities → create dimensionless groups → validate with conservation laws. Result: interpretable models aligned with reality.
Coverage: Energy forms and equations • SI dimensional framework • Battery discharge case study • Physics-informed EDA
Why Energy Matters in Physics-Based ML
Energy is the universal currency of physical processes. It provides three critical advantages for modeling:
- Conservation (First Law): Enables prediction without tracking every intermediate step—initial states determine final states
- Interconvertibility: All technologies exploit energy transformations (kinetic ↔ potential ↔ thermal ↔ electrical)
- Change Quantification: Every process involves energy transfer; it constrains what’s physically possible
Ignoring energy in ML models means discarding nature’s most fundamental constraint. Physics-informed features encode these principles directly.
Forms of Energy
Energy—the ability to do work or cause change—appears in interchangeable forms governed by conservation laws. Understanding kinetic, potential, and radiant energy provides the foundation for deriving meaningful features from sensor data.
Main Categories
Kinetic Energy
Energy of motion—both translational (moving objects) and rotational (spinning objects). In ML modeling, kinetic energy features capture dynamic behavior in mechanical systems.
Key Equations: \(KE_{\text{trans}} = \frac{1}{2} m v^2, \quad KE_{\text{rot}} = \frac{1}{2} I \omega^2\)
For high-speed applications (relativistic): \(E = \gamma m c^2 - m c^2\)
Relativistic kinetic energy: At low velocities (v « c), the equation approximates to classical \(\frac{1}{2}mv^2\). As velocity approaches light speed, inertia becomes velocity-dependent, creating a dynamical light-speed barrier.
Why It Matters: Quantifies work capacity through motion, enabling predictive modeling across domains—from particle physics to industrial machinery. Galileo’s falling body experiments (16th century) led to Leibniz’s vis viva concept, eventually formalized as \(\frac{1}{2} mv^2\) in the 19th century.
- Subforms/Examples:
- Thermal: Random motion of particles (atoms/molecules). Example: Heat from friction or steam from boiling water. In data science, thermal energy datasets often involve temperature distributions for heat transfer models.
- Mechanical: Macroscopic motion of objects. Example: Rolling ball; wind turning turbine blades. Useful in predictive maintenance models for machinery.
- Electrical: Ordered motion of charged particles. Example: Current in household appliances. Key in energy consumption forecasting using time-series data.
- Sound: Energy carried by vibrational waves in matter. Example: Guitar string or speaker cone vibrating. Analyzed in signal processing for acoustic modeling.
Potential Energy
Stored energy based on position or configuration—ready to convert to kinetic energy. Critical for ML models of mechanical systems where position determines behavior.
Key Equation: \(PE_{\text{grav}} = m g h\)
Why It Matters: Represents latent work capacity. Total mechanical energy (kinetic + potential) is conserved, simplifying simulations and optimizations. Thomas Young coined “energy” (1807); Rankine and Kelvin formalized potential energy in the 1850s, establishing the First Law of Thermodynamics.
- Subforms/Examples:
- Gravitational: Due to position in a gravitational field. Example: Water behind a dam; book on a shelf. Modeled in geospatial data for hydropower predictions.
- Elastic: Stored in deformed elastic materials. Example: Stretched rubber band, compressed spring, drawn bow. Used in finite element analysis datasets.
- Chemical: Stored in molecular bonds and arrangements. Example: Gasoline, food, battery discharge. Critical for reaction kinetics modeling in cheminformatics.
- Nuclear: Stored in atomic nuclei via strong force. Example: Fission in reactors; fusion in the Sun. Involved in radiation transport simulations.
Radiant Energy (Electromagnetic)
Energy carried by electromagnetic waves at light speed. Essential for modeling solar, optical, and thermal systems.
Key Equations: \(E_{\text{photon}} = h \nu, \quad I = \frac{P}{A}\)
Why It Matters: Powers photovoltaic cells, heating systems, and optical sensors. Maxwell unified electromagnetism (1860s); Planck’s quanta (1900) and Einstein’s photoelectric effect (1905) established photons as energy packets—foundational for quantum mechanics.
ML Applications: Spectral data for remote sensing, image analysis, solar panel optimization.
Energy Culture
- Conservation and Conversion: Total energy is conserved (First Law of Thermodynamics, established 1840s–1850s by Mayer, Joule, Helmholtz). Forms interconvert with efficiencies <100%; “losses” often appear as thermal energy or sound. In data science, this principle underpins loss functions in energy balance models.
- Storage and Transport: Examples include batteries (chemical → electrical), pumped hydro (gravitational → kinetic), capacitors/inductors (electrical), and nuclear fuels (nuclear → thermal → mechanical → electrical). Data scientists model these for optimization in supply chain or grid management.
The International System of Units (SI)
The International System of Units (SI), from the French Système international d’unités, is the modern metric system and the world’s most widely used measurement standard. It ensures global consistency in science, engineering, industry, and trade. The Bureau International des Poids et Mesures (BIPM) maintains the SI.
Current Definition (since 2019)
Since 20 May 2019, the SI defines all units by fixing the exact numerical values of seven defining constants of nature. This makes the system universal, stable, and independent of physical artifacts:
- Cesium frequency Δν₁₃₃Cs — 9 192 631 770 Hz
- Speed of light in vacuum c — 299 792 458 m/s
- Planck constant h — 6.626 070 15 × 10⁻³⁴ J s
- Elementary charge e — 1.602 176 634 × 10⁻¹⁹ C
- Boltzmann constant k — 1.380 649 × 10⁻²³ J/K
- Avogadro constant N_A — 6.022 140 76 × 10²³ mol⁻¹
- Luminous efficacy K_cd — 683 lm/W (for monochromatic radiation at 540 THz)
These constants derive the seven base units.
Seven SI Base Units
- Second (s) — time
- Metre (m) — length
- Kilogram (kg) — mass
- Ampere (A) — electric current
- Kelvin (K) — thermodynamic temperature
- Mole (mol) — amount of substance
- Candela (cd) — luminous intensity
Use the seven SI base units, and derived units built from them, to enforce consistency and comparability across datasets and models.
Brief History of the SI
Originated during the French Revolution (1799: platinum metre/kilogram standards). The 1875 Metre Convention created BIPM. The modern SI was adopted in 1960 with six base units (mole added 1971), culminating in the 2019 revision based entirely on fundamental constants.
Examples of Derived Units for Energy
- Energy: joule (J = kg·m²·s⁻²)
- Power: watt (W = kg·m²·s⁻³)
- Force: newton (N = kg·m·s⁻²)
- Pressure: pascal (Pa = kg·m⁻¹·s⁻²)
Feature Engineering with Dimensions (for Data Scientists Modeling Energy Systems)
Feature engineering is a core step in the data science pipeline, bridging raw data preparation and modeling. A standard feature engineering process typically involves three key steps:
- Feature Creation: Generating new features from raw data (e.g., deriving physical quantities, transformations, or interactions).
- Feature Transformation: Applying scaling, normalization, encoding, or other modifications to make features suitable for models.
- Feature Selection: Identifying the most relevant features to reduce dimensionality and improve performance.
In energy-related datasets (e.g., sensor readings, simulation outputs, or experimental logs), incorporating physical dimensions and SI units ensures features are physically meaningful, leading to more interpretable and generalizable models.
Dimensional Analysis and Feature Hierarchy
Dimensional analysis is the systematic tracking of physical units through calculations. Every derived quantity must have dimensions that follow from its definition. For example, power has dimensions $[kg \cdot m^2 \cdot s^{-3}]$ because it’s energy per time: $[kg \cdot m^2 \cdot s^{-2}] / [s]$. If your calculation produces the wrong dimensions, you made an algebraic error.
The Buckingham Π theorem formalizes dimensionless feature construction: given $n$ physical variables involving $k$ base dimensions, you can form $n - k$ independent dimensionless groups. These groups are scale-invariant—they remain valid whether you measure voltage in millivolts or kilovolts.
Feature hierarchy follows naturally:
- Raw measurements: Base SI quantities with dimensions (V, A, K, s)
- Derived quantities: Computed from raw data via physics equations (power, energy, resistance)
- Dimensionless features: Ratios or products that cancel all units (normalized voltage, Π groups)
Dimensionless features generalize better because they’re independent of measurement scales and unit systems.
Figure 3.0: Three-tier feature hierarchy with dimensional analysis
Battery Example: Raw sensor data (voltage V, current I, time t, temperature T) sits at the top. Middle layer derives power $P = VI$ $[W]$, energy $E = Pt$ $[J]$, and resistance $R = V/I$ $[\Omega]$—each with verified dimensions. Bottom layer creates dimensionless ratios: $V/V_{max}$ (state of charge proxy), $\Pi_1 = Pt/E$ (efficiency indicator), $\Pi_2 = RI/V$ (Ohm’s law verification). These features work across different cell chemistries and capacities.
Feature Engineering Steps
- Unit Harmonization (Preparation for Creation): Convert all measurements to SI units using libraries like Pint. This is foundational before creation to avoid inconsistencies.
- Feature Creation Examples:
- Dimensionless groups via Buckingham Π theorem (e.g., Reynolds number for flows).
- Per-unit quantities (e.g., specific energy J/kg, power density W/m³).
- Domain-specific: Arrhenius terms, power from \(P = V \cdot I\), work integrals. \(k(T) = A \exp\left(-\frac{E_a}{R T}\right)\)
- Feature Transformation Examples:
- Log transforms for skewed distributions.
- Standardization (zero mean, unit variance) or MinMax scaling.
- Polynomial features or interactions for non-linear relationships.
- Feature Selection Examples:
- Correlation analysis or mutual information to filter redundant features.
- Recursive Feature Elimination (RFE): Iteratively remove features and retrain to find the minimal set that preserves performance.
- LASSO regularization: L1 penalty drives coefficients to zero, performing automatic feature selection during training.
- Physics-informed selection: Retain features that respect conservation laws.
Advanced: Use physics-informed neural networks (PINNs) to embed constraints during training.
Convert to SI units first, then build dimensionless and per‑unit features before scaling; this increases robustness across datasets and operating regimes.
Avoid mixing units (e.g., J and kWh) in training; normalize consistently and document unit provenance for every feature.
When Standard ML Fails: The Case for Physics-Informed Features
Standard machine learning treats all features equally—voltage mean, temperature, cycle count are just numbers in a matrix. Physics-informed ML enforces structure: convert to SI units, derive quantities from first principles (power, energy, resistance), then create dimensionless groups that remain valid across operating conditions. The difference matters when models face data they’ve never seen.
Figure 1.0: Two feature engineering pipelines
The Problem: Extrapolation Breakdown
Scenario: A battery manufacturer trains an ML model to predict remaining useful life (RUL) using 6 months of lab data (20-30°C, 1C discharge rate). Model achieves 95% R² on test set. Success?
Reality Check: Model deployed in field conditions (−10°C to 45°C, variable discharge rates). Performance collapses:
| Metric | Lab Test | Field Deployment |
|---|---|---|
| R² Score | 0.95 | 0.34 |
| RMSE (cycles) | 12 | 89 |
| Max Error | 45 cycles | 340 cycles |
Root Cause Analysis:
1
2
3
4
5
6
7
8
9
# Standard ML approach (what failed)
features_standard = [
'voltage_mean', 'voltage_std',
'current_mean', 'current_std',
'temp_celsius', 'cycle_number'
]
# Model learned spurious correlations:
# - High voltage → long RUL (true in 25°C lab, breaks at -10°C)
# - Cycle number → linear decay (ignores temperature acceleration)
Why It Failed: Violation of Physical Laws
- Energy Conservation Ignored: Model doesn’t track cumulative energy throughput—the actual degradation driver
- Temperature Effects Misrepresented: Linear
temp_celsiusfeature can’t capture exponential Arrhenius kinetics - Dimensional Inconsistency: Voltage mean (V) and current std (A) have incompatible units—no physical relationship enforced
- Extrapolation Outside Training Distribution: 45°C operation has 4× degradation rate vs 25°C (Arrhenius), but model treats it as +20 offset
Feature Importance Analysis: Reveals what the model actually learned. Computed via permutation importance: shuffle each feature, measure performance drop. Large drop means the model relies on that feature. In the standard ML case, temperature contributes only 48% despite being the dominant physical driver. The model compensated with spurious correlations from other features. In the physics-informed case, the Arrhenius-corrected temperature feature accounts for 78% of importance—the model learned the true mechanism.
Figure 2.0: Extrapolation performance (top) and feature importance (bottom) for standard ML vs physics-informed approaches
The Physics-Informed Fix
1
2
3
4
5
6
7
8
# Physics-informed approach
features_physics = [
'energy_cumulative_j', # ∫P dt - conservation law
'arrhenius_factor', # exp(-Ea/RT) - thermal kinetics
'resistance_internal_ohm', # V/I - degradation signature
'soc_normalized', # V/Vmax - dimensionless state
'power_specific_w_per_kg' # P/m - per-unit intensity
]
Results After Physics-Informed Features:
| Metric | Standard ML | Physics-Informed |
|---|---|---|
| Field R² | 0.34 | 0.87 |
| Field RMSE | 89 cycles | 23 cycles |
| Extrapolation Error | 280% | 35% |
Key Improvements:
- Arrhenius factor: Correctly predicts 4× degradation at 45°C
- Cumulative energy: Captures actual stress regardless of discharge profile
- Dimensionless SOC: Generalizes across cell chemistries
- Internal resistance: Tracks irreversible degradation mechanism
Decision Framework: When to Use Physics-Informed Features
Use physics-informed ML when:
- Extrapolation required: Operating conditions differ from training (temperature, load, scale)
- Small training data: Physical constraints reduce model complexity—100 physics-informed samples > 10,000 raw samples
- Interpretability critical: Engineers need to validate predictions against first principles
- Conservation laws apply: Energy, mass, momentum must be preserved
- Known failure modes: Chemistry (Arrhenius), mechanics (fatigue laws), thermodynamics (efficiency limits)
Standard ML sufficient when:
- Interpolation only (predictions within training range)
- Abundant data covers all operating regimes
- Black-box acceptable (no regulatory/safety requirements)
- Empirical relationships dominate (social systems, financial markets)
Cost-Benefit Analysis
| Approach | Engineering Effort | Data Requirement | Generalization | Interpretability |
|---|---|---|---|---|
| Standard ML | Low (autoML-ready) | High (10K+ samples) | Poor (interpolation) | Low (black-box) |
| Physics-Informed | Medium (domain knowledge) | Low (100s of samples) | Excellent (extrapolation) | High (first principles) |
| Hybrid | Medium-High | Medium (1K samples) | Very Good | Medium-High |
Engineering ROI: For the battery case, physics-informed features required 2 weeks of domain expert time but reduced field failures by 65%, saving $2M in warranty costs.
Practical Example: Battery Discharge Modeling
Problem Context
Predict remaining useful life (RUL) of a lithium-ion battery from discharge cycle data. Raw sensor data: voltage, current, temperature.
Raw Data Sample: Time (min), voltage (V), current (A), temperature (°C) at 5 time points during discharge.
Problem: Mixed units, no physical relationships captured.
Step 1: Unit Harmonization (SI Conversion)
Convert all measurements to SI base units:
\[t[\mathrm{s}] = t[\mathrm{min}] \times 60, \quad T[\mathrm{K}] = T[^\circ\mathrm{C}] + 273.15\]Use Pint or similar library to enforce consistency across dataset.
Step 2: Feature Creation (Physics-Derived)
2.1 Instantaneous Power
\(P = V \cdot I \quad [\mathrm{W}] = \mathrm{kg \cdot m^2 \cdot s^{-3}}\) Direct application of Ohm’s law. Captures energy flow rate.
2.2 Internal Resistance
\(R = \frac{V}{I} \quad [\Omega] = \mathrm{kg \cdot m^2 \cdot s^{-3} \cdot A^{-2}}\) Key degradation indicator—increases as battery ages.
2.3 Cumulative Energy (Numerical Integration)
\(E(t) = \int_0^t P(\tau) \, d\tau \quad [\mathrm{J}]\) Trapezoid rule for discrete data: \(E_i = E_{i-1} + \frac{P_i + P_{i-1}}{2} \cdot (t_i - t_{i-1})\)
1
2
3
4
5
6
# Cumulative energy via trapezoid integration
energy_j = np.zeros(len(time_s))
for i in range(1, len(energy_j)):
dt = time_s[i] - time_s[i-1]
p_avg = (power_w[i] + power_w[i-1]) / 2
energy_j[i] = energy_j[i-1] + p_avg * dt
Represents total stress accumulated by the cell.
2.4 Dimensionless Features (State of Charge Proxy)
\(\text{Voltage Normalized} = \frac{V(t)}{V_{\max}} \quad [\text{dimensionless}]\)
Scale-invariant indicator. Works across different cell chemistries.
2.5 Per-Unit Features (Specific Power)
\(P_{\text{specific}} = \frac{P}{m} \quad [\mathrm{W/kg}] = \mathrm{m^2 \cdot s^{-3}}\)
Normalizes by cell mass. Enables comparison across designs.
2.6 Arrhenius Rate Factor (Thermal Effects)
\(k(T) = A \exp\left(-\frac{E_a}{RT}\right)\)
Models temperature-dependent degradation ($E_a \approx 50$ kJ/mol for Li-ion).
1
2
3
4
5
# Arrhenius degradation rate
R = 8.314 # J/(mol·K)
E_a = 50000 # J/mol
A = 1e6
arrhenius = A * np.exp(-E_a / (R * temp_k))
Step 3: Feature Transformation
3.1 Standardization (Z-score)
\(z = \frac{x - \mu}{\sigma}\) Apply after creating physical features. Use StandardScaler for power, resistance, specific power.
3.2 Log Transform
\(\log(1 + E) \quad \text{for skewed energy distributions}\) Stabilizes variance for cumulative quantities.
Step 4: Exploratory Data Analysis (EDA)
4.1 Dimensional Consistency Check
Verify all features have correct SI dimensions:
1
2
3
4
# Check units programmatically
assert power_w.units == 'watt'
assert resistance_ohm.units == 'ohm'
assert energy_j.units == 'joule'
4.2 Univariate Analysis
Distributions: Plot histograms for each feature
power_w: Check for outliers (sensor errors)resistance_ohm: Expect gradual increase over cyclestemp_k: Verify physical range (273–373 K typical)
Summary Statistics: \(\text{Range, Mean, Std, Skewness, Kurtosis}\)
Flag non-physical values (negative resistance, temp > 400 K).
4.3 Temporal Evolution and Conservation Law Validation
Conservation laws impose hard constraints on valid data. Energy cannot decrease in a closed system. Power dissipated cannot exceed the product of voltage and current. These aren’t statistical properties—they’re physical requirements. Violations indicate measurement errors, numerical bugs, or misunderstood units.
Ohm’s Law states that resistance $R = V/I$ is a material property independent of current (for linear resistors). If a scatter plot of power vs voltage deviates from a straight line through the origin (slope = current), either the device is nonlinear or the measurements are wrong.
Arrhenius equation $k(T) = A \exp(-E_a / RT)$ governs thermally activated processes—chemical reactions, diffusion, degradation. Plot $\log(k)$ vs $1/T$: it must be linear with slope $-E_a/R$. Curvature or the wrong slope means your temperature sensor is miscalibrated or you’re measuring a different mechanism.
These checks catch errors early, before they propagate into model training.
Figure 4.0: Three validation checks for physics consistency
Battery Example: Left panel shows cumulative energy vs time—must be monotonically increasing. Each timestep adds $\Delta E = P \cdot \Delta t \geq 0$. An assert all(np.diff(energy_j) >= 0) catches integration errors or out-of-order timestamps. Middle panel plots power vs voltage colored by current—points cluster along $P = VI$ lines, confirming Ohm’s law. Outliers (circled) flag measurement glitches. Right panel is an Arrhenius plot: $\log(\text{degradation rate})$ vs $1/T$. The linear fit with slope $-E_a/R \approx -6000$ K matches literature values for Li-ion SEI growth, validating that temperature effects follow expected chemistry.
1
2
3
# Sanity checks
assert all(np.diff(energy_j) >= 0), "Energy must increase"
assert all(resistance_ohm > 0), "Resistance must be positive"
4.4 Bivariate Relationships
Voltage vs. Resistance: \(R \uparrow \text{ as } V \downarrow \quad \text{(degradation signature)}\)
Power vs. Temperature: Scatter plot with Arrhenius overlay—expect exponential relationship.
Correlation Matrix:
High $ r $ between power_wandspecific_power(expected—linear scaling)Moderate $ r $ between resistance_ohmand target (degradation link)
4.5 Dimensionless Analysis
Dimensionless groups are ratios of physical quantities where all units cancel, leaving pure numbers. The Buckingham Π theorem formalizes their construction: given $n$ variables with $k$ base dimensions (mass, length, time, etc.), you can form $n - k$ independent dimensionless groups.
Why they matter for ML: dimensionless features are scale-invariant. A model trained on a 2.5 Ah battery will work on a 3.0 Ah battery if features are dimensionless, but will fail if features are dimensional (absolute voltage, absolute capacity). Reynolds number in fluid dynamics has this property—it predicts flow regime whether you’re simulating a pipe or an ocean.
Construction: Combine quantities until dimensions cancel. Example: $\Pi = \frac{P \cdot t}{E} = \frac{[W] \cdot [s]}{[J]} = \frac{[J]}{[J]} = [-]$. The result is pure number, independent of whether you measure power in watts or horsepower.
Operating regime separation: When plotted against each other, dimensionless groups often reveal clusters corresponding to physical states. This is because the groups encode fundamental ratios—efficiency, load factor, stress level—rather than absolute measurements that vary with system size.
Figure 5.0: Dimensionless group scatter plot colored by cycle number
Battery Example: We construct $\Pi_1 = Pt/E$ (instantaneous power-to-energy ratio, roughly an efficiency indicator) and $\Pi_2 = RI/V$ (which equals 1 for an ideal resistor via Ohm’s law, deviations indicate overpotential). Plotting these reveals three clusters: “normal operation” (center, early cycles), “high stress” (upper right, high power relative to energy stored), and “end of life” (lower left, high resistance). The separation happens naturally because these groups capture physics, not arbitrary thresholds. Color gradient from purple (early cycles) to yellow (late cycles) shows degradation trajectory through this dimensionless space.
1
2
3
# Dimensionless feature engineering
pi_1 = (power_w * time_s) / (energy_j + 1e-6) # avoid division by zero
pi_2 = (resistance_ohm * current_a) / voltage_v
4.6 Outlier Detection (Physics-Informed)
Hard constraints from physics provide exact thresholds for impossible values: energy cannot decrease, resistance cannot be negative, power cannot exceed voltage times current. These checks require no statistical assumptions—a single violation proves measurement error.
Isolation Forest detects outliers by tree isolation: anomalous points require fewer random splits to isolate than normal points. It works well in high dimensions where distance-based methods fail. LOF (Local Outlier Factor) compares local density around a point to densities around its neighbors—points in sparse regions are outliers. Both are unsupervised and make no assumptions about distribution shape.
Applying these to dimensionless features rather than raw measurements improves robustness: outliers in absolute voltage might just be different cell capacities, but outliers in $V/V_{max}$ indicate genuine anomalies.
Battery Example: Flag measurements violating conservation laws:
- $P > V_{\max} \cdot I_{\max}$ (power exceeds theoretical limit)
- $\Delta E < 0$ (energy decrease impossible)
- $R < 0$ (non-physical resistance)
Then apply Isolation Forest on $[\Pi_1, \Pi_2, V/V_{max}]$ to catch subtler issues like sensor drift or intermittent connections that don’t violate hard constraints but fall outside normal operating regimes.
4.7 Feature Importance (Pre-Modeling)
Mutual information quantifies how much knowing one variable reduces uncertainty about another. For a feature $X$ and target $y$: \(I(X; y) = \sum_{x,y} p(x, y) \log \frac{p(x, y)}{p(x)p(y)}\)
It’s zero when $X$ and $y$ are independent, positive when they share information. Unlike correlation, mutual information captures nonlinear relationships—it will detect that $y = x^2$ even though the linear correlation is zero.
Permutation importance measures feature relevance by model performance drop: shuffle feature $i$, recompute predictions, measure accuracy loss. Large drop means the model relies heavily on that feature. This is model-agnostic and works for any black-box predictor.
For physics-informed features, we expect importance to align with known mechanisms. If a spurious feature (like cycle_number) dominates, the model hasn’t learned the physics—it memorized a dataset-specific pattern.
Battery Example:
1
2
from sklearn.feature_selection import mutual_info_regression
mi_scores = mutual_info_regression(X, y)
Expected rankings for RUL prediction:
v_normalized— direct state-of-charge indicator, strongly correlated with remaining capacityresistance_ohm— increases monotonically with degradationarrhenius— captures thermal acceleration of degradation
Step 5: Feature Selection (Physics-Informed)
Correlation with Target (RUL)
v_normalized(↓ voltage → ↓ RUL)resistance_ohm(↑ resistance → ↓ RUL)energy_j(cumulative stress)
Dimensionality Reduction
- Remove redundant features:
power_wvs.specific_power(keep per-unit version) - Retain dimensionless groups: $\Pi_1$, $\Pi_2$ (scale-invariant)
Final Feature Set for Modeling
1
2
3
4
5
6
7
8
features = [
'v_normalized', # Dimensionless SOC proxy
'resistance_ohm', # Internal resistance (degradation)
'specific_power', # Per-unit intensity
'log_energy', # Transformed cumulative stress
'arrhenius', # Temperature-adjusted rate
'pi_1', 'pi_2' # Dimensionless groups
]
Train with any suited regressor model. Physical features improve interpretability and generalization.
Key Takeaways
| Step | Action | Why |
|---|---|---|
| 1. SI units | Convert all measurements to base units | Dimensional consistency |
| 2. Physics features | Derive power, energy, resistance | Encode real dynamics |
| 3. EDA | Validate conservation laws | Catch errors before modeling |
| 4. Dimensionless groups | Create Buckingham Π terms | Scale-invariant predictions |
| 5. Transform | Apply scaling after creation | Preserve physical meaning |
| 6. Select | Use domain knowledge | Conservation laws guide relevance |
