Control Systems That Learn — Inside a Physics-Based Simulation

Control Systems That Learn — Inside a Physics-Based Simulation

Most control systems rely on hard-coded logic or PID tuning.

But what if the controller could learn — from direct interaction with a physically accurate model?

In this project, I trained a Q-learning agent to control an electric kettle inside a symbolic simulation — built in System Modeler (Modelica) and controlled via Wolfram Language.

Article content
Model of the kettle in Wolfram System Modeler.

Core Idea

  • The agent interacts directly with a first-principles, continuous model — not a surrogate or discrete simulator.
  • The entire training loop — from reward logic to Q-value updates — is symbolic, declarative, and fully inspectable.
  • Once trained, the agent’s policy is embedded as a 2D lookup table inside the same model — interpretable, deterministic, and inference-free.

Because it’s table-based, the policy can be reviewed, bounded, and verified — a critical feature for safety and deployment in real systems.

How the Agent Learns

The learning process is based on Q-learning, a trial-and-error method where the agent discovers what actions lead to good outcomes.

At each time step:

  • It observes the water temperature and whether heating is ON/OFF.
  • It chooses an action: turn heater ON or OFF.
  • The simulation advances, and the agent receives a reward:
  • ✅ +2 if temperature is in target range (363–368 K)
  • 🔁 +1 if moving in the right direction
  • ❌ –1 otherwise

Over 1,500 training episodes, the agent builds a Q-table that maps each situation to the most rewarding action — forming a transparent, learned controller.

🔗 Full simulation + Q-learning code: https://community.wolfram.com/groups/-/m/t/3494353

What the Agent Learned

Q-Value Curve

Article content
Shows when the agent prefers to heat

  • X-axis: Water Temperature (K)
  • Y-axis: Q-value of taking action
  • ✅ Positive values mean no action is needed
  • ❌ Sharp drop near 363 K shows when the agent chooses to stop heating

This isn't hand-tuned logic — it’s learned through physical interaction.

How the System Behaves With the Learned Controller

Once trained, the Q-table is deployed back into the simulation as a 2D lookup table controller.

System Response Curve

Article content
Shows convergence across random start temperatures

  • X-axis: Time (minutes)
  • Y-axis: Water Temperature (K)
  • The system converges to 363 K consistently — from a wide range of initial conditions
  • No overshoot, no oscillation — behavior is stable and learned, not hard-coded

This validates the core idea: the agent isn’t just learning values — it’s learning safe, physical control behavior.

Why This Matters

This kind of hybrid control — learning inside symbolic simulation — is rare.

But it’s what real-world systems demand:

  • Physically grounded models
  • Transparent, traceable control logic
  • No hand-tuning or opaque inference

This workflow generalizes across thermal, electrical, and fluid systems — wherever intelligent, explainable control is needed.


🔗 Full simulation + Q-learning code: https://community.wolfram.com/groups/-/m/t/3494353

To view or add a comment, sign in

More articles by Ankit Anurag Naik

Explore topics