Control Systems That Learn — Inside a Physics-Based Simulation
Most control systems rely on hard-coded logic or PID tuning.
But what if the controller could learn — from direct interaction with a physically accurate model?
In this project, I trained a Q-learning agent to control an electric kettle inside a symbolic simulation — built in System Modeler (Modelica) and controlled via Wolfram Language.
Core Idea
- The agent interacts directly with a first-principles, continuous model — not a surrogate or discrete simulator.
- The entire training loop — from reward logic to Q-value updates — is symbolic, declarative, and fully inspectable.
- Once trained, the agent’s policy is embedded as a 2D lookup table inside the same model — interpretable, deterministic, and inference-free.
Because it’s table-based, the policy can be reviewed, bounded, and verified — a critical feature for safety and deployment in real systems.
How the Agent Learns
The learning process is based on Q-learning, a trial-and-error method where the agent discovers what actions lead to good outcomes.
At each time step:
- It observes the water temperature and whether heating is ON/OFF.
- It chooses an action: turn heater ON or OFF.
- The simulation advances, and the agent receives a reward:
- ✅ +2 if temperature is in target range (363–368 K)
- 🔁 +1 if moving in the right direction
- ❌ –1 otherwise
Over 1,500 training episodes, the agent builds a Q-table that maps each situation to the most rewarding action — forming a transparent, learned controller.
🔗 Full simulation + Q-learning code: https://community.wolfram.com/groups/-/m/t/3494353
What the Agent Learned
Q-Value Curve
- X-axis: Water Temperature (K)
- Y-axis: Q-value of taking action
- ✅ Positive values mean no action is needed
- ❌ Sharp drop near 363 K shows when the agent chooses to stop heating
This isn't hand-tuned logic — it’s learned through physical interaction.
How the System Behaves With the Learned Controller
Once trained, the Q-table is deployed back into the simulation as a 2D lookup table controller.
System Response Curve
- X-axis: Time (minutes)
- Y-axis: Water Temperature (K)
- The system converges to 363 K consistently — from a wide range of initial conditions
- No overshoot, no oscillation — behavior is stable and learned, not hard-coded
This validates the core idea: the agent isn’t just learning values — it’s learning safe, physical control behavior.
Why This Matters
This kind of hybrid control — learning inside symbolic simulation — is rare.
But it’s what real-world systems demand:
- Physically grounded models
- Transparent, traceable control logic
- No hand-tuning or opaque inference
This workflow generalizes across thermal, electrical, and fluid systems — wherever intelligent, explainable control is needed.
🔗 Full simulation + Q-learning code: https://community.wolfram.com/groups/-/m/t/3494353