machine learningiotedge computing

Machine Learning Is the Key to Unlocking IoT's Potential

AM
Ajay Malik · Founder & CEO
October 11, 2018
Machine Learning Is the Key to Unlocking IoT's Potential

The room where I stopped defending batch retraining

It was a grey Seattle morning in the spring of 2018, and I was standing backstage at the Predictera Summit trying to remember the one thing I always forget before a talk: to slow down. The room was a few hundred people — data engineers, plant-automation folks, a cluster of product managers from companies that made sensors for things I had never thought about, like the vibration signature of an industrial pump. The organizers had put me on a panel track titled, a little grandly, "The Intelligent Edge." I had prepared a talk about a single, unglamorous idea, and I was worried it was too small for the title.

The idea was this: most of the machine learning we were shipping into IoT in 2018 was trained the wrong way for the job we were asking it to do. We were collecting streams of sensor data, hauling them back to a data center, training a model in a batch, freezing it, and pushing it out to the edge. Then we acted surprised when the world drifted out from under the model six weeks later.

I opened with a confession. At that point I had spent a good part of my career building networks and systems that lived at the edge of things — access points, controllers, gear that had to make decisions locally because the round trip to the cloud was too slow or too expensive or simply not there. And I had watched teams treat the machine-learning model as the one component in that whole system that was allowed to be frozen in time. Everything else adapted. The model did not.

The argument: learning should not stop at deployment

The point I wanted to land was about online and incremental learning — models that keep updating from streaming data instead of being retrained in occasional batches. In 2018 this was not exotic research. Stochastic gradient descent already updated weights one mini-batch at a time; that is online learning at heart. Incremental methods, streaming decision trees, and reinforcement learning all assumed a world that arrives one observation at a time. What was rare was seeing any of it in production on a device.

So I drew the distinction plainly for the room. Batch training asks: given all the data I have collected, what is the best model? Online and incremental learning ask a different question: given the model I have and the single new reading that just arrived, how should I adjust? The first question is answered in a data center on a schedule. The second is answered on the device, continuously, as the pump ages and the season changes and the factory floor gets rearranged.

I was careful not to oversell it. Online learning has real hazards, and I said so. A model that updates on every reading can chase noise. It can be walked off a cliff by a run of bad sensor data or by an adversary who understands that the model is listening. Catastrophic forgetting was a genuine problem — update too aggressively toward recent data and the network quietly loses what it knew. So the honest version of my argument was not "always learn online." It was: decide deliberately where on the spectrum between fully frozen and fully online each model should sit, and build the guardrails for that choice, rather than defaulting to frozen because it was easier to reason about.

Why enterprises in the room actually cared

The business case in 2018 was not abstract. Three forces made continuous edge learning matter to the people in that room.

First, bandwidth and latency. If you have ten thousand sensors on a plant floor, you cannot stream every raw reading to the cloud, and you certainly cannot wait for a cloud round trip to decide whether a machine is about to fail. The decision has to be local.

Second, drift. Physical systems change. A model trained on last quarter's data describes last quarter's factory. Concept drift and data drift were the terms we used, and monitoring for them was becoming part of what we were starting to call MLOps — the discipline of treating a deployed model as a living production system with versioning, monitoring, and rollback, not a research artifact you threw over a wall.

Third, and this one got the most nods: interpretability. If a model on a turbine is going to update itself in the field, an engineer has to be able to ask why it flagged a failure. In 2018 we leaned on the tools we had — feature-importance methods, and newer techniques like LIME and the SHAP work that had just come out — to open the box a little. I told the room that an edge model that learns continuously but cannot explain itself is not an asset; it is a liability with good reflexes.

Batch retraining (the old default) Sensors Haul to cloud, retrain in batch Freeze & ship model to edge Model drifts weeks later Continuous edge learning (the argument) Sensor stream Model at the edge updates per reading Decide locally + explain the call feedback: outcome updates the model

A concrete example: the pump that taught itself the season

The example I used was deliberately mundane, because mundane is where this actually pays off. Imagine a fleet of pumps in a municipal water system, each with a cheap vibration and temperature sensor. We had trained an anomaly-detection model on a few months of data. It worked — until summer arrived, ambient temperatures climbed, and the model started flagging perfectly healthy pumps as anomalous because it had never seen a warm baseline. Every one of those false alarms cost a truck roll and a technician's afternoon.

The batch answer was to wait, collect summer data, retrain, and redeploy — by which point it was autumn. The incremental answer was to let each pump's model slowly adapt its baseline to its own local, seasonal normal, while a supervisory layer watched for the difference between "the world changed" and "the pump is failing." That distinction — normal drift versus genuine fault — was the whole game, and it is exactly the kind of judgment you can only make well if the model is allowed to keep learning and is instrumented well enough to explain what it learned. I stressed the guardrails: bounded update rates, a frozen reference model to fall back to, and human review on anything that changed the model's behavior sharply.

What I got right, and where the bridge to today runs

I did not have all the answers in that Seattle room, and I said as much. What I had was a conviction that the hard part of machine learning in the real world was not the training run; it was everything after — keeping a model honest, current, and explainable while the world moved. The tooling of 2018 made that genuinely difficult, and I did not pretend otherwise.

Those same principles — models that live in production, that keep learning from what actually happens, and that can explain the decisions they make — are exactly what we operationalize at StudioX today as autonomous AI workers and Missions. The scene has changed enormously since 2018; the discipline underneath has not.


Related on StudioX: Enterprise AI Platform · AI Workers · AI Missions

Discussion

No comments yet — start the conversation.

Join the discussion

See StudioX run.

Put autonomous AI workers to work on your own systems and knowledge.