One model to predict them all (failures that is)

One model to predict them all (failures that is)

Having done various predictive maintenance scenarios over the course of 3 years I noticed 2 common pitfalls. First, the lack of business understanding and the cost of being right vs the cost of being wrong (I will write about that later). Second, the idea of a single model that can predict it all, under all conditions, no matter what. The latter I see often when working novice Data Scientist. They will spend most of their time re-training their model to get their perfect F1 or accuracy metric instead of knowing when to stop and re-think strategy. This problem gets even more severe when you deal with very imbalanced datasets.

Why does this matter?

Each device and each sensor will have its own unique data foot print with its own noise distribution. If your pool is sufficiently large you will be able to detect generic trends and do generic predictions. However, you might have lost the subtle differences between machines and thus lost predictive powers. Maybe if you did some clustering analyses before you could have decided that 3 or 5 models would have served your problem much better. Or maybe if you just already know if your data is an anomaly you can drive business value.

What could be a solution?

As always, there is no single best solution out there. There will always be a trade-off to get a model generalized sufficiently to bring to production. What I did observe so-far at several customers is that a combination of weak-learners is outperforming most of these highly specialized models. Even more, when we take the models into field trials and or production. Odds are, your training data never was complete and thus you rather have the flexibility to deal with this incompleteness by having weak, but multiple, models.

Leave a Reply

Your email address will not be published. Required fields are marked *