Switching contexts: Will Machine Learning work on my data?

There has been a lot about diagrams recently – this is about the machine learning itself.

I don’t know about you, but one of the big problems I have is taking a guess at whether the latest ML approach is likely to work on the data I’ve got. Wouldn’t it be cool if we could test that, without having to implement the whole system?

A very brief summary of the paper

Take a CNN cat-photo classifier in Computer Vision. It works for photos from wikipidea, but will this approach work for my personal cat photos? We might expect that the features underlying would be similar (e.g. fundamental features of cats, such as the outline of their faces) but also some important aspects might not be (e.g. something about my own camera or photographing style might be different to what is found on wikipedia). This makes it hard to know if it will work. “Luckily”, in real life, we’ve given corporations so much access to our personal data that the classifier is already trained on real life photos :/.

The paper explores how the complexity of the data impacts the effectiveness of the approach, for the same task. What that means is that we can just look at the data itself (rather than train and run the algorithms) in order to take a guess at how good a particular approach might be. In business, this can save huge amounts of time an energy. The paper doesn’t get all the way to a full testing rig for this, instead laying theoretical groundwork and conducting a series of trials.

Exec summary

We might be able to guess the effectiveness of ML approaches based on the data alone.

Get in touch with fuza.co.uk if you’re interested in exploring this space further.

Leave a comment