What is Machine Learning?
Updated: Apr 30, 2020
Here at Bio Conscious, machine learning is at the core of what we do. Thus far, we’ve primarily used machine learning techniques to predict blood sugar in people with insulin-dependent type 1 diabetes. If you’re a Diabits user who wants to learn more about the magic that powers your app, or a curious passerby looking into who we are, you may be asking yourself - what is machine learning?
And so here is our short definition.
Broadly speaking, machine learning serves as an umbrella term for an assortment of automated techniques and tools, rooted in computer science, that find patterns (perhaps implicitly) in large amounts of data, often with the intention of prediction.
There are two methods we’ll dive into: supervised and unsupervised learning.
Diabits uses a supervised learning approach to predict blood glucose. It is the most common and intuitive form of machine learning. Supervised learning starts with known outcomes, y, and features, X. Say, for example, we’re training an algorithm that classifies images of dogs and cats (a simple prediction problem available on the data science platform Kaggle). We feed the algorithm images of dogs and cats, telling the algorithm whether each image is of a dog or a cat. The algorithm then learns which features are associated with each species - for example, floppy ears might be associated with dogs, while bright orange fur may be associated with cats. When we feed the algorithm a new image without telling it if the image is of a dog or a cat, a well generalized model (more on this later!) will correctly determine the species.
In unsupervised machine learning, features X are fed into a model, without telling it outcomes y. In our cats and dogs example, this would mean feeding an algorithm images without telling it if the images are of cats or dogs. The algorithm will automatically detect differences between the images and group them along those lines. In this example, a well generalized model will sort the images into two groups, where one will happen to be of dogs while the other is of cats. Unlike our supervised learning model, we haven’t told the unsupervised model which species each image corresponds to.
Most often, machine learning is deployed when trying to predict some future outcome. This is certainly true for us at Bio Conscious. As such, a ‘good’ model is one that generalizes well, meaning that when you introduce it to new data, it does well at predicting an accurate outcome. This is different to other data-driven fields, such as statistics, where researchers are most often not primarily interested in predicting an unknown outcome, but in finding true causal relationships in data. The pursuit of identifying causal inference forces statisticians to make assumptions about the underlying structure of their data, that if not met, can cause wildly inaccurate causal conclusions. In machine learning, due to the nature of the questions we ask, most of these assumptions are not required.
A representation of a neural network classifier. Graphic taken from this MIT-IBM Watson article.
Machine Learning in Healthcare
At Bio Conscious, we’re using these tools to delay the onset of disease. We’ve built algorithms that achieve unparalleled accuracy in predicting blood sugar, and we’re expanding this knowledge to other areas. Using machine learning in healthcare has a unique set of challenges we must navigate.
While this is in no way unique to healthcare, a machine learning model is only as good as the data you put in it. Due to privacy concerns, it is difficult to collect complete data on free-living patients. Machine learning models generally do not perform well when they are fed incomplete or inaccurate data. Even when complete and accurate data is found, models will still perform poorly if there isn’t enough data to adequately train them. This is why Diabits predictions start at 15 minutes, and can take up to 7 days to show you full 60-minute predictions!
Another issue is the nature of the machine learning models themselves. Some of the most powerful predictive algorithms, like neural networks, are ‘black-box’ models, meaning that it can be very difficult to figure out exactly how the model is assessing the data you feed it to create its predictions. In healthcare, models often need to be transparent - if a model assesses an individual’s risk of cancer, for example, scientists, doctors and researchers must clearly understand how conclusions are being made. For companies driving innovation in the healthcare space, this adds another issue of protecting intellectual property. Companies and researchers must navigate publishing reproducible results with sufficiently-transparent algorithms, all while protecting their IP.
From predicting coronavirus outbreaks to delaying the onset of diabetes complications, companies around the globe are leveraging machine learning to make huge strides in improving the health of people across the globe.