Listen Up

Monday, March 19, 2018

Big Data and Machine Learning in Health Care | Clinical Decision Support | JAMA | JAMA Network


Nearly all aspects of modern life are in some way being changed by big data and machine learning. Netflix knows what movies people like to watch and Google knows what people want to know based on their search histories. Indeed, Google has recently begun to replace much of its existing non–machine learning technology with machine learning algorithms, and there is great optimism that these techniques can provide similar improvements across many sectors.
It is perhaps more useful to imagine an algorithm as existing along a continuum between fully human-guided vs fully machine-guided data analysis. To understand the degree to which a predictive or diagnostic algorithm can said to be an instance of machine learning requires understanding how much of its structure or parameters were predetermined by humans. The trade-off between the human specification of a predictive algorithm’s properties vs learning those properties from data is what is known as the machine learning spectrum. Returning to the Framingham study, to create the original risk score statisticians and clinical experts worked together to make many important decisions, such as which variables to include in the model, the relationship between the dependent and independent variables, and variable transformations and interactions. Since considerable human effort was used to define these properties, it would place low on the machine learning spectrum (#19 in the Figure and Supplement). Many evidence-based clinical practices are based on a statistical model of this sort, and so many clinical decisions in fact exist on the machine learning spectrum (middle left of Figure). On the extreme low end of the machine, learning spectrum would be heuristics and rules of thumb that do not directly involve the use of any rules or models explicitly derived from data (bottom left of Figure).
It is no surprise then that medicine is awash with claims of revolution from the application of machine learning to big healthcare data. Recent examples have demonstrated that big data and machine learning can create algorithms that perform on par with human physicians.1 Though machine learning and big data may seem mysterious at first, they are in fact deeply related to traditional statistical models that are recognizable to most clinicians. It is our hope that elucidating these connections will demystify these techniques and provide a set of reasonable expectations for the role of machine learning and big data in healthcare.


Machine learning was originally described as a program that learns to perform a task or make a decision automatically from data, rather than having the behavior explicitly programmed. However, this definition is very broad and could cover nearly any form of data-driven approach. For instance, consider the Framingham cardiovascular risk score, which assigns points to various factors and produces a number that predicts 10-year cardiovascular risk. Should this be considered an example of machine learning? The answer might obviously seem to be no. Closer inspection of the Framingham risk score reveals that the answer might not be as obvious as it first seems. The score was originally created2 by fitting a proportional hazards model to data from more than 5300 patients, and so the “rule” was in fact learned entirely from data. Designating a risk score as a machine learning algorithm might seem a strange notion, but this example reveals the uncertain nature of the original definition of machine learning.

There is no doubt that  'machine learning', artificial intelligence will gradually intrude upon our routines almost unnoticed, just as chatbots already have done so.





Big Data and Machine Learning in Health Care | Clinical Decision Support | JAMA | JAMA Network

No comments: