Can Analytics Tame Health Information Overload?

We are living in an age of information overload. In the medical realm, thousands of new papers are published every day, with 806,326 new citations listed on PubMed/MEDLINE in 2015 alone.

That’s great news for medicine, but how can researchers and clinicians keep up? Even within a narrow field, it’s simply not humanly possible to read, absorb and remember all of the available research that’s out there.

Fortunately, humans no longer have to do it alone. Data analytics and machine learning can help doctors and researchers quickly extract meaningful information from very large corpora such as PubMed. Analytics can also help researchers make relevant connections between articles and make novel inferences that could suggest new avenues for research.

A new cognitive analytical system called Battelle Sematrix™ does exactly that.

How it Works

Sematrix vastly reduces the time is takes to locate and evaluate the most relevant and usable articles for a given field of inquiry. Unlike simple keyword search programs, Sematrix uses advanced machine learning and natural language processing to “read” technical text at very granular level and extract information in ways that allow for in-depth analysis. It uses this “understanding” to create context and identify connections with other articles in its database.

Natural language processing allows users to ask complex, context-dependent questions about scientific or technical information. For example, users may want to know how a particular gene variation is related to outcomes for a lung cancer treatment. Instead of pulling up all articles tagged with lung cancer and the gene variation, Sematrix is able to identify the most relevant articles and extract information from the text to provide an answer to the question. The Centers for Medicare and Medicaid Services (CMS) are already using Sematrix to streamline healthcare quality measure development.

Cognitive processing can also be used to draw inferential conclusions by combining knowledge extracted from multiple documents. In other words, if one article says A=B, and another says B=C, then the algorithm can extrapolate that A must equal C. For example, using a large corpus of documents describing genetic aspects of antibiotic resistance and a database of bacterial genomic sequences, the program identified specific genes associated with resistance to an antibiotic. Researchers were able to infer that if a particular gene is present in a bacterium’s genome, it is likely to be resistant to the antibiotic – even if the specific bacterium was not part of any of the studies in the document set.

What it Means

These analytical tools could help researchers accelerate the pace of scientific discovery. They could also reduce the time that it takes for promising discoveries to translate into clinical practice. It can take up to 17 years – or even more – for new findings to make their way down to the doctor and patient level. Data analytics could make the process of identifying evidence-based practices faster and easier for doctors and policymakers as well as researchers.

The pace of discovery shows no signs of slowing. Fortunately, doctors and researchers don’t need to be superhuman to keep track of it all. The answers they need may already be hiding in the scientific literature – and data analytics can help find them.  

October 17, 2016
Battelle Insider
Estimated Read Time
3 Mins
Stay In the Know

Get Battelle Insights in Your Inbox

Get Updates


Receive updates from Battelle for an all-access pass to the incredible work of Battelle researchers.