Innovative Machine-Learning Tool Enables Researchers to Identify Patients for Clinical Trials using Clinical Notes


A natural language processing tool developed at Northwestern Medicine improves trial recruitment efforts by sorting through unstructured data to find eligible participants with rare cancers.


Using modern technology like machine learning to make better use of existing data is becoming more mainstream among healthcare providers. With virtually no limit yet in sight for the technology’s potential, a team at Northwestern Medicine decided to explore how machine learning could be applied to an unstructured and untapped area of patient data: clinician notes. The question they sought to answer was as much about investigating the capabilities of machine learning as it was about overcoming challenges to advancing patient care.

According to the National Institutes of Health, 55% of clinical trials worldwide are terminated due to low accrual rate.

Finding the Right Patients

Clinical trials are an integral part of improving disease research and patient outcomes across the world. But recruiting eligible participants is a difficult and time-consuming process that places a heavy burden on healthcare organizations. Without the right number of participants, trials fail, which limits the opportunity to explore new treatments. In the spring of 2021, the Northwestern Medicine team set out to study this problem.
In this case, they collaborated with the principal investigators leading clinical trials for a variety of less common cancers: stomach, lip, oral cavity, pharynx, trachea and cervical. Their goal was to determine if structured query language (used to narrow down relevant populations with criteria like patient demographics and documented diagnoses) enhanced by natural language processing (used to sift through patient notes for harder-to-find details) could home in on the individuals eligible for the trials.
Over the course of a few months, the team built an algorithm that searched for certain elements within clinician notes and successfully identified those patients. It uses topic modeling — the ability to pull out recurring themes from text — to find a specific subpopulation of patients in a fraction of the time it would take manually, saving time and resources in recruitment efforts. The algorithm required continuous validation from the principal investigators, who confirmed or rejected patients as the tool pulled out potential candidates.

Customizing a Solution

Solutions to problems like recruiting patients for clinical trials are rarely one-size-fits-all. While commercial vendors offer similar artificial intelligence tools, the Northwestern Medicine algorithm is custom-tailored to fit the format of its data, making it that much more powerful and accurate. Building the technology in-house also allows researchers and clinicians to adapt it for additional contexts.
Prior to this new algorithm, trial recruitment efforts relied on approaches that often missed more discreetly documented information, such as details found in patient notes. Especially for those with rare diseases, which typically have fewer treatment options, the ability to quickly find eligible trial participants who otherwise would go unnoticed is a significant step in advancing patient care.