Request a Demo

Study is First to Validate Application of Hybrid OCR/NLP Technology For Large-Scale Data Extraction from Scanned Colonoscopy and Pathology Reports

 width= Rochester, N.Y., June 3, 2021 – A study published in the May issue of Gastrointestinal Endoscopy ( demonstrates and validates for the first time that optical character recognition (OCR) combined with natural language processing (NLP) technology analyzes scanned procedure and pathology reports accurately and efficiently – eliminating the time and cost of manual data extraction by delivering electronically processable clinical information.

In the retrospective study conducted at Cleveland Clinic, Cleveland, Ohio, and the University of Minnesota, Minneapolis, a randomly sampled list of outpatient screening colonoscopy procedures and pathology reports was selected. Desired variables were then collected. Two researchers first manually reviewed the reports for the desired variables, then the OCR/NLP algorithm was used to obtain the same variables from 3 different electronic health records: Epic, ProVation, and Sunquest PowerPath.

Among the key results of the study: The OCR/NLP technology extracted desired variables from reports contained in an image format with an accuracy of >95%. Compared with manual data extraction, the accuracy of the hybrid approach to detect polyps was 95.8%, adenomas 98.5%, sessile serrated polyps 99.3%, advanced adenomas 98%, inadequate bowel preparation 98.4%, and failed cecal intubation 99%. A comparison of the dataset collected via NLP alone versus that collected using the hybrid OCR/NLP approach showed the accuracy for almost all variables was >99%.

“The results of this proof-of-concept study create a new frontier in the use of large-scale data extraction from scanned reports, which was previously limited by lack of appropriate technology,” said Maged Rizk, MD, a gastroenterologist and associate director for the Cleveland Clinic Medicare Accountable Care Organization.

Dr. Rizk, lead author of the research, explained that while data shows colonoscopy screening has led to lower colorectal cancer incidence and mortality, increasing evidence suggests that examination quality may impact its effectiveness. The information needed to assess exam quality is often embedded in non-standardized procedure reports of varying formats within EHRs, requiring time-consuming and costly data extraction for accurate reporting. As a result, this information is not readily available for streamlining quality management, participating in endoscopy registries, or reporting of patient- and center-specific risk factors predictive of outcomes.

“A process which was previously expensive and time-consuming can now potentially be done accurately in a time- and labor-efficient manner,” explained Dr. Rizk, who was among the 11-member team of physicians and researchers that co-authored the study.

The team also included the late Colin Rhodes, former Chief Technology Officer of eHealth Technologies, who colleagues say was “vital” in the development of the OCR/NLP hybrid technology. Mr. Rhodes passed away prior to the study’s publication.

“The contributions of Mr. Rhodes to this collaboration support our company’s mission to provide seamless access to health care information. eHealth Technologies is continuously seeking ways to streamline the critical data that physicians need to deliver lifesaving care for their patients,” said Jeff Markin, CEO, eHealth Technologies.

Future multicenter studies elaborating the use of OCR in combination with validated commercially available NLP tools will help substantiate the use of this novel technology on a larger scale – not only for measurement of procedure quality indicators but potentially for multiple other venues in health care as well.

About eHealth Technologies™

eHealth Technologies is the leading provider of medical record retrieval and organization services and image-enabled Health Information Exchanges (HIEs). With customers across the country, eHealth Technologies works with prominent HIEs and top-ranked hospitals, including 17of the 20 U.S. News & World Report Honor Roll Hospitals for 2020-2021. The company’s eHealth Connect® solutions enhance patient and physician satisfaction by streamlining care transitions and assuring physicians have the right information to care for their patients. eHealth Connect® Image Exchange enables HIE subscribers access to full diagnostic quality medical records in the context of the patient record. Visit Follow us on Twitter, Facebook and LinkedIn.


Media Contact:  Kathleen Dutton-Fanning,, 585-242-1000 (ext. 565)

Every patient deserves faster access to care

Interested in seeing how our health-tech solutions can benefit your healthcare system? Request a demo today and one of our experts will be in touch to show you our technology in action.