JMIR Publications Blog

Fixing the Foundation: High-Fidelity Lab Data for Oncology Research

Written by Reviewed by Kayleigh-Ann Clegg, PhD | May 7, 2026 1:48:36 PM

Real world data (RWD) is increasingly a key engine behind modern drug discovery and clinical decision making. However, anyone who has worked with electronic health records (EHRs) knows that this engine often relies on data that are fragmented and inconsistent. Laboratory results, which are the backbone of oncology care, are frequently plagued by inaccurate codes and missing units.


Key Takeaways
The Problem: Lab data for oncology research (RWD) is highly fragmented and often unusable (up to 19% unmappable) due to inaccurate LOINC codes and missing units of measure.
The Solution: A precise, automated three-step framework—Code Correction, Unit Normalization, and Integrity Validation—was developed to fix these errors at the source.
The Impact: When tested on 6.34 billion lab records, the framework dramatically improved data accuracy, boosting correct unit assignment from 73.1% to 99.7%.


In a new study published in JMIR Medical Informatics, researchers Parvati Naliyatthaliyazchayil and Travis Stenerson from ConcertAI introduce a scalable, system agnostic framework designed to fix these errors at the source, ensuring that lab data are both accurate and ready for large scale analysis.

The Interoperability Gap

While LOINC (Logical Observation Identifiers Names and Codes) is the global standard for lab tests, mapping local data to these codes is notoriously difficult. Studies show that up to 19% of laboratory tests cannot be accurately mapped due to incomplete information. Even when a code is correct, the unit of measure (e.g., mg/dL vs. mmol/L) might be missing or recorded incorrectly, rendering the data useless for aggregation.

A Three Step Solution

The framework proposed by Naliyatthaliyazchayil and Stenerson addresses these challenges through a precise, automated process driven by three knowledge tables:

  1. Code Correction: The system uses the unit of measure to sanity check the LOINC code. If the units do not match the test, the framework assigns a more compatible LOINC code.

  2. Unit Normalization: Once the code is finalized, the system populates missing units or corrects erroneous ones, such as fixing g/dL to mg/dL.
  3. Integrity Validation: Crucially, every change is validated against a Laboratory Reasonable Range (LRR) table. If a transformed result falls outside of what is biologically plausible, the change is rejected, maintaining a high standard of data integrity.

Strong Performance on 6 Billion Records

To test the framework's scalability, the authors applied it to the ConcertAI database, encompassing approximately 10 million oncology patients and 6.34 billion lab records. The results were dramatic:

  • Accuracy Boost: Correct unit assignment in the full dataset jumped from 73.1% to 99.7%.
  • Completeness: Unit completeness (the presence of any unit) rose from 92.7% to 99.8%.
  • Source Agnostic: Similar improvements were seen across three different EHR vendor datasets, demonstrating that the system works regardless of where the data originates.

Why This Matters for Oncology

In cancer care, lab values help treating teams determine therapy selection, monitor disease progression, and define clinical trial eligibility—the quality of that laboratory data is crucial. By ensuring semantic consistency, this framework removes the manual burden of data cleaning and allows researchers to focus on generating potentially life-saving insights.

  In this video, Parvati Naliyatthaliyazchayil from ConcertAI presents a groundbreaking framework for standardizing and correcting laboratory data in real-world oncology datasets.  

Why JMIR?

The authors chose JMIR Medical Informatics to share these findings due to the journal's focus on the intersection of digital health and professional practice. As the healthcare industry looks to build a more data informed system, this study helps provide the evidence base needed to shape the next generation of health care analytics.

Curious to see how data standardization is reshaping the future of oncology research? Watch the video featuring the study's insights and read the full research paper to explore the framework and the strategic roadmap for high quality real world evidence.


Naliyatthaliyazchayil P, Stenerson T
Harmonizing Logical Observation Identifiers Names and Codes (LOINC) Codes and Units in Real-World Oncology Data: Method Development and Evaluation
JMIR Med Inform 2026;14:e81254
URL: https://medinform.jmir.org/2026/1/e81254
DOI: 10.2196/81254