Real world data (RWD) is increasingly a key engine behind modern drug discovery and clinical decision making. However, anyone who has worked with electronic health records (EHRs) knows that this engine often relies on data that are fragmented and inconsistent. Laboratory results, which are the backbone of oncology care, are frequently plagued by inaccurate codes and missing units.
Key Takeaways |
| The Problem: Lab data for oncology research (RWD) is highly fragmented and often unusable (up to 19% unmappable) due to inaccurate LOINC codes and missing units of measure. |
| The Solution: A precise, automated three-step framework—Code Correction, Unit Normalization, and Integrity Validation—was developed to fix these errors at the source. |
| The Impact: When tested on 6.34 billion lab records, the framework dramatically improved data accuracy, boosting correct unit assignment from 73.1% to 99.7%. |
In a new study published in JMIR Medical Informatics, researchers Parvati Naliyatthaliyazchayil and Travis Stenerson from ConcertAI introduce a scalable, system agnostic framework designed to fix these errors at the source, ensuring that lab data are both accurate and ready for large scale analysis.
While LOINC (Logical Observation Identifiers Names and Codes) is the global standard for lab tests, mapping local data to these codes is notoriously difficult. Studies show that up to 19% of laboratory tests cannot be accurately mapped due to incomplete information. Even when a code is correct, the unit of measure (e.g., mg/dL vs. mmol/L) might be missing or recorded incorrectly, rendering the data useless for aggregation.
The framework proposed by Naliyatthaliyazchayil and Stenerson addresses these challenges through a precise, automated process driven by three knowledge tables:
Code Correction: The system uses the unit of measure to sanity check the LOINC code. If the units do not match the test, the framework assigns a more compatible LOINC code.
To test the framework's scalability, the authors applied it to the ConcertAI database, encompassing approximately 10 million oncology patients and 6.34 billion lab records. The results were dramatic:
In cancer care, lab values help treating teams determine therapy selection, monitor disease progression, and define clinical trial eligibility—the quality of that laboratory data is crucial. By ensuring semantic consistency, this framework removes the manual burden of data cleaning and allows researchers to focus on generating potentially life-saving insights.
| In this video, Parvati Naliyatthaliyazchayil from ConcertAI presents a groundbreaking framework for standardizing and correcting laboratory data in real-world oncology datasets. |
Why JMIR?
The authors chose JMIR Medical Informatics to share these findings due to the journal's focus on the intersection of digital health and professional practice. As the healthcare industry looks to build a more data informed system, this study helps provide the evidence base needed to shape the next generation of health care analytics.
Curious to see how data standardization is reshaping the future of oncology research? Watch the video featuring the study's insights and read the full research paper to explore the framework and the strategic roadmap for high quality real world evidence.