Framework for visualizing examine designs


There’s rising curiosity in utilizing proof generated from routinely collected knowledge contained in administrative claims, cost grasp, affected person registry, and digital healthcare file (EHR) databases to help regulatory, protection and different healthcare decision-making.1–4 The strategies used to conduct research with these kind of longitudinal knowledge streams may be complicated and lengthy narratives make reviewing tough and error-prone. A graphical framework for depicting longitudinal examine designs to mitigate this barrier was launched and has discovered large acceptance.5,6 The purpose was to make research carried out with secondary healthcare knowledge extra reproducible and facilitate clear communication of longitudinal examine design selections to readers. Since publication, this graphical framework has been integrated into STaRT-RWE,7 a structured examine implementation template that was developed in a public-private collaboration that included members from worldwide regulatory companies and was endorsed by two worldwide skilled societies.

The unique graphical framework launched normal terminology for temporal anchors required for causal examine designs on remedy effectiveness, together with base anchors, first-order (main) and second-order (secondary) anchors outlined in calendar or affected person occasion time.5 Examples of design diagrams for various examine designs have been supplied for instance the flexibleness of the framework to deal with quite a lot of examine questions utilizing completely different strategies. The unique graphical framework didn’t comprise info to assist readers assess the appropriateness of the supply knowledge wherein the examine design was utilized. For instance, whether or not knowledge have been totally observable through the examine interval or whether or not there was (partially) unobservable info, as is anticipated for a lot of EHR databases.8–11 This is a crucial concern as a result of when conducting analysis with secondary knowledge, the investigator is just not in command of what, when or how knowledge is captured. Thus, clear communication of knowledge completeness and high quality inside the temporal context of a particular examine is critically necessary to judge methodologic validity.

On this paper, we replace the unique graphical framework for depicting examine design to incorporate a easy visualization of knowledge kind and observability. This revision is necessary as a result of analysis of examine validity wants to contemplate the triad of examine query, design, and knowledge collectively.12 Any given knowledge supply or examine design could also be acceptable for some questions however not others.

There are a lot of features concerned in evaluating whether or not an information supply is fit-for-purpose, together with the flexibility to measure key examine parameters similar to publicity, final result, inclusion standards, covariates and observe up.13,14 Different features embrace how chosen versus consultant the information are, the quantity of lacking knowledge or unreliable knowledge (eg implausible lab values), and the diploma of transparency in knowledge provenance or historical past of knowledge transformations to create the analysis database.

Though many features of evaluating knowledge health can’t be visually summarized in a graphical framework (eg high quality management, knowledge curation practices), some elementary options may be. The target of this replace to the graphical framework for examine design is to assist researchers working with longitudinal secondary knowledge take into consideration knowledge observability when designing a examine. The framework will also be utilized in reporting, to assist reviewers perceive with a high-level, visible abstract, whether or not and the way key examine parameters may be captured within the knowledge supply.


On this part, we outline knowledge observability and suggest a framework for visualizing knowledge observability within the context of examine design.

Knowledge Observability

Healthcare databases are largely comprised of longitudinal knowledge streams which can be generated by the routine operation of a healthcare system that tracks encounters together with medical, billing, and different info.15 Administrative claims databases seize longitudinal knowledge on affected person encounters which can be coated by insurances throughout suppliers and have an outlined begin and finish of the information stream, the enrollment interval. Nonetheless, they lack the richness of scientific element that may be present in EHR, which can be helpful for enhanced confounding adjustment and figuring out choose endpoints. In distinction, EHR-based knowledge sources have larger depth of scientific info, however notably troublesome for researchers in international locations with fragmented healthcare programs just like the US, info is misplaced to researchers when affected person search care by physicians who function exterior of the index EHR system.16 Built-in healthcare programs or linked claims-EHR knowledge sources profit from each longitudinal seize of all healthcare encounters and the flexibility to leverage detailed scientific measurements similar to very important indicators, laboratory, imaging, and different take a look at outcomes in addition to doctor notes.16

Therefore, as has been beforehand outlined,8,9 a key distinction between claims and EHR knowledge sources is the idea of affected person knowledge observability. We outline observable time as time home windows throughout which affected person healthcare occasions are routinely captured and saved within the database accessible by the researcher. That is associated to however not the identical as a classical definition of lacking knowledge, which happens when there a measurement was taken however no worth recorded for a variable that’s a part of the routinely captured knowledge.

In claims knowledge, this attribute of observable time may be measured via enrollment intervals, the beginning and finish dates of that are assiduously captured by insurers as a result of this info is vital to their enterprise perform. When measuring baseline traits or figuring out observe up, these enrollment home windows can be utilized to make sure that the ascertainment home windows within the examine design are restricted to time throughout which affected person healthcare knowledge is observable. That mentioned, for some claims-based knowledge sources, there could also be home windows of solely partially observable knowledge for structural causes. For instance, treatment administration knowledge for sufferers throughout time spent in hospital or different establishments is commonly not obtainable for claims-based analysis databases as they’re included in bundled funds.

In EHR or specialty registry knowledge, the problem of knowledge observability may be extra sophisticated. A lot of these knowledge typically wouldn’t have clear home windows defining when a affected person is actively engaged with a supplier inside the system. Moreover, it may be tough to determine how a lot of the affected person’s healthcare exercise is captured inside the community versus different programs. Linkage to claims or different knowledge sources that observe all encounters throughout the care supply continuum can assist elucidate the diploma to which sufferers are in search of healthcare in different programs. Alternatively, researchers could make assumptions about when affected person healthcare occasions are observable or use algorithms to determine and prohibit examine populations to sufferers who repeatedly use a single healthcare system.8–10 For some EHR-based analysis databases, intervals of structurally unobservable affected person time could also be clear. For instance, some databases comprise solely inpatient hospitalization knowledge, that means something exterior of hospital is unobservable.17 Others could give attention to main care solely or specialty clinics solely. Any of these kind of EHR knowledge sources could also be fit-for-purpose, relying on the examine query and design.

Visible Vocabulary for Knowledge Observability and Kind

The unique design visualization framework was logically centered round a main anchor that denoted time zero for entry to the examine inhabitants, and secondary time anchors outlined relative to the first anchor.5 The first anchor was visually represented by a vertical grey arrow with horizontal containers representing secondary time anchors similar to inclusion-exclusion evaluation home windows, covariate evaluation home windows, washout home windows for outlining incident publicity or final result, an publicity evaluation window, and observe up. The size and placement of the horizontal containers have been used to visually symbolize the timing of the secondary temporal anchors relative to time zero, with bracketed numbers explicitly denoting the precise timing.5

To herald the idea of knowledge observability to the design visualization framework, we suggest overlaying a steady or dotted line on every horizontal field utilizing a easy 2 shade palette to indicate home windows of observable affected person knowledge (steady and darkish shade), partial observability (dashed and darkish shade), and no observability (steady and lightweight shade). These three classes can solely present crude instructions about knowledge observability and may have extra explanations in foot notes. Beneath the design diagram, we due to this fact use the identical palette to interrupt down the observability of widespread knowledge sorts in addition to knowledge that could be idiosyncratic for the analysis database.

As an instance this addition to the design diagram framework, we stroll via 2 examples of comparative effectiveness research utilizing secondary knowledge sources.


Instance 1: Comparative Effectiveness of Famotidine versus Non-Use on Danger of Dying for Hospitalized COVID-19 Sufferers

On this hypothetical design of a examine of the chance of dying for sufferers initiating famotidine versus non-use on the day of admission to a hospital for COVID (based mostly on an actual examine18) Day 0 was outlined as a hospital admission date. The publicity evaluation window for famotidine publicity versus non-exposure was the date of admission. Sufferers have been included if they’d lab confirmed COVID-19 from 2 weeks prior up via 3 days after hospital admission. Sufferers have been excluded if there was proof of publicity to famotidine inside 90 days previous to day 0, or if there was proof of intensive service use inside 90 days previous to and together with day 0. Baseline covariates to regulate for confounding have been assessed over the 90 days previous to and together with day 0. Observe up for the result of dying started the day after admission and continued for as much as 30 days, with censoring upon final result, discharge, or finish of examine interval.

The unique design visualization framework doesn’t present details about knowledge observability. A generic design diagram may seem as panel A in Determine 1. If we have been to layer in details about knowledge observability, we’d see completely different points highlighted for several types of knowledge sources.

Determine 1 Comparative effectiveness of famotidine versus non-use on threat of dying for hospitalized COVID-19 sufferers. (A) Unique design visualization framework, (B) design utilized in a business claims database with knowledge observability traces, (C) Design utilized in a hospital EHR-based analysis database, (D) design utilized in linked EHR-claims knowledge.

If this hypothetical examine was carried out with a United States business claims database, the design would usually additionally embrace an inclusion requirement of getting a minimal quantity of enrollment previous to day 0. This could serve to make sure a window of observability for measured baseline traits. Equally, observe up could be censored when the enrollment window ends (Determine 1B). The traces indicating knowledge observability given this knowledge supply and this design would point out that affected person healthcare occasions are observable for all evaluation home windows aside from in-hospital treatment administration and lab take a look at outcomes. The latter are partially observable within the outpatient setting and unobservable within the inpatient setting. The vital piece of unobservable info on inpatient publicity standing makes this mix of knowledge and design not fit-for-purpose. The readability of the visualization brings this specific examine limitation to the forefront for a reviewer.

If the design have been to be applied in a hospital EHR based mostly analysis database, then there could be a transparent window of time throughout which affected person occasions have been observable – through the hospitalization. Nonetheless, relying on the information supply, whereas prior hospitalizations in the identical system could also be observable for every affected person, seize of affected person occasions from outpatient healthcare contacts or from different hospital programs will not be linkable to the index hospitalization that allowed sufferers to enter the analytic examine cohort. That is mirrored by the information observability traces in Determine 1C the place affected person info earlier than the index hospitalization is barely observable if the index hospitalization displays a rehospitalization inside the similar system. The lack to look at outpatient drug publicity previous to hospitalization makes it tough to exclude sufferers who had prior publicity to the medication of curiosity. Equally, knowledge on intensive service use or baseline traits is barely partially observable previous to the index admission (observable solely via prior hospitalization information). With out the information observability traces overlaid on the design diagram, use of prolonged baseline evaluation home windows previous to the index could give the misunderstanding that well being occasions previous to the index admission are measurable and contributing to the definition of incident use or different inclusion-exclusion standards. To tailor the design to the information, evaluation home windows might be modified to focus measurement on admitting diagnoses or procedures occurring on the date of admission, thus recognizing restricted observability exterior of the index hospitalization. Determination guidelines could also be wanted to outline whether or not situations have been seemingly current on admission versus new situations that developed throughout hospitalization and ended up on the discharge file. For instance, inpatient hypertension diagnoses seemingly mirror pre-existing situations whereas inpatient myocardial infarction codes is likely to be thought-about both pre-existing or indicative of a brand new occasion in the event that they have been an admission prognosis or the first discharge prognosis.

If the design have been to be applied utilizing a linked claims-EHR database, then the examine may gain advantage from use of clear enrollment home windows to seize observable time within the examine design inclusion standards, with knowledge observability traces reflecting that affected person well being occasions earlier than hospital admission are measurable, as is drug publicity standing upon hospitalization (Determine 1D). Relying on the linked knowledge sources, EHR based mostly affected person knowledge could also be totally, partially or not observable exterior of the index hospitalization (eg totally built-in healthcare system, hospital-based system with some outpatient clinics, hospital solely EHR).

Instance 2: Comparative Effectiveness of Chemotherapy Regimens in Specialty Oncology Registry Knowledge

Within the design of a examine based mostly on a beforehand developed grasp protocol19,20 of the chance of dying for sufferers initiating programmed death-ligand 1 inhibitor (PD-(L)1) monotherapy (pembrolizumab, nivolumab, atezolizumab) in comparison with sufferers initiating a PD-(L)1 plus a doublet chemotherapy mixture, day 0 was outlined as the primary noticed remedy episode with both course of remedy (Determine 2A). To determine sufferers whose healthcare occasions would seemingly be observable, sufferers have been required to have no less than 2 visits to the identical neighborhood clinic at any time previous to day 0. Sufferers have been restricted to those that had both a recurrence or development to superior non-small cell lung most cancers (aNSCLC) inside 120 days previous to initiating both course of remedy below investigation. Sufferers have been included if the neighborhood clinic at which they have been handled collected and supplied information of historic remedy to the analysis database.

Determine 2 Comparative effectiveness of chemotherapy regimens in specialty oncology registry knowledge. (A) Unique design visualization framework, (B) design utilized in neighborhood most cancers clinic based mostly EHR database with no exterior linkage of dying knowledge, (C) design utilized in neighborhood most cancers clinic based mostly EHR database with exterior linkage of dying knowledge.

Baseline covariates have been assessed over a number of completely different home windows. Demographics have been assessed on day 0 or utilizing the latest worth noticed contemplating all obtainable knowledge. Lab testing and outcomes have been assessed utilizing all obtainable time previous to day 0, utilizing solely time inside 30 days earlier than day 0, or together with time inside 30 days after day 0. Non-cancer associated comorbid situations have been assessed utilizing all obtainable knowledge previous to and together with day 0. Observe up for the result of dying started the day after remedy initiation and continued till the top of the examine interval.

If this examine design have been utilized to an information supply comprised primarily of community-based oncology clinics, then after making use of restrictions within the design section to incorporate sufferers who’re prone to be receiving all of their most cancers care in a clinic that contributes to the analysis database, the observability traces may point out that most cancers associated baseline measures are largely observable whereas non-cancer associated baseline traits are solely partially observable (via non-cancer associated scientific codes or notes captured within the most cancers clinic EHR – all exterior healthcare transactions could be unobservable). If this knowledge supply didn’t have linkage to exterior sources of details about mortality, then observability traces throughout observe up for total survival may point out that affected person occasions are solely partially observable (Determine 2B). The observability traces throughout observe up may look completely different if the examine was both investigating a special final result or if the examine have been investigating total survival in an identical knowledge supply with linkage to extra knowledge. For instance, below the belief that the restriction to sufferers who’ve a historical past of receiving their most cancers care from the identical neighborhood clinic is efficient at figuring out sufferers with observable knowledge, then if the result was remedy discontinuation as an alternative of survival, the analysis database would come with the required info and observe up window for this final result could be thought-about observable. Alternatively, if the analysis database included not simply EHR knowledge from community-based oncology clinics, but additionally dying info from the social safety dying index, linked claims, state dying info, and so on., then the window of observe up for total survival might be labeled as largely observable (Determine 2C).


Working with secondary knowledge requires clear communication of knowledge completeness and high quality within the temporal context of a particular examine as a result of the investigator is just not in command of what to measure, how, or when to measure it. Utilizing visualizations to assist researchers be exact about when and the way publicity, baseline traits, inclusion-exclusion standards, and outcomes are captured in knowledge and design can assist readers higher interpret the outcomes of a examine. On this paper we introduce terminology and a visible language for knowledge observability which can be notably related for digital well being file and illness registry databases based mostly on routinely collected healthcare knowledge. We offer up to date energy level templates with a shade palette (that’s pleasant for the color-impaired) which can be utilized to create publication high quality figures. The enhancement to the visualization framework can solely draw consideration to an observability concern within the knowledge; the accompanying textual content is required to elucidate particulars of the observability concern and the way it could affect the validity of the examine.


A limitation of the proposed enhancement to the design visualization framework is that it doesn’t seize all features of assessing whether or not an information supply is dependable, related, and fit-for-purpose. Moreover, research with very complicated designs could lead to complicated diagrams.7,21 Nonetheless, supplementing free textual content description of the strategies for such research with a abstract visualization should be a helpful help for planning a examine in addition to deciphering and evaluating the strategies.


Research design diagrams can be utilized as a software for planning a examine, reporting on it, and facilitating validity evaluation. For investigators, use of this framework for graphical illustration of examine design and knowledge observability through the examine planning section will assist them to contemplate and account for necessary knowledge options or limitations of their examine design. Sharing this diagram with a publication or report will assist talk these concerns to reviewers. For reviewers, the presence of a examine design and knowledge observability diagram will facilitate efficient analysis of examine validity in addition to relevance for decision-making.


We wish to acknowledge Judy Maro, Michael Nguyen, Joshua Ok Lin, and Jeremy Rassen for early discussions on this revision to the graphical framework for examine design.


Authors have been supported by funding from the NIH: NHLBI RO1HL141505 and NIA R01AG053302. The content material is solely the duty of the authors and doesn’t essentially symbolize the official views of the Nationwide Institutes of Well being. Dr. Wang has no conflicts of curiosity to report. Dr. Schneeweiss is principal investigator of the FDA Sentinel Innovation Middle funded by the FDA, co-principal investigator of an investigator-initiated grant to the Brigham and Ladies’s Hospital from UCB and Boehringer Ingelheim unrelated to the subject of this examine. He’s a marketing consultant to Aetion Inc., a software program producer of which he owns fairness. His pursuits have been declared, reviewed, and accredited by the Brigham and Ladies’s Hospital and Companions HealthCare System in accordance with their institutional compliance insurance policies.


1. Eichler HG, Baird LG, Barker R, et al. From adaptive licensing to adaptive pathways: delivering a versatile life-span strategy to convey new medication to sufferers. Clin Pharmacol Ther. 2015;97(3):234–246. doi:10.1002/cpt.59

2. Ball R, Robb M, Anderson SA, Dal Pan G. The FDA’s sentinel initiative—a complete strategy to medical product surveillance. Clin Pharmacol Ther. 2015;99(3):265–268. doi:10.1002/cpt.320

3. Solar X, Tan J, Tang L, Guo JJ, Li X. Actual world proof: expertise and classes from China. BMJ. 2018;360:j5262. doi:10.1136/bmj.j5262

4. Makady A, Ham RT, de Boer A, et al. Insurance policies to be used of real-world knowledge in Well being Know-how Evaluation (HTA): a comparative examine of six HTA companies. Worth Well being. 2017;20(4):520–532. doi:10.1016/j.jval.2016.12.003

5. Schneeweiss S, Rassen JA, Brown JS, et al. Graphical depiction of longitudinal examine designs in well being care databases. Ann Intern Med. 2019;170(6):398–406. doi:10.7326/m18-3079

6. Happe LE, Brown JD, Gatwood J, Schneeweiss S, Wang S. Utility of a graphical depiction of longitudinal examine designs to managed care pharmacy analysis. J Manag Care Spec Pharm. 2020;26(3):268–274. doi:10.18553/jmcp.2020.26.3.268

7. Wang SV, Pinheiro S, Hua W, et al. STaRT-RWE: structured template for planning and reporting on the implementation of actual world proof research. BMJ. 2021;372:m4856. doi:10.1136/bmj.m4856

8. Rassen JA, Bartels DB, Schneeweiss S, Patrick AR, Murk W. Measuring prevalence and incidence of continual situations in claims and digital well being file databases. Clin Epidemiol. 2019;11:1–15. doi:10.2147/CLEP.S181242

9. Lin KJ, Glynn RJ, Singer DE, Murphy SN, Lii J, Schneeweiss S. Out-of-system care and recording of affected person traits vital for comparative effectiveness analysis. Epidemiology. 2018;29(3):356–363. doi:10.1097/EDE.0000000000000794

10. Lin KJ, Rosenthal GE, Murphy SN, et al. Exterior validation of an algorithm to determine sufferers with excessive data-completeness in digital well being information for comparative effectiveness analysis. Clin Epidemiol. 2020;12:133–141. doi:10.2147/CLEP.S232540

11. Lin KJ, Singer DE, Glynn RJ, Murphy SN, Lii J, Schneeweiss S. Figuring out sufferers with excessive knowledge completeness to enhance validity of comparative effectiveness analysis in digital well being information knowledge. Clin Pharmacol Ther. 2018;103(5):899–905. doi:10.1002/cpt.861

12. Wang SV, Schneeweiss S. Assessing and deciphering real-world proof research: introductory factors for brand new reviewers. Clin Pharmacol Ther. 2022;111(1):145–149. doi:10.1002/cpt.2398

13. Daniel G, Silcox C, Bryan J, et al. Characterizing RWD high quality and relevancy for regulatory functions; 2018. Obtainable from: Accessed April 26, 2022.

14. U.S. Meals & Drug Administration. Framework for FDA’s actual world proof program; 2018. Obtainable from: Accessed January 31, 2019.

15. Schneeweiss S, Patorno E. Conducting real-world proof research on the scientific outcomes of diabetes therapies. Endocr Rev. 2021;42(5):658–690. doi:10.1210/endrev/bnab007

16. Lin KJ, Schneeweiss S. Concerns for the evaluation of longitudinal digital well being information linked to claims knowledge to check the effectiveness and security of medicine. Clin Pharmacol Ther. 2016;100(2):147–159. doi:10.1002/cpt.359

17. Makadia R, Ryan PB. Remodeling the premier perspective hospital database into the Observational Medical Outcomes Partnership (OMOP) widespread knowledge mannequin. Egems. 2014;2(1):1110. doi:10.13063/2327-9214.1110

18. Shoaibi A, Fortin SP, Weinstein R, Berlin JA, Ryan P. Comparative effectiveness of famotidine in hospitalized COVID-19 sufferers. Am J Gastroenterol. 2021;116(4):692–699. doi:10.14309/ajg.0000000000001153

19. Associates of Most cancers Analysis. FOCR rwEndpoints use case: assessing frontline remedy regimens in real-world sufferers with Superior Non-Small Cell Lung Most cancers (aNSCLC); 2021. Obtainable from: Accessed March 31, 2021.

20. Stewart M, Norden AD, Dreyer N, et al. An exploratory evaluation of real-world finish factors for assessing outcomes amongst immunotherapy-treated sufferers with superior non–small-cell lung most cancers. JCO Clin Most cancers Inform. 2019;(3):1–15. doi:10.1200/cci.18.00155

21. Franklin JM, Pawar AS. Replication of the EMPAREG diabetes trial in healthcare claims. US Nationwide Library of Medication, Obtainable from: Accessed March 4, 2020.

Related Articles

Back to top button