Data Integrity — Don't Forget the Arrows

James Ladyman, December 2021

Illustrations (unless otherwise stated): Old Data Bloke

Data flow diagrams have been a growing trend in the clinical trial environment, partly through encouragement by the regulatory authorities.

My QA colleague shared with me the below simplified diagram, first used some years ago to visualise the data life-cycle in a clinical trial.

Simplified Diagram
A data-flow diagram with limited systems (illustration: © J V Birnie)

This diagram is somewhat simple and we can easily grasp where to focus our efforts and where data is flowing internally and externally.

Commonly, however, data flow diagrams look more like this:

Modern Diagram
A more modern (and messier), data-flow diagram

Certainly more visually busy but clearer than trying to describe the same situation only with words. We can also still see a few obvious systems where lots of critical efficacy and safety data is clustered which helps us focus our efforts on those systems (boxes).


Actually, it's all about the arrows

Whenever data is transferred between two organisations there is extra risk.

Until now the focus has often been on the areas represented by the boxes — systems or activities — when considering data integrity. However one thing that’s often lost in this process is the data flow itself: in other words the arrows. Recall our ‘bottleneck’ system: what should make this stand out to us is the arrows into and out-of the system not the system itself. Each of these arrows is itself a data transfer or process and whenever data is transferred there’s a risk to both its integrity and its confidentiality. Indeed recent data integrity guidance / regulations stress the importance of data integrity / encryption in transit and at rest — but the issues are far more than just encryption. (See the References below.)

So, why care about the arrows?

Arrows that cross organisational boundaries require special attention

That arrows indicate data transfer is nothing new, of course, and many of us will be familiar with EDT for example, when moving data between two systems as in:
Simple data transfer
Simple EDT
…but this arrow isn't meaningless: this indicates a process of data transfer likely involving both the CRO and sponsor’s EDT SOP(s) and potentially specialised teams or negotiation between IT departments. Whenever data is transferred, that transfer process is as important to data integrity as the systems on both ends! Indeed, this is an area in which we see plenty of regulatory interest.

As a second example take this area of our second data flow:

A back-water system

Whenever data is transferred between two organisations, even if it is only a ‘simple’ data transfer, there is always extra risk — don't forget, queries are important data in their own right.

What’s in an arrow? When data transfer becomes data transformation.

As we've already seen, not all arrows are created equal. For another example take the simple EDT figure we saw above:

Simple data transfer

This is a well understood process both in the IT world and in the clinical world and granted proper SOPs and training, should be somewhat painless. But take this:

Simple data transfer

Here we won’t simply be sending raw data to the statistician (not twice anyway!) and is as much a transformation as a transfer. Another good example is this system and its associated arrows:

Simple data transfer

Note the arrow coming ‘in’ to this process. Sometimes this might be simple data transfer, at other times there are additional considerations, for example preserving the blind when transferring data for blinded data review, or ensuring only the correct subset of data is sent to the Data Monitoring Committee, which is data transformation as well.

A final concern issue: arrows that don’t go anywhere

Most worrying of all are arrows that don’t go anywhere, or which end in a ‘dead-end’ system — for example, in the last diagram above, neither the Blinded Data Review nor the Data Monitoring Committee meetings have any output. Look out for an upcoming article by JVB on this subject.


There is growing regulatory interest in all aspects of the data life-cycle and data integrity. Data-flow diagrams are a helpful tool in visualising the modern clinical trial and its many computer systems. Often, however, we focus on the boxes and not the arrows, assuming that they are ‘only’ simple data transfer but as we have seen this is not always the case and not all arrows are created equal.