Proficiency Testing and Interlaboratory Comparisons

02 February 2021

Article by James Zjalic (Verden Forensics) and Foclar

Why are they required?

As the probative value of forensic science is intrinsically linked to the reliability of such, it is essential that the performance of tools used, methods implemented and application of such by examiners is consistent from case to case. Without assurance of these factors, how can the trier of fact be sure that non crucial information has been lost during an imagery enhancement? In the first instance it must be demonstrated that the tools used are appropriate and function as expected, because although an examiner may be proficient, if increasing the brightness by 5% using a fader results in differing amounts of brightness depending on the day or the format of the imagery being enhanced, the outcome may be unreliable. Confidence in the tools can be obtained through validation, a process which will be covered in a future article. In the second instance, the method itself must be reliable, as the tool may function as expected and the examiner may be competent, but the ability to follow a method correctly will be of no use if the method produces misleading, poor or unstable results. Finally, the examiner must be proficient in performing the method, a factor which can be evidenced by them taking part in some form of controlled assessment, or ‘proficiency test’ [1].

Proficiency Tests

Proficiency testing: The determination of the calibration or testing performance of a laboratory or the testing performance of an inspection body against pre-established criteria by means of interlaboratory comparisons [2].

The advantages of participation for the organisation include, but are not limited to:

  • Training of personnel;
  • Promoting baseline competency;
  • Improving laboratory practices;
  • Identify risks;
  • Measure error rates.

The tests can be blind (where the source of the tested sample is not revealed until after the test, and the examiners are not aware they are in a test), or open (where the source is known, and the examiner understands they are being tested). They can also be internal, and thus developed by the organisation, or external, and developed by a party with no vested interest in the result. As blind tests mean the examiner performs as they would in a real case, and creation by an external organisation mitigates against any biases, blind tests are held in the highest regard. One method to implement such is for the test to be submitted in accordance with casework, allowing the examiner to believe it to be such and acts in their normal manner.

Where the performance is poor, the reasons for such must then be investigated. Did the examiner deviate from the method? And if so, was the deviation accidental due to incompetence or deliberate? [3] Or did the examiner conform to a poor method? The results can then be used to determine if the methods should be improved or abandoned and assess whether further staff training is necessary [4].

Interlaboratory Comparisons

Inter laboratory Comparison: The organisation, performance and evaluation of calibration/tests on the same or similar items by two or more laboratories or inspections bodies in accordance with predetermined conditions [2].

These are often performed instead of formal proficiency tests for a number of reasons, including:

  • Lack of proficiency testing availability;
  • The unreasonable burden it would place on the laboratory;
  • The low number of laboratories within the sector.

They can involve a large number of labs, or few. When there are fewer than seven laboratories taking part, the ILC is classed as a ‘small ILC’, which can come with its own difficulties in interpretation of results due to the limited number of such [5].

Generally, for these types of comparisons there would be an organiser and a number of participants. The organiser takes the responsibility of creating the test and dataset, and then analysing and reporting the results. The participants are those who take part, which can include the organiser.

There are various factors for consideration when developing an inter laboratory comparison exercise, such as the types of data (qualitative or quantitative), how the data is provided (in a series round-robin type manner where each participant passes it onto the next or simultaneously) and whether the tests will be continuous or a one-off test.

In imagery forensics tests would generally be simultaneous, owing to the nature of digital data for bit stream replication. Quantitative tests may be those for which a quantitative result is possible, such as rewrapping or conversion, whereas qualitative would be enhancements and morphological comparison exercises [6].

ILC’s must always be performed using the laboratories own documented methods and procedures, and any unexpected performances are classed as non-conformities [7]. As the test is a comparison, the goal is to determine the performance of the laboratory relative to other laboratories. They do not, therefore, explicitly require known, expected outcomes (which is not possible with qualitative disciplines), but is of benefit where suitable, such as those with quantitative outputs.


Lay people place a strong weighting on expert opinion, and so it is vital that it is as reliable as possible. In order to ensure reliability and improve procedures, a combination of validation, inter laboratory comparisons and proficiency tests should be implemented within all forensic laboratories quality management systems.


[1] J. J. Koehler, “Proficiency tests to estimate error rates in the forensic sciences,” Law Probab. Risk, vol. 12, no. 1, pp. 89–98, Mar. 2013.

[2] United Kingdom Accreditation Service, “TPS 47 - UKAS Policy on Participation in Proficiency Testing.” Feb. 04, 2020.

[3] R. Mejia, M. Cuellar, and J. Salyards, “Implementing blind proficiency testing in forensic laboratories: Motivation, obstacles, and recommendations,” Forensic Sci. Int. Synergy, 2020.

[4] W. E. Crozier, J. Kukucka, and B. L. Garrett, “Juror appraisals of forensic evidence: Effects of blind proficiency and cross-examination,” Forensic Sci. Int., vol. 315, 2020.

[5] European Accreditation, “Guidelines for the assessment of the appropriateness of small interlaboratory comparisons within the process of laboratory accrediation.” 2018.

[6] R. M. Voiculescu, M. C. Olteanu, and V. M. Nistor, “Design and Operation of an Interlaboratory Comparison Scheme.” Institute for Nuclear Research Pitesti, Romania, 2013.

[7] Forensic Science Regulator, “Codes of Practice and Conduct FSR-C-100,” no. 5, 2020.

You want to stay up-to-date? Subscribe to the FOCLAR newsletter

Over 25 years of experience in software development for forensics

Used worldwide in forensics and law enforcement

Streamlined design, intuitive to use and responsive communication

Practice-oriented and scientifically based

Are you sure you are getting the most out of your footage in an effective way?