How the medium influences test development
In 2015, the Centre for Psychological Assessment (Thomas More University College, department of Applied Psychology) published the Dutch Cognitive Ability Test (CoVaT-CHC), a paper-and-pencil assessment tool, which was the first cognitive ability test in Flanders to be based on the CHC-model of intelligence. In 2017, the decision was made to adapt this test to digital format, mainly to take advantage of the many benefits that digitization has to offer (e.g. instant and automated scoring, higher standardization, adaptive testing, etc.).
Our aim was to retain as much of the test’s original features as much as possible. We wanted to migrate a test originally conceived for paper format to digital format, and keep both tests equivalent as much as possible, with little to no difference in performance or solution process.
For some subtests, the migration process was fairly easy. An example of this is the subtest “Opposites”, which is one of two subtests designed to measure crystallized intelligence. The purpose of this task is to select, from a given number of alternatives, the opposite of a given word.
In this case, the digital medium had very little influence on the way the subtest was translated: First of all, the digital medium doesn’t require item modification. The items as they are presented on paper, can be presented on screen in exactly the same way. Secondly, the way the respondent has to answer in the digital format doesn’t differ greatly from the way it’s done in paper format: there’s very little difference between circling or underlining a word by hand, or clicking on one.
In this case, the digital format didn’t influence test adaptation in any significant way. This subtest is therefore presumed to be equivalent across the two formats, paper and digital, and is therefore presumed to be equivalent both in terms of performance, as well as solving process.
For other subtests, the migration process proved more difficult. An example of this is the subtest “Figure Sequences”, which is one of two subtests designed to measure fluid intelligence (reasoning). Each item consists of a sequence of figures. The purpose of this task is to complete the sequence based on a rule, which the respondent has to induce from the part of the sequence that has been given. In paper format, the respondent completes the sequence by drawing the rest of the sequence by hand.
This poses a first problem: how do we migrate a subtest that relies on free form drawing to digital format?
From paper format to digital format
The first idea we had was perhaps the most obvious one: the use of graphic touchpads. We had quite a significant amount of data at our disposal: actual drawings from the paper-based test that we could scan, compile and create a database with. However, it proved quite difficult to write a scoring system that could automatically read, interpret and score these drawings. Not impossible, perhaps, but very challenging. Unfortunately, this also meant it wasn’t financially feasible for us to further pursue and develop this idea.
A second idea we had was to use a multiple choice format, whereby we could eliminate the need for the respondent to have to draw. Many subtests that measure fluid intelligence use a multiple choice format. Perhaps the best known example of this is the Raven PM. The multiple choice format aims to measure “convergent thinking”. Convergent thinking is the type of thinking that focuses on coming up with the single, well-established answer to a problem. It means eliminating options and working “towards” the right answer, where one and only answer is correct.
“Divergent thinking” is the opposite, where different answers could be evaluated as “correct” in some form or another. A good example of this would be Vocabulary from the Wechsler-scales (or any Gc-subtest, for that matter) where the respondent answers in a spontaneous, free-flowing manner and where many answers are generated and evaluated.
However, we wanted to migrate Figure Sequences to digital format without losing the measurement principles that we had. In Figure Sequences, respondents have to literally construct their own answers with some freedom of expression. Sequence completion also requires three answers per item, all of which are evaluated independently. In that sense, Figure Sequences offers a measure of divergent thinking. In developing our new subtest, we decided not to adopt a multiple choice format, because we wanted to maintain this measurement principle, which the multiple choice format does not allow.
The third idea we had was to keep the subtest basically as it is. With this method, the original items are presented on screen, but the respondent answers in classic paper-and-pencil form. The clinician would then score manually and put in the scores. This method fails to deliver, however. By still relying on the paper-and-pencil (response) format, this method would only amount to partial migration, which defeats the purpose of our aim. And secondly, and perhaps more importantly, it this method would not allow for adaptive testing.
Finally, we asked ourselves if there was a way for the original subtest to be translated to some form of “point-and-click” method. We eventually came up with the idea to use matrices.
One the one hand, this method allows for some degree of freedom, which is important in terms of divergent thinking. At the other hand, it also allows for automated scoring to be possible. The combination of those two things meant that this was a method we felt we could work with.
- Item modification
Using the matrix method had a number of consequences. First and foremost, in terms of item construction. All the original paper items had to be translated to this new format. For some items, translation was fairly easy and they translated quite well. However, not all items were possible to translate to matrix form. The adaptation meant that too much of the ‘original’ information was lost.
Using matrices as a point-and-click method, created another problem: during our pilot studies, we learned that clicking tiny boxes apparently is a lot less fun than drawing. A significant amount of people actually found it frustrating, especially when parts of the sequence were repetitive. This meant that in translating or creating items, we had to reduce the number of required clicks as much as possible, and eliminate unnecessary repetitiveness.
- Item response format
By using free space boxes in paper format, the only relevant factor is the “form” of the drawings – the position of the drawing within the box is not relevant. In digital format, however, free space is replaced by a matrix and position does become relevant. Items now have to be correct both in terms of “form”, as well as “position”. The respondent now has less freedom of expression, which might influence the respondent’s solving process.
We know that in paper format, respondents sometimes rotate the paper to look at items from different angles, and this may help the examinee’s comprehension of the task at hand and may facilitate engaging a correct solving process. By presenting items on a screen, respondents are no longer able to do this. Other manipulations are made impossible too, such as drawing or highlighting certain elements. By using matrices, the respondent now has less freedom of manipulation. Because these kinds of subtests are also influenced by visual information processing, these small differences might hold important consequences.
From Figure Sequences to Pattern Sequences
As we’ve seen, migrating Figure Sequences to digital format has led to significant differences in item construction. The original paper items were adapted to matrix format. This irrevocably changed the look and feel of the items. It also meant that some of the original items had to be removed and replaced by novel items. Besides item modification, we’ve also seen differences in terms of item response format. This is why we eventually considered this subtest to an entirely new subtest, and we renamed it Pattern Sequences.
How do both subtests compare? For example, can we assume equivalence in terms of performance? Performance equivalence can be assumed by assuming that respondents still implement the same solution process. At first glance, it seems that the solution process remains the same: respondents still have to induce a rule from part of the sequence that has been given, and then complete the sequence by constructing figures or patterns (either by drawing or by clicking).
However, we know from research that the item format can have a strong impact on solution processes and the item format has now changed significantly. Research also suggests that respondents’ attitudes have an effect on the solution process as well, and that attitudes seem to shift, at least in some cases, when there’s a migration from paper-and-pencil to digital environment.
Translating “Figure Sequences” to “Pattern Sequences” is a good illustration of the ways in which the specific medium determines in large part what form a test will take, and how test migration is a balancing effort (what is our goal? What is technologically & financially feasible? How do we maintain our measurement principles?). Balancing those elements means that migrating a test from one format to another can lead you to places you didn’t originally expect, but that’s also what makes test development and adaption exciting and fun.
“From paper-and-pencil to digital assessment: how the medium influences test development” was a contributing paper to the 15th European Conference on Psychological Assessment (2019), Brussels. All images used are the property of the Centre for Psychological Assessment and can not be used without authorization.