Psychologist logo
Glenda Liell and Steve Sireci
Psychological testing

‘We now see tests being used far more often for positive purposes’

The British Psychological Society’s Committee on Test Standards Chair Glenda Liell talks to Professor Stephen Sireci, outgoing President of the International Test Commission.

23 January 2025

Your opening Presidential address at the International Test Commission conference in Granada last summer, 'The Failure and Future of Psychometrics' was a powerful one, with some important reminders of past failures.

As Bob Marley sang, 'In this great future, you can't forget your past'. I think it is important for us all to realise psychometrics grew out of the paradigm of experimental psychology and was a major force in 'proving' psychology was a legitimate science. For that reason, all measurement became heavily standardised, and the focus of all research was on the measures, rather than the people who were being measured.  Today, we look at all the effects of educational and psychological testing and we realise tests have serious consequences – both positive and negative – on individuals and society.  Thus, the purpose of my talk was to encourage people to focus on how tests are being developed and used, and by thinking first about the individuals affected by tests, we will improve test design and validity.

As for past failures, the dark side of educational and psychological testing is rooted in the eugenics movement. Test scores have been used to argue some races are intellectually superior than others. Today we know such differences are essentially a consequence of how the tests were developed. As Cronbach and Meehl stated 70 years ago, constructs are postulated attributes that we made up (i.e. constructed). We cannot pretend test scores that are used in studies of intellectual differences are absolute measures of intelligence. The ITC audience knows very well that such measures are culturally laden, and in the USA, white-dominant perspectives of how to define and measure intelligence have helped support systematic oppression of historically minoritised groups. That is just one reason the American Psychological Association issued an apology to 'to People of Color for APA's Role in Promoting, Perpetuating, and Failing to Challenge Racism, Racial Discrimination, and Human Hierarchy in U.S' in 2021. In that apology, tests were explicitly mentioned eight times as causes of harm. 

On the positive side, we now see tests being used far more often for positive purposes such as helping students learn, and the field of culturally responsive assessment is also enormously encouraging. It was great to see so many sessions on these topics at the ITC Conference in Granada.

Arguably the most important aspect of testing and responsible test use is the consideration of both the intended and potential unintended consequences. What are some examples which really demonstrate this point?

One of the examples I gave in my Presidential address was the test use for admissions to the competitive high schools in New York City. New York City uses this norm-referenced test as the sole criteria for admissions and it has severe adverse impact. For example, for an entering class of 1,000 students in one of these competitive high schools, only 10 students were African American. That is 1 per cent of the accepted student population, but about 22 per cent of the population of students in New York City is African American. In my opinion, it doesn't matter how many validity studies were conducted to support that test if it is preventing membership of an entire group to access high-quality education. And the sad reality is there is not a wide body of evidence to support the use of the test for that purpose.

It's good to hear that re-cap on some those historical events which have marred testing. They serve as a good warning! 

Psychological Testing can come across as bit 'niche' and stats-heavy – but people need to connect more with testing being a real thing, which psychologists do a lot of, with real consequences. Do you find people are interested in testing in the US, and what do you do to keep the conversation live?

The days of the general public trusting educational and psychological tests is over. Tests should be accountable and demonstrate they are achieving their intended purposes while not causing unintended harm. In the USA, tests are becoming increasingly scrutinised. For example, the 'opt-out' movement sprang up in communities across the USA where parents chose to excuse their children from mandated testing in the schools. Movies like Try Harder and Persona have illustrated the damage that can be done from testing, and tests are a major part of conversations surrounding equity and access in university admissions. 

Thus, there is not much we need to do to keep testing in the public conversation, but there is much to do to improve testing for the public good. I think the call for accountability in testing, rather than testing for accountability, is overdue and encourages us to actually do the research professional guidelines in testing call us to do.

CTS is in the process of reviewing its testing standards which will continue to be aligned with the European Federation of Psychologists Associations (EFPA). Given your previous points about consequences, what would you say to psychologists in the UK who have not undertaken formal training in psychometric test use?

I don't think one needs formal training in psychometrics to understand we need to evaluate the consequences of testing as part of evaluating the use of a test for a particular purpose. First of all, tests are designed for specific purposes – that is to have one or more intended consequences. Are these consequences, or benefits, being fulfilled by the testing program? And we clearly know unintended negative consequences can occur, such as the adverse impact in testing we discussed earlier. Studying testing consequences is similar to evaluating a program. So, I say to my colleagues in the UK and elsewhere, be sure to study what the intended purpose of a testing program is, look for validity evidence that purpose is being fulfilled, and for any evidence of negative repercussions from use of the test.

You also covered Fairness in Testing and Measurement Justice, what key points did you want to raise about this? 

Fairness sounds like a simple concept, but it can be envisioned and defined in different ways. With respect to testing, I like the definition provided in the ITC/ATP Guidelines for Technology-Based Assessments, which stated, 'fairness in testing requires test developers to consider the wide diversity of needs and potential inequities within the tested population in all aspects of testing (e.g., test development, developing test preparation materials, test administration, scoring, etc.)' (p.7). This view of fairness requires us to evaluate the reasons for testing, who defined the construct to be measured, and considerations of how measurement of that construct might interact with the variety of personal characteristics within the populations of people to be tested. Measurement justice requires us to think beyond the dominant culture when designing and developing tests, and to conduct research on the population to be tested to understand how diversity within it will need to be addressed in all aspects of testing from construct definition to scoring and score reporting. I am greatly influenced by the work of Jennifer Randall in this area – she gave a terrific keynote on culturally responsive assessment at the ITC conference in Granada.

Do you see the development of psychometric testing being able to keep pace with practitioners' focusing more on individualised approaches so they can be more culturally responsive? 

Absolutely. I think personalised assessment is an important evolution in standardised testing and will be the predominant test design in the foreseeable future. For over 100 years we have tried to develop the best test we can for most people, knowing that we cannot design a single test that is best for all people.  Personalised assessment requires us to try to assemble the best test for each test taker. I use the word 'assemble' because the testing goal changes from developing a test to developing a testing system that can assemble or compile the best instantiation of the test for each individual, subject to content, linguistic complexity, and other constraints.

What do you see being the future for psychometric testing? Will the increased use of technology remove useful information gleaned from in-person testing?

I think the future is here – personalised assessment made possible through the use of technology. In the 20th century, we were adapting tests based on item difficulty and test taker proficiency. Now, we are expanding that to include adaptation based on test taker characteristics and test taker choices. Technology can be used to develop and deliver variations of content best suited to the test taker, quantify the similarity of that content for scaling and comparability purposes, and provide feedback to test takers to make test results more actionable.

I came along to the ITC conference with my Committee on Test Standards colleagues Nigel Evans and Charlie Eyre. The British Psychological Society are Gold Sponsors of the event. We have long had BPS representation at ITC… what do you see as the key benefits of our mutual engagement?

The ITC was founded by national and international associations that strive to evaluate and produce quality educational and psychological assessments, and support research aimed to improve testing practices broadly defined. The ITC's mission, 'to promote fair, valid, transparent, and efficient testing, assessment, and reporting practices, guidelines, and policies to benefit individuals, institutions, and societies throughout the world', is consistent with BPS goal of 'promoting excellence in the ethical practice of psychology across science, education and in real life practical situations'. 

Through the ITC, BPS members can network with colleagues from professional psychological associations in other countries to share ideas, collaborate, and make connections.  BPS is a key member of the ITC and has made many contributions to it over the years.  For example, the ITC has been indebted to BPS Past-President Nicky Hayes, who has not only been on the ITC Council for several years but has also been the editor for the Testing International Newsletter for six years. Collaboration between BPS and ITC remains strong. In a sense, the ITC can be viewed as a key resource for BPS on all issues, research, and practices related to testing; and BPS is a key resource for ITC in reaching and interacting with the measurement community in Great Britain.

The theme for this year's conference was 'Working together to improve cross-cultural assessment and research'. How was this theme chosen, and what were the overall aims of the conference?

We were so lucky to have an incredibly talented and dedicated team of conference co-chairs from the University of Granada, Spain. Professors Isabel Benitez Baena, Jose-Luis Padilla, and Luisma Lozano came up with the conference theme, and it really resonated with the international assessment community. The proposals we received focused on that theme and the sessions were super informative with respect to improving cross-cultural assessment and research. The aims of the conference were to re-convene and re-invigorate the international assessment community, since this was our first conference since 2018 (thanks to Covid); and also, to use the ITC community to improve testing research and practices worldwide.

Was it a sad moment stepping down as Chair of ITC? What do you feel the committee has achieved during your term?

It was not particularly sad because the current leadership of the ITC is strong. During the past two years when I was President, we were able to work with the Association of Test Publishers to develop the ITC/ATP Guidelines for Technology-Based Assessments, revise our mission and vision statements, host webinars in Brazil, El Salvador, Indonesia, and Italy; and pull off one of the greatest conferences in the history of educational and psychological measurement. It is incredible what the ITC has been able to do, using only the generosity of time committed by our members!

Going back to the conference, what are your reflections on whether the aims were met, and what are the plans looking like for New Zealand in 2026?

ITC conferences are the best conferences I have attended in my career. To meet people from across the globe who have similar interests to mine, and who handle the same measurement problems in different and innovative ways is inspiring. If it were not for the ITC Conference, I would not be doing this interview! I expect New Zealand will be equally incredible and I look forward to seeing a large contingent of my BPS colleagues there!