How Public Safety Assessments are Affecting Judges’ Pretrial Decision-Making

(Photo: Getty Images)

The preliminary findings are part of the A2J Lab’s PSA research in six courtrooms throughout the U.S.

By Matt Keyser
National Partnership for Pretrial Justice
Sept. 29, 2022

In courtrooms throughout the United States, judges utilize a pretrial assessment tool known as the PSA to help predict the outcome of a person’s pretrial release.

The instrument uses an algorithm that factors the person’s age and criminal history and provides three weighted scores representing the risk that a person will fail to appear to a future court date, commit a new crime while released, or commit a new violent crime during release. A body of evidence shows that the vast majority of people released pretrial appear for court and remain arrest-free during the pretrial period.

Although the tool provides judges with scores, it is each jurisdiction’s responsibility to translate that score into a standard release recommendation, and it is the judge’s ultimate decision on whether to release or detain the individual.

In a groundbreaking study, researchers at the A2J Lab at Harvard University are conducting six separate randomized studies at courtrooms across the U.S. to determine what, if any, impact the algorithmic recommendations have on human decision-making.

Researchers have released preliminary findings on some of the data they’ve analyzed in Dane County, Wis. We spoke with Dr. James Greiner and Dr. Kosuke Imai about those findings.

This interview has been edited for length and clarity.

NPPJ: These public safety assessments are being used in courtrooms throughout the U.S. How are they impacting judges’ pretrial decisions?

Kosuke Imai: The broader question is how the algorithm should be used in public policy, and how those algorithmic recommendations affect public policy decisions and people’s lives. It is important to recognize that in most high-stakes situations like judicial decisions humans are the final decision-maker and that we haven’t outsourced those decisions to an algorithm.

Thus, we must understand how algorithmic recommendations interact with human decision-making. An algorithm may be biased, but we also know that humans are biased. The question is how those biases interact with one another. That’s a really important question in the context of the criminal justice system because the goal of the pretrial risk assessment instrument is to help human judges make better decisions.

NPPJ: Right. In the criminal justice system, those decisions have a very tangible effect on people’s lives, and understanding how those decisions are made with the help of an algorithmic tool like the PSA seems vitally important. In what ways has your research shown the PSA is affecting pretrial decision-making?

James Greiner: Our research right now is based on one location — Dane County, Wis. — and only about 20 percent of the total information from there. We have found so far that folks in the criminal justice system, the stakeholders, generally like having the risk assessment present. But so far it doesn’t appear that the availability vs. the non-availability of the risk assessment instrument changes anything.

So far, the risk assessment instrument doesn’t appear to produce better criminal justice outcomes or worse criminal justice outcomes on anything that we’ve looked at. There are two small exceptions to that. First, there may be a very, very mild increase in failure to appear associated with a risk assessment instrument, and that is in some way associated with slightly greater use of release. Second, bond amounts don’t change on average but appear to become more similar across cases.

NPPJ: Generally, what do judges think of the risk assessment instrument?

JG: It varies by location. Most of the judges — but not all — seem to like it. A vocal minority despise it. The bail bond industry generally despises it because they think that the use of the risk assessment instrument, if successful, will result in less reliance on their products.

NPPJ: What has your research shown thus far about gender or racial disparities, if any, with the pretrial risk assessment?

JG: There are slight differences showing up in the difference between men and women. Men and women have always been treated differently in the criminal justice system. That is deeply rooted. And the disparities between the way men and women are treated differently are large. It may be that the risk assessment instrument has — based on that one site, without all the data analyzed — that one site is causing a slight increase in the difference between how men and women are treated differently. And it’s very slight.

There appear to be no differences at all in terms of having a risk assessment instrument and not having one with the respect to measures of racial disparity. So many people have alleged publicly that having a risk assessment instrument will worsen racial disparities that already exist in the criminal justice system, and so far evidence does not in any way support that assertion.

Again, these are very preliminary findings with less than half the data on one of six sites we’re conducting our research.

NPPJ: Kosuke, you mentioned claims that the algorithm is biased. In what ways, and what has your research shown about any potential biases?

KI: In our study, we have the treatment group and the control group of cases divided equally. In the treatment group, the judge sees the PSA while the judge doesn’t see it in the control group. When we look at the control group, the decisions that the judge ends up making are actually fairly well correlated with the PSA recommendation that’s produced by the instrument. That’s part of the reason, perhaps, that we don’t see a huge effect of the provision of algorithmic recommendation in this case — even without providing that information, the judge is already making a similar recommendation that the algorithm is producing.

So when we talk about algorithmic bias, what we are seeing is that the recommendation doesn’t have a huge impact on the judge’s decision, at least with the data we’ve been able to analyze so far. And even with the absence of a recommendation, the judge’s decision seems to be highly correlated with the algorithmic recommendation.

One thing we have found is that both the algorithmic recommendation and the human decision seem to be much more severe than necessary. Now, I have to be careful because how do you define what’s necessary? That depends on how much you weigh the cost of detaining an arrestee who may be innocent against the cost of potential new crime to society. The vast majority of arrestees don’t go on to commit violent crimes if they are released.

NPPJ: I’m surprised there hasn’t been more in-depth research like you all are conducting considering how widely the instrument is being used in the nation’s courtrooms.

JG: It’s not just courtrooms, but a lot of places in the criminal justice system. They’re obviously used at initial release decisions. Sometimes they’re used for a cost decision about whether to stop someone on the street and search them. Prosecutors can use them. I don’t know whether many are yet, but I think it’s coming as to whether they’re making charging decisions. Judges certainly use them at sentencing. Parole boards use them when making parole decisions.

As far as we know, this is one of the first studies that is conducting a randomized evaluation of them.

Having said that, for anyone interested in finding out whether these sorts of instruments make a difference, we ask for a little patience. The unfortunate thing about credible scientific information is that it is slow to arrive; you have to invest time in generating it.

We’re almost there — almost in the sense of about a year to 18 months away from being able to share complete findings from more than one location. But we’re not quite there yet. All that’s posted right now is preliminary data on one of six locations. That’s all we have available right now.

Read the NPPJ archives and subscribe to our monthly newsletter.