A little over two years ago, I dropped a letter in the mail.
I had begun to wonder, after a series of high-profile criminal cases had ended in acquittals earlier that year – Gerald Stanley and Raymond Cormier’s trials, specifically – if I could collect any data on the racial composition of juries. I shot off a few e-mails to lawyers and activists, and quickly learned this data likely didn’t exist.
But, as part of my poking around, I realized I might be able to look at something else: sentencing. I figured sentences must be tracked in a structured way by correctional authorities; if not, they wouldn’t know when to release inmates. Given the overrepresentation of Indigenous people in the correctional system, it’d be worthwhile to examine sentencing data by race – so I pivoted, from jury composition to sentencing.
Though freedom of information requests are often a shot in the dark, I typed up a letter asking for 20 years of records from the Correctional Service of Canada’s (CSC) database, which I’d learned about by e-mailing yet another set of people. On Aug. 30, 2018, I mailed them my request, and then almost immediately forgot about it.
Weeks later, I heard back from the CSC’s freedom of information officers, and began a months-long negotiation for release of the data I’d requested. In April, 2019, I finally received a CD in the mail, and went to open the spreadsheet it contained.
Microsoft Excel booted up, and then immediately crashed. The spreadsheet the CSC had disclosed was 185 megabytes. In my hands, I realized, was an enormous data set unlike any I’d ever worked with, recording the lives of nearly 50,000 people in the CSC’s custody between 2012 and 2018. I used a statistical programming language called R to open the file, and began digging around.
It often takes me a while to become “comfortable” with new data, and it was especially true now given this file’s size. I had no idea what kinds of patterns it contained, or how best to summarize it. In my mind, I often picture this phase in any analysis as the point at which I “crawl inside” a data set.
To start, I blindly summarized it, curious to see what it’d tell me. I figured I needed a second opinion, so I sat down with Patrick White, a colleague who’d extensively covered the federal prison system, and showed him some of the charts I’d cooked up. “There’s almost too much interesting stuff in here,” I told him, “and I’m not sure where to start.”
After spending some time with the materials I’d pulled together, he asked simply: “What about these risk assessments?”
From there, exploring the data became easier, and I quickly uncovered some disturbing patterns. Indigenous and Black people seemed to be receiving worse scores across a range of assessments much more frequently than other groups. Two scores in particular, the “offender security level” and “reintegration potential” score, sounded especially important. But I had no idea what these scores were supposed to represent, how they were calculated or what impact they had on an inmate’s time in prison.
By now, it was December, 2019, and I began reaching out to anyone I knew who could tell me whether – and how – these scores mattered. At the end of each call, I’d ask them if there was anyone else they’d recommend I speak with. Over a period of 10 months, my network grew from a small handful to nearly 70 people.
With each conversation, I tightened my methodology and honed my analysis. Eventually, after being inspired by a U.S. news outlet’s investigation on risk assessments, I realized I needed to disambiguate the impact of race from everything else using statistical modelling. So once again I went back to my Rolodex, e-mailing academics, statisticians and data scientists who could help me. Over the winter, spring and summer, guided by their advice, I built the kinds of statistical models I’d need for the analysis.
As I was doing that, I also began looking for inmates who could tell me about their experiences. Finding people who’d speak with me wasn’t easy, given I was looking at something as arcane and specific as risk assessments. I met Nick Nootchtai, for instance, after e-mailing a contact. They put me in touch with someone, who led me to someone else, who finally told me they knew of a person I might want to speak with. The first time I met Nick, at a Tim Horton’s in downtown Toronto, he handed me a plastic bag full of his correctional records, which shed light on the process and made it clear how critical these scores were.
My model’s findings were damning – so much so that I spent months trying to find an error in my code that could account for the discrepancies. When dealing with data-driven stories of this size, The Globe has a process for independently verifying findings. This meant handing over my entire analysis to a fellow data journalist, Chen Wang, and a Globe data scientist, Jeremy Gray. Our head of visuals, Matt Frehner, served as a sounding board for the investigation’s major findings.
I began to report the story in earnest in the spring, but the COVID-19 pandemic quickly became a priority. Months later, I was able to return to it, interviewing dozens of academics and experts in the field. Occasionally, I’d get a phone call from an unknown number – an inmate calling me from behind bars.
According to the CSC’s data, in 2018 an average of two inmates each day started serving a two-year sentence. That’s the threshold for sending them to federal prison, where risk assessments undoubtedly left their mark. The odds are good, then, that someone went to prison the same day I dropped my letter in the mail. Today, they are walking free.
Their experiences, along with those of so many others, are what you see today.
Our Morning Update and Evening Update newsletters are written by Globe editors, giving you a concise summary of the day’s most important headlines. Sign up today.