What online comments can reveal about the person behind the keyboard

In this series, we explore how our online identities intersect with who we really are.

Justin Cheng has pored over thousands of profanity-riddled online comments, including many he wishes he had never read.

On websites like CNN, Brietbart and gamer-focused IGN, the Stanford PhD student has encountered xenophobic remarks ("you get out of MY country, you f***ing a*******"), racially charged complaints ("Every single touchy feely story is about a black basketball player............YOU GUYS MAKE ME SO SICK AS A READER !") and unintelligible diatribes ("Anybody can get away with anything, side with corrupt! No respect in our country, no values, liars worshipped, even pretty girls making hip hop sounds, many entertainers humping all over the stage, pants on the ground...").

By finding patterns in the messages – such as readability, frequency of swearing and tendency to veer off topic – Cheng thinks there are clues to "who's behind this bad behaviour." And Cheng, whose fellowship is sponsored by Microsoft, isn't the only one who believes our personalities, mental states, and even physical health are reflected in the language we use online.

It turns out, the comments we make online reveal a lot about us. Researchers are now analyzing online comments for a wide array of predictive patterns and signals, using Internet discussions and social media as sources of constant, easy-to-access information about what's going on in people's lives.

Their efforts may eventually allow health professionals to monitor patients' well-being based on their Twitter streams and Facebook entries. Controversially, employers or insurance companies could one day screen job applicants and potential clients based on their social media status updates.

Data sources, like Twitter, have surfaced at the same time new techniques have emerged in specialized fields of computer science, such as natural language processing, which involves translating between human and computer languages, and computational linguistics, using computers to analyze human language, says Gregory Park, a software developer in Columbus, Ohio.

"It's led to this whole new field of research," says Dr. Park, who has co-authored multiple studies examining people's personalities through their language on social media.

Dr. Park explains frequent use of "negative emotion" words, like "hate" or swear words or mentions of violence, "tend to be good indicators of lower emotional stability... higher levels of baseline anger and anxiety and higher levels of depression, higher levels of stress."

He has conducted studies using people's survey responses and comments on social media to develop computer models that predict users' personalities based on their language.

In one study, for which Dr. Park was a collaborator, data scientists were able to develop algorithms that could predict participants' so-called "dark triad" personality traits, or their levels of narcissism, Machiavellianism and psychopathy, based on their Twitter histories. (Dr. Park notes this was a "proof-of-concept" study, conducted simply to determine whether it was possible to build such algorithms. While he says the algorithms would need to be refined before they were put to use in real life, the study showed the algorithms did, in fact, work.)

Much of the research in this area is focused on using mass collections of online comments to learn about the health and psychology of entire communities and regions, Dr. Park says, explaining that computer algorithms may one day allow researchers to use social media comments in place of expensive, time-consuming polls and surveys to learn about large groups.

But another, less-developed, area of study focuses on the individual level, he says. Researchers are searching for ways in which individuals' social media comments may predict their moods, personalities or various aspects of their health. A big challenge with this, however, is, in order to identify predictive patterns and signals, researchers must be able to compare what individuals say on social media with what is actually happening in their lives, Dr. Park says.

One application for which there is great interest for this type of research is to monitor the mental health of consenting patients, says Dr. Andrew Schwartz, an assistant professor of computer science at Stony Brook University in New York State. Mental-health experts often note they lack the time and resources needed to meet the demand for mental-health care, he says.

While social media comments would not likely be used to make diagnoses or to determine individuals' treatments, they could provide a window into patients' mental health in between treatments, and offer information about patients' well-being that is not otherwise easily accessible. "This would be a way for a clinician or a therapist to get a clearer picture of what's going on in a patient's life," Schwartz says.

Schwartz believes that at the community level, the analysis of online comments can help public-health officials tailor their interventions to specific groups. He and his fellow researchers have found, for instance, it is possible to predict a community's heart-disease mortality rates, using computer algorithms to analyze the language of the Twitter posts of people from that community.

"Basically, the information on Twitter is more predictive than knowing the smoking rates, the obesity rates, the demographics, the income, the education of the community," Schwartz says.

These tell-tale linguistic features include higher use of hateful language, profanity, and mentions of disliking others, he says. But it's not necessarily the people who tweet in this manner who have high rates of heart disease. Rather, Schwartz explains, these individuals act as "canaries" or signals of the type of community in which they live.

In other words, he says, "If your neighbours are hateful, you've got a higher likelihood of dying from heart disease."

Schwartz is now studying whether social media comments can predict other top causes of death in the U.S., including cancer, strokes, accidents and suicide. To establish these correlations, he uses Twitter comments and data from the Centers for Disease Control and Prevention for various counties in the U.S.

The idea of developing computer algorithms to gauge people's mental and physical health via social media has raised some concerns about how this technology could be used, Park says. Many people are already aware what they say online may affect their reputations. But some worry about the potential for users, such as employers or insurance companies, to make automated assessments of individuals' personalities and health statuses, and deny people work or health insurance based solely on their online comments.

For individuals concerned about online privacy, "is this an area that we should be considering a little more closely?" Dr. Park says. "Because maybe we're giving away information that's more revealing than we thought."

Indeed, even without the use of computer algorithms, a lot can be gleaned from how and what people write online.

Dr. Daniel Jones at the University of Texas at El Paso believes some people with psychopathic traits have a distinct way with words. Previous research has suggested psychopathic individuals tend to have difficulty with attention, says Jones, an assistant professor of psychology. They often zero in on a goal, but fail to notice other things, like threats in the environment or emotional stimuli.

This tendency may be reflected in online messages that are confusing and difficult to read, or what Jones describes as lacking in "narrative coherence." It's a characteristic separate from grammar or vocabulary. Rather, he explains, one idea doesn't follow the next. Jones points to the unintelligible diatribe Cheng encountered as an example ("Anybody can get away with anything, side with corrupt! No respect in our country, no values, liars worshipped, even pretty girls making hip hop sounds, many entertainers humping all over the stage, pants on the ground...").

While Jones says one must be careful not to label someone a psychopath without a proper diagnosis, it's possible to identify general language patterns. People with psychopathic traits tend to get so "locked in" on their goal of writing a message, he says, "they don't think about the substance or the style of what they're trying to say."

Jones is interested in studying how the online language of psychopaths differ from those who have other dark triad personality traits. He believes, for instance, that psychopaths, who tend to be impulsive, may have different motives than sadists when writing anti-social comments online. He suggests they may be more likely to threaten and intimidate rather than relish in other people's emotional responses. As such, they act unlike typical Internet trolls, or individuals who deliberately try to get a rise out of others.

"I doubt a psychopath is going to sit down really think [it] through when they write a really nasty thread or a blog or an e-mail," Jones says. "They're just going to write and fire it off. A sadist may take more time in crafting the message because they really want to savour…the process and to see the [recipient] agitated."

Back at Stanford, Cheng has found there are certain measures online communities can take to deter anti-social behaviour and encourage civil discussions.

Computer algorithms can help website administrators identify and remove inappropriate comments. Moving comment boxes to the bottom of web pages, below published comments, gives users an incentive to read what others have written before crafting their own comments, which may encourage them to make more thoughtful entries, he says.

And even including a note to warn commenters to be civil helps send the message that a particular website does not tolerate anti-social behaviour.

But as interested as he is in studying the language of anti-social users, Cheng says he doesn't respond when they've directed their insults at him.

"My strategy right now is not to engage with them," he says, repeating an adage of online communities: "Just don't feed the trolls."