Hello, friends!  Welcome to Sciency Words, a special series here on Planet Pailly where we talk about science and science-related terminology.  Today on Sciency Words, we’re talking about:


Okay, are you supposed to say “this data” or “these data”?  Are you supposed to say “the data is” or “the data are”?  In other words, are you supposed to treat “data” like a singular or plural noun?

Well, before I answer those questions, let me tell you something about English grammar that you already know, even if you don’t know that you know it.  English makes a grammatical distinction between count nouns (like shoe, child, or cactus) and mass nouns (like corn, furniture, or homework).

Count nouns have different singular and plural forms (shoe/shoes, child/children, cactus/cacti).  Mass nouns do not.  When was the last time you heard someone walk into a room and say, “Oh, look at all these furnitures”?

Traditionally, data has been treated as a count noun, with datum as the singular form and data as the plural.  This is consistent with the word’s Latin origin.  In Latin, datum meant something like “a thing that is given,” and so data would mean “things that are given.”

But of course, Latin is a dead language; English is still living, and in living languages words change.  Right now, “data” is in the process of changing from a count noun to a mass noun.  If I had to guess, I’d point the finger at personal computers for causing this change.  I imagine datum and data were once rather esoteric, rather academic words.  Then personal computers put the word “data” into the vocabulary of the masses—but not the word “datum.”

I mean, given how much (how many?) data computers process, how often would anyone need to talk about a single datum?  In our daily experience, a single bit of data is akin to a single grain of sand.  And so, much like the word “sand,” many of us have started treating “data” as a mass noun.  Those who still use “data” as a count noun are in the minority.

A few years ago, the statistics blog Five Thirty Eight conducted a survey asking, among other things, if people preferred “the data is” or “the data are.”  As Five Thirty Eight explains:

To those who prefer the plural, I’ll put this in your terms: The data are pretty conclusive that the vast majority of respondents think we should say “data is.”  The singular crowd won by 58 percentage-points, with 79 percent of respondents liking “data is” to 21 percent preferring “data are.”

There are still some contexts where saying “this data” or “the data is” would be frowned upon.  Basically, the more academic a setting you’re in, the more countable (and less mass-able) your data should be.  Although I’ve noticed that even the most persnickety of academics are more likely to talk about a singular “data point” rather than use the word “datum.”

Of course, none of this matters if you’re talking about Commander Data, the character from Star Trek.  In that context, Data is a proper noun, and therefore a countable noun, and therefore:

  Kate Rauner says:

    I had sort-of intuited the status of data, but now I have the facts! And a link to send to the next grammar Nazis I encounter.

  2. Your human analysis of the situation is flawless however the logic by which you’ve arrived at your data in understanding the complexities of mass communication is fascinating James. Step by step I can see where youre coming from and while we’re not computers, humans are typically irrational, illogical and emotional creatures. If only I was an android!

