Taxonomy: Big words, big results

You have a pot of information in front of you. A dataset, maybe. Or, more likely, a lot of data, not really in sets. You have to make sense out of it in order to make something out of it.

This isn’t just about data for databases (in fact, if you have a database, you already have basic groupings, by association). It can be in crisis management when you have multiple inputs of data coming in and have to organize them to be able to see the field. It might be quantitative. It might be qualitative. No matter what, you have to figure out how to organize it.

Which brings us to the most awesome thing about data: taxonomies. Strictly speaking, it’s how data is associated (red shirts vs. blue shirts, or shirts vs. pants; in a crisis management context, it might be “what I know as fact vs. what’s in Internet rumors,” or “medical reports vs. police reports”). When you give a group of datums — thus, data — a taxonomy, you give it a common home.

One of the problems of creating taxonomies, though, is creating rules. A database is only as good as its data structure and its contents. Lots of attention is paid to data quality within the cells of a database, but less is paid to the total organization of the dataset. If the structure isn’t well-defined and the rules aren’t thought out, then the data gets messy and it’s hard to get a consistent input.

Here’s an example of rules done right. It comes from Fringe Focus, a Chicago graphic artist who managed to sell 3,400 posters of all the ACME products from the old Wile E. Coyote cartoons in a Kickstarter campaign. In a post to his backers, he discussed his rules:

Second: Product choices

  • Any object that officially said ACME on it. Obviously.

  • Any product whose box, wrapping, or label said ACME on it.

  • Any product that appeared on an invoice, shipping manifest, slip of paper etc. that said ACME on it.

  • Labeled products that clearly arrived with other ACME orders. It can be assumed these were from ACME as well, even if their box did not read ACME in the name.

  • Any product with a named title on its box or label. Coyote ordered 100% of his items from ACME, so if it had an official name or box I just included it as an ACME item.

  • All books. The books accompanied or preceded ACME purchases, thus are assumed to be published by ACME as well.

Fringe Focus — real name, Rob Loutoka — does a fabulous job here of explaining what the parameters were. With clearly defined rules, input is simplified, streamlined and stands a much better chance of being clean.