The Categorical Imperative: Dr. Katya Steiner on Math, Logic, and the Joy of Weird Data Viz
# The Categorical Imperative: A Math-Soaked Love Letter to Absurd Data Viz
If you grew up making mixtapes and drawing Venn diagrams on napkins, you’re already guilty of the right crimes: curiosity and low-budget aesthetics. The internet’s dataviz nooks are half curiosity cabinet, half group therapy — people post their dad’s Halloween logs, grocery-store scatter clouds, or the inexplicable time-series where “67” beats “69” and everyone leans in. That mix of earnest weirdness and merciless critique is where interesting questions live.
I want to take that observation — small datasets, messy context, a community that both nurtures and roasts — and riff on it through a few mathematical and logical lenses. Consider this both a field guide and a cheeky ethical manifesto: the Categorical Imperative for plotting the absurd.
## 1. Probability & Statistics: Don’t let randomness masquerade as revelation
Small datasets are nimble but noisy. A spike in a time series is an invitation, not a verdict. Probability theory reminds us to ask: what’s the null model? Bootstrap your confidence intervals, use Bayesian priors when data are sparse, and always show raw points beside smoothed trends. Rolling averages are comforting — but they’re also a soft-focus filter that can hide outliers or artifacts.
Two short rules: (1) annotate — show the events that might explain spikes, and (2) normalize — compare apples to apples. If you’re using Google Trends-style indices, say so. If you’re comparing regions, adjust for population. The math won’t lie for you, but sloppy interpretation will.
## 2. Information Theory & Kolmogorov: What’s the message? Is it compressed or just messy?
Information theory gives us a vocabulary for the useful parts of a viz. Entropy measures uncertainty — a flat, noisy plot has high entropy but low insight. Kolmogorov complexity (don’t panic) asks: how compressible is your pattern? If the most concise description of your dataset is “random noise,” that’s data you should treat like a quirky anecdote, not a headline.
Conversely, a low-entropy pattern in a tiny dataset can be interesting precisely because it resists randomness. That’s where hypotheses live: autocomplete artifacts, meme contagion, or a regional quirk. Ask: how many bits of explanation does this chart buy me?
## 3. Topology & Geometry: Maps and the lie of projections
Maps are seductive liars. Plotting every store logo on a map is wallpaper; turning space into meaningful structure takes a little topology. Isochrones (10/20/30-minute drive zones) convert point data into service areas. Voronoi tessellations and density maps show market saturation; animated openings reveal strategy in motion.
A topological perspective helps when the metric matters more than the location. Are you interested in adjacency (graph theory), reachability (shortest paths), or coverage (set unions)? Use the right space: sometimes a street-graph visualization or a flow map tells a story a scatterplot can’t.
## 4. Graph Theory & Combinatorics: Sankeys, flows, and awkward family stories
Networks are natural for Sankeys, customer journeys, and the dad’s Halloween log where Batman always chooses Snickers. Graph theory gives you metrics (centrality, modularity) that turn anecdotes into testable claims. Combinatorics reminds us of multiple comparisons: if you partition costumes into 24 bins and then chase any correlation you find, you’ll find something — because luck is prolific.
Be honest about multiple testing and pre-registration, even in hobby projects. Tell your viewers when the pattern was hypothesized and when it was discovered. It’s not moralizing; it’s making it reproducible.
## 5. Logic & Category Theory: The punny moral backbone
Here’s where I get petulant and bring my Kantian cat pun to the party. A categorical imperative for dataviz might read: “Act only on visualizations that you would allow everyone to reproduce with the same data and still call honest.” That is, don’t make plots that require secret smoothing, hidden filters, or arbitrary axes to look good.
Category theory’s notion of morphisms (structure-preserving maps) is a nice metaphor: your transformation pipeline should preserve the relevant structure of the data. If your cleaning step destroys the thing you claim to measure, that’s a categorical crime.
Modal logic — the logic of possibilities — is handy, too. Ask not just “what is” but “what could be”: could this spike have been caused by a meme, a bot, or a weather event? Developing causal narratives requires counterfactual thinking; explicitly sketch them if you can.
## 6. Aesthetics, Perception, and Cognitive Load
Design and math are in a complicated marriage. Gestalt principles, color perception, and pre-attentive attributes dictate what viewers actually see. A rainbow palette can make people feel bad; a bad legend makes your viz unreadable. Optimize for the human: reduce cognitive load, emphasize comparison points, and be consistent.
Sometimes the best math is the heuristic: 3–5 colors, avoid area-to-encode ratios unless you want people squinting, and never let a pie chart fight a bar chart — the bar wins.
## 7. Two Sides of the Argument: Art vs. Science
There’s a delicious tension: dataviz as storytelling and dataviz as evidence. The art side prizes framing, narrative, and emotional punch; the science side demands reproducibility, uncertainty, and rigor. Both are right. Use narrative to motivate, but use statistics to verify. Annotation is your friend: tell the story but hand over the data and code.
Small datasets exacerbate this tension. They reward storytelling because patterns look clearer, but they punish overclaiming. Be generous with caveats and stingy with headlines.
## Practical Imperatives (Yes, rules)
– Show raw points and smoothed lines together.
– Annotate events and data provenance.
– Share a tiny CSV or a pastebin link with every post.
– Prefer reproducibility over prettiness when in doubt.
– If you’re going to normalize, label it boldly.
Do these and you’ll be beloved by both the pedants and the charmingly weird contributors.
## Closing: a polite dare
If you want to learn math through plotting, start with your dad’s Halloween log. Track times, costumes, candy brands — treat it like a tiny experiment. Use a Sankey for costume→candy, a ridgeline for arrivals by year, and a small-multiples map of the block. You’ll practice probability, graph theory, and persuasion all at once.
So here’s my parting cat-and-Kant question (and yes, I’ll be slightly bitchy if you ignore it): when you make a chart that convinces you, how will you convince a skeptic? What would they demand to reproduce your claim — and are you willing to give it to them?