The Categorical Imperative: Words, Wires, and the Math of Asking Better Questions

Generated image# The Categorical Imperative: Words, Wires, and the Math of Asking Better Questions

If linguistics is a messy, beloved party where grammar, pedagogy, and policy awkwardly share a bowl of chips, mathematics and logic are the people who bring labeled Tupperware: neat, often hard to love, but mysteriously calming. The original “Words, Wires, and the War on Good Questions” memo is a field guide for grads who want to be useful without sounding like a forum newbie. Read through a few mathematical lenses and you get not only slightly nerdier metaphors, but practical ways to make your research sturdier, safer, and — dare I say — smarter.

## Precision as formalization: logic, proofs, and good questions

The advice “be precise” is the linguist’s equivalent of “write down your definitions.” In logic-speak, a good question is one that can be mapped to a well-formed formula.

– Predicate logic teaches us to separate objects, properties, and relations. When you ask “Is this dialect?”, you should implicitly be asking: which features (phonological, syntactic, lexical) count as evidence and under what threshold? Make those predicates explicit.
– Modal logic gives a neat vocabulary for grad-school hedging: possibility, necessity, belief. Are you asserting that an analysis necessarily holds across communities, or that it is merely possible in noisy data?
– Proof theory reminds us that evidence and inference are distinct. A flashy correlation is not a proof; formalize what would count as one.

In short: the act of formalizing a question helps you see what you need to measure.

## Category theory: a highfalutin metaphor with real mileage

Category theory’s charm is that it forces you to think in terms of structure-preserving maps (morphisms) between contexts. If you hate abstract nonsense, think of it as a way to track what your model *translates* from theory into data and back.

– Functorial thinking: your theory is a category, your dataset is another category, and your analysis is a functor. Does the functor preserve the things you care about (e.g., are ordinal relations preserved when you aggregate tokens into types)?
– Adjunctions capture the trade-off between explanation and compression: the left adjoint tries to generate data from theory; the right adjoint summarizes data into theory. Useful when deciding whether a complex model is overfitting noise or capturing a genuine signal.

Category theory won’t write your IRB, but it will make you ask: what structure must survive translation for my claim to hold?

## Probability, Bayesianism, and the humility to carry priors

Good questions expect uncertainty. Bayesian reasoning teaches two things: state your prior beliefs (yes, explicitly) and update them with data.

– Priors are not subjective sin: they force you to articulate what you think before seeing the corpus. If your prior implies that infants never generalize irregular pasts, fine — show us the data that changes that belief.
– Model comparison trumps p-value worship. Which model predicts new data better? Cross-validation, Bayes factors, and predictive checks are your friends.

And remember: measurement error in speech data is a mess. Treat it like noise in a probabilistic model, not as an embarrassing footnote.

## Information theory and speech: entropy, redundancy, and why kids trip up

Information theory explains why spoken language and reading are cousins rather than strangers.

– Phoneme inventories are constrained by redundancy and noise: low-entropy systems resist misperception; high-entropy signals carry more information but are more vulnerable to noise.
– Literacy ties into probabilistic inference. Poor phonological representations increase uncertainty when mapping orthography to phonology; that’s a recipe for decoding trouble.

So interventions that boost the signal (clearer articulation, richer phonological exposure) reduce entropy in the child’s mapping problem.

## Graphs, networks, and the social spread of features

Language is social. Graph theory gives tools to model how features diffuse across communities.

– Node centrality predicts influencers; edge weights predict transmission probability. Want to understand an accent shift? Map the network, not just the grammar.
– Community structure interacts with identity: pronunciation changes can be contagious — but only within the right network topology.

Networks also remind you to sample wisely. Corpus sampling that ignores community structure will bias your estimates.

## Complexity, computability, and practical limits

Parsing, learning, and inference have computational costs.

– Formal language theory explains why some syntactic models are trivially learnable and others are not. Know whether your algorithm is polynomial-time or living in NP-complete hell.
– Kolmogorov complexity (informally: “how much description length do you need?”) helps you think about model parsimony: simpler explanations are better, until they stop fitting.

This is not philosophy for philosophers; it tells you whether your brilliant model will run on an institutional laptop or needs a grant to compute.

## Cryptography and privacy: wires get hacked; protect your corpus

“Lock down your data” is boring until it’s not. Math hands you concrete tools:

– Cryptography: encrypted storage, key management, and zero-knowledge proofs guard participant identity.
– Differential privacy gives a quantifiable privacy budget when releasing statistics from sensitive corpora.
– Homomorphic encryption and secure multi-party computation are getting practical enough to keep collaborative work alive without exposing raw transcripts.

If you don’t speak the math, at least learn the terms. It’s cheaper than rebuilding a trust relationship after a leak.

## A modest checklist (mathy edition)

– Formalize your question: name predicates, variables, and the space of possible answers.
– State priors and what evidence would change them.
– Think functorially: what structure must survive translation between theory and data?
– Model uncertainty explicitly; don’t hide measurement error.
– Respect network structure in sampling and analysis.
– Consider computational complexity before you commit to a method.
– Use cryptographic tools to protect sensitive data.

Wrap all that in the original field guide’s spirit: be precise, do your homework, and don’t make people grade your homework in public.

I like to end on a slightly provocative note, so here’s the question I can’t stop poking at in my sleep: if every discipline brings a different formal lens (probability, category theory, graph theory, cryptography), which lens should define the problem — and when is insisting on one lens just a way of avoiding a messier, but more realistic, pluralism?

Leave a Reply

Your email address will not be published. Required fields are marked *