When external data enters your program, you can’t really be certain of its type unless you validate it. That library output, that API response, and (most of all) that user input…are you sure it is what you think it is?
Until you check, the most accurate type to assign that data is one that means “I don’t actually know”.1
Making Assumptions
Let’s say you’re taking input from a user and expecting it to be a string. A naive approach would be to assume the data will always be what you expect:
No warnings, no problem, right? Not exactly… There are no warnings because we told the type checker the user input is always a string
. That’s why it’s happy to let us call string methods on data
.
But what if data
is sometimes undefined
(or anything other than string
)? In that case, this code will experience an uncaught TypeError
at runtime saying Cannot read properties of undefined (reading 'toUpperCase')
which may leave your program in a broken state.
This is where “unknown” can help.2
Unknown to the Rescue
Some languages explicitly include a type called “unknown” (e.g. TypeScript has one), while others have a type you can treat similarly (e.g. Python’s object
type effectively means “unknown”).
Whatever your language gives you, the general approach is the same:
- Assign the “unknown” type to the unverified data
- Explicitly validate the data’s relevant characteristics before you use them
- Be happy when your tooling warns you about unsafe assumptions you’re making
No Assumptions
Let’s ask the type checker to help us be more careful:
Perfect! We want those type warnings. We’re calling a string
method (toUpperCase
) on a value we haven’t confirmed is a string
. That’s risky.
To resolve the warning, we need to validate the assumption we’re making about data
’s type:
With each assumption you validate, the type checker “widens” its understanding of your data from its narrow starting point (unknown
) to a type with more characteristics (e.g. string
).
And now you can be sure what type of data you have.
Footnotes
-
This post is a response to questions I received on Reddit suggesting we always know a given value’s data type. For any data coming from outside your program, I think that assumption is risky! ↩
-
This is where a lot of people reach for an “any” type, but be careful! “any” is the opposite of “unknown” and effectively disables type checking by telling the type checker all assumptions about a value are safe. You probably don’t want that. ↩