In Rust I find myself gaining a good bit of type safety without losing ergonomics by wrapping types in a newtype then implementing Deref for them. At first it might seem like a waste, but it prevents accidentally passing the wrong type of thing to a function (e.g. a user UUID as a post UUID).
This and for the use case from the article we will hopefully gain pattern types in Rust soon.
They do not solve every
problem that constructive data modeling does but in my opinion a large portion of what actually occurs in everyday programs. Since they are zero-cost I'd say their cost-benefit ratio is pretty good.
Ada and Pascal also had handled the "encode the range in the type" nicely for decades.
I want to point out that, technically, using Deref for this is an anti-pattern, as Deref is intended exclusively for smart pointers. Nothing really wrong with doing this outside of some loss in opacity (and unexpected behaviour if you're writing a library), but it's worth pointing out
I don't really see the issue in providing Deref for a wrapper type like this. Could you elaborate? I'm not trying to gain full encapsulation, just trying to make sure I'm passing the right kind of wrapper, then using it transparently.
IME this is exactly backwards: type safety is mostly about names, everything else is a nice-to-have. Yes, you can bypass your name checks if you want to, but you can bypass any type check if you want to. Most relevant type relationships in most programming are business relationships that would be prohibitively expensive to express in a full formalism if that was even possible. But putting names on them is cheap, easy, and effective. The biggest win from typed languages comes from using these basic techniques.
Hmm, IME the preferred type systems are structural - a function shouldn't care what the name is of the struct passed to it, it should just work if it has the correct fields.
If someone encodes "Meter" and "Yard", your type system wouldn't provide any errors if a meter is used in a yard calculation or vice versa. If someone encodes "RGBColor" and "LinearRGBColor", both structs with 3 floats, your type system wouldn't provide any errors if a LinearRGB color is passed into an RGB calculation. You also wouldn't have any error if you accidentally passed a Vertex3 (again, struct of 3 floats) into your RGB calculation.
Preferred by me, I'm not trying to speak for anyone else. In fact I'd say it's a somewhat minority opinion.
When talking about types like `Meter` and `Yard` in a structural system, the "type" of the data is also data. In a nominal system that data is encoded in the type system, but that's not the only place it can be. For example, if I asked you how far the nearest gas station is, you wouldn't respond with "10", but rather "10 minutes", or "10 kilometres", etc. Both the value and unit of measurement are relevant data, and thus both of those would be part of the structural type as well.
{ unit: yard, value: 20 }
This is real, concrete data that you can see all at once. You can feed it into different functions, create aliases for it (unlike objects where you'd need to make snapshots or copies when they might change), compare it with other data to check equality, transmit it across networks, and work with it easily in other programming languages since they all understand basic types. When you stick with this kind of data, you can use general-purpose functions that work on any data rather than being locked into specific methods tied to particular types or interfaces - methods that won't exist when you move to different languages or systems.
In a nominal system you might end up with a generic Measurement<T> type that contains the unit inside, which can help with code reuse but it's not at the same level as pure data.
Structural types does not preclude having some names that prevent mix-ups. Haskell’s `data` keyword doesn’t let you confuse structurally-identical things.
> If someone encodes "RGBColor" and "LinearRGBColor", both structs with 3 floats, your type system wouldn't provide any errors if a LinearRGB color is passed into an RGB calculation.
It 100% would, unless you were silly enough to use a bare tuple to do it. Again, defining a type with `data` in Haskell wouldn’t get confused.
> Structural types does not preclude having some names that prevent mix-ups. Haskell’s `data` keyword doesn’t let you confuse structurally-identical things.
Haskell doesn't let you confuse structurally-indentical things because it is nominal, not structural.
I think that's backwards - ultimately everything on a computer is just bytes, so if you push that philosophy to the limit then you would write untyped functions and they can "just work" on any input (just not necessarily giving results that are sensible or useful if the input is wrong). The point of a type system is to help you avoid writing semantically wrong code, to bring errors forward, and actually the most important and valuable use case is distinguishing values that are structurally identical but semantically different (e.g. customer ID vs product ID, x coordinate vs y coordinate, immutable list vs read view of mutable list, sorted vs unsorted...).
I think the structural type approach leans heavily into the "computation is just data and its transformations", so it makes sense for it to treat data as the most important thing. You end up thinking less about classification and more about the transformations.
I'm not saying the nominal approach to types is wrong or bad, I just find my way of thinking is better suited for structural systems. I'm thinking less about the semantics around product_id vs user_id and more about what transforms are relevant - the semantics show up in the domain layer.
Take a vec3 for example, in a structural system you could apply a function designed for a vec2 on it, which has practical applications.
> I think the structural type approach leans heavily into the "computation is just data and its transformations"
But it's never "just data". My password is different in many ways than my username. Don't you ever log/print it by accident! So even if structurally the same, we MUST treat it different. Hence any approach that always only looks at things structurally is deeply flawed in the context of safe software development.
Yeah you bring up a good point. A { name: string } dict needs to be treated differently from a { user_pw: string } dict. The difference is that happens in the domain layer instead of the type layer.
That's no difference than using newtype structs. If you remove the extra layer you are left with `string` for both of them.
> The difference is that happens in the domain layer instead of the type layer
This view greatly reduces the usefulness of the type layer though, as that's the only automated tool that can help the domain layer with handling cases like this.
Bugs in the typechecker are rare (it's widely exercised if the language is at all popular) and generally fixed quickly. If you have an expression of type A you can be pretty confident you're getting a value that's passed through a constructor for type A.
Can a human encode something different by that than what they intended to encode? Certainly. But it's got the highest cost-benefit of any approach to double-checking your code I've found.
That's not what I mean. It's trivial to encode your types in such a way that it incorrectly implements domain logic and there is no meta-meta language enforcing correctness there.
How so? To the extent that you use your types with your code, getting your types wrong will lead to either type errors in correct code (in which case you notice and fix them) or overly loose types that will allow incorrect code to pass (in which case you don't get an actual bug unless you make a corresponding mistake in your code).
Depending on the language, you can define a PasswordString type that is entirely distinct from the string type. Perhaps it can be explicitly converted to a string (perhaps given a capability token). Then you have:
a { user_pw: PasswordString}
This is what it means to model the domain using types. It is not a separate layer, it is actually using the type system to model domain entities.
> Structurally typed languages can still support information hiding.
But you haven't hidden the information, it's still a string. You can put the string in a wrapper struct but in a structural system that's not really any different from putting it a list or map - the data is still exposed, and if someone writes code to e.g. log objects by just enumerating all their fields (which is a very natural thing to do in those systems) then it will naturally print out your password and there's not really any way to make it ever not.
> I guess the point is that you can model your domain using data as well as types.
You want both in your toolbox though. Restricting yourself to only having types that are essentially lists, maps, or a handful of primitives has most of the same downsides as restricting yourself to not using types at all.
You can implement the Stringer interface for the type which prevents it from being logged and since it's private, code from outside of the module can't enumerate it. Of course it's still accessible via reflection or memory dumps etc, but isn't that the case with Java etc? Storing a plain text password like this is a bad idea anyways.
I guess my point is that a structural type system can still allow for encapsulation.
> You can implement the Stringer interface for the type which prevents it from being logged and since it's private, code from outside of the module can't enumerate it.
Those sound like decidedly non-structural features. And couldn't you undermine them by passing it to a function that expects a different `struct { pw string }` and logs its contents?
Only in the same module. Outside, it's effectively hidden.
And yeah, structurally typed languages often have nominal features. They come in useful in a lot of scenarios! Unless you're talking about something like Clojure which is not statically typed.
> I'm not saying the nominal approach to types is wrong or bad, I just find my way of thinking is better suited for structural systems. I'm thinking less about the semantics around product_id vs user_id and more about what transforms are relevant - the semantics show up in the domain layer.
But that domain layer should make use of the type system! That's where the type system is most useful!
I've seen this debate play out before. Often the type theory people and the domain modeling people end up talking past each other.
I think type-theoretic safety is a completely different thing to the use of types (and names) in software-as-domain-modeling (for example, but not necessarily OO modelling). At different times, for different people, one perspective is more important than the other. It is important not to confuse the perspectives, and to value both of them, but also to recognise their strengths and weaknesses.
One theme that sometimes emerges is that the type theory people don't care about names at all. Not even field names. Taken to the extreme Customer( name: str; age: int) is just a Tuple[str, int]. The words "Customer", "name", "age" have no place in the code base.
My take is that when you are dealing with computer-scientific abstract things, e.g. a List of T, then there is no need to reference the domain entities; placeholder names like T, x, xs make sense. On the other hand, if you're writing an application that models domain semantics (eg. business rules), writing software amounts to modelling the domain, it should be easy to correlate software entities with the real-world entities that they model. To do this we use descriptive words, including names, domain events, activities and so on. e.g. List[Customer] not List[Tuple[str, int]]. Then again, you could replace all of the type names with A, B, C, ... and all the variable names with w, x, y, .... The example would end up as X[Y[Z,W]], the software would work exactly the same, and you might get some insights into the structure of the system. However, if you're in the business of building a user management system in general this is not going to fly for very long with your workmates or your client. You will have trouble onboarding new developers.
I think this is a superb point. If you look at Clojure and Rich Hickey's justifications for it's design, he talks a lot about designing systems that work with a world that is changing constantly, interfacing with systems where you can't always guarantee a stable contract, solving problems that use non-elegant models, dealing with state and time, and change in ways you can't predict. Eric Normand wrote a great article on this and he comes from a dual Haskel/Clojure background [1]. Nominal static type systems absolutely have their place, especially with closed systems where the abstractions are different.
I'm not sure which side you're putting me on, because I think named types are important for exactly the same reason that named fields are. A customer should not just be a pair of str, int in the same way that an age should not just be an int and a name should not just be a string (and using an int field called "age" is a poor substitute for using an actual age type).
Why not go further? Customer should be a subtype of Human, which contains subtypes DoesInteractWithCompany and HasNeverHeardOfUs. Customer should have subtypes for each country they live in, and PersonName should probably be split into FullName<FirstName, LastName> for flexibility.
Structural type systems mostly don’t support encapsulation (private members that store things like account numbers) without some sort of weird add on, while nominal type systems support encapsulation directly (because the name hides structure). The canonical example is a cowboy and picture that both have a draw method.
Go's type system in general is not structural. Only checks for whether a type matches an interface are structural, anything else is nominal. And since private methods and public methods in Go have different names, there is no question of needing to decide whether a private methods matches a method in a public interface.
TS doesn’t really. TS simply treats private fields as public ones when it comes to structural type checks. TS is unsound anyways, so not providing hard guarantees about field access safety is right up its alley. More to the point, if you specify a class type with private fields as a requirement, whatever you plug into that requirement has to have those private fields, they are part of the type’s public signature.
To get where structural type systems fall down, think about a bad case is when dealing with native state and you have a private long field with a pointer hiding in it used in native calls. Any “type” that provides that long will fit the type, leading to seg faults. A nominal type system allows you to make assurances behind the class name.
class Foo {
public bar = 1;
private _value = 'hello';
static doSomething(f: Foo) {
console.log(f._value);
}
}
class MockFoo { public bar = 1; }
let mock = new MockFoo();
Foo.doSomething(mock); // Fails
Which is why you'd generally use interfaces, either declared or inline.
In the pointer example, if the long field is private then it's not part of the public interface and you shouldn't run into that issue no?
_value is part of the type for Foo, it’s as if it was a public field. You can forge a reference to Foo by adding _value to your mock. TS deals with private fields by pretending they are public when it comes to matching. There are more rigorous ways have hiding and then revealing private state in structurally typed languages, but they involve something that is suspiciously like using a name, and really, it makes sense. The only way you can hide something and recover it later is via some kind of name (unless you can somehow capture the private structure in a type variable so it’s just passing through the parts that can’t see it).
You can do a lot just by hiding the private state and providing methods that operate on that private state in the type (using interfaces for example), but that approach doesn’t allow for binary methods (you need to reveal private state on a non-receiver in a method).
You can deal with receiver private state since you are exporting methods on the receiver as part of your type signature. If those methods are called, the receiver can see itself, structural typing works fine and you still have encapsulation. Eg
interface Foo {
fun doFoo()
}
You can call doFoo() on some value, and that value can refer to its private state that doesn’t appear in Foo.
However, if you want to see the private data of an argument, private data has to appear in the signature (or use nominal typing). The easiest example is an equality method that compares private state.
interface Foo {
fun equals(otherFoo: Foo): Boolean
}
The receiver of the equals call can still refer to its private data, but whatever you a provided for otherFoo is only guaranteed to have the equals method. You might be able to deal with this isn’t an opaque type:
interface FooModule {
export type t
fun equals(foo: t, otherFoo: t): boolean
makeFoo(): t
}
But you really aren’t doing structural typing anymore, and basically are using t like you would a name.
> everything is represented by a bitstring, a function shouldn't care what the name is of the bitstring passed to it, it should just work if it has the correct bits.
;)
Nominal type system can be built on top of structural type system with zero runtime overhead, but not vice versa (you'll have to add tags, which will take additional memory space).
The problem with nominal type systems, is that it needs to support parametrized types, otherwise it's hard to impossible to write reusable / generic code.
Reading TFA now, Pythons NewType seems to be equal to Haskells newtype. Yes, it's a hack for the type checker to work around existing language semantics and feels unergonomic at times when Parse, Don't Validate needs to fall back to plain validation, but I wouldn't call it neither weird nor arbitrary.
Python defaults uses nominal typing so isn't prone to this problem anyway.
The kind of "branding" I'm talking about is a hack only needed for structural typing systems. Consider something inspired by the C locale API, for example:
class RealLocale:
name: str
const C_LOCALE = RealLocale("C");
# Each of these can be passed to *some*, but not all, locale functions,
# which will check for them by identity before doing the logic for `RealLocale`.
singleton NO_LOCALE # used for both "error" and "query"
singleton THREAD_LOCALE
singleton GLOBAL_LOCALE
singleton ENV_LOCALE
In a structural typing system, it is impossible to write a function that takes a union including more than one of `{NO_LOCALE, THREAD_LOCALE, GLOBAL_LOCALE, ENV_LOCALE}`, since they have no contents and thus cannot be distinguished. You have to hack around it, by some kind of casting and/or adding a member that's not actually present/useful at runtime.
And this kind of need is quite common. So I maintain that structural typing is not a serious type system proposal.
(Then again, the man proponent for structural typing is TypeScript, which even in $CURRENTYEAR still lacks even a way to specify ubiquitous needs like "integer" or "valid array index").
I think a type system that permits a type (no_locale=()) | (thread_locale=()) | (global_locale=()) | (env_locale=()) would generally be considered structural. I think newtypes and structure field names are isomorphic in functionality. Where a nominal type system would use newtype(tag,X) a structural one could use struct{tag:X} (and this holds even if X is the unit/singleton type).
You mean like if you have two types which are identical but you want your type system to treat them as distinct? To me that's a data modelling issue rather than something wrong with the type system, but I understand how it can sometimes be unavoidable and you need to work around it.
I think it also makes more sense in immutable functional languages like clojure. Oddly enough I like it in Go too, despite being very different from clojure.
What type system are you thinking about? Is there a specific language you have in mind? Haskell is structural—it emphasizes pattern matching on type variants and whatnot—but it wouldn’t confuse two different structs.
This is very much backwards from most programming practices. It's much more common to have an operation that makes sense for many structurally different types, such as add(a, b) where a and b could be integers, floating point numbers, complex numbers, 3D vectors, 4D vectors, matrices etc.
Most programming happens in nominally typed languages where the above isn't trivially possible. To be fair that example is contrived, but you can imagine operations happening on large dicts where only a small subset of the fields are necessary.
This is actually a relatively common example of the benefits of either polymorphism (for the a.add(b) version) or generic methods/templates. You'll actually find libraries that have a method or template function like this in C++, Java, C#, and other commonly used static languages.
My peers and I work on a language centered around "constructive data modeling" (first time I hear it called that). We implement integers, and indeed, things like non empty lists using algebraic data types, for example. You can both have a theory of values that doesn't rely on trapdoors like "int32" or "string", as well as encode invariants, as this article covers.
As I understand it, the primary purpose of newtypes is actually just to work around typeclass issues like in the examples mentioned at the end of the article. They are specifically designed to be zero cost, because you want to not pay when you work around the type class instance already being taken for the type you want to make an instance for. When you make an abstract data type by not exporting the data constructors, that can be done with or without newtype.
The alternative to newtypes is probably to go the same route as OCaml and have people explicitly bring their own instances for typeclasses, instead of allowing each type only one instance?
I think OCaml calls these things modules or so. But the concepts are similar. For most cases, when there's one obvious instance that you want, having Haskell pick the instance is less of a hassle.
I like ML-style modules in principle, but in practice I like the ergonomics of typeclasses.
But while I did nearly half of my career in either OCaml or Haskell, I did all of my OCaml programming and most of my Haskell programming before the recent surge of really good autocompletion systems / AI; and I notice how much they help with Rust.
So the ergonomics of ML-style modules might be perfectly acceptable now, when you have an eager assistant filling in the busy work for obvious cases. Not sure.
The author seems concerned about compile-time range checking: did you handle the full range of inputs?
Range checking can be very annoying to deal with if you take it too seriously. This comes up when writing a property testing framework. It's easy to generate test data that will cause out of memory errors - just pass in maximum-length strings everywhere. Your code accepts any string, right? That's what type signature says!
In practice, setting compile-time limits on string sizes for the inputs to every internal function would be unreasonable. When using dynamically allocated memory, the maximum input size is really a system property: how much memory does the system have? Limits on input sizes need to be set at system boundaries.
I can say that I am not particularly concerned with compile-time range checking. I agree with you that it is a massive headache that is almost always a huge waste of time. Even in dependently-typed languages, tracking ranges and bounds ends up requiring an incredible amount of bookkeeping that definitely does not seem worth the effort in the vast majority of application code.
When I wrote this blog post, I used a very simple datatype because it was an extraordinarily simple example, but given many of the comments here, it seems it may have been too simple (and thus too contrived). It is only an illustration; don’t read into it too much.
These are possibly situations where I’d resort to a panic on the extra branch rather than complicate the return type.
Providing a proof of program correctness is pretty challenging even in languages that support it. In most cases careful checking of invariants at runtime (where not possible at compile time) and crashing loudly and early is sufficient for reliable-enough software.
I take it you didn’t read the whole thing, as the next example is NonEmptyList, which is a good compelling example. It’s also not hard to think of other examples from my own work: I can imagine a URL type that only exposes constructors to create well-formed URLs. Etc etc.
Really good examples will be rather domain-specific, so it’s perfectly understandable why Alexis would trust her readers to be able to imagine uses that suit their needs.
Perhaps it's because I'm not a haskeller but I'm not sure if I'm sold on encoding this into the type system. In go (and other languages for example), you would simply use a struct with a hidden Int, and receiver methods for construction/modification/access. I'm not sure I see the benefit of the type ceremony around it.
> Especially if I'm extracting an Int out of the type to interface with other code.
Why would you do this? As soon as you go "outside" the type you lose typechecker guarantees.
The whole point of the article is showing where the compiler can tell you when you're writing code that fails to consider some cases, and how use of `newtype` loses some of these guarantees.
In Rust I find myself gaining a good bit of type safety without losing ergonomics by wrapping types in a newtype then implementing Deref for them. At first it might seem like a waste, but it prevents accidentally passing the wrong type of thing to a function (e.g. a user UUID as a post UUID).
This and for the use case from the article we will hopefully gain pattern types in Rust soon.
They do not solve every problem that constructive data modeling does but in my opinion a large portion of what actually occurs in everyday programs. Since they are zero-cost I'd say their cost-benefit ratio is pretty good.
Ada and Pascal also had handled the "encode the range in the type" nicely for decades.
I want to point out that, technically, using Deref for this is an anti-pattern, as Deref is intended exclusively for smart pointers. Nothing really wrong with doing this outside of some loss in opacity (and unexpected behaviour if you're writing a library), but it's worth pointing out
Note that that language has changed:
https://github.com/rust-lang/rust/commit/58645e06d9121ae3765...
I don't really see the issue in providing Deref for a wrapper type like this. Could you elaborate? I'm not trying to gain full encapsulation, just trying to make sure I'm passing the right kind of wrapper, then using it transparently.
IME this is exactly backwards: type safety is mostly about names, everything else is a nice-to-have. Yes, you can bypass your name checks if you want to, but you can bypass any type check if you want to. Most relevant type relationships in most programming are business relationships that would be prohibitively expensive to express in a full formalism if that was even possible. But putting names on them is cheap, easy, and effective. The biggest win from typed languages comes from using these basic techniques.
Names are not cheap. It is hard to come up with a good consistent set of names even for a small system.
Hmm, IME the preferred type systems are structural - a function shouldn't care what the name is of the struct passed to it, it should just work if it has the correct fields.
If someone encodes "Meter" and "Yard", your type system wouldn't provide any errors if a meter is used in a yard calculation or vice versa. If someone encodes "RGBColor" and "LinearRGBColor", both structs with 3 floats, your type system wouldn't provide any errors if a LinearRGB color is passed into an RGB calculation. You also wouldn't have any error if you accidentally passed a Vertex3 (again, struct of 3 floats) into your RGB calculation.
Also, preferred by who?
Preferred by me, I'm not trying to speak for anyone else. In fact I'd say it's a somewhat minority opinion.
When talking about types like `Meter` and `Yard` in a structural system, the "type" of the data is also data. In a nominal system that data is encoded in the type system, but that's not the only place it can be. For example, if I asked you how far the nearest gas station is, you wouldn't respond with "10", but rather "10 minutes", or "10 kilometres", etc. Both the value and unit of measurement are relevant data, and thus both of those would be part of the structural type as well.
This is real, concrete data that you can see all at once. You can feed it into different functions, create aliases for it (unlike objects where you'd need to make snapshots or copies when they might change), compare it with other data to check equality, transmit it across networks, and work with it easily in other programming languages since they all understand basic types. When you stick with this kind of data, you can use general-purpose functions that work on any data rather than being locked into specific methods tied to particular types or interfaces - methods that won't exist when you move to different languages or systems.In a nominal system you might end up with a generic Measurement<T> type that contains the unit inside, which can help with code reuse but it's not at the same level as pure data.
The issue with this is that you can make mistakes with the general-purposed functions.
a function `fn convertYardsToKm(value: i32): i32` doesn't fail when you give it a weight.
Whereas in Rust you'd write something like this:
and your functions becomes `fn convertYardsToKm(value: Yard): Km`You can group them in an enum
(Note that it would be nice if we could refer to `Measurement::Yard` as a type vs have to add a distinct `Yard` type).That way there is no confusion what you're putting in, and what type the output is, which has resulted in for example an emergency landing https://en.wikipedia.org/wiki/Gimli_Glider#Miscalculation_du... and loss of a Mars probe: https://en.wikipedia.org/wiki/Mars_Climate_Orbiter
There 100% are unit systems implemented with types. (One example: https://github.com/goldfirere/units/tree/master/units)
Structural types does not preclude having some names that prevent mix-ups. Haskell’s `data` keyword doesn’t let you confuse structurally-identical things.
> If someone encodes "RGBColor" and "LinearRGBColor", both structs with 3 floats, your type system wouldn't provide any errors if a LinearRGB color is passed into an RGB calculation.
It 100% would, unless you were silly enough to use a bare tuple to do it. Again, defining a type with `data` in Haskell wouldn’t get confused.
> Structural types does not preclude having some names that prevent mix-ups. Haskell’s `data` keyword doesn’t let you confuse structurally-identical things.
Haskell doesn't let you confuse structurally-indentical things because it is nominal, not structural.
I think that's backwards - ultimately everything on a computer is just bytes, so if you push that philosophy to the limit then you would write untyped functions and they can "just work" on any input (just not necessarily giving results that are sensible or useful if the input is wrong). The point of a type system is to help you avoid writing semantically wrong code, to bring errors forward, and actually the most important and valuable use case is distinguishing values that are structurally identical but semantically different (e.g. customer ID vs product ID, x coordinate vs y coordinate, immutable list vs read view of mutable list, sorted vs unsorted...).
I think the structural type approach leans heavily into the "computation is just data and its transformations", so it makes sense for it to treat data as the most important thing. You end up thinking less about classification and more about the transformations.
I'm not saying the nominal approach to types is wrong or bad, I just find my way of thinking is better suited for structural systems. I'm thinking less about the semantics around product_id vs user_id and more about what transforms are relevant - the semantics show up in the domain layer.
Take a vec3 for example, in a structural system you could apply a function designed for a vec2 on it, which has practical applications.
> I think the structural type approach leans heavily into the "computation is just data and its transformations"
But it's never "just data". My password is different in many ways than my username. Don't you ever log/print it by accident! So even if structurally the same, we MUST treat it different. Hence any approach that always only looks at things structurally is deeply flawed in the context of safe software development.
Yeah you bring up a good point. A { name: string } dict needs to be treated differently from a { user_pw: string } dict. The difference is that happens in the domain layer instead of the type layer.
That's no difference than using newtype structs. If you remove the extra layer you are left with `string` for both of them.
> The difference is that happens in the domain layer instead of the type layer
This view greatly reduces the usefulness of the type layer though, as that's the only automated tool that can help the domain layer with handling cases like this.
It's not really automated though, it's just another layer of code written by a human, prone to the same types of human error.
Bugs in the typechecker are rare (it's widely exercised if the language is at all popular) and generally fixed quickly. If you have an expression of type A you can be pretty confident you're getting a value that's passed through a constructor for type A.
Can a human encode something different by that than what they intended to encode? Certainly. But it's got the highest cost-benefit of any approach to double-checking your code I've found.
That's not what I mean. It's trivial to encode your types in such a way that it incorrectly implements domain logic and there is no meta-meta language enforcing correctness there.
How so? To the extent that you use your types with your code, getting your types wrong will lead to either type errors in correct code (in which case you notice and fix them) or overly loose types that will allow incorrect code to pass (in which case you don't get an actual bug unless you make a corresponding mistake in your code).
> The difference is that happens in the domain layer instead of the type layer.
What's those layers you are talking about? In my domain-logic code I use types of course so there is no dedicated "type layer".
Depending on the language, you can define a PasswordString type that is entirely distinct from the string type. Perhaps it can be explicitly converted to a string (perhaps given a capability token). Then you have:
a { user_pw: PasswordString}
This is what it means to model the domain using types. It is not a separate layer, it is actually using the type system to model domain entities.
Structurally typed languages can still support information hiding.
I guess the point is that you can model your domain using data as well as types.> Structurally typed languages can still support information hiding.
But you haven't hidden the information, it's still a string. You can put the string in a wrapper struct but in a structural system that's not really any different from putting it a list or map - the data is still exposed, and if someone writes code to e.g. log objects by just enumerating all their fields (which is a very natural thing to do in those systems) then it will naturally print out your password and there's not really any way to make it ever not.
> I guess the point is that you can model your domain using data as well as types.
You want both in your toolbox though. Restricting yourself to only having types that are essentially lists, maps, or a handful of primitives has most of the same downsides as restricting yourself to not using types at all.
You can implement the Stringer interface for the type which prevents it from being logged and since it's private, code from outside of the module can't enumerate it. Of course it's still accessible via reflection or memory dumps etc, but isn't that the case with Java etc? Storing a plain text password like this is a bad idea anyways.
I guess my point is that a structural type system can still allow for encapsulation.
> You can implement the Stringer interface for the type which prevents it from being logged and since it's private, code from outside of the module can't enumerate it.
Those sound like decidedly non-structural features. And couldn't you undermine them by passing it to a function that expects a different `struct { pw string }` and logs its contents?
Only in the same module. Outside, it's effectively hidden.
And yeah, structurally typed languages often have nominal features. They come in useful in a lot of scenarios! Unless you're talking about something like Clojure which is not statically typed.
> I'm not saying the nominal approach to types is wrong or bad, I just find my way of thinking is better suited for structural systems. I'm thinking less about the semantics around product_id vs user_id and more about what transforms are relevant - the semantics show up in the domain layer.
But that domain layer should make use of the type system! That's where the type system is most useful!
I've seen this debate play out before. Often the type theory people and the domain modeling people end up talking past each other.
I think type-theoretic safety is a completely different thing to the use of types (and names) in software-as-domain-modeling (for example, but not necessarily OO modelling). At different times, for different people, one perspective is more important than the other. It is important not to confuse the perspectives, and to value both of them, but also to recognise their strengths and weaknesses.
One theme that sometimes emerges is that the type theory people don't care about names at all. Not even field names. Taken to the extreme Customer( name: str; age: int) is just a Tuple[str, int]. The words "Customer", "name", "age" have no place in the code base.
My take is that when you are dealing with computer-scientific abstract things, e.g. a List of T, then there is no need to reference the domain entities; placeholder names like T, x, xs make sense. On the other hand, if you're writing an application that models domain semantics (eg. business rules), writing software amounts to modelling the domain, it should be easy to correlate software entities with the real-world entities that they model. To do this we use descriptive words, including names, domain events, activities and so on. e.g. List[Customer] not List[Tuple[str, int]]. Then again, you could replace all of the type names with A, B, C, ... and all the variable names with w, x, y, .... The example would end up as X[Y[Z,W]], the software would work exactly the same, and you might get some insights into the structure of the system. However, if you're in the business of building a user management system in general this is not going to fly for very long with your workmates or your client. You will have trouble onboarding new developers.
I think this is a superb point. If you look at Clojure and Rich Hickey's justifications for it's design, he talks a lot about designing systems that work with a world that is changing constantly, interfacing with systems where you can't always guarantee a stable contract, solving problems that use non-elegant models, dealing with state and time, and change in ways you can't predict. Eric Normand wrote a great article on this and he comes from a dual Haskel/Clojure background [1]. Nominal static type systems absolutely have their place, especially with closed systems where the abstractions are different.
1: https://ericnormand.me/article/clojure-and-types
I'm not sure which side you're putting me on, because I think named types are important for exactly the same reason that named fields are. A customer should not just be a pair of str, int in the same way that an age should not just be an int and a name should not just be a string (and using an int field called "age" is a poor substitute for using an actual age type).
I was thinking of you as on the domain modelling side (same side I am usually on).
You are right, I should have gone further with the example and used Customer{ age: Age, name: PersonName}.
Why not go further? Customer should be a subtype of Human, which contains subtypes DoesInteractWithCompany and HasNeverHeardOfUs. Customer should have subtypes for each country they live in, and PersonName should probably be split into FullName<FirstName, LastName> for flexibility.
Structural type systems mostly don’t support encapsulation (private members that store things like account numbers) without some sort of weird add on, while nominal type systems support encapsulation directly (because the name hides structure). The canonical example is a cowboy and picture that both have a draw method.
Both Go and TS are structural and support encapsulation fine, I'm not sure why that would be an issue.
Go's type system in general is not structural. Only checks for whether a type matches an interface are structural, anything else is nominal. And since private methods and public methods in Go have different names, there is no question of needing to decide whether a private methods matches a method in a public interface.
TS doesn’t really. TS simply treats private fields as public ones when it comes to structural type checks. TS is unsound anyways, so not providing hard guarantees about field access safety is right up its alley. More to the point, if you specify a class type with private fields as a requirement, whatever you plug into that requirement has to have those private fields, they are part of the type’s public signature.
To get where structural type systems fall down, think about a bad case is when dealing with native state and you have a private long field with a pointer hiding in it used in native calls. Any “type” that provides that long will fit the type, leading to seg faults. A nominal type system allows you to make assurances behind the class name.
Anyways, this was a big deal in the late 90s, eg see opaque types https://en.wikipedia.org/wiki/Opaque_data_type.
Typescript had to support JS's quirks... :/
Which is why you'd generally use interfaces, either declared or inline.In the pointer example, if the long field is private then it's not part of the public interface and you shouldn't run into that issue no?
_value is part of the type for Foo, it’s as if it was a public field. You can forge a reference to Foo by adding _value to your mock. TS deals with private fields by pretending they are public when it comes to matching. There are more rigorous ways have hiding and then revealing private state in structurally typed languages, but they involve something that is suspiciously like using a name, and really, it makes sense. The only way you can hide something and recover it later is via some kind of name (unless you can somehow capture the private structure in a type variable so it’s just passing through the parts that can’t see it).
You can do a lot just by hiding the private state and providing methods that operate on that private state in the type (using interfaces for example), but that approach doesn’t allow for binary methods (you need to reveal private state on a non-receiver in a method).
Can you explain the last part more? I don't think I'm grasping what you mean.
You can deal with receiver private state since you are exporting methods on the receiver as part of your type signature. If those methods are called, the receiver can see itself, structural typing works fine and you still have encapsulation. Eg
You can call doFoo() on some value, and that value can refer to its private state that doesn’t appear in Foo.However, if you want to see the private data of an argument, private data has to appear in the signature (or use nominal typing). The easiest example is an equality method that compares private state.
The receiver of the equals call can still refer to its private data, but whatever you a provided for otherFoo is only guaranteed to have the equals method. You might be able to deal with this isn’t an opaque type: But you really aren’t doing structural typing anymore, and basically are using t like you would a name.> everything is represented by a bitstring, a function shouldn't care what the name is of the bitstring passed to it, it should just work if it has the correct bits.
;)
Nominal type system can be built on top of structural type system with zero runtime overhead, but not vice versa (you'll have to add tags, which will take additional memory space).
The problem with nominal type systems, is that it needs to support parametrized types, otherwise it's hard to impossible to write reusable / generic code.
The critical problem with structural typing is that it requires weird and arbitrary branding when dealing with unions of singletons.
Branding doesn't need to be weird and arbitrary, see Pythons NewType: https://docs.python.org/3/library/typing.html#typing.NewType
Reading TFA now, Pythons NewType seems to be equal to Haskells newtype. Yes, it's a hack for the type checker to work around existing language semantics and feels unergonomic at times when Parse, Don't Validate needs to fall back to plain validation, but I wouldn't call it neither weird nor arbitrary.
Python defaults uses nominal typing so isn't prone to this problem anyway.
The kind of "branding" I'm talking about is a hack only needed for structural typing systems. Consider something inspired by the C locale API, for example:
In a structural typing system, it is impossible to write a function that takes a union including more than one of `{NO_LOCALE, THREAD_LOCALE, GLOBAL_LOCALE, ENV_LOCALE}`, since they have no contents and thus cannot be distinguished. You have to hack around it, by some kind of casting and/or adding a member that's not actually present/useful at runtime.And this kind of need is quite common. So I maintain that structural typing is not a serious type system proposal.
(Then again, the man proponent for structural typing is TypeScript, which even in $CURRENTYEAR still lacks even a way to specify ubiquitous needs like "integer" or "valid array index").
I think a type system that permits a type (no_locale=()) | (thread_locale=()) | (global_locale=()) | (env_locale=()) would generally be considered structural. I think newtypes and structure field names are isomorphic in functionality. Where a nominal type system would use newtype(tag,X) a structural one could use struct{tag:X} (and this holds even if X is the unit/singleton type).
You mean like if you have two types which are identical but you want your type system to treat them as distinct? To me that's a data modelling issue rather than something wrong with the type system, but I understand how it can sometimes be unavoidable and you need to work around it.
I think it also makes more sense in immutable functional languages like clojure. Oddly enough I like it in Go too, despite being very different from clojure.
If I understand you correctly - in popular structurally typed languages, sure.
It seems ok in upcoming languages with polymorphic sum types (eg Roc “tags”) though?
What type system are you thinking about? Is there a specific language you have in mind? Haskell is structural—it emphasizes pattern matching on type variants and whatnot—but it wouldn’t confuse two different structs.
Haskell is nominal. In this context structural typing is the ability to confuse two different structs.
> should just work if it has the correct fields.
Correct fields by...name? By structure? I'm trying to understand.
By name, type, and structure. In typescript for example:
Then you can use this function on any data type that satisfies that signature, regardless of if it's User, Dog, Manager etc.This is very much backwards from most programming practices. It's much more common to have an operation that makes sense for many structurally different types, such as add(a, b) where a and b could be integers, floating point numbers, complex numbers, 3D vectors, 4D vectors, matrices etc.
Most programming happens in nominally typed languages where the above isn't trivially possible. To be fair that example is contrived, but you can imagine operations happening on large dicts where only a small subset of the fields are necessary.
This is actually a relatively common example of the benefits of either polymorphism (for the a.add(b) version) or generic methods/templates. You'll actually find libraries that have a method or template function like this in C++, Java, C#, and other commonly used static languages.
My peers and I work on a language centered around "constructive data modeling" (first time I hear it called that). We implement integers, and indeed, things like non empty lists using algebraic data types, for example. You can both have a theory of values that doesn't rely on trapdoors like "int32" or "string", as well as encode invariants, as this article covers.
As I understand it, the primary purpose of newtypes is actually just to work around typeclass issues like in the examples mentioned at the end of the article. They are specifically designed to be zero cost, because you want to not pay when you work around the type class instance already being taken for the type you want to make an instance for. When you make an abstract data type by not exporting the data constructors, that can be done with or without newtype.
The alternative to newtypes is probably to go the same route as OCaml and have people explicitly bring their own instances for typeclasses, instead of allowing each type only one instance?
I think OCaml calls these things modules or so. But the concepts are similar. For most cases, when there's one obvious instance that you want, having Haskell pick the instance is less of a hassle.
Yes, I may have accidentally spilled, but I prefer ML modules to typeclasses.
I like ML-style modules in principle, but in practice I like the ergonomics of typeclasses.
But while I did nearly half of my career in either OCaml or Haskell, I did all of my OCaml programming and most of my Haskell programming before the recent surge of really good autocompletion systems / AI; and I notice how much they help with Rust.
So the ergonomics of ML-style modules might be perfectly acceptable now, when you have an eager assistant filling in the busy work for obvious cases. Not sure.
The author seems concerned about compile-time range checking: did you handle the full range of inputs?
Range checking can be very annoying to deal with if you take it too seriously. This comes up when writing a property testing framework. It's easy to generate test data that will cause out of memory errors - just pass in maximum-length strings everywhere. Your code accepts any string, right? That's what type signature says!
In practice, setting compile-time limits on string sizes for the inputs to every internal function would be unreasonable. When using dynamically allocated memory, the maximum input size is really a system property: how much memory does the system have? Limits on input sizes need to be set at system boundaries.
I can say that I am not particularly concerned with compile-time range checking. I agree with you that it is a massive headache that is almost always a huge waste of time. Even in dependently-typed languages, tracking ranges and bounds ends up requiring an incredible amount of bookkeeping that definitely does not seem worth the effort in the vast majority of application code.
When I wrote this blog post, I used a very simple datatype because it was an extraordinarily simple example, but given many of the comments here, it seems it may have been too simple (and thus too contrived). It is only an illustration; don’t read into it too much.
What if I want a type called MinusIntMaxToPlusIntMax?
In other words the full range of Int?
Is newtype still bad?
In other words how much of this criticism has to do with newtype not providing sub-ranging for enumerable types?
It seems that it could be extended to do that.
These are possibly situations where I’d resort to a panic on the extra branch rather than complicate the return type.
Providing a proof of program correctness is pretty challenging even in languages that support it. In most cases careful checking of invariants at runtime (where not possible at compile time) and crashing loudly and early is sufficient for reliable-enough software.
These kinds of types are just a waste of time. It is going to be OneToSix or OneToSeven very soon...
It's just an example! Well, if you cannot come up with a good example, maybe you don't have a point.
I take it you didn’t read the whole thing, as the next example is NonEmptyList, which is a good compelling example. It’s also not hard to think of other examples from my own work: I can imagine a URL type that only exposes constructors to create well-formed URLs. Etc etc.
Really good examples will be rather domain-specific, so it’s perfectly understandable why Alexis would trust her readers to be able to imagine uses that suit their needs.
Yes, I didn't, and I don't think NonEmptyList should have its own type either. This is just busy work and not a good example either.
Perhaps it's because I'm not a haskeller but I'm not sure if I'm sold on encoding this into the type system. In go (and other languages for example), you would simply use a struct with a hidden Int, and receiver methods for construction/modification/access. I'm not sure I see the benefit of the type ceremony around it.
> you would simply use a struct with a hidden
In such languages that's the equivalent of a newtype in Haskell.
Isn't the whole article a discussion of the kind of guarantees such an approach (which can also be done in Haskell) cannot provide?
Right, I'm just unsure how valuable those guarantees really are. Especially if I'm extracting an Int out of the type to interface with other code.
> Especially if I'm extracting an Int out of the type to interface with other code.
Why would you do this? As soon as you go "outside" the type you lose typechecker guarantees.
The whole point of the article is showing where the compiler can tell you when you're writing code that fails to consider some cases, and how use of `newtype` loses some of these guarantees.
I don't think I'm gonna find a library that supports my OneToFive type.
Your type comes with its own library, and the compiler makes sure you're not misusing it by failing to consider a possible case.
If you need to use your OneToFive someplace that actually wants an integer, that happens at the boundary. Everything else is safer.
Title should been "names are not ENOUGH for type-safety" but then no one would have read it I guess...