Thinking about types.

I’m still a long way from implementing any of this but I keep thinking about the sorts of types that I want “ocean” to have, so I thought I would write stuff down in the hope that it will stop looping around my brain.

Firstly, types will have names, names will be global to a module (probably a file) and will be a separate names space to names of objects like variables and functions. So yes, that means that types aren’t first class objects. Type equivalence will be name equivalence: if you have two types with the same structure and different names, they are different types.

It may be possible to have anonymous types in some circumstances, and they may well support structural equivalence with named types, but that will only happen if it makes it easier to write code … it is just a vague idea at the moment. Anon types would be a bit like under-specified constants – like 45.0 might be int or float or something else. Type analysis will decide what it must be, then it will always have been that.

There will probably be cases where the syntax allows either an object name or a type name, though only one of those would survive further analysis. In these cases we might want them to be syntactically different. I think I will use a “:” prefix to indicate a type. Specifically, some types will accept parameters, both constants and types being likely sorts of parameters, and variables being possible. To pass “4” and “int” you would write “typename(4, :int)” or similar.


The first group of types are scalars. These are simple things that the language knows about and that are always copied (i.e. no attempt is ever made to track multiple references to the one scalar) and they usually fit in a machine register.

Integers can be signed “int” or unsigned “uint“. These are probably 32bit – maybe 64. If you want a particular size, you can have i8, i16, i32, i64, ,u8, u16, u32, u64. I will probably support “int(N)” meaning an integer ranging from -N-1 to N, and the same for uint(N). “byte” might be a synonym for u8.

Floats are “float” or “float64” or “float128” etc.

A “number” will be a fraction with arbitrary large numerator and denominator.

Boolean” or “Bool” is “True” or “False” and “Order” is “Less” or “Equal” or “More” or something like that.

A “char” is a UNICODE codepoint.


A “struct” is a collection of fields. Each field is declared much like a variable as “name:type=value” though the initial value is optional.

struct complex:

might define a struct. A struct is for use inside a program only – never for export. The compiler is free to change the order of fields to improve alignment without wasting space. It is not possible to cast a pointer to a structure into some other sort of pointer – the language owns the internals, not the programmer.

A field in a struct can be named “_” (a single underscore) in which case it is treated as anonymous. If it is a scalar, then it must be the only field and the struct works a bit like a typedef in C. If it is a struct, then all the fields in that struct are imported into the parent, and there must be no name conflicts. If it is an array, then the must be only one and it is used whenever an array index operation is attempted on the struct.

Fields are accessed with standard dot notation, “foo.field" is a field of "struct foo".


A record is similar to a structure, but the internal layout is under programmer control. The way the data is stored in memory is well defined, so that memory can be written to a file or sent over a network or similar. A record can be declared to be “big endian” or “little endian” or “host endian” – though I don’t know yet what the default is. This applies to all fields in the record. If you want different fields to have different endianness, then you need a sub-record which is declared differently. The endian in the outer will set the default for the inner, but will not over-ride an explicit setting for the inner.

Charsets for strings and alignment and padding will probably also be controllable somehow.

While a struct can contain anything, a record cannot – it can only contain well-defined things. So a record can contain a fixed sized ints and Booleans and chars and other records. They can also contain arrays of these things. They cannot contain pointers or structs or other more esoteric things that we haven’t met yet.

Because the representation of a record is well defined, it is possible to cast the address of a struct to a pointer to an array of bytes, or to anything else that is well defined.


An array cannot exist as a named type, but a variable of a field in a record or struct can be an array. If an array is an anonymous field (named with an underscore), the struct will appear it for many practical purposes just like an array. An array is declared as [type:length] so:

struct months:

is a struct containing two twelve-element arrays. Array elements are indexed using standard name[index] notation.


A class is like a struct, but it can also have methods. A struct can hold function pointers in it which are a bit like methods, or it can hold a pointer to a separate struct of function pointers. A class might use either of these techniques, or it might do something else. It allows methods to be used, but leaves it up to the compiler to worry about implementation details.

Some sort of mechanism will be provided for declaring interfaces and sharing implementations, but I haven’t thought much about what this will be yet. I do expect there to be several internal implementation options, and that the programmer will have some opportunity to suggest a preferred approach.

One approach is that any struct can be “classified” (made into a class) by providing a set of methods and pointers to objects in the class would be implemented as fat pointers – two points together, one to the data, one to the implementation. This is exactly the interface used by the C-library “qsort” function.

Some fields in a class will be “private” to certain methods, others will be public parts of one or more interfaces.


There will be a number of different sorts of pointers. Some of them will imply “ownership” of the referenced object, and some won’t. Different sorts of ownership will be supported.

In the first instance I suspect that the only sort of ownership that will be supported is refcounting – so only classes and structs with an identified ref counter can be owned. Non-owning (borrowed) references will only be valid while some other designated owning reference remains valid. For example, a borrowed reference can point to a member of a structure as long as there is a valid pointer to the structure that is borrows from.

Later I hope to allow owning references that have an implicit refcount of one, and probably other variations.

Pointer arithmetic will not be supported. If you want to do arithmetic on memory addresses, you need an array. Pointers can only point to scalars and to structs/records/classes. In particular you cannot have a pointer to a pointer, though you could have a pointer to a struct containing just a pointer.

If “foo” is a pointer then most accesses to “foo”, including array member access and field access, access the thing that “foo” points to. Only assignment modifies the pointer itself. If you want to modify the whole of the thing pointed to by a pointer (which is a structure or similar) then the “copy” or “swap” statement will be used. I imagine “copy” and “swap” to be statements in the core language which take 2 variables (or fields or similar) and copy or swap the content. That would mean that swapping pointers isn’t easy … I wonder if that matters.

There is probably a lot more to say about pointers, but their time haven’t really come yet.


Enumerated types bother me. In C, the values in the type are global names, which feels a bit like name-space pollution. I could require a “type.” prefix, and it is not uncommon to see that sort of thing used in C – a common prefix for an enumeration – but it still feels a bit clumsy. It also introduce the typename into the object namespace, which I didn’t want. I’ll probably need to try things out and see what works. Possibly “:name” will find an enum with that name in any known type, and “:type:name” will disambiguate, when needed.

I suspect enums will look a lot like structs except that the names will be constants, not variables. No type will be needed and the value will still be optional.

In C we often want an enumeration of bits in a bit-field and Go has a syntax to make this easy – it seems like a hack to me though. I suspect I’ll just make the issue irrelevant by making such things unnecessary. One option is a “#” prefix operator which converts a number to a bit, so “#BUSY” is the same as a “(1 << BUSY)“. Another option is to have infix operators which operate between a bitset and a bit, so “flags +/ BUSY” and “flags -/ BUSY” will set or clear the “BUSY” bit.

Functions and procedures

Functions can be used in arbitrarily complex expressions so they really need to return precisely one value. Procedures can return any number of values so that can only be called in more restricted contexts. I think I want to maintain that distinction that Pascal had, rather than being like C and pretending they are all the same.

A function will be “name(parameters):return_type” while a procedure will have no return type, but (optionally) a second set of parameters separated by “::“. When calling a procedure, a multi-variable assignment statement can be used to collect the return values rather than passing them as special parameters. This can only work if the all names are being declared at this point, or if none of them are. I wonder if that is too restrictive.

Parameterized types

On top of all this, I want parameterized types – both integers and other types will be appropriate parameters, and when describing a function signature, there might be unbound types for which only an interface is given. Lots to think about there.

Error types

In the Linux kernel we have a practice where a pointer variable can hold an error code instead. An address with a signed-number equivalent between -1000 and 0 is treated as an error. The same thing can be done with positive numbers meaning success and negative meaning an error. Floating point has a somewhat-similar concept where a specific value – NaN – is not a number but is actually an error.

This is very powerful, particularly for function return values. You effectively get a cheap discriminated union which is either a useful value or an error. Providing the caller always checks for an error, things work nicely.

I would like to support this natively in ocean, at least for pointers and numbers that aren’t the full range of the bits used. Some sort of type annotation would say that an error code can be encoded is spare parts of the bit-space. A simple ‘is_err’ test could be used on any error-enhanced type to see if an error is present. The compiler would refuse to let an error-enhance value to be used until the error status has been tested. If I end up adding exception handling, the use of an erroneous value could trigger an exception.


I definitely want ocean to support strings natively, but that is hard – at least the witness of Python 3 seems to suggest that it is hard.

I think I want strings to be utf-8 encoded with a length (rather than nul termination), though there is a strong case for utf-16 in some cases.  Working in the ASCII subset needs to be trivial. Probably the difficult part is understanding what an interator looks like, and if there need to be different sorts of iterators – code bytes, code-points, graphemes, something else?  In the first instance, strings will be utf-8 with concatenation only.  When I need more, I’ll have to invent something.

Posted in Uncategorized | Leave a comment

I’m back…. but no pony.

It seems that I disappeared for a while… various reasons that I won’t go into.  I seem to be motivated again, thanks in part to pony.

Pony ( is a language which was the topic of a talk at in January 2018.  It was interesting and it motivated me, so that it good.  But I don’t like pony.  Why?

One reason is that pony doesn’t distinguish between expressions and statements.  Any structured statement can be used as an expression if it provides a value.  I found I particularly disliked that when I was looking at Rust a while back for, and I don’t see any reason to change my mind.  My current belief is that if you want statements where expressions are expected (and I sometimes do), then the language syntax should allow you to put statements there.  Ocean does that. Alternately, if you want a value to be determined by a construct complex enough to need a statement, then you should just use statements and assign a variable.

I don’t particularly mind the “c?a:b” construct in C, which is an if statement in the form of an expression, because it isn’t trying to be a statement.  The “a if c else b” in python is possibly better, but either is definitely better than “if a then b else c end” which pony uses.  Different things should be different.

The other thing that bothered me about Pony is the concurrency model.  It uses an “actor” based model where actors can send messages to other actors, and receive messages in return.  I’m sure this is a very powerful model and has an important place, but I wouldn’t want it as the only model in a language.  Actors cannot wait for a reply; instead, they must tell their correspondent to send a message back when they are done.  This amounts to “callback-oriented programming“, as used in node.js, where all sequencing becomes callback.  I think callbacks have their place, but it isn’t everywhere.  There are times when I would want to be able to wait for a response.  As far as I can tell, pony doesn’t allow for those times.

The up side

Pony isn’t all bad of course.  I puts a lot of focus on type safely and proper management of references.  The approach — described as “reference capabilities” — seems a little bit different to that take by Rust, and it would be well worth analysing the two side-by-side to see what lessons can be learned.

So while I’m not sold on using Pony, I may well learn from it.

Posted in Uncategorized | Leave a comment

A multitude of oceans.

I thought, when I chose the name “Ocean” for my new language, that I had checked that it wasn’t obviously in use.  If I had, I wasn’t very thorough.

In the first place, there is Ocean: A new systems programming language which looks like an experiment much like mine.  It is far from complete and the most recent date I can find is over 2 years ago.  Maybe it is still active, maybe not.  If I ever progress far enough that I think I have a serious language instead of just a little toy, I would certainly revisit that project to see if there is any real conflict, but for now we are both at a stage where choosing the same name is much like choosing the same name for our children – not really a problem.

However there is also the Ocean Programming Language Definition which looks more formal.  It dates from 1995 (nearly 20 years ago!) and was targeted at being a language for teaching compiler design with.  It may even still be in use for that, I don’t know.  How long can you hold on to a name for a language without having a portal on the web I wonder?

Finally there is the Ocean Language from Cadence Design Systems(youtube) which seems be a totally different beast – something for simulations I think.  That definitely seems like a viable contended for the name.

So I should probably change the name of my language … not just yet though.  It really is just a toy so there is no need to be hasty.  It might be worth thinking of options though.

The first to occur to me is “Mare”, being the Latin name of “sea”, and importantly the name used for the “seas” on the Moon. It does have the problem that, when spoken aloud, it could be confused with “mere”, and while “a mere language” has some attraction, I would rather avoid confusion.  Another possibility is “Oceanus”, again from the Latin and used for Oceanus Procellarum, again on the moon.

I’ll have to make a decision before my current registration of expires, but that is a couple of years yet and I may have lost interest by then…

Posted in Uncategorized | Leave a comment

Hello World

Yesterday I ran my first “ocean” program…

I’m currently working on building a simple interpreter so I can test out the next steps of my language design.  Yesterday I got it to the point where it could print out “hello world” and similar totally trivial things.  So this isn’t really a statement about the development of the language, but only development of the support software.  Even there  it isn’t much of a statement.

Still, it is exciting.  Hopefully I’ll have a blog post ready in a week or two which describes the next step in my language and provides a link to the code so you  can welcome the world too.


Posted in Uncategorized | Leave a comment

Other thoughts of parsing and linebreaks etc.

I just found which also has things to say about parsing with indents and line break.
Of particular interest was a link to by the author of “D”.

Posted in Uncategorized | Leave a comment

Two Dimensional Parsing is done.

It took much longer  than I would have liked to get to where I am now.  Partly the problem took a while for me to fully understand (though it seems so simple now).  Partly other life issues got in the way.

However I finally have a good understanding of how to handle line break and indents in a uniform and elegant way.  My parser generator can now generate parsers which use indents to resolve ambiguity and detect errors, and use linebreak to terminate somethings, but not other things as appropriate.  All the details are in my main blog.


Posted in Uncategorized | Leave a comment

lwn article on lexical issues

My third article on language issues was published a few days ago and has gathered quite a good collection of comments, some very interesting.

I really hadn’t foreseen the hate which some people seem to have towards the “leading 0 means octal” convention for number literals.  This is clearly something that Ocean will need to be careful about.

International character-sets also gain a lot of attention.  There are probably some subtleties that that I’ll need to be careful about, but I really want to keep it all as simple as possible… with the understanding that “just ASCII” is now impossibly simple.

Posted in Uncategorized | Leave a comment

Banner image credit

I should say that the banner image is used by permission and was taken by Heather Paul:

Original image is here.

Posted in Uncategorized | Leave a comment