Pointers is one of the particular areas where I wanted to innovate in the design of Ocean. Traditionally (in lanuages like C) pointers are both powerful and dangerous. My goal is to control that power without extinguishing it, so they become powerful with more clarity, and without the danger. Pointers have always been overloaded - they can store either a pointer to an actual object, or a 'nil' (or 'NULL') pointer, which isn't an object and cannot be dereferenced safely. This provides power (it enables simple finite linked lists) and danger (it is too easy to try to dereference the `nil` pointer). In Ocean, pointers that can be `nil` are a different (though related) type to those which cannot. A pointer which can be `nil` may only be dereferenced after a test confirms that it isn't, in fact, `nil`. This requires more careful notation of the types of all pointers, but is easily checked by the compiler so notation errors are easily caught and corrected. As well as allowing `nil`, it is sometimes useful to allow other non-pointer values to be stored in a pointer object. The Linux kernel does this a lot, and there are two specific cases that are used. In the first case, small integers (normally negative) can be stored in a pointer variable. These are used as error codes. As the first page of virtual memory and also the last page are not mapped, any pointer with an absolute numeric value less than the page size cannot be a valid pointers and can be treated as something else. Zero is used for the `nil` pointer, negative numbers are used as errors, positive numbers are largely unused. Ocean will make this pattern explicit in the types: a pure pointer will not have one of these values and can be dereferenced. A loaded pointer might have a reserved value and can only be dereferenced after a test. If the test fails, the numeric value can be extracted. The second case relies on the fact that on all supported architectures, any pointer to a 16-byte (or larger) value must be (at least) 2-byte aligned, so the numeric value of the pointer will be even. This means that odd values cannot be valid pointers, so those values can be used for something else. This provides a secord, or third, level of "loading" that can be declared for a pointer. The Linux kernel uses these in hashtables that support lockless lookups and movement of entries between chains. The loaded value is treated like a `nil` pointer, but records which hash chain has come to an end. If a search finds the wrong hash value at the end of the chain, it know that it might have been diverted between chains and needs to re-check. The odd pointer values can also be used in a different way. Instead of indicating a different interpretation of the whole pointer, the least significant bit can be treated as a separate value that is stored in the pointer - to optimize space usage. A particular use for this is as a lock bit, again in a hash table though this time in the head. Before a thread may change a hash chain, it must atomically change the least significant bit from zero to one. After the change, it can clear the bit, possibly as a sige effect of updating that first pointer. The Ocean language shouldn't insist that the various non-pointer values are error codes, hash values, or lock bits, but it will make them available for the programmer to use in a controlled way, and will ensure that a pointer that can contain non-pointer values is always properly checked. Apart from `nil` pointer dereferences, the main danger with pointers involves the interaction with memory allocation and freeing. If memory is released (and then reused) while a pointer to the memory is held, future dereferences of that pointer are likely to cause problems. In order to avoid these errors, the language must ensure that no such references remain when memory is freed. Once it does that, it can help the programmer by actively freeing the memory once the number of valid references reaches zero. This tracking of references is sometimes done using a process referred to as "garbage collection". All pointer that are (or could be) active, and all locations they point to are marked. Anything not-marked can be freed. This has a degree of conceptual simplicity and a degree of practical inelegance. The implementation can supposedly be made quite efficient, but it isn't the approach that I want to take. The alternate approach is to enhance the type system so that the language "knows" when a pointer is active or not, and how many active pointers there are to any given location. When there is only one pointer and it become inactive, the memory can be free. The simplest implementation of this pattern is to store a reference-counter in each allocated object, and to use it to keep count of all references. This is a useful pattern and in some circumstances it is the best possible pattern, but it is not the only valid pattern. Ocean will support this natively by allowing a field on a structure to be identified as reference counter, but it will allow other options as well. Pointers can be classified as either "owned" or "borrowed" references. This will probably be determined dynamically from code analysis, though in some cases such as function parameters and returns, it must be declared. An owned reference is one that holds a reference count or by some other mechanism is an independent reference which will prevent the object from being freed, until some action happens on the owned references to releasse it. A "borrowed" reference is a temporary reference that exists under the umbrella of some identifiable owned reference. The language definition will require that the borrowed reference have limited scope, and that the related owned reference will not be released while that scope is still active. A simple example of a borrowed reference happens when an owned reference is passed as a parameter to a function which declares the formal parameter to be a borrowed reference. Inside the function, the borrowed reference cannot be used to release the reference, and the owned reference much remain intact for the duration of the function call. It may be that the owned reference cannot be uniquely identified. As an example, there might be two owned references, and an conditional statement assigns one or the other to a borrowed pointer. In this case it isn't possible to know at time of analysis which owned pointer is the primary, so both (or all) possible primaries must be preserved. The "Scope" of the borrowed reference that the owned reference must be preserved for will often be a lexical scope, but may not always be. I imagine that the language may at some stage be able describe other temporal relationships between different objects. In that case, the borrowed reference must have a life time contained within the guaranteed lifetime of the matching borrowed pointer. I should say that "Garbage collection" will be an option, just not the only or the default. The language can potentially identify exactly the references that need to be checked. Maybe collectable pointers reserve the lsb for mark, prior to sweep. compiler would extract a description of places that gc refs can live, and encourage them in specific domains. We might also make an allocation domain a well defined concept. ## Other Ownership There are a variety of sorts of owned references. They include: - counted references. These are created by incrementing a reference-count field, which is then decremented when the reference is destroyed. - single references. Some objects are declared as only ever having a single owned reference. This can be moved from pointer to pointer but never duplicated. When the reference is destroyed, so is the object. - inherited references. An .... ## Rules for borrowing references. Borrowed references can live in automatic variables and in structures (including arrays). They can only live here as long as the owned reference is stable. For automatic variables, we can (hopefully) deduce the relevant owned references - it is what was copied to get a borrowed ref. For refs in structures, we need to be explicit for borrowed refs That means we need a language and a precise understanding. We would need a parent or sibling or child object to have the ref.. Maybe we just declare the name of the object, where it has to be passed in as an arg to be part of a parent. ---------------------- Later... How much of this could/should be implemented as Smart Pointers? Rust uses smart pointers to implement Nullable (Option<>), RefCount (rc<>) Atomic refcount (arc<>) and others. It even has Box<> to create on heap instead of stack. Why don't I just do that? Partly because I don't have classes yet!! I like a simple syntax to test if a pointer is over-loaded. if pointer? and to get the overload value pointer! Rust would use pattern matching. if let Some(p) = pointer { use p } in place of if pointer? : use pointer! Rc() uses Rc::clone() to take a reference. I would rather that were transparent. Rc and ARc need to be different. I don't want the difference to be that obvious. Box<> is interesting. It is the gateway to the heap. Maybe it is just syntax. I want to be able to store an owned pointer in an ADT without telling the ADT the precise type. Either it returns an object to be freed by caller, or a 'free' function is passed in. ... NAH this can be done with normal type parameterization. So the ownership-type needs to be transparent to a degree. i.e. I can have: - a borrowed pointer - an owned pointer of unknown disposition - an owned pointer of specific discipline. Rust uses & to specify a borrowed reference. Maybe ^type is a borrowed reference @type is an owned reference with discipline defined by type. discipline@type is an owned reference of a given discipline. How do I want to handle mutability? Function can usefully have 'const' markers. What are the risks when single-threaded? We could extract a value, accidentally change something, then depend on the value. We could probably do that anyway... Pointers are normally (default) overloaded with a small integer. +/2047 Pointers can have status attributes when stored in structures or passed to or from functions. - "pure" means there is no small integer - "loaded" means there is large-integer overloading - if lsb set. - "init" means the target is being initialised - "exit" means it can be torn down - other words can reflect stages in life cycle - still other words might reflect locking status. There can be several locking statuses, and several life-time statuses, as these can affect different fields differently. In a struct each field can have a mapping from internal states to external state. (init=foo, locked=foobar, rlocked=baz) And internal state not listed in inheritet from external states (a=a). For a function, returned value attribute can be copied from args, as can type I suspect fun foo(a:thing, b:thing) : or better fun foo(a:x, b:x) : x ------------------------ Years later - March 2021 I need somewhere to start so I need to be able to ignore lots of this detail. So in the first instance all references are counted references. They must refer to a struct that contains 'refcount'. A reference is declared with name : ref base-type and the base object can be accessed with name.ref though this can sometimes be inferred from "name". In particular name.foo will find 'foo' either as an attribute of name, or of what name refers to. If name refers to a reference, this recurses. A ref can be checked with "name.valid" A new object can be allocated with "name.new()", which returns the ref. So "name.new().valid" is true if the allocation succeeded, which is always will on Linux. Future ideas might include: type name : ref(attr,list) basetype where attr,list can include borrow,counted,single, etc