2 Pointers is one of the particular areas where I wanted to innovate in
5 Traditionally (in lanuages like C) pointers are both powerful and
6 dangerous. My goal is to control that power without extinguishing it,
7 so they become powerful with more clarity, and without the danger.
9 Pointers have always been overloaded - they can store either a pointer
10 to an actual object, or a 'nil' (or 'NULL') pointer, which isn't an
11 object and cannot be dereferenced safely. This provides power (it
12 enables simple finite linked lists) and danger (it is too easy to try
13 to dereference the `nil` pointer). In Ocean, pointers that can be
14 `nil` are a different (though related) type to those which cannot. A
15 pointer which can be `nil` may only be dereferenced after a test
16 confirms that it isn't, in fact, `nil`. This requires more careful
17 notation of the types of all pointers, but is easily checked by the
18 compiler so notation errors are easily caught and corrected.
20 As well as allowing `nil`, it is sometimes useful to allow other
21 non-pointer values to be stored in a pointer object. The Linux kernel
22 does this a lot, and there are two specific cases that are used.
23 In the first case, small integers (normally negative) can be stored in
24 a pointer variable. These are used as error codes. As the first page
25 of virtual memory and also the last page are not mapped, any pointer
26 with an absolute numeric value less than the page size cannot be a
27 valid pointers and can be treated as something else. Zero is used for
28 the `nil` pointer, negative numbers are used as errors, positive
29 numbers are largely unused. Ocean will make this pattern explicit in
30 the types: a pure pointer will not have one of these values and can be
31 dereferenced. A loaded pointer might have a reserved value and can
32 only be dereferenced after a test. If the test fails, the numeric
33 value can be extracted.
35 The second case relies on the fact that on all supported
36 architectures, any pointer to a 16-byte (or larger) value must be (at
37 least) 2-byte aligned, so the numeric value of the pointer will be
38 even. This means that odd values cannot be valid pointers, so those
39 values can be used for something else. This provides a secord, or
40 third, level of "loading" that can be declared for a pointer. The
41 Linux kernel uses these in hashtables that support lockless lookups
42 and movement of entries between chains. The loaded value is treated
43 like a `nil` pointer, but records which hash chain has come to an
44 end. If a search finds the wrong hash value at the end of the chain,
45 it know that it might have been diverted between chains and needs to
48 The odd pointer values can also be used in a different way. Instead
49 of indicating a different interpretation of the whole pointer, the
50 least significant bit can be treated as a separate value that is
51 stored in the pointer - to optimize space usage. A particular use for
52 this is as a lock bit, again in a hash table though this time in the
53 head. Before a thread may change a hash chain, it must atomically
54 change the least significant bit from zero to one. After the change,
55 it can clear the bit, possibly as a sige effect of updating that first
58 The Ocean language shouldn't insist that the various non-pointer
59 values are error codes, hash values, or lock bits, but it will make
60 them available for the programmer to use in a controlled way, and will
61 ensure that a pointer that can contain non-pointer values is always
64 Apart from `nil` pointer dereferences, the main danger with pointers
65 involves the interaction with memory allocation and freeing. If
66 memory is released (and then reused) while a pointer to the memory is
67 held, future dereferences of that pointer are likely to cause
68 problems. In order to avoid these errors, the language must ensure
69 that no such references remain when memory is freed. Once it does
70 that, it can help the programmer by actively freeing the memory once
71 the number of valid references reaches zero.
73 This tracking of references is sometimes done using a process referred
74 to as "garbage collection". All pointer that are (or could be)
75 active, and all locations they point to are marked. Anything not-marked
76 can be freed. This has a degree of conceptual simplicity and a degree
77 of practical inelegance. The implementation can supposedly be made
78 quite efficient, but it isn't the approach that I want to take.
80 The alternate approach is to enhance the type system so that the
81 language "knows" when a pointer is active or not, and how many active
82 pointers there are to any given location. When there is only one
83 pointer and it become inactive, the memory can be free. The simplest
84 implementation of this pattern is to store a reference-counter in each
85 allocated object, and to use it to keep count of all references. This
86 is a useful pattern and in some circumstances it is the best possible
87 pattern, but it is not the only valid pattern. Ocean will support
88 this natively by allowing a field on a structure to be identified as
89 reference counter, but it will allow other options as well.
91 Pointers can be classified as either "owned" or "borrowed" references.
92 This will probably be determined dynamically from code analysis,
93 though in some cases such as function parameters and returns, it must
94 be declared. An owned reference is one that holds a reference count
95 or by some other mechanism is an independent reference which will
96 prevent the object from being freed, until some action happens on the
97 owned references to releasse it. A "borrowed" reference is a
98 temporary reference that exists under the umbrella of some
99 identifiable owned reference. The language definition will require that
100 the borrowed reference have limited scope, and that the related owned
101 reference will not be released while that scope is still active.
103 A simple example of a borrowed reference happens when an owned
104 reference is passed as a parameter to a function which declares the
105 formal parameter to be a borrowed reference. Inside the function, the
106 borrowed reference cannot be used to release the reference, and the
107 owned reference much remain intact for the duration of the function
110 It may be that the owned reference cannot be uniquely identified. As an
111 example, there might be two owned references, and an conditional
112 statement assigns one or the other to a borrowed pointer. In this case
113 it isn't possible to know at time of analysis which owned pointer is the
114 primary, so both (or all) possible primaries must be preserved.
116 The "Scope" of the borrowed reference that the owned reference must be
117 preserved for will often be a lexical scope, but may not always be. I
118 imagine that the language may at some stage be able describe other
119 temporal relationships between different objects. In that case, the
120 borrowed reference must have a life time contained within the
121 guaranteed lifetime of the matching borrowed pointer.
123 I should say that "Garbage collection" will be an option, just not the
124 only or the default. The language can potentially identify exactly the
125 references that need to be checked. Maybe collectable pointers
126 reserve the lsb for mark, prior to sweep.
127 compiler would extract a description of places that gc refs can live,
128 and encourage them in specific domains.
130 We might also make an allocation domain a well defined concept.
134 There are a variety of sorts of owned references. They include:
136 - counted references. These are created by incrementing a
137 reference-count field, which is then decremented when the reference
140 - single references. Some objects are declared as only ever having a
141 single owned reference. This can be moved from pointer to pointer
142 but never duplicated. When the reference is destroyed, so is the
145 - inherited references. An ....
147 ## Rules for borrowing references.
149 Borrowed references can live in automatic variables and in structures
150 (including arrays). They can only live here as long as the owned
152 For automatic variables, we can (hopefully) deduce the relevant owned
153 references - it is what was copied to get a borrowed ref.
154 For refs in structures, we need to be explicit for borrowed refs
155 That means we need a language and a precise understanding.
156 We would need a parent or sibling or child object to have the ref..
157 Maybe we just declare the name of the object, where it has to be
158 passed in as an arg to be part of a parent.
161 ----------------------
164 How much of this could/should be implemented as Smart Pointers?
165 Rust uses smart pointers to implement
166 Nullable (Option<>), RefCount (rc<>) Atomic refcount (arc<>) and
168 It even has Box<> to create on heap instead of stack.
170 Why don't I just do that? Partly because I don't have classes yet!!
172 I like a simple syntax to test if a pointer is over-loaded.
174 and to get the overload value
177 Rust would use pattern matching.
178 if let Some(p) = pointer {
186 Rc() uses Rc::clone() to take a reference. I would rather that were
189 Rc and ARc need to be different. I don't want the difference to be
192 Box<> is interesting. It is the gateway to the heap. Maybe it is
195 I want to be able to store an owned pointer in an ADT without telling
196 the ADT the precise type. Either it returns an object to be freed by
197 caller, or a 'free' function is passed in.
198 ... NAH this can be done with normal type parameterization.
200 So the ownership-type needs to be transparent to a degree.
203 - an owned pointer of unknown disposition
204 - an owned pointer of specific discipline.
206 Rust uses & to specify a borrowed reference.
209 ^type is a borrowed reference
210 @type is an owned reference with discipline defined by type.
211 discipline@type is an owned reference of a given discipline.
214 How do I want to handle mutability?
215 Function can usefully have 'const' markers.
216 What are the risks when single-threaded? We could extract a value,
217 accidentally change something, then depend on the value.
218 We could probably do that anyway...
220 Pointers are normally (default) overloaded with a small integer. +/2047
221 Pointers can have status attributes when stored in structures or
222 passed to or from functions.
223 - "pure" means there is no small integer
224 - "loaded" means there is large-integer overloading - if lsb set.
225 - "init" means the target is being initialised
226 - "exit" means it can be torn down
227 - other words can reflect stages in life cycle
228 - still other words might reflect locking status.
229 There can be several locking statuses, and several life-time statuses,
230 as these can affect different fields differently.
232 In a struct each field can have a mapping from internal states to
233 external state. (init=foo, locked=foobar, rlocked=baz)
234 And internal state not listed in inheritet from external states (a=a).
236 For a function, returned value attribute can be copied from args, as
238 fun foo(a:thing, b:thing) : <a|b>
242 fun foo<x:thing>(a:x, b:x) : x
244 ------------------------
245 Years later - March 2021
247 I need somewhere to start so I need to be able to ignore lots of this detail.
248 So in the first instance all references are counted references. They must refer
249 to a struct that contains 'refcount'.
251 A reference is declared with
253 and the base object can be accessed with
255 though this can sometimes be inferred from "name".
258 will find 'foo' either as an attribute of name, or of what name refers to.
259 If name refers to a reference, this recurses.
261 A ref can be checked with "name.valid"
263 A new object can be allocated with "name.new()", which returns the ref.
264 So "name.new().valid" is true if the allocation succeeded, which is always
267 Future ideas might include:
268 type name : ref(attr,list) basetype
269 where attr,list can include borrow,counted,single, etc