Saturday, November 16, 2019

Draft Feature List for Aligorith's Unnamed/Hypothetical Programming Language - 2019 Edition

Perhaps it is finally time to think about creating my own programming language? It would be that long-term passion project that I can truly have complete control over, with the potential for truly minimal dependencies, and in which I can finally build that proper "property system" + "node evaluation engine" + UI toolkit" that will fully work the way that I want it to, and then use this as the basis for all the projects in my backlog which will require similar structural/framework support.  (I could also formalise that documentation system I've been using on all personal projects - the one I documented here once many years ago).

I'm not that crazy yet...   yet!

Then again, I have been thinking about these matters more and more over the past year, so it's highly possible that this may still happen. If I can muster up enough time/energy away from everything else I'm working on (OR if external constraints expedite this process for whatever reason).

So, if I was to write my own programming language/environment, what can you expect it to have. Here is an incomplete list of the important things I'd do.



Non-Negotiables

0) I do not give a damn if anyone thinks something should be done "because that's the proper way" or "because that's the math should be done to be 'correct'" or "every one else does it that way" or "that's a standard approach". Obtaining global/widespread, year-on-year exponential growth is not a target at all - if by some fluke it happens, fine - but growth at all costs is not something I have any interest in.

1) Significant Whitespace 
   1a) Tabs as the Mandatory/Only-Supported Means of Indenting Code/Indicating Scope.  (Default tabwidth for all editors editing such files will be set to 4 spaces wide. If that can't be done as the tabwidth is unadjustable, the editor is to be deemed defective, and bug reports should be lodged in the appropriate places until it is fixed)
   1b) "Blank" (i.e. whitespace-only lines) within a block of code must be maintained in the source file. Whitespace stripping / incorrect whitespace on such lines will be a syntax error, as they would indicate the end of the scope of the preceding code.
   1c) "Catapillar If's" will be banned (i.e. constructs like the following will be a syntax error:   } else if {  <-- where the closing bracket for a block lives on the same line as the opening keywords/condition for the next block). Other constructions will be allowed (e.g. "braces always on separate lines" will always be allowed, and "opening brace on same line as control statement" will be allowed but only within a function / for type definitions).

If these whitespace requirements turn off a significant proportion of the programming community from using this language, then I consider it a 100% well justified feature decision! It means we've already eliminated most of the sloppy people who don't care enough, and the uber-opinionated ones from the spaces-only camp (I've had enough battles with those folk to want to keep them out of my codebases).


2) BigInts + Base-10 Decimals will be the default numeric types

The default numeric types will be BigInts and high-precision Base-10 decimals. Performance considerations be damned - or rather, absolute performance doesn't matter.  This way, we ensure that computation results will turn out "as expected" by default, without weird stuff happening. (The implementation though may have some optimisations so that "small-enough" common values are handled using the common/faster types, with seamless conversion to the safety-padded types later).

Other "traditional" numeric types will also be provided. Things like the 4/8/16/64 bit integers, and IEEE754 floats. Those will be strictly "opt-in" (i.e. you need to know what to specify for those when you need certain types for certain optimisation reasons (i.e. you know that a value has bounded values) or interoperability reasons (i.e. you're communicating using some binary format that the rest of the computing world uses).  For vectorised operations, the appropriate types for those will be used.

We could also explore the option of having a "global optimisation pass" that would check the usage of those types throughout the codebase, and infer which of the optimised types could be used instead for the best results (i.e. when it's very unlikely that over/under-flow can/will occur).

(See also the "Property System" and "Type Safety/Templating" sections for complementary features)


3) Type System

This will be a predominantly compiled language, but with the ability to run a real-time REPL for testing short snippets. Ideally, there'd also a way to implement introspection/dynamic-code-injection/execution in other processes written in this language (when the appropriate debugging hooks are attached in the other process's binaries) - e.g. so that in a debugger, you can issue code snippets to run on the slave-process's state, and see what those code snippets would evaluate as under the current state.

The main ideas with this type system will be:
   3a) You can specify types for your code if you want/need - where doing so will make it clearer what's going on
   3b) Like with Rust, variables in a method will default to being const/unmutable by default, and will have to be explicitly tagged as being modifiable if you want otherwise.
   3c) You can define code that doesn't specify any types at all (or with type constraints). This can be for all variables, or only some of the variables.

Underpinning all this will need to be a type inference engine. The ability to not specify types (or with just type constraints) means that this is one of the mechanisms for defining templated-code, but also


4) Data Model + Namespacing/Modules/Encapsulation/Data Hiding

The single biggest concept here is that access to data will be open by default, with promiscuous default sharing of data between types/around type families/across modules/across namespaces/etc.

The goal is to protect/enforce the notion that as the programmer, you are "God" of the software domain you're building. As such, you should be able to always have full control/visibility of everything should the need arise, and be able to do whatever you need to without artificial constraints getting in your way (since some problems are already hard enough as they are already, to not need additional bureaucratic nonsense). Hiding then is an explicit decision you need to make - it is done only when you need to identify something as particularly fragile/dangerous/volatile (e.g. an implementation detail that the caller shouldn't be worried about when using that component, and/or which is highly liable to change in the future). 

As far as access modifiers/controls go, they should be additive, with the following combinations able to be specified:
* Global (Default) - Everything in the program, everywhere can see the thing
* Module/Package/Namespace - Only things within the current unit can see this. (The additive version allows explicitly naming another unit that will have that same full-access. Combined with the "Language Extensibility" principles,
* Type Family - Every instance of the type can see it. The same applies for subclasses having full access to baseclass data, and baseclasses being able to reach into subclasses - the subclass versions of conflicting properties take priority in case of duplicates/overrides.  (The additive version allows specifying other unrelated types, similar to the "friend class" mechanism for having full access).
* Instance Local - Usage of this will require jumping through hoops to enforce, and will therefore be undesirable for general usage.


5) Enums + Switch/Case Statements

The enums in Rust (themselves derived a lot from those from Haskell, and apparently sometimes referred to as "Sum Types") are a nice starting point. We however want to go further (i.e. see note on the type B ones).

The "enums" we're describing here fall into 2 categories:
  * Type A - Name + Type - This is what Rust uses for things like Option<T> = { Some(T) | None }
  * Type B - Name + (Type) + Value - This is going beyond Type A and Rust's int-set (e.g. enum Kind { A=1, B=2, C=3 }). It allows specifying a bunch of Name = "some value/object-with-specific-values"  (e.g. allowing like a collection of preset values as a bunch of selectable constants)
 
Switch-case / Match statements then have the same full-power of Rust's ones, with the same de-structuring powers.  This is one of the powerful things from Haskell that Python partially supports (with tuples), and which is really powerful.


6) Dynamic Object /Value Construction

- All memory will always be zero'd when initialised/requested (i.e. "calloc" style zeroing).  It is possible to explicitly request no zeroing (i.e. malloc-style) for cases where all values will be set to proper values immediately after, and/or is an option for future automatic optimiser behaviour. However, that will be exception, and will be a pain that requires extra effort to request (i.e. we default to a "safe + easy to use by default".

- C++ style initialiser-lists "for calling relevant constructor overloads" will be supported as one style for how instances of a type can be initialised.

- C99 style ".member = " style will be recommended way of constructing new types for most simple-ish data

- Rust-style "all or nothing" will NOT happen. This is an anti-pattern, and a serious PITA that harms API usability (and leads to abominations like the "fluent" api's).  It is also unnecessary when everything is automatically zeroed + constructors can exist to set these values.


7) Language Extensibility - Templates/Generics. Inlining, Custom Control Structures, and Operator Overloading

As with the data model section, again the principle here is "full power to the programmer". If the programmer wants to do something, the computer should get out of their way and just let them get on with it. (If anything, the computer should bend over backwards to make such things work better than a mere hack would)


9) Standard Library

* "Batteries Included" - Python Style. Most important modules are bundled / available in default install.

* Try to achieve solve the stale-old-cruft vs too-many-undiscoverable-options problem that every other system currently has:
   * Version 1) Have seamless bundled <-> package manager integration so that old modules can be pushed out of newer releases to die in the package manager repo, while promising new modules can be promoted to standard library.
   * Version 2) Or perhaps, all modules live in package manager, but the core language release includes links to "blessed modules (with relevant version numbers)" that are the defacto standards for that release.  If you need any newer versions of any "builtin" libraries, you'll need to explicitly request those.

(Since I began writing this wishlist, I've come to view the second option as being more promising. An influential article that has shifted my viewpoint on this issue is - "Considering Python's Target Audience | Curious Efficiency". Reading this, I personally fall more in the "scripter" category he identified in terms of the ways that I prefer + believe that things should be done ;)

* It may also be worth exploring expanding on the "version number" idea in QML's imports (per source file). That said, the main gripe I have with those is always that it's hard to know what you've got installed / what you can use.


10) Overall Syntax Style

Somewhere between C and Python


To Be Decided


7) Const-ness, "Pointers", and Inlining
...






...

No comments:

Post a Comment