Hum… The answer might surprise you, but there is a real reason why I implemented LispE in the first place, and it was not to create a new Lisp dialect, this came later as a side-effect.
I’m researcher in Computational Linguistics, and I implemented a syntactic analyser during my PhD thesis, which was used as the main tool for research in linguistics in my laboratory for 17 years (from 1999 to 2016). If you look for XIP or Xerox Incremental Parser, you’ll find European projects, articles and patents, which all rely on it.
Let’s be honest, the advent of LLMs has made the whole endeavor quite moot.
Dealing with texts was quite complicated back then, and we had to deal with heterogeneous encodings all the time. Texts were often encoded in Latin, instead of UTF-8, and the Japanese team used a local encoding that was not UTF-8 neither… Big, big mess… I tried to use Python, but it sucked (and still suck) at handling multi-encoded strings, something that was quite common back then, as almost no dataset had been curated for encodings… A real nightmare, especially for languages such as French or Spanish, which use accented letters.
So I implemented my own language, which is called Tamgu now, to handle these problems (see https://github.com/naver/tamgu). There was a lot of work on string manipulation, with many of the main functions such as find or replace being implemented with SIMD instructions. For instance, encoding conversions for Latin, UTF-8, UTF-16 and UTF-32 rely on this SIMD for fast responses.
Tamgu is now used in production and has become a huge project. However, since the language implementation is now too large, I decided to create a very simple demonstrator in C++ to show how Tamgu actually works, to help other people contribute to it.
And Lisp is the easiest language to demonstrate how an AST interpreter works.
However, I also became very interested in Haskell and Array Languages (such as APL) . But since certain concepts eluded me, I used LispE as a platform to experiment with these concepts. For instance if you have a look on: https://github.com/naver/lispe/wiki/5.4-A-la-Haskell, you’ll see how I managed to replicate function composition.
I also experimented with many of the different APL operators such as scan, reduce, rho etc. which I implemented to also better understand their internal working.
Furthermore, LispE is also a Python library, which was a way to better understand Python API. Actually, LispE has been created in such a way that creating external libraries is really simple. Much much simpler than Python by the way. It basically consists of deriving a new class from LispE base class.
LispE also proved simple enough to experiment with Web Assembly.
Finally, LispE is also a very fast multithreaded language, which is almost lock free.
I love LispE, because I can tinker with it as much as I want. Adding a new instruction to the langage takes a couple of minutes. It has many features that are absent from Common Lisp, such as an array based implementation for lists, that makes it the perfect language for Advent of Code or Leetcode.
See: https://github.com/naver/lispe/wiki/6.20-Conway-Game-of-Life-in-LispE for an example of how LispE can help you now.
You can even used it as a terminal to handle Unix commands: https://github.com/naver/lispe/wiki/7.-Shell, especially that I developed my own terminal editor (see https://github.com/naver/lispe/wiki/1.2-Jag:-Terminal-Editor-With-Mouse-Support-and-Colour-Highlighting)
I have more than 30 years of professional experience in implementing parsers and languages in C++ and I thought that at the end of my career it will be fun to share this experience with others.
You are right, I implemented other methods to access elements in a list, which do not rely on cdr or car (see nth operator). Actually, I also implemented a parallel Linked Lists, where these functions make more sense. You can choose which structure is more adapted.
(list 1 2 3) ; array list
(llist 1 2 3) ; linked list
The “llist” behaves exactly as traditional lists.
I don’t use std::vector under the hood… :-)
Not exactly. Javascript strings are encoded as UTF-16 characters. Each element in this array of integers is a UTF-16 code point… So basically most characters are encoded on int16_t, except for very large codes, such as emojis, which are usually implemented on two int16_t.
Very interesting remark. Basically, a gap buffer is a linked list in disguise, which I wanted to avoid since most implementations become much slower when your data are no longer contiguous in memory.
And what you propose is basically what I have implemented. My list is implemented as two different objects: ITEM and LIST, where ITEM contains a buffer that can be extended at will and LIST contains a pointer to ITEM and an _offset_ value.
When I do a _cdr_, I build a new LIST object that shares the same ITEM pointer as the current list but with an offset incremented by 1.
The implementation is here: https://github.com/naver/lispe/blob/master/include/listes.h
I’m really interested if you have some news about this very topic. Seriously!!! Handling strings was really a hassle in WASM and if there is now a better solution I’m really excited to this prospect.
Do yo have any pointers?
This is a very complex problem. Maybe this will have some interest for you:
https://github.com/naver/lispe/wiki/6.1-Pattern-Functions
Here is how fizzbuzz is implemented: