Introducing Coyote

This is an early, pre-production release of Coyote – a new language for the Data-Centric developer. It’s a fast, lightweight language designed for data professionals with programming skills. It’s designed with features from both Functional and Object-Oriented languages while having a full-feature embedded SQL engine. The philosophy of the Coyote language is to incorporate the power of a full-fledged language with a built-in SQL database and OLAP stores so that the tight integration between both produces a seamless experience that adds power to Data Science and Data Analytics.

Why Another Language?

I have been developing data-oriented applications for decades, starting from CRM and logistics applications to large scale analytics platforms. I’ve had the good fortune to create applications using Procedural Programming, OOP, Functional Programming, Web Development, SQL, MDX, DAX, and Regular Expressions. No matter how great the language, paradigm, or platform was that I used, I always found myself wishing I could do a bit more. I wanted to be able to store native objects in a database and query them with SQL, I wanted to have functions and first class objects and still have OOP. One day, about a year ago, I took a leap and decided to start on a new language that would allow me to do all of these things without burdening the user with concerns over integration – and that’s how Coyote began.

The question I asked myself is: if I were to work on Kaggle-type analyses, what would I need my language to have in it’s toolkit? The answer to that is what drives the design of Coyote.

Speed

The language needs to be fast – the faster the better. Making it as fast a C or Assembly by compiling to machine code would be ideal, but the cost in flexibility would be too steep. Analyzing data is an iterative and creative process that would quickly feel cumbersome if we had to compile to machine code every time we changed a line of code. On the other hand, interpreted languages like R are ultimately flexible but they’re too slow to use in a production environment. Coyote’s goal is to be somewhere near the middle. It’s a single-pass bytecode compiler that runs in a VM written in Go – fast enough to compile and fast enough to use in production. It isn’t fully optimized yet, but it already can run a recursive Fibonacci nearly 10 times faster than R and nearly as fast as Python 3. In the future, it might be possible to go from the bytecode to machine code much like .NET and Java do.

Declared Variables and Scoping

Languages like R and Python allow you to assign variables on the fly which is convenient at first, but becomes a source of hard to find bugs inside of larger applications. A small misspelling of a variable you think was already assigned earlier will create a new variable without complaining. Or you might accidentally re-assign a variable that was supposed to hold strings with an integer value. Knowing that the variable you’re using had to be assigned and it had to be assigned with an unchangeable data type makes it possible for to catch fatal errors like these at compile-time , before the program runs.

Arrays

var x = new int[3]
x[0] = 100
x[1] = 101
x[2] = 1

println(x[1])
// 101

If the array is initialized at the same time as it’s declared, it takes the size of the initializer

var y = @[10,20,30]
println(y[1])
// 20

A declared but uninitialized array can be initialized (and sized) later

var z int[]
z = @[200,201,2]
println(z[1])
// 201

Multi-Dimensional Arrays

Dimensions in an array are delimited by commas. One comma indicates two dimension, two commas mean three dimensions, and so on. There is no practical limit to how many dimensions you can declare in an array.

var x = new int[3,3]
var y = 0
for i = 0 to 2 {
    for j = 0 to 2 {
        x[i,j] = y
        y = y + 1
    }
}
println(x[1,1])
// 4

As with regular arrays, the variable can be sized in advance

var m = new int[2,3,4]
var y = 0
for i = 0 to 1 {
    for j = 0 to 2 {
        for v = 0 to 3 {
            m[i,j,v] = y
            y = y + 1
        }
    }
}
println(m[1,1,1])
// 17

To declare a multi-dimensional array and initialize it at the same time, you can add [int,int] at the beginning of the declaration of the array elements:

var x = @[[3,3]0,1,2,3,4,5,6,7,8]
x[2,2] = 4
x[1,1] = 1
x[0,0] = 100
println(x[2,2])

Lists

Lists contain elements of different types like − numbers, strings, arrays and even another list inside it. A list can also contain a matrix or a function as its elements. List is created as follows:

var l = @{"One":1, "Two":2, "Three":3}

var veggies = list[string,float]
veggies$Tomatoes = 2.00
veggies$Celery = 3.50
veggies$Spinach = 2.75

println(veggies$Celery)
// 3.50000

List of arrays:

var x = @{
        "Q1":@["Jan","Feb","Mar"],
        "Q2":@["Apr","May","June"]
        }
println(x$Q2[1])
// May

Array of Lists:

var food = @[
    @{"Carrots":1.75,"Celery":3.50, "Onions":0.75},
    @{"Beef":4.55,"Pork":5.75,"Chicken":2.80}
]
println(food[0]$Celery)
println(food[1]$Pork)
// 3.5000
// 5.7500

Functions

Functions don’t have names, they return a function type which is stored in a variable. If you pass parameters, you must use a name:type expression followed by a return type if there is one. If there is a declared return type, it must be explicitly returned with the return keyword

var f = func(x:int, y:int) int {
    return x * y
}
println(f(4,5))
// 20

Function Closures

Functions are first class objects in Coyote, so they can be passed as parameters, stored in data structures, and returned from functions. A closure in Coyote is a function that is able to bind objects the closure used in the environment is was created in. These functions maintain access to the scope in which they were defined, allowing for powerful design patterns found in pure functional programming

Suppose you want a function that adds 2 to its argument. You would likely write something like this:

var add_2 = func(y:int) int {
    return 2 + y
}
add_2(5)
// 7

Now suppose you need another function that instead adds 7 to its argument. The natural thing to do would be to write another function, just like add_2, where the 2 is replaced with a 7. But this would be grossly inefficient: if in the future you discover that you made a mistake and you in fact need to multiply the values instead of add them, you would be forced to change the code in two places. In this trivial example, that may not be much trouble, but for more complicated projects, duplicating code is a recipe for disaster.

A better idea would be to write a function that takes one argument, x, that returns another function which adds its argument, y, to x. In other words, something like this:

var add_x = func(x:int) func {
   return func(y:int) int {
        return x+y
   }
}

var f = add_x(7)
println(f(5))
// 12

var g = add_x(10)
println(g(5))
// 15

OOP

Creating a new class creates a new type of object, allowing new instances of that type to be made. Each class instance can have attributes attached to it for maintaining its state. Class instances can also have methods (defined by its class) for modifying its state.

var myClass = class {
    int a
    int b
    sum(x:int y:int) int {
        return x+y+this.a+this.b
    }
}

var x = new myClass
x.a = 6
x.b = 4

println(x.sum(3,4))
// 17

var y = new myClass
y.a = 10
y.b = 20

println(y.sum(3,4))
// 37

Integrated Database

Like nearly every other language out there, interacting with a database often felt fragile to me. Applications needs to establish connections to outside data stores that can change name and location at any time, so we use config files or registries which we hope are up to date. Interacting with the data usually means loading an ORM that abstracts the lowest common denominator of multiple database flavors, or it means sending commands via strings which don’t get validated until they reach the data server. Then the data coming back needs to be converted to native data types with varying levels of success. Between the data and the programming language, there was always a rickety bridge.

Most won’t remember the venerable Foxpro/Visual Foxpro language which was probably the last of the 90’s era languages that had an integrated database. But those that do will remember easy it was to write inline-SQL right in the middle of procedural or OOP code. The data store was an integral and seamless which gave us expressive power that was downright addictive. Coyote’s aim is to bring back some of these same capabilities. You don’t have to send strings through a database connection. You can create tables, load them, and query them with statements that are part of the language like any other statement or expression. This kind of query is possible, for example.

var country = "France"

select
a.ProductID,
b.ProductName,
sum(a.Quantity) as TotalQty
FROM OrderDetail a
    join Product b on a.ProductId = b.Id
    join "Order" c on c.Id = a.OrderId
where c.ShipCountry = $country
group by a.ProductID,
         b.ProductName;

The entire range of SQL commands is available side by side and in-line with the rest of the code:

create table Person (
    first_name string ,
    last_name string ,
    age int not null
);

var ageNum = 27

insert into Person (first_name, last_name, age) values ('John','Smith',$ageNum);
insert into Person (first_name, last_name, age) values ('Mary','Jones',42);
insert into Person (first_name, last_name, age) values ('George','Constanza',66);

var df = select first_name, last_name, age from Person ;
showdata(df)

create table Person2 as select * from Person where age <50;
select * from Person2;

In the near future, OLAP storage will be included in the same way.

Coyote is still not yet complete; it’s currently in alpha version, but in the next weeks and months, the following features will be completed at which point it’ll reach version 1.0:

  • Built-in Concurrency
  • Line-by-line debugger
  • OLAP Engine
  • Web Services
  • Web UI
  • Cloud Integration
  • Additional Statistical Functions
  • Bulk-Loading Data
  • Modules
  • User-friendly error messages

Please visit Coyote’s github page If you want to know more and to take Coyote for a spin!

Special Thanks

To get to this point, I read (and re-read) several books on Compiler design, followed blogs written by incredibly talented developers, and I’ve pored over the source code of Lua, SQLite, and the Java Virtual Machine. I also benefited a great deal from the kindness of my fellow programmers who answered questions for me in Stackoverflow and Reddit.

I’d love to say I’m standing on the shoulders of giants, but I can’t say that quite yet. I haven’t even managed to claw my way up that high. Someday, I hope Coyote will get there, and when it does, it’ll be thanks for the most part to people far smarter than I who paved the way and took the time to show us how it’s done. Here’s a list of some which have been the most useful:

Crafting Interpreters
The Dragon Book
Writing a Compiler in Go
Engineering a Compiler
Language Implementation Patterns
The Definitive ANTLR 4 Reference
The Algorithm Design Manual
Computer Systems: A Programmer’s Perspective



Categories: Compiler, Compiler Design, Machine Learning

Tags: , , ,

2 replies

  1. What is your thoughts on how to connect to an external database? My thought goes towards something in the lines of PLSQL or T-SQL, allowing the use of SQL in the language, but perhaps there are other ideas? Getting the data using some expressive SQL, select into the internal engine and use that for additional aggregating and washing seems like an interesting proposition.

    • I think it’s a great idea. Maybe having a setting that puts the SQL variant in T-SQL mode or PL/SQL mode. Soon, I’m adding import options to the internal database using CSV, EXCEL, and JSON files but getting an import from an outside database would be very important as well. Thanks for the input!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: