Let’s translate programming languages like Google Translate - part 2
Building a Python to Javascript transpiler
In the first part, we learned about the compiler and its internal workings using an url-compiler. Now let's continue with translating programming languages. In this blog, we will build a compiler to convert Python code to Javascript and use it in a Visual Studio code extension.
To convert one programming language to another we need to do perform these 4 steps
- Parse the code from the source programming language and convert it into meaningful tokens (Lexical analysis).
- Create an Abstract syntax tree(AST) from the generated tokens (Syntax analysis).
- Transform the AST in the source programming language into an AST in the target programming language (Transformation).
- Generate the code from the AST of the target programming language (Code Generation).
In our case to convert from Python to Javascript, we will follow the below steps:
- Convert the Python code to a Python AST
- Transform Python AST to JS AST
- Convert the JS AST to JS code.
Let's revisit AST before getting started.
AST is a hierarchical tree-like structure that is used to represent the syntax of a source code. The AST can be easily traversed and manipulated, so it is easier to convert the source code to an indented target format.
In our url-compiler, we created a parser that converts the URL into a tree-like structure. Similarly, a python compiler or a Javascript compiler will convert the source code into its corresponding AST format.
TIP: You can play with astexplorer.net to visualize, understand and manipulate ASTs.
Convert the Python code to a Python AST
We can achieve this in three ways
- Write our own lexer and parser from scratch.
- Use the python standard library parser.
- Create a parser using a parser generator based on python grammar.
We will skip the first two approaches since the first approach is time-consuming and for the second approach, we will have to use more than one first-party parser. For example, in the second approach, we can use Python's standard library for generating python AST but if we need to transform another language like Java or Rust then we need to use their corresponding parsers.
We shall use the third approach where we will create a parser using a parser generator that efficiently generates the AST from a set of rules called grammar.
What is grammar?
Grammars are the language of languages. Similar to English grammar, behind every programming language there is a grammar that determines its structure. The grammar is generally represented in a specific format like Backus-Naur Form (BNF). The grammar for popular languages is widely available, so we can use these grammar files with suitable parser generators and generate the corresponding parsers.
A sample grammar for the if-else statement in python is shown below
if_stmt:
| 'if' named_expression ':' block elif_stmt
| 'if' named_expression ':' block [else_block]
elif_stmt:
| 'elif' named_expression ':' block elif_stmt
| 'elif' named_expression ':' block [else_block]
else_block: 'else' ':' block
Introducing ANTLR
The parser generator that we will use is ANTLR. ANTLR (ANother Tool for Language Recognition) is a powerful parser-generator that already has the grammar for most common programming languages. ANTLR takes the grammar file as input and generates a parser that can create and traverse AST's.
Follow this quick start guide to set up ANTLR in your machine.
Once ANTLR is set up, we can download the python grammar and generate the python parser with the target language as Javascript.
antlr4 -Dlanguage=JavaScript ./grammar/Python3.g4 -visitor
Executing the above command should generate the lexer, parser, and additional helper files for parsing python language in javascript. The generated parser receives Python code as input and generates a Python AST.
Transform Python AST to JS AST
Now that we have a Python AST, we can provide a custom visitor (similar to the url-compiler) to the Python AST and transform it to Javascript AST. Javascript has several popular parsers like Acorn, Esprima, Babel, etc. Since Babel is more popular and widely used, we shall convert the Python AST to JS AST in Babel format. Our custom visitor will traverse each node in the Python AST and transform it to JS AST.
const antlr4 = require("antlr4");
const { Python3Lexer } = require("./grammar/Python3Lexer");
const { Python3Parser } = require("./grammar/Python3Parser");
const { CustomVisitor } = require("./custom-visitor");
function getJsAst(input) {
const chars = new antlr4.InputStream(input);
const lexer = new Python3Lexer(chars);
const tokens = new antlr4.CommonTokenStream(lexer);
const parser = new Python3Parser(tokens);
parser.buildParseTrees = true;
const tree = parser.file_input();
return tree.accept(new CustomVisitor());
}
The custom visitor should extend the Python3Visitor
and implement all the necessary visit methods to return a Babel-supported AST format.
// custom-visitor.js
const { Python3Visitor } = require("./grammar/Python3Visitor");
const { tokens } = require("./grammar/tokens");
const t = require("@babel/types");
class CustomVisitor extends Python3Visitor {
constructor() {
super();
}
/* Visitor for return statement */
visitReturn_stmt(ctx) {
const children = this.visitChildren(ctx.children[1]);
return t.returnStatement(children[0]);
}
/* Visitor for function definition */
visitFuncdef(ctx) {
const id = t.identifier(ctx.NAME().getText());
let args = ctx
.parameters()
.children.filter((item) => !this.isTerminalNode(item))[0];
const params = args
? args
.getText()
.split(",")
.map((arg) => this.getIdentifier(arg))
: [];
const children = this.visitChildren(ctx.getChild(4));
const funcDef = t.functionDeclaration(
id,
params,
t.blockStatement(children)
);
return funcDef;
}
/* Visitor of other nodes */
}
Convert the JS AST to JS code
The transformed JS AST is in Babel-supported format, so we can use the generator from the @babel/generator
library and can convert the AST to Javascript code.
const generate = require("@babel/generator");
function toJavaScript(ast) {
return generate(ast).code;
}
Now that we have a Parser, Transformer, and Code generator, our compiler is ready.
We can now export getJsAst
and toJavaScript
in a lib and create a Visual Studio code extension to convert Python to Javascript. The source code is available in py-to-js and py-to-js-vs-code-extn repositories.
Hope you enjoyed reading this. Ciao for now :)