Literate Programming

Literate programming is a programming paradigm introduced by Donald Knuth. A program is written in a natural language with snippets of code interspersed. From this text usable source code is generated, along with well formatted human-readable document.

The most important influence for this literate programming extension is the PBR Book.

This extension provides a set of tools that help the programmer writing literate programs. Through automation the process of writing literate programs should be as painless as possible. The programmer writes his literate programs using Markdown. When a literate programmer needs a snippet they can add a code fence. In this extension snippets are called code fragments. Building on the framework provided by Visual Studio Code this extension introduces code completion, code actions, definition provider, hover tooltips and a fragment explorer.

The approach for this extension is based on Markdown documents. For this extension the Markdown specification is only slightly adapted to make supporting literate programming easy. The code fragments are expressed in code fences as per the Markdown specification, either with surrounding triple backticks or triple tildes. Along with the programming language identifier the opening line has been extended to contain the fragment name and type essentially as options to the code fence. The opening line thus will look like py : <<fragment name>> to create a new fragment, or like py : <<fragment name>>=+ to amend an existing fragment.

The order of declaration of code fragments does not matter. It is possible to reference code fragments before they are created. It is thus possible to write fragments that reference code fragments that haven't been seen yet. The only requirement is that code fragments referenced by other code fragments eventually are created.

From fragment to source code

Fragments themselves don't directly create source files in most cases, but in the end source files is what is wanted from this extension.

To create actual source files a fragment creation line needs to be used with a slightly extended form of the creation tag mentioned above. The name has to be suffixed with the string .* between the chevrons. Furthermore a file name needs to be specified after the equal sign which is followed by whitespace and a dollar sign $. This is essentially a relative path that is going to be appended to each workspace folder as the root. A top-level fragment looks like py : <<top-level fragment.*>> = ./src/source.py $. The name of this fragment is top-level fragment, and once it has been fully extrapolated will it be written to a file in the workspace folder under src as the file source.py.

The Literate Programming extension allows the program author to write multiple projects in the same Visual Studio Code workspace. Each workspace folder is the root for its own literate project. Within each project there can be one or more literate files. These files carry the extension .literate. One literate file can contain zero or more code fragments. A literate file can also contain more than one top level fragment. In other words an author can create multiple source files within just one literate document.

This text describes the Literate Programming extension as a literate program.

Fragment Model

The tools provided by the Literate Programming extension are built around one repository of the project providing all necessary information around fragments.

The fragment repository handles parsing of literate documents, reacting to changes made by users. The repository provides all fragments found in the projects added to the current workspace. Additionally the repository will write out source files and rendered HTML files.

The fragment model is defined in the FragmentRepository class, which will be described in detail after introducing a couple of classes that help the repository.

FragmentMap class

The FragmentMap class holds a map of strings, which are the fragment names, and their associated FragmentInformation instances. This map is available through the map property. The class provides also a clear method and a dispose method.

<<fragment map>>=
class FragmentMap {
  map : Map<string, FragmentInformation>;

  constructor()
  {
    this.map = new Map<string, FragmentInformation>();
  }

  clear()
  {
    this.map.clear();
  }

  dispose()
  {
    this.map.clear();
  }
};

List of GrabbedState

The class GrabbedStateList holds an array of GrabbedState accessible through the list property. The class provides clear and dispose properties.

GrabbedState is collected from the MarkdownIt parser. It contains tokens and related information generated by the parser.

<<list of grabbed states>>=
class GrabbedStateList {
  list : Array<GrabbedState>;

  constructor()
  {
    this.list = new Array<GrabbedState>();
  }

  clear()
  {
    this.list = new Array<GrabbedState>();
  }

  dispose()
  {
    while(this.list.length>0)
    {
      this.list.pop();
    }
  }
};

The FragmentRepository class

The FragmentRepository uses several helper classes that we looked at just above.These we introduce right before defining the repository class.

<<fragment repository>>=
<<fragment map>>
<<list of grabbed states>>
<<fragment tag location>>

export class FragmentRepository {
  <<fragment repository member variables>>
  <<fragment repository constructor>>
  <<fragment generation method>>

  <<method to get fragments from repository>>

  <<method to get fragment on line for position>>
  <<method to get token at position>>
  <<method to get state for workspace>>
  <<method to get state for document>>
  <<method to get all reference locations>>

  dispose() {
    for(let fragmentMap of this.fragmentsForWorkspaceFolders.values())
    {
      fragmentMap.dispose();
    }
    this.fragmentsForWorkspaceFolders.clear();

    for(let grabbedState of this.grabbedStateForWorkspaceFolders.values())
    {
      grabbedState.dispose();
    }
    this.grabbedStateForWorkspaceFolders.clear();
  }
}

Member variables

Our FragmentRepository needs a couple of member variables to function properly. We'll need an instance of a properly configured MarkdownIt parser.

<<fragment repository member variables>>=
private md : MarkdownIt;

The MarkdownIt parser will handle the actual tokenizing and parsing of the literate files.

Since we work with a multi-root workspace we'll create a map of maps. The keys for this top-level map will be the workspace folder names. The actual FragmentMaps will be the values to each workspace folder.

<<fragment repository member variables>>=+
readonly fragmentsForWorkspaceFolders : Map<string, FragmentMap>;

For our parsing functionality we need an Array<GrabbedState>, which we have encapsulated in the class GrabbedStateList and is available through the list property. Each GrabbedStateList is saved to the map of workspace folder name and list key-value pair.

<<fragment repository member variables>>=+
readonly grabbedStateForWorkspaceFolders : Map<string, GrabbedStateList>;

Finally we need a DiagnosticCollection to be able to keep track of detected problems in literate projects. TBD: this probably needs to be changed into a map of DiagnosticCollection, again with the workspace folder names as keys.

<<fragment repository member variables>>=+
readonly diagnostics : vscode.DiagnosticCollection;

Constructor

The constructor takes an extension context to register any disposables there. We'll be registering to text document changes, and to workspace changes. In both cases we want to process literate files to regenerate fragments, source files and HTML files.

<<fragment repository constructor>>=
constructor(
  context : vscode.ExtensionContext
)
{
  <<initializing the fragment repository members>>

  <<subscribe to text document changes>>
  <<subscribe to workspace changes>>
}
Initializing members

First we make sure we have an instance of the MarkdownIt parser that is set up for our literate files processing.

<<initializing the fragment repository members>>=
this.md = createMarkdownItParserForLiterate();

Then we'll make sure the maps for tracking fragment maps and grabbed states are created, and finally pushing our diagnostics collection to our subscription.

<<initializing the fragment repository members>>=+
this.fragmentsForWorkspaceFolders = new Map<string, FragmentMap>();
this.grabbedStateForWorkspaceFolders = new Map<string, GrabbedStateList>();
this.diagnostics = vscode.languages.createDiagnosticCollection('literate');
context.subscriptions.push(this.diagnostics);
Subscribing to text document changes

The repository subscribes to the onDidChangeTextDocument event on the workspace. It could process literate files on each change, but the completion item provider needs to trigger itself processing of literate files. Since completion item provider gets called on typing a opening chevron (<) we skip triggering the processing here when such a character has been typed.

<<subscribe to text document changes>>=
context.subscriptions.push(
  vscode.workspace.onDidChangeTextDocument(
    async (e : vscode.TextDocumentChangeEvent) =>
    {
      if(!(e.contentChanges.length>0 && e.contentChanges[0].text.startsWith('<')))
      {
        await this.processLiterateFiles(e.document);
      }
    }
  )
);
Subscribing to workspace changes

Triggering of processing literate documents is necessary when new workspace folders have been added. Additionally we need to clean up fragment maps and grabbed states for those workspace folders that have been removed from the workspace folder.

<<subscribe to workspace changes>>=
context.subscriptions.push(
  vscode.workspace.onDidChangeWorkspaceFolders(
    async (e : vscode.WorkspaceFoldersChangeEvent) =>
    {
      for(const addedWorkspaceFolder of e.added) {
        await this.processLiterateFiles(addedWorkspaceFolder);
      }
      for(const removedWorkspaceFolder of e.removed)
      {
        this.fragmentsForWorkspaceFolders.delete(removedWorkspaceFolder.name);
        this.grabbedStateForWorkspaceFolders.delete(removedWorkspaceFolder.name);
      }
    }
  )
);

Processing literate files

The parsing and setting up of the fragments map is handled with the method processLiterateFiles. Additionally the method will write out all specified source files.

Processing the literate files is started generally in one of three cases: 1) change in workspace due to addition or removal of a workspace folder, 2) change to a literate document or through triggering of the literate.process command.

<<fragment generation method>>=
async processLiterateFiles(
  trigger :
    vscode.WorkspaceFolder
    | vscode.TextDocument
    | undefined) {
      <<set up workspace folder array>>
      <<iterate over workspace folders and parse>>
}

First we determine the workspace folder or workspace folders to process. In the case where trigger is a workspace folder or a text document we use the given workspace folder or determine the one to which the text document belongs. In these cases we'll have an array with just the one workspace folder as element. When the trigger is undefined we'll use all workspace folders registered to this workspace.

<<set up workspace folder array>>=
const workspaceFolders : Array<vscode.WorkspaceFolder> | undefined = (() => {
  if(trigger)
  {
    <<get workspace if text document>>
    <<else just use passed in workspace>>
    if("eol" in trigger) {
      const ws = determineWorkspaceFolder(trigger);
      if(ws)
      {
        return [ws];
      }
    } else {
      return [trigger];
    }
  }
  if(vscode.workspace.workspaceFolders && vscode.workspace.workspaceFolders.length>0) {
    let folders = new Array<vscode.WorkspaceFolder>();
    for(const ws of vscode.workspace.workspaceFolders)
    {
      folders.push(ws);
    }
    return folders;
  }
  return undefined;
}
)();

We can check if our trigger is a TextDocument by checking if eol is a property. If the eol property exists we are dealing with an TextDocument, if it doesn't exist we are dealing with a Workspace.

<<get workspace if text document>>=
if("eol" in trigger) {
  const ws = determineWorkspaceFolder(trigger);
  if(ws)
  {
    return [ws];
  }
}

Again, when the property eol is not found in the object we were passed we can assume it is just a workspace so return that as the one element in the array we return.

<<else just use passed in workspace>>=
else
{
  return [trigger];
}

With the list of workspace folders set up we can iterate over each folder and then handle literate files in that workspace folder.

<<iterate over workspace folders and parse>>=
if(workspaceFolders) {
  for(const folder of workspaceFolders)
  {
    <<set up fragments and grabbedStateList>>
    if(fragments && grabbedStateList) {
      <<clear FragmentMap and GrabbedStateList>>
      <<iterate over all files, write out html>>
      <<hanle fragments for map>>
      <<extrapolate fragments and save out>>
    }
  }
}

First we ensure entries for our workspace exist in the maps for FragmentMap and GrabbedStateList.

<<set up fragments and grabbedStateList>>=
    if(!this.fragmentsForWorkspaceFolders.has(folder.name))
    {
      this.fragmentsForWorkspaceFolders.set(folder.name, new FragmentMap());
    }
    if(!this.grabbedStateForWorkspaceFolders.has(folder.name))
    {
      this.grabbedStateForWorkspaceFolders.set(folder.name, new GrabbedStateList());
    }

Next we can get the FragmentMap and GrabbedStateList for our workspace folder. These we'll fill up with the data of our literate project.

<<set up fragments and grabbedStateList>>=+
    const fragments = this.fragmentsForWorkspaceFolders.get(folder.name);
    const grabbedStateList = this.grabbedStateForWorkspaceFolders.get(folder.name);

Each time we process a literate project we clear out the fragments and state so that we don't end up with stray elements.

<<clear FragmentMap and GrabbedStateList>>=
fragments.clear();
grabbedStateList.clear();

Our first pass is iterating over all the literate files in our folder, parsing them as we go. Each parsed file will be rendered as HTML and saved out to disk. The parser state with all the tokens will be set to grabbedStateList.list. We need to await on this async function, otherwise our state will be incomplete. The full state is needed for the next two steps.

<<iterate over all files, write out html>>=
await iterateLiterateFiles(folder,
                           writeOutHtml,
                           grabbedStateList.list,
                           this.md);

With the state complete, and our HTML files saved out, we are going to do two passes over the state. Lets do the first step here: we clear out the diagnostics, and then await on handleFragments. This function we call such that there is no extrapolation of fragments, nor source files are going to be saved. We await for the function to complete, otherwise our fragment map will be incomplete, or even just missing later on.

<<hanle fragments for map>>=
this.diagnostics.clear();
fragments.map = await handleFragments(folder,
                                      grabbedStateList.list,
                                      this.diagnostics,
                                      false,
                                      undefined);

The second step we'll call the fragment handler again, but this time we do want the fragments to be completely extrapolated, and the final source files written to disk. Before the call we again clear out the DiagnosticCollection so that we get the correct diagnostics in case of errors in literate files.

Again we wait for the results, just to ensure it all completes before we go on.

<<extrapolate fragments and save out>>=
this.diagnostics.clear();
await handleFragments(folder,
                      grabbedStateList.list,
                      this.diagnostics,
                      true,
                      writeSourceFiles);

Fetching fragments for workspace folder

When we call getFragments we assume the literate projects have all been process properly. In most cases that is triggered automatically, but it may be necessary to trigger the processing manually before calling getFragments. When the projects have been properly processed, though, this function returns the FragmentMap for the given workspace folder.

<<method to get fragments from repository>>=
getFragments(workspaceFolder : vscode.WorkspaceFolder) : FragmentMap
{
let fragmentMap : FragmentMap = new FragmentMap();
this.fragmentsForWorkspaceFolders.forEach(
  (value, key, _) =>
  {
    if(key === workspaceFolder.name)
    {
      fragmentMap = value;
      }
    }
  );

  return fragmentMap;
}

Getting fragment on line for position

This method checks to see if for the given text line and position a fragment usage or mention can be found.

First we find matches on the current line against FRAGMENT_USE_IN_CODE_RE. In all matches we check which of them is at the given position. We do that by searching for the index of the match tag name, including the double chevron bracketing.

The range for the FragmentLocation will be created from the found index and run the length of the tag name including the double enclosing chevrons. An attempt to find the corresponding fragment is made, but if no such fragment exists the FragmentLocation will be created with the fragment set to undefined. The root and add parts are also given to the fragment location, even if they were not matched. This information can be used elsewhere to determine what kind of fragment was found at the given position.

<<method to get fragment on line for position>>=
getFragmentTagLocation(
  document : vscode.TextDocument,
  currentLine : vscode.TextLine,
  position : vscode.Position
) : FragmentLocation
{
  const workspaceFolder : vscode.WorkspaceFolder | undefined = determineWorkspaceFolder(document);
  const matchesOnLine = [...currentLine.text.matchAll(FRAGMENT_USE_IN_CODE_RE)];
  for(const match of matchesOnLine)
  {
    if(!match || !match.groups) {
      continue;
    }
    const tagName = `${OPENING}${match.groups.tagName}${CLOSING}`;
    const foundIndex = currentLine.text.indexOf(tagName);
    if(foundIndex>-1) {
      if(foundIndex <= position.character && position.character <= foundIndex + tagName.length)
      {
        const startPosition = new vscode.Position(currentLine.lineNumber, foundIndex);
        const endPosition = new vscode.Position(currentLine.lineNumber, foundIndex + tagName.length);
        let range : vscode.Range = new vscode.Range(startPosition, endPosition);
        let fragment : FragmentInformation | undefined;
        if(workspaceFolder) {
          const fragments = theOneRepository.getFragments(workspaceFolder).map;
          fragment = fragments.get(match.groups.tagName) || undefined;
        }
        return new FragmentLocation(match.groups.tagName, document.uri, range, fragment, match.groups.root, match.groups.add);
      }
    }
  }

  return unsetFragmentLocation;
}
The FragmentLocation

A fragment location encodes the occurrence of what could be a fragment that already exists or one that still needs to be defined. The class holds the name of the fragment, the range of this string in the resource specified by the uri, and whether a FragmentInformation was found or not.

The properties root and add can be used to determine what type of fragment is at the given range.

<<fragment tag location>>=
export class FragmentLocation
{
  readonly rangeExclusive : vscode.Range;
  readonly valid : boolean;

  constructor(
    public readonly name : string,
    public readonly uri: vscode.Uri,
    public readonly range : vscode.Range,
    public readonly fragment : FragmentInformation | undefined,
    public readonly root : string | undefined,
    public readonly add : string | undefined
  )
  {
    this.valid = uri.fsPath.indexOf('not_valid_for_literate')===-1;
    if(name.startsWith(OPENING)) {
      this.rangeExclusive = new vscode.Range(
        range.start.line, range.start.character + 2,
        range.end.line, range.end.character - 2
      );
    }
    else
    {
      this.rangeExclusive = range;
    }
  }
}
const unsetFragmentLocation =
    new FragmentLocation(
      '',
      vscode.Uri.file('not_valid_for_literate'),
      new vscode.Range(0,0,0,0),
      undefined,
      undefined,
      undefined
    );

Get fragment usage token

This method takes a text document and a range, based on which the token containing the range is returned. If no token is found, or the workspace folder is not available the emptyToken constant is returned.

<<method to get token at position>>=
getTokenAtPosition(
  document : vscode.TextDocument,
  range : vscode.Range
) : TokenUsage
{

Determine the workspace folder for the given text document. As mentioned above, is no workspace folder is found the emptyToken is returned.

<<method to get token at position>>=+
  const workspaceFolder : vscode.WorkspaceFolder | undefined = determineWorkspaceFolder(document);
  if(!workspaceFolder)
  {
    return emptyToken;
  }

Next we can retrieve the state for the document.

<<method to get token at position>>=+
  const state = this.getDocumentState(document);

We can iterate over all the tokens in the grabbed state of the document. We're only interested in tokens that have a valid map property, since we need to check the range asked for.

<<method to get token at position>>=+
  for(const token of state.gstate.tokens)
  {
    if(token.map) {

If the range given is contained within the token map we create a new TokenUsage and return that. This concludes the search for the token containing the range we are interested in.

<<method to get token at position>>=+
      const tokenRange = new vscode.Range(token.map[0], 0, token.map[1], 1024);
      if(tokenRange.contains(range))
      {
        let tokenUsage : TokenUsage = {
          token : token,
        };
        return tokenUsage;
      }
    }
  }

If no hit was found return the emptyToken.

<<method to get token at position>>=+
  return emptyToken;
}

The TokenUsage interface helps determining whether we have a token or not. TBD: we can probably get rid of this interface and just use a Token directly.

<<token usage interface>>=
interface TokenUsage
{
  token : Token | undefined,
}

const emptyToken : TokenUsage =
{
  token : undefined,
};

Get list of grabbed states for a workspace

<<method to get state for workspace>>=
getWorkspaceState(workspaceFolder : vscode.WorkspaceFolder) : GrabbedStateList
{
  let grabbedState : GrabbedStateList = new GrabbedStateList();
  this.grabbedStateForWorkspaceFolders.forEach(
    (value, key, _) =>
    {
      if(key === workspaceFolder.name)
      {
        grabbedState = value;
        }
      }
    );

  return grabbedState;
}

Get the grabbed state of a document

<<method to get state for document>>=
getDocumentState(document: vscode.TextDocument) : GrabbedState
{
  let grabbedState : GrabbedState = emptyState;
  const ws = determineWorkspaceFolder(document);
  if(ws) {
    const workspaceState = this.getWorkspaceState(ws);
    for(const state of workspaceState.list)
    {
      if(document.uri.path === state.literateUri.path)
      {
        grabbedState = state;
      }
    }
  }

  return grabbedState;
}

Get all reference locations

Finding all references for a fragment, that is fragment usage or fragment mention in a literate project will go over all tokens of a workspace. For each reference a vscode.Location is returned.

The getReferenceLocations method takes a workspace folder and a fragment name, and will return an array of vscode.Location.

<<method to get all reference locations>>=
getReferenceLocations(
  workspaceFolder : vscode.WorkspaceFolder,
  fragmentName : string
) : vscode.Location[]
{

We start with an empty list of locations, which we will fill for each reference hit we determine in the given literate project. For the workspace folder we get the latest grabbed state.

We then will proceed to iterate through all grabbed states. Remember that each grabbed state corresponds to a literate document. From that grabbed state we will iterate over each token, and we'll be interested only in the tokens that have a valid map property.

<<method to get all reference locations>>=+
  const fragmentTag = OPENING+fragmentName+CLOSING;
  let locations = new Array<vscode.Location>();
  let grabbedStateList = this.getWorkspaceState(workspaceFolder).list;

  for(const grabbedState of grabbedStateList)
  {
    for(const token of grabbedState.gstate.tokens)
    {
      if(token.map)
      {

When we have a token that could contain a reference we'll see if there is any occurrence of the fragment tag, otherwise the content has no reference.

<<method to get all reference locations>>=+
        if(token.content.indexOf(fragmentTag) > -1)
        {

With a hit in the entire content of the token we need to figure out each reference, which we do by splitting the token content into lines if there are any new line characters, then for each line look at each hit.

If our token is a fence we initialize idx to 1, otherwise to 0.

<<method to get all reference locations>>=+
          const lines = token.content.split("\n");
          let idx = token.type === 'fence' ? 1 : 0;
          for(const line of lines) {
            let offset = line.indexOf(fragmentTag);
            while(offset>-1) {

When offset is larger than -1 we know we have a hit, so we can create a new range using the token.map[0] and the idx. The range will include the entire fragment tag, with the opening and closing double chevrons.

<<method to get all reference locations>>=+
              let range = new vscode.Range(
                token.map[0] + idx,
                offset,
                token.map[0] + idx,
                offset + fragmentTag.length
              );

The location then is created with the uri of the literate file that contains this token, and the rnge we just set up.

<<method to get all reference locations>>=+
              let location = new vscode.Location(grabbedState.literateUri, range);
              locations.push(location);

We chech for the next occurrance of the fragment tag by looking enough characters past the current offset. That way we'll ensure we get to all the references if there are multiple on one line.

<<method to get all reference locations>>=+
              offset = line.indexOf(fragmentTag, offset + 5);
            }

Update idx for each pass while going through the lines array.

<<method to get all reference locations>>=+
            idx++;
          }
        }
      }
    }
  }

Return the locations array. If there were hits the locations array will have entries, if there were no hits the array will be empty.

<<method to get all reference locations>>=+
  return locations;
}

Iterating all literate files

As mentioned in the introduction the main idea of the extension is to collect all fragments that are created in all .literate files. Once all fragments have been collected they are extrapolated until the top fragments are the full source files. Fully extrapolated top fragments are written to the source files as specicied for them.

The first step is to put each .literate file through the MarkdownIt renderer. Each rendering will be given a special environment that will be used to collect the state for the render. The state will contain the document tokenized according the Markdown specification. The state env is of type GrabbedState. Among the tokens will be the code fences that are code fragments. For each .literate file the grabbed state env is saved in the list of GrabbedStates envList.

<<render and collect state>>=
async function iterateLiterateFiles(workspaceFolder : vscode.WorkspaceFolder,
                                    writeHtml : WriteRenderCallback
                                                | undefined
                                                | null,
                                    envList : Array<GrabbedState>,
                                    md : MarkdownIt)
{
  <<find all literate files in workspace>>
  try {
    for (let fl of foundLiterateFiles) {
      <<get text from literate document>>
      <<parse literate file>>
      <<write out rendered file if requested>>
    }
  } catch (error) {
    console.log(error);
  }
}

We ensure that only literate files are going to be parsed for their program fragments. We do that by using a vscode.RelativePattern using the workspace folder passed into iterateLiterateFiles.

<<find all literate files in workspace>>=
const foundLiterateFiles = await getLiterateFileUris(workspaceFolder);

We get the content of our literate file using getFileContent. We do need to await for that so that we actually get the string and not a promise.

<<get text from literate document>>=
const text = await getFileContent(fl);

With the text for our literate document ready we harvest the relative file path to our document from the workspace folder. fname is then set as the literateFileName of our GrabbedState instance that we push into the envList so that we can access it later. Now we finally get to pass the text of our literate document to the MarkdownIt renderer. Once that is done we have both an HTML representation of our document as well as the entire parser state in env.

<<parse literate file>>=
const fname = path.relative(workspaceFolder.uri.path, fl.path);
const env: GrabbedState = { literateFileName: fname, literateUri: fl, gstate: new StateCore('', md, {}) };
envList.push(env);
const rendered = md.render(text, env);

If a callback implementing WriteRenderCallback is passed to iterateLiterateFiles we call that with the endered file content so that it can be saved as an HTML file with the same name as the .literate file that was being rendered, but with the extension replaced with .html. Conversely, if no callback was passed in it is not called and rendered results are not saved to disk.

<<write out rendered file if requested>>=
if(writeHtml)
{
  await writeHtml(fname, workspaceFolder.uri, rendered);
}

GrabbedState interface

The GrabbedState interface is used to create a type that helps us collecting the tokens for each .literate file. Instances of objects with this interface are passed to a MarkdownIt renderer. The renderer will have the GrabberPlugin registered, which provides a rule that helps us collecting the states of each rendered file. The grabbed state is collected in gstate, which is an instance of the StateCore, provided by MarkdownIt.

The interface defines literateFileName, which is the filename of the literate document to which the grabbed state belongs. literateUri is the full uri for this document. Finally gstate holds the StateCore of the parsing result.

<<grabbed state type>>=
interface GrabbedState {
  literateFileName: string;
  literateUri: vscode.Uri;
  gstate: StateCore;
}

We define a GrabbedState that is not valid, the emptyState. This allows us to always return an object instead of undefined in select cases.

<<grabbed state type>>=+
const emptyState : GrabbedState =
{
  literateFileName : '',
  literateUri : vscode.Uri.file('not_valid_for_literate'),
  gstate: new StateCore('', createMarkdownItParserForLiterate(), '')
};

Preparing MarkdownIt

In the iterateLiterateFiles we start by setting up the MarkdownIt parser.

<<set up MarkdownIt>>=
const md : MarkdownIt = createMarkdownItParserForLiterate();

The function createMarkdownItParserForLiterate does this setup so that it is easy to get a new parser to use for different purposes, like parsing documents to get the code fragment names for code completion.

The highlight function we use to ensure our code fragments get syntax highlighting. This simply relies on highlight.js to do the work.

We also tell MarkdownIt to use our grabberPlugin. This plug-in harvests the internal states for each document into instances of GrabbedState. These states we'll later use to get all the different code fragments and to weave them into the code files they describe.

Finally we replace the default fence rule with our own renderCodeFence rule. The intent of that rule will be explained in the section on renderCodeFence.

<<create markdownit parser>>=
function createMarkdownItParserForLiterate() : MarkdownIt
{
  const md : MarkdownIt = new MarkdownIt({
          highlight: function(str: string, lang: string, attrs: string) {
            if(lang && hljs.getLanguage(lang)) {
              return '<pre><code>' +
              hljs.highlight(str, {language : lang}).value +
              '</code></pre>';
            }
            return '<pre title="' + attrs + '">' + md.utils.escapeHtml(str) + '</pre>';
          }

        })
        .use(grabberPlugin);

      oldFence = md.renderer.rules.fence;
      md.renderer.rules.fence = renderCodeFence;
  return md;
}

Fragment structure and regular expressions

Before we dive deeper into the processing of .literate documents it is necessary to have a look at how fragments work.

Fragments in the literate extension have a specific format that requires a bit of explaining.

There are four types of fragment tags, three of which either create or modify a fragment, and one that expresses fragment usage.

For the detection of fragments a couple of regular expressions are used. These are explained in more detail below.

Fragment use in code

Lets start by looking at the form for fragment tag use.

Fragments can be used in code blocks by using their tag double opening and closing chevrons around the fragment name <<fragment name>>. To detect usage of fragments in code we use FRAGMENT_USE_IN_CODE_RE.

<<fragment regular expressions>>=

const FRAGMENT_USE_IN_CODE_RE =
  /(?<indent>[ \t]*)<<(?<tagName>.+)>>(?<root>=)?(?<add>\+)?/g;

The regular expression captures four groups. A match will give us 5 or more results, the whole string matched and the captured groups. There may be some additional parts after that, but those we will discard. The whole string matched is called the tag. The first group is called indent, which will be used to indent the whole fragment code when it gets extrapolated into the final code. The second group is called tagName, which is the fragment name. The third group is called root and the final group is called add. For fragment use we essentially need only the second group tagName, with the indent still serving a function. The other groups are in the regular expression so we can identify incorrect use of fragments in code: creating or adding to fragments inside code blocks is not valid.

The application of FRAGMENT_USE_IN_CODE_RE is explained in more detail in the section on code realization.

Creating and modifying fragments

There is the tag used to create a new fragment, which is always in conjunction with the opening code fence tag. This means either a triple backtick or triple tilde followed by the programming language identifier for the following code block. The actual fragment tag is placed as first option right after the colon following the language specifier.

<<fragment regular expressions>>=+
const FRAGMENT_RE =
  /(?<lang>[^:]*)(?<colon>:)?.*<<(?<tagName>.+)>>(?<root>=)?(?<add>\+)?\s*(?<fileName>.*\s+\$)?(?<extraSettings>\s+.*)?/;

Most of the groups correspond to the ones defined by FRAGMENT_USE_IN_CODE_RE with a few additions. Most notably there is the group catching the language specifier, the group to catch the filename and the group to catch extra settings, called lang, fileName and extraSettings respectively.

The filename group has to end in whitespace and a dollar sign.

Also the colon is separated out into a group. That will allow for checking if a tag declaration is properly formed. When the colon is missing it is possible to detect this and emit a diagnostic accordingly.

So to create a new tag the info line for the code fence could look like py : <<a fragment name>>=.

To add to a fragment a + is added, so it could look like py : <<a fragment name>>=+. Having a fragment without = or =+ on the code fence info line is an error.

Gathering all fragments

All code fragments are fetched from each environment state. This is done through looking for all fence tokens. If the token.info for a fence matches the FRAGMENT_RE we can check to see whether the fragment we have currently in our hands is a new fragment (root && !add) or whether this one expands an existing one (root && add), as will be explained in more detail further down.

<<handle fragments>>=
async function handleFragments(
  workspaceFolder : vscode.WorkspaceFolder,
  envList : Array<GrabbedState>,
  diagnostics : vscode.DiagnosticCollection,
  extrapolateFragments : boolean,
  writeSource : WriteSourceCallback | undefined) : Promise<Map<string, FragmentInformation>>
{
  const folderUri = workspaceFolder.uri;
  <<build fragment map>>

  if(extrapolateFragments)
  {
    <<extrapolate fragments>>
  }

  if(writeSource) {
    writeSource(workspaceFolder, fragments);
  }

  return Promise.resolve(fragments);
}

Populating the fragment map

First we build a map of all available fragments. These will go into fragments, which is of type Map<string, FragmentInformation>. The name of a fragment will function as the key, and an instance of FragmentInformation will be the value.

<<build fragment map>>=
const fragments = new Map<string, FragmentInformation>();
const overwriteAttempts = new Array<string>();
const missingFilenames = new Array<string>();
const addingToNonExistant = new Array<string>();
for (let env of envList) {
  for (let token of env.gstate.tokens) {
    <<handle fence tokens>>
  }
}

Each fence token we find we need to check. There may be of course code fences in the document that do not create or modify a fragment. These we need to skip.

Since we are handling code fences we use FRAGMENT_RE to match token.info. A fragment is malformed if the colon is missing, so we need to <<emit diagnostic when colon is missing>>.

<<handle fence tokens>>=
if (token.type === 'fence') {
  const linenumber = locationOfFragment(token);
  const match = token.info.match(FRAGMENT_RE);
  if (match && match.groups) {
    let lang = match.groups.lang.trim();
    let colon = match.groups.colon;
    let name = match.groups.tagName;
    let root = match.groups.root;
    let add = match.groups.add;
    let fileName = match.groups.fileName;
    let extraSettings = match.groups.extraSettings;
    <<emit diagnostic when colon is missing>>
    <<add to existing fragment>>
    <<create a new fragment>>
  }
}

Error diagnostic when fragment malformed

The diagnostic emitted has a message telling the colon is missing, along with the line number and the literate file this happened in.

<<emit diagnostic when colon is missing>>=
if(lang && !match.groups.colon) {
  let msg = `Missing colon for fragment: ${name}. ${env.literateFileName}${linenumber}`;
  const diag = createErrorDiagnostic(token, msg);
  updateDiagnostics(env.literateUri, diagnostics, diag);
}

Creating a new fragment

If the root group has captured a result but not the add group we know we have a new fragment on our hand.

If we already have in our fragments map a key with the same name as the fragment we are currently handling we add an error diagnostic message. We don't stop handling fences, or the entire literate.process command for that matter. We keep on going, but leave it up to the programmer to see and handle the error messages.

If a fragment name with .* is found we need to ensure there is a result in the fileName capture group. That is going to be needed to write out the source code file eventually. A file defining fragment without a file name is an error.

When everything appears to be in order a new FragmentInformation instance is created with the information found. The code for this fragment is the token content in token.content. Finally the new FragmentInformation instance is added to the fragments map.

If a new fragment is going to be created, but it already exists in the fragment map we emit an error diagnostic. To ensure we emit the error diagnostic only once the fragment name is added to overwriteAttempts.

<<create a new fragment>>=
if (root && !add) {
  if (fragments.has(name)) {
    if(!overwriteAttempts.includes(name))
    {
      let msg = `Trying to overwrite existing fragment fragment ${name}. ${env.literateFileName}${linenumber}`;
      const diag = createErrorDiagnostic(token, msg);
      updateDiagnostics(env.literateUri, diagnostics, diag);
      overwriteAttempts.push(name);
    }
  }

If it does not yet exist in the fragment map we can proceed. We need to check though if we have a top-level fragment. In that case we require a file name, so emit an error diagnostic when that is missing.

<<create a new fragment>>=+
  else {
    if (!fileName && name.indexOf(".*") > -1 ) {
      if(!missingFilenames.includes(name)) {
        let msg = `Expected filename for star fragment ${name}`;
        const diag = createErrorDiagnostic(token, msg);
        updateDiagnostics(env.literateUri, diagnostics, diag);
        missingFilenames.push(name);
      }
    }

On the contrary, if we have a non-starred fragment but we do get a filename we also issue a diagnostic to notify the programmer of the mistake.

<<create a new fragment>>=+
    if(fileName && name.indexOf(".*")===-1) {
        let msg = `Unexpected filename for non-star fragment ${name}`;
        const diag = createErrorDiagnostic(token, msg);
        updateDiagnostics(env.literateUri, diagnostics, diag);
    }

We do need to make sure that the fileName gets cleaned up because the matching expression for this contains whitespace and a dollar sign.

<<create a new fragment>>=+
    if(fileName) {
      fileName = fileName.replace(/\s+\$/, "");
    }

Check the extraSettings group if a template is specified. Find the vscode.Uri for the file specified. Use that if it exists, otherwise keep sourceTemplateUri at undefined.

<<create a new fragment>>=+
    let sourceTemplateUri : vscode.Uri | undefined = undefined;
    if(extraSettings) {
      let settings = extraSettings.split(";");
      for(let setting of settings)
      {
        setting = setting.trim();
        if(setting.startsWith("template"))
        {
          let settingParts = setting.split("=");
          const sourceTemplateFilePattern : vscode.RelativePattern = new vscode.RelativePattern(workspaceFolder, settingParts[1]);
          const _foundSourceTemplateFile = await vscode.workspace
            .findFiles(sourceTemplateFilePattern)
            .then(files => Promise.all(files.map(file => file)));
          if(_foundSourceTemplateFile.length===1)
          {
            sourceTemplateUri = _foundSourceTemplateFile[0];
          }
        }
      }
    }

We can now finally create the FragmentInformation instance and add it to our fragment map.

<<create a new fragment>>=+
    let code = token.content;
    let fragmentInfo: FragmentInformation = {
      lang: lang,
      literateFileName: env.literateFileName,
      sourceFileName: fileName,
      templateFileName: sourceTemplateUri,
      code: code,
      tokens: [token],
      env: env,
    };
    fragments.set(name, fragmentInfo);
  }
}

Modifying an existing fragment

If both the root and add groups have capture their results, an = and an + respectively we need to add code to an existing fragment.

For this to work a new fragment needs to be always present before the modifying fragment. It is an error to try to modify a fragment that hasn't been seen yet.

The fragment with specified name is fetched, and when it is not undefined the token.content is appended to the code of the FragmentInformation instance we got from the map. The current token is also appended to the tokens list.

The fragments map is updated with the modified FragmentInformation instance.

<<add to existing fragment>>=
if (root && add) {
  if (fragments.has(name)) {
    let fragmentInfo = fragments.get(name) || undefined;
    if(fragmentInfo && fragmentInfo.code) {
      let additionalCode = token.content;
      fragmentInfo.code = `${fragmentInfo.code}${additionalCode}`;
      fragmentInfo.tokens.push(token);
      fragments.set(name, fragmentInfo);
    }
  } else {
    if(!addingToNonExistant.includes(name)) {
      let msg = `Trying to add to non-existant fragment ${name}. ${env.literateFileName}:${linenumber}`;
      const diag = createErrorDiagnostic(token, msg);
      updateDiagnostics(env.literateUri, diagnostics, diag);
      addingToNonExistant.push(name);
    }
  }
}

The FragmentInformation type

We have now seen the FragmentInformation type being used several times, so it is important to take a moment to clarify it in more detail.

The interface allows us to gather information for each found code fragment. It allows us to store the programming language identifier, name of the .literate file and name of the targeted source file, if the code fragment happens to be a top fragment.

The actual code for the fragment is stored in code. Furthermore the tokens for the complete fragment are stored in the tokens list. This list is of objects that fullfill the Token interface, which is provided by the MarkdownIt module.

<<fragment information type>>=

interface FragmentInformation {
  
  lang: string;
  
  literateFileName: string;
  
  sourceFileName: string;
  
  templateFileName: vscode.Uri | undefined;
  
  code: string;
  
  tokens: Token[];
  
  env: GrabbedState;
}

Writing source files

Writing source files is a matter of looping through the keys of a fragments map. For each key that ends with the .* string we check if a fragment exists, and if for that fragment a source filename is recorded. If that is the case write out the file with the code content of the fragment.

If a vscode.Uri is defined for templateFileName read the file contents and use that instead of the default one that says just [CODE]. This means that for a template to work properly it needs to contain the string [CODE], since that will be replaced with the code generated for this file.

For newline handling we'll replace all single LF occurances with CRLF when the underlying operating system is Windows. Otherwise do the reverse: replace CRLF with a single LF.

<<method to write out source files>>=
async function writeSourceFiles(workspaceFolder : vscode.WorkspaceFolder,
                fragments : Map<string, FragmentInformation>)
{
  const folderUri = workspaceFolder.uri;
  
  for(const name of fragments.keys()) {
    if (name.endsWith(".*")) {
      let fragmentInfo = fragments.get(name) || undefined;
      if (fragmentInfo && fragmentInfo.sourceFileName) {
        let sourceTemplate = '[CODE]';
        if(fragmentInfo.templateFileName) {
          sourceTemplate = await getFileContent(fragmentInfo.templateFileName);
        }
        let code = sourceTemplate.replace("[CODE]", fragmentInfo.code);
        let fixed = '';
        if(os.platform()==='win32')
        {
          const lf2crlf = /([^\r])\n/g;
          fixed = code.replaceAll(lf2crlf, '$1\r\n');
        } else {
          const crlf2lf = /\r\n/g;
          fixed = code.replaceAll(crlf2lf, '\n');
        }
        const encoded = Buffer.from(fixed, 'utf-8');
        let fileName = fragmentInfo.sourceFileName.trim();
        const fileUri = vscode.Uri.joinPath(folderUri, fileName);
        try {
          await vscode.workspace.fs.writeFile(fileUri, encoded);
        } catch(writeError)
        {
          console.log(writeError);
        }
      }
    }
  }
}

Extrapolating fragments

Once all fragments have been collected from the .literate files of the project fragments can be combined into source code.

<<extrapolate fragments>>=

let pass: number = 0;
const rootIncorrect = new Array<string>();
const addIncorrect = new Array<string>();
const fragmentNotFound = new Array<string>();
do {
  pass++;
  let fragmentReplaced = false;
  for (let fragmentName of fragments.keys()) {
    let fragmentInfo = fragments.get(fragmentName) || undefined;
    if (!fragmentInfo) {
      continue;
    }

    const casesToReplace = [...fragmentInfo.code.matchAll(FRAGMENT_USE_IN_CODE_RE)];
    for (let match of casesToReplace) {
      if(!match || !match.groups) {
        continue;
      }
      let tag = match[0];
      let indent = match.groups.indent;
      let tagName = match.groups.tagName;
      let root = match.groups.root;
      let add = match.groups.add;
      if (root && !rootIncorrect.includes(tag)) {
        let msg = `Found '=': incorrect fragment tag in fragment, ${tag}`;
        const diag = createErrorDiagnostic(fragmentInfo.tokens[0], msg);
        updateDiagnostics(fragmentInfo.env.literateUri, diagnostics, diag);
        rootIncorrect.push(tag);
      }
      if (add && !addIncorrect.includes(tag)) {
        let msg = `Found '+': incorrect fragment tag in fragment: ${tag}`;
        const diag = createErrorDiagnostic(fragmentInfo.tokens[0], msg);
        updateDiagnostics(fragmentInfo.env.literateUri, diagnostics, diag);
        addIncorrect.push(tag);
      }
      if (!fragments.has(match.groups.tagName) && tagName !== "(?<tagName>.+)" && !fragmentNotFound.includes(tagName)) {
        let msg = `Could not find fragment ${tag} (${tagName})`;
        let range = fragmentUsageRange(fragmentInfo.tokens[0], tagName);
        const diag = createErrorDiagnostic(fragmentInfo.tokens[0], msg, range);
        updateDiagnostics(fragmentInfo.env.literateUri, diagnostics, diag);
        fragmentNotFound.push(tagName);
      }
      let fragmentToReplaceWith = fragments.get(tagName) || undefined;
      if (fragmentToReplaceWith) {
        let code = fragmentToReplaceWith.code;
        let lines = code.split("\n").slice(0, -1);
        let indentedLines = lines.flatMap(function (e, _) {
          return indent + e;

        });
        let newcode = indentedLines.join("\n");
        fragmentReplaced = true;
        fragmentInfo.code = fragmentInfo.code.replace(tag, newcode);
        fragments.set(fragmentName, fragmentInfo);
      }
    }
  }
  if(!fragmentReplaced) {
    break;
  }
}
while (pass < 25);

custom code fence rendering

Our extension uses a custom code fence rendering rule to ensure the code fragment name is also rendered as part of the fence.

Essentially the old, default rendering rule for fences is first used to create the original fence.

Then the token.info is matched against the FRAGMENT_RE regular expression. If we have a match we prepare the HTML code to essentially wrap around the HTML as generated by the default rule. Before we can actually wrap that in our div tags with the necessary tags we ajust the rendered HTML code to protect fragment tags. Otherwise these will also be syntax colored, and that we don't want. The fragment tag protection is explained in <<fragment tag protector>>.

For further cleanup of the rendered result any spans with comments in are removed. These code comments are useful for the generated source code, but a literate program otherwise already documents code thoroughly. Comment removal from the HTML rendition is done with <<remove comments from HTML>>

The fence is skipped if its info contains the string SETTINGS, since that denotes a configuration block as can be specified in index.literate. And the configuration block is not intended to be visible in either resulting code nor resulting HTML output.

If the fence has its token.info end with the string mermaid (all lower-case), and is not a valid fragment fence then the token.content is wrapped in <pre class="mermaid"> and </pre>. This allows the HTML template module import of mermaid.js to render diagrams expressed in these tags.

<<renderCodeFence rule>>=

<<fragment tag protector>>
<<remove comments from HTML>>

function renderCodeFence(tokens : Token[],
             idx : number,
             options : MarkdownIt.Options,
             env : any,
             slf : Renderer) {
  let rendered = '';
  if (oldFence && tokens[idx].info.indexOf("SETTINGS")<0) {
    rendered = oldFence(tokens, idx, options, env, slf);

    let token = tokens[idx];
    if (token.info) {
      const match = token.info.match(FRAGMENT_RE);
      if (match && match.groups) {
        let lang = match.groups.lang.trim();
        let name = match.groups.tagName;
        let root = match.groups.root;
        let add = match.groups.add;
        let fileName = match.groups.fileName;
        if (name) {
          root = root || '';
          add = add || '';
          fileName = fileName || '';
          fileName = fileName.trim();
          rendered = protectFragmentTags(rendered);
          rendered = removeCodeComments(rendered);
          rendered =
`<div class="codefragment">
<div class="fragmentname">&lt;&lt;${name}&gt;&gt;${root}${add} ${fileName}</div>
<div class="code">
${rendered}
</div>
</div>`;
        }
      }
      else if(token.info.endsWith('mermaid')) {
        rendered =
`<pre class="mermaid">
${token.content}
</pre>`;
      }
    }
  }

  return rendered;
};

Protecting fragment tags

With protectFragmentTags we can adjust the rendered HTML as received from the oldFence. We'll search for HTML that is our fragment tag: &lt;&lt, followed by tag name, followed by &gt;&gt;. Such occurrances we wrap in a span tag that has the CSS class clean-literate-tag-name. The class can be set up to essentially override any styling applied through hljs-keyword.

For matching we use two regular expressions, FRAGMENT_HTML_RE and FRAGMENT_HTML_KEYWORD_CLEANUP_RE.

With FRAGMENT_HTML_KEYWORD_CLEANUP_RE all span tags injected by hljs can be cleaned up, and with FRAGMENT_HTML_RE we can wrap our fragment tag names in a class of our own, literate-name-tag to handle special styling for tags in code fences.

<<fragment regular expressions>>=+
const FRAGMENT_HTML_CLEANUP_RE= /(<span.class="hljs-.+?">)(.*?)(<\/span>)/g;
const FRAGMENT_HTML_RE= /(&lt;&lt;.+?&gt;&gt;)/g;

Since these regular expressions are used with replaceAll they need to be marked global with g.

To make sure the highlights are properly cleaned we introduce an inline function cleanHighlights that takes care of all the highlights by using a replaceAll on the match passed on to cleanHighlights. The result is used to wrap inside the span with the literate-name-tag class.

<<fragment tag protector>>=
function protectFragmentTags(rendered : string) : string {
  function cleanHighlights(match : string, _: number, __: string)
  {
    let internal = match.replaceAll(FRAGMENT_HTML_CLEANUP_RE, "$2");
    return `<span class="literate-tag-name">${internal}</span>`;
  }
  return rendered
    .replaceAll(
      FRAGMENT_HTML_RE,
      cleanHighlights
    );
}

In our CSS file we can now specify .literate-tag-name to say an italic font to stand out in the code fences.

Remove code comments

In rendered HTML code comments are wrapped in span tags with the class hljs-comment. These can be on one line, or for comment blocks on multiple lines. Since the goal is to remove these completely from rendered HTML the regular expression for the match will be just to do that: match the span with the hljs-comment even if it is over several lines. To do that we use also the s modifier to the expression.

<<fragment regular expressions>>=+
const CODECOMMENT_HTML_RE= /<span class="hljs-comment">.*?<\/span>/gs;

The remove action becomes now a simple replaceAll on the rendered HTML using the regular expression CODECOMMENT_HTML_RE with the empty string as replacement.

<<remove comments from HTML>>=
function removeCodeComments(rendered : string) : string {
  rendered = rendered.replaceAll(CODECOMMENT_HTML_RE, "");
  return rendered;
}

Register the literate.process command

The command literate.process is registered with Visual Studio Code. The disposable that gets returned by registerCommand is held in literateProcessDisposable so that we can push it into context.subscriptions.

Here we find the main program of our literate.process command. Our MarkdownIt is set up, .literate files are searched and iterated. Each .literate file is rendered, and code fragments are harvested. Finally code fragments are extrapolated and saved to their respective source code files. The HTML files are also saved to files.

Diagnostic messages are also handled here. Errors and warnings are shown where necessary. On successfull completion a simple status bar message will be used. An information diagnostic message is not good here, because that will prevent the usage of literate.process in for instance tasks.json, since the diagnostic message will block execution of a task if it were used as prelaunch task. That is obviously not good for the workflow.

<<register literate.process>>=
let literateProcessDisposable = vscode.commands.registerCommand(
  'literate.process',
  async function () {
    theOneRepository.processLiterateFiles(undefined);
    return vscode.window.setStatusBarMessage("Literate Process completed", 5000);
});

context.subscriptions.push(literateProcessDisposable);

Register the literate.create_fragment_for_tag command

<<register literate.create_fragment_for_tag>>=
let literateCreateFragmentForTagDisposable = vscode.commands.registerCommand(
  'literate.create_fragment_for_tag',
  async function (range? : vscode.Range) {
    createFragmentForTag(range);
  }
);

context.subscriptions.push(literateCreateFragmentForTagDisposable);

literate.create_fragment_for_tag implementation

The literate.create_fragment_for_tag will do as its name suggests. When the position in the document is on a tag then the command will add a code fence to the document.

If the tag at the position is fragment usage in a fragment then the code fence will be created after the current fragment using the same language as specified in the current fragment. If the tag is a fragment mention then instead the current fragment map is checked for what the most used language is and that is pre-filled.

First we ensure we have an active editor.

<<create fragment for tag>>=
function createFragmentForTag(range? : vscode.Range)
{
  let activeEditor = vscode.window.activeTextEditor;
  if(activeEditor)
  {

From the active editor we find the document. The editor also has the information about where our cursor currently is, under active property on selection, but we use range.start if range was passed in.

<<create fragment for tag>>=+
    const document = activeEditor.document;
    const position = range ? range.start : activeEditor.selection.active;

For the document and position we determined we can get the fragment at that location. We retrieve that from the repository so that we can get the range for the fragment use.

<<create fragment for tag>>=+
    const fragmentLocation = theOneRepository.getFragmentTagLocation(
      document,
      document.lineAt(position),
      position);

With the range of the fragment use in hand we can find the Markdown token where this range is contained for the document we're in.

<<create fragment for tag>>=+
    const tokenUsage = theOneRepository.getTokenAtPosition(
      document,
      fragmentLocation.range);

The token needs to be valid to be able to use it for determining the insert. The map property on Token will tell us the begin and end lines. We want to add our new fragment definition after this token. We'll access the map when we are ready to do an insert on the WorkspaceEdit.

We initialize a temporary language id to LANGID.

<<create fragment for tag>>=+
    if(tokenUsage.token && tokenUsage.token.map)
    {
      let workspaceEdit = new vscode.WorkspaceEdit();
      let langId : string = 'LANGID';

If we have a fence Token we try matching the info property of the token with FRAGMENT_RE. This gives us the language id used for that fence. We'll be using the same language id for the new code fragment.

<<create fragment for tag>>=+
      if(tokenUsage.token.type === 'fence' && tokenUsage.token.map)
      {
        let match = tokenUsage.token.info.match(FRAGMENT_RE);
        if(match && match.groups) {
          langId = match.groups.lang;
        }
      }

We can now create the new fragment string with the language id and the fragment tag name we want to create the fragment for.

Newlines at the begin and end of the string ensure the fragment won't be created without the necessary empty lines.

<<create fragment for tag>>=+
      let newFragment = `\n${FENCE} ${langId} : ${OPENING}${fragmentLocation.name}${CLOSING}=\n${FENCE}\n`;

Now that we have the new fragment text ready we can call insert on our workspace edit. The position is created with the second element of the token.map as the line number, and 0 to have the insert happen at the beginning of the line.

Finally we apply the workspace edit to our workspace. This will give us the new fragment right after the paragraph or code fence with the fragment name we found at the position we ran the command at.

<<create fragment for tag>>=+
      workspaceEdit.insert(
        document.uri,
        new vscode.Position(tokenUsage.token.map[1], 0),
        newFragment
        );
      vscode.workspace.applyEdit(workspaceEdit);
    }
  }
}

Register the literate.split_fragment command

Registering the literate.split_fragment command, setting it up so that it could take a vscode.Position parameter, which helps in programmatically firing the command for a certain pre-computed location.

<<register literate.split_fragment>>=
let literateSplitFragmentDisposable = vscode.commands.registerCommand(
  'literate.split_fragment',
  async function (position? : vscode.Position) {
    splitFragment(position);
  }
);

context.subscriptions.push(literateSplitFragmentDisposable);

literate.split_fragment implementation

The literate.split_fragment will split the current fragment below the line where the cursor is. If no active text editor was found nothing will happen. With one in hand though we can either use the position given to the method, or otherwise use the cursor location in the document.

<<split fragment>>=
function splitFragment(position_? : vscode.Position)
{
  let activeEditor = vscode.window.activeTextEditor;
  if(activeEditor)
  {
    const document = activeEditor.document;
    const position = position_ ? position_ : activeEditor.selection.active;

With the document and position we can find the Token at that location. Continue only if it is a fence.

<<split fragment>>=+
    const tokenUsage = theOneRepository.getTokenAtPosition(
      document,
      new vscode.Range(position, position));
    if(tokenUsage.token && tokenUsage.token.type === 'fence')
    {

Next we match the info line, we want to ensure we have actually a fragment here.

<<split fragment>>=+
      let match = tokenUsage.token.info.match(FRAGMENT_RE);
      if(match && match.groups)
      {

From the matched info line we take the language identifier and the fragment tag name. We can create the text that will split the current fragment.

<<split fragment>>=+
        let langId = match.groups.lang.trim();
        let tagName = match.groups.tagName.trim();
        let textToInsert = `${FENCE}\n\n${FENCE}${langId} : ${OPENING}${tagName}${CLOSING}=+\n`;

Finally we can create the workspace edit, make the insert on the next line from the cursor, and apply the edit.

<<split fragment>>=+
        let workspaceEdit = new vscode.WorkspaceEdit();
        workspaceEdit.insert(
          document.uri,
          new vscode.Position(position.line+1, 0),
          textToInsert
          );
        vscode.workspace.applyEdit(workspaceEdit);
      }
    }
  }
}

Diagnostics

In this chapter are a few methods that help creating and updating diagnostics. These diagnostics help the literate programmer determining if there are problems with the text and where.

Updating diagnostics

Diagnostic messages are instances of vscode.Diagnostic. To show them in the Problems panel in VSCode we add the diagnostics collection, which typically is passed into the updateDiagnostics.

<<diagnostic updating>>=
function updateDiagnostics(
  uri: vscode.Uri,
  collection: vscode.DiagnosticCollection,
  diagnostic : vscode.Diagnostic | undefined): void {
  if (uri) {
    if (diagnostic) {
      const diags = Array.from(collection.get(uri) || []);
      diags.push(diagnostic);
      collection.set(uri, diags);
    }
  } else {
    collection.clear();
  }
}

Creating an error diagnostic

Instances of vscode.Diagnostic can be created with createErrorDiagnostic. This takes a markdownIt token, a message and a range. If the passed in range isn't a proper range the range is harvested from the passed in token.

For now all messages are considered errors.

<<diagnostic updating>>=+
 token Token that carries the faulty code fragment
 * @param message Error message
 */
function createErrorDiagnostic(token: Token, message: string, range? : vscode.Range) : vscode.Diagnostic {
  range = range ? range : fragmentRange(token);
  let diagnostic: vscode.Diagnostic = {
    severity: vscode.DiagnosticSeverity.Error,
    message: message,
    range: range
  };

  return diagnostic;
}

Line number of fragment start

<<diagnostic updating>>=+
 token Token to extract code location from
 */
function locationOfFragment(token: Token): number {
  let linenumber = token.map ? (token.map[0]) : -1;
  return linenumber;
}

Line number of fragment end

locationOfFragmentEnd is used to get the last line of the given token in the literate document. This is typically used for code fences in this extension.

<<diagnostic updating>>=+
 token Token to extract code location from
 */
function locationOfFragmentEnd(token: Token): number {
  let linenumber = token.map ? (token.map[1] ) : -1;
  return linenumber;
}

Range of whole fragment in text

fragmentRange is a method to construct a vscode.Range for a given token that is a code fragment.

<<diagnostic updating>>=+
 token Token to create range for
 */
function fragmentRange(token: Token): vscode.Range {
  let startTagName = token.info.indexOf("<<") + 2;
  let endTagName = token.info.indexOf(">>") - 1;
  let start = new vscode.Position(locationOfFragment(token), startTagName);
  let end = new vscode.Position(locationOfFragmentEnd(token), endTagName);
  let range: vscode.Range = new vscode.Range(start, end);
  return range;
}

Fragment usage range

This method gives a Range for the given tag name based on the passed in Token. The line number for the occurrance is computed, along with the begin and and positions within the line.

<<diagnostic updating>>=+
function fragmentUsageRange(token : Token, tagName : string) : vscode.Range
{
  let startLineNumber = locationOfFragment(token);
  const lines = token.content.split('\n');
  let index : number = 0;
  for(const line of lines)
  {
    startLineNumber++;
    index = line.indexOf(tagName);
    if(index > -1)
    {
      break;
    }
  }
  let start = new vscode.Position(startLineNumber, index - 2);
  let end = new vscode.Position(startLineNumber, index + tagName.length + 2);
  return new vscode.Range(start, end);
}

Utility functions

Retrieving literate file uri list

Function to get all literate files in a given workspace. We need to ensure that we give the uris in the correct order. The order will be defined by an index.literate if it exists. Otherwise use the order in which they are found by findFiles. If an index.literate exists also harvest the SETTINGS fence if there is any.

<<utility functions>>=
async function getLiterateFileUris(
  workspaceFolder : vscode.WorkspaceFolder
) : Promise<vscode.Uri[]>
{
  const literateFilesInWorkspace : vscode.RelativePattern =
        new vscode.RelativePattern(workspaceFolder, '**/*.literate');
  const _foundLiterateFiles = await vscode.workspace
        .findFiles(literateFilesInWorkspace)
        .then(files => Promise.all(files.map(file => file)));
  let foundLiterateFiles = new Array<vscode.Uri>();
  <<see if an index.literate exists>>
  <<search index for html links>>
  <<sort uris based on html link order>>
  <<get SETTINGS from index.literate>>
  return foundLiterateFiles;
}

TBD: create instead a markup that allows us to express the literate file order in whichever file we want.

If we don't find an index.literate file return the found literate files as is.

<<see if an index.literate exists>>=
const index = _foundLiterateFiles.find(uri => uri.path.endsWith('index.literate'));
if(!index)
{
  return _foundLiterateFiles;
}

We now parse the index file to get the state with markdown tokens. We don't need the rendered HTML of the index document, so we discard that.

<<search index for html links>>=
const md = createMarkdownItParserForLiterate();
const text = await getFileContent(index);
const env: GrabbedState = { literateFileName: 'index.literate', literateUri: index, gstate: new StateCore('', md, {}) };
const _ = md.render(text, env);

We create a new list of strings to which we will add the file uris in the order is found in the parsed tokens. We do that by looking for bullet_open_list and bullet_close_list. We assume that inside these lists there will be items that contain links. Once we've encountered an bullet_open_item we start looking for list_item_open. When the token is found we get the inline token that is two tokens away from the list_item_open token. We make sure that it has child tokens, and that the first child token has a valid attrs property. Here we find the uri to the HTML version of a literate file.

TBD add support for ordered list.

<<sort uris based on html link order>>=
let links = new Array<string>();
let bulletListOpen = false;
let idx = 0;
for(let token of env.gstate.tokens)
{
  if(token.type==='bullet_list_open')
  {
    bulletListOpen = true;
  }
  if(token.type==='bullet_list_close')
  {
    bulletListOpen = false;
  }
  if(bulletListOpen && token.type==='list_item_open')
  {
    let inline = env.gstate.tokens[idx+2];
    if(inline.children && inline.children[0].attrs)
    {
      try {
        const currentUri = inline.children[0].attrs[0][1];
        let path = currentUri.replace("html", "literate");
        const foundUri = _foundLiterateFiles.find(uri => uri.path.endsWith(path));
        if(foundUri)
        {
          foundLiterateFiles.push(foundUri);
        }
      } catch(_) {}
    }
  }
  idx++;
}

Then ensure that the index file also exists in the list to return.

<<sort uris based on html link order>>=+
const finalCheck = foundLiterateFiles.find(uri => uri.path.endsWith('index.literate'));
if(!finalCheck)
{
  foundLiterateFiles.splice(0, 0, index);
}

Finally harvest code fences that have SETTINGS in the info. Split the content of the fence into lines, and loop over the lines.

If a trimmed line starts with template harvest the filename that is given as its value. As an additional check ensure the file exists before assigning its URI to htmlTemplateFile.

Write out HTML

Function to write out the given rendered content out to a file. The rendered string will be set into a HTML body. The HTML template will be read from the file at the URI specified by htmlTemplateFile if it is set, otherwise use a hard-coded piece of HTML template.

For the template to work it needs to have the string [CONTENT] where the rendered Markdown HTML is going to be substituted.

The default HTML template imports mermaid.js as a module, so that it can work on any pre tag that has the CSS class mermaid set.

If authors contains a string that is not empty one ore more Open Graph protocol tags will be used to replace the [AUTHORS] tag if it exists. If authors is empty [AUTHORS] will be replaced with that. If the authors string contains semi-colons it will be split on those, and for each author are there will be an article:author tag added.

Regarding the file endings we do the same as for source files. On Windows we replace single LF instances with CRLF and vice versa on non-Windows machines replace CRLF instances with LF.

<<utility functions>>=+
async function writeOutHtml
      (fname : string,
       folderUri : vscode.Uri,
       rendered : string) : Promise<void>
{
  let html = '';
  const getContent = async () => {
    let _html = '';
    if(htmlTemplateFile) {
      _html = await getFileContent(htmlTemplateFile);
    } else {
      _html =
`<!DOCTYPE html>
<html>
<head>
  <meta name="description" content="A Literate Program written with the Literate Programming vscode extension by Nathan 'jesterKing' Letwory and contributors" />
  <meta property="og:description" content="A Literate Program written with the Literate Programming vscode extension by Nathan 'jesterKing' Letwory and contributors" />
  <link rel="stylesheet" type="text/css" href="./style.css">
  [AUTHORS]
</head>
<body>
[CONTENT]
<script type="module">
  import mermaid from 'https://cdn.jsdelivr.net/npm/mermaid@10/dist/mermaid.esm.min.mjs';
</script>
</body>
</html>`;
    }
    return _html;
  };
  html = await getContent();

  let authorlist = authors.split(";");
  let metaAuthors = '';
  for(let author of authorlist) {
    metaAuthors += `<meta name="author" content="${author}">`;
  }

  html = html
    .replace("[CONTENT]", rendered)
    .replace("[AUTHORS]", metaAuthors);

  if(os.platform()==='win32'){
    const lf2crlf = /([^\r])\n/g;
    html = html.replaceAll(lf2crlf, '$1\r\n');
  } else {
    const crlf2lf = /\r\n/g;
    html = html.replaceAll(crlf2lf, '\n');
  }
  const encoded = Buffer.from(html, 'utf-8');
  fname = fname.replace(".literate", ".html");
  const fileUri = vscode.Uri.joinPath(folderUri, fname);
  return Promise.resolve(vscode.workspace.fs.writeFile(fileUri, encoded));
};

Get file content for uri

For each literate file in the workspace we'll get eventually the text content, but we do have to check if any of the files are opened in an editor. Especially for on-the-fly updating of the tree view, but also for fragment name completion and similar functionality we need to get the text from the TextDocument instead of the file on disk. If there is a TextDocument that corresponds to the literate file we are currently handling we read the text into currentContent, otherwise we set it to an empty string.

<<utility functions>>=+
async function getFileContent(
  file : vscode.Uri
) : Promise<string>
{
  const currentContent = (() =>
      {
          for(const textDocument of vscode.workspace.textDocuments) {
              if(vscode.workspace.asRelativePath(file) === vscode.workspace.asRelativePath(textDocument.uri)) {
                  return textDocument.getText();
              }
          }
          return '';
      }
  )();

If currentContent is an empty string we read the content from the file on disk, and decode it into text. If on the other hand we do have currentContent, we use that for our text instead. The currentContent wil be more up-to-date than what we have on disk.

<<utility functions>>=+
  const content = currentContent ? null : await vscode.workspace.fs.readFile(file);
  const text = currentContent ? currentContent : new TextDecoder('utf-8').decode(content);
  return text;
}

The extension

Our Visual Studio Code entry file is the extension.ts file. While developing the plug-in the JavaScript version created from this, in out/extension.js is set as the entry point for the extension, in package.json. But when it is prepared for release on the Visual Studio Code marketplace this needs to be changed to the minified and bundled version that gets realized as out/main.js. This ensures, together with a properly set up .vscodeignore that the published package stays small in size. Without that the package is easily over 2MB in size, but properly configured it is under 400KB.

The extension main entry lies in the activation of the extension, as given by <<activate the extension>>, but before we get there we need to set up several bits and pieces that are required for the proper functioning of the tools.

First of all we import all the functionality and modules we are going to need.

<<literate.*>>= ./src/extension.ts $


<<import necessary modules for literate>>

After the imports we introduce the oldFence where we will keep a hold of the fence rule from the default MarkdownIt parser. I was not entirely sure how to best tackle it, so for now it is here.

Here we also find htmlTemplateFile and authors, variables that can be set in a literate program through a code fence in index.literate with the token info containing the string SETTINGS.

<<literate.*>>=+
let oldFence : Renderer.RenderRule | undefined;
const FENCE = '```';
const OPENING = '<<';
const CLOSING = '>>';

let htmlTemplateFile : vscode.Uri | undefined = undefined;
let authors = '';

With that out of the way we introduce the interfaces we use in the Literate Programming extension.

<<literate.*>>=+
<<introduce interfaces>>

Next we set up the fragment regular expressions and define everything needed to implement the fragment explorer. This explorer will show up in the Explorer bar when a literate project is open. We need a representation for a node in the tree view, a data provider for the tree view and then the actual tree view explorer itself.

<<literate.*>>=+
<<fragment regular expressions>>

<<fragment node>>

<<fragment tree provider>>

<<fragment explorer>>

<<fragment hover provider>>

For our extension we need to override the code fence rule since we want to augment the rendering of the code fences. Specifically we want to add the fragment line prior to the code block. This is explained in the section on <<renderCodeFence rule>>.

Also we have a way to create a MarkdownIt parser the way we need it. It is explained in more detail in the section on <<create markdownit parser>>.

<<literate.*>>=+
<<renderCodeFence rule>>

<<create markdownit parser>>

The central mechanism of the Literate Programming extension, the tools it provides, are expressed in <<render and collect state>>, <<handle fragments>> and <<method to write out source files>>. These all ensure that all literate files can be iterated, parsed, rendered. And that from the parsed state all the code fragments can be collected and extrapolated into the source file or source files as written in the literate program.

<<literate.*>>=+
<<render and collect state>>
<<handle fragments>>
<<method to write out source files>>
<<fragment repository>>
<<rename provider class>>
<<code action provider class>>
<<definition provider class>>
<<reference provider class>>

Utility function to determine the workspace folder for a TextDocument

<<literate.*>>=+
function determineWorkspaceFolder(document : vscode.TextDocument) : vscode.WorkspaceFolder | undefined
{
  if(!vscode.workspace.workspaceFolders || vscode.workspace.workspaceFolders.length === 0)
  {
    return undefined;
  }
  for(const ws of vscode.workspace.workspaceFolders)
  {
    const relativePath = path.relative(ws.uri.toString(), document.uri.toString());
    if(!relativePath.startsWith('..'))
    {
      return ws;
    }
  }
  return undefined;
}

Although the fragments mentioned above are the soul of the extension they are not of much use without the proper activation. With this activate implementation all providers and commands are registered with Visual Studio Code.

<<literate.*>>=+
<<activate the extension>>
<<diagnostic updating>>
<<utility functions>>
<<create fragment for tag>>
<<split fragment>>

There is nothing currently needed for deactivation of the extension, so there is just an empty-bodied implementation for it.

<<literate.*>>=+
export function deactivate() {}

The imports

<<import necessary modules for literate>>=
import { TextDecoder } from 'util';
import * as vscode from 'vscode';
import * as path from 'path';
import * as os from 'os';

import StateCore = require('markdown-it/lib/rules_core/state_core');
import Token = require('markdown-it/lib/token');
import MarkdownIt = require("markdown-it");
import Renderer = require('markdown-it/lib/renderer');




const hljs = require('highlight.js');

import { grabberPlugin } from './grabber';

Interfaces used in Literate Programming

<<introduce interfaces>>=
interface WriteRenderCallback {
  (
    fname : string,
    folderUri : vscode.Uri,
    content : string
  ) : Promise<void>
};
interface WriteSourceCallback {
  (
    workspaceFolder : vscode.WorkspaceFolder,
    fragments : Map<string, FragmentInformation>
  ) : Thenable<void>
};

<<grabbed state type>>
<<fragment information type>>
<<token usage interface>>

Extension activation

The extension activation sets up all our tools and data structures. The activation happens through the activate function. This takes a context to which we push all our disposables for proper cleanup when our extension gets decativate. Note that our activate implementation is also marked async, because we want to await where necessary.

<<activate the extension>>=
let theOneRepository : FragmentRepository;
export async function activate(context: vscode.ExtensionContext) {

We start the activation by setting up the FragmentRepository. This is the hard of the processing of literate projects. We give it the context so that it can also push disposables to the context.subscriptions for proper cleanup.

With the repository set up we will process the entire workspace for literate projects. We want to await here to ensure everything is ready, that is our repository can provide fragments when requested.

With that done we place our repository in the context.subscriptions.

<<activate the extension>>=+
  theOneRepository = new FragmentRepository(context);
  await theOneRepository.processLiterateFiles(undefined);
  context.subscriptions.push(theOneRepository);

Now that the repository is up and running we can register all our commands, views and providers. Note that we currently use markdown to register to. In the future we would probably want to ensure that .literate becomes its own language ID to register agains.

<<activate the extension>>=+
  <<register literate.process>>
  <<register literate.create_fragment_for_tag>>
  <<register literate.split_fragment>>
  <<register fragment tree view>>
  <<register completion item provider>>
  <<register definiton provider>>
  <<register reference provider>>

  context.subscriptions.push(
    vscode.languages.registerHoverProvider('markdown', new FragmentHoverProvider(theOneRepository))
  );

  context.subscriptions.push(
    vscode.languages.registerRenameProvider('markdown', new LiterateRenameProvider(theOneRepository))
  );

  context.subscriptions.push(
    vscode.languages.registerCodeActionsProvider('markdown', new LiterateCodeActionProvider(theOneRepository))
  );

From our extension activation we return an extension of the built-in MarkdownIt parser. This way we get a preview where the parser is configured the same way as our extension. This should result in previews that are close to what our HTML rendering does.

<<activate the extension>>=+
  console.log('Ready to do some Literate Programming');

  return {
    extendMarkdownIt(md: any) {
      md.use(grabberPlugin);
      oldFence = md.renderer.rules.fence;
      md.renderer.rules.fence = renderCodeFence;
      return md;
    }
  };
};

Afterword

So you have made it this far - or perhaps you just skipped over a lot of text. If you actually read all the text up to this point you have read also all of the code for the entire Literate Programming. I appreciate you took the time to read this document. I hope it helped you get more interested in the literate programming paradigm.

I invite you to install the Literate Programming extension for Visual Studio Code and start using it in your daily work.