Literate programming is a programming paradigm introduced by Donald Knuth. A program is written in a natural language with snippets of code interspersed. From this text usable source code is generated, along with well formatted human-readable document.

The most important influence for this literate programming extension is the PBR Book.

This extension provides a set of tools that help the programmer writing literate programs. Through automation the process of writing literate programs should be as painless as possible. The programmer writes his literate programs using Markdown. When a literate programmer needs a snippet they can add a code fence. In this extension snippets are called code fragments. Building on the framework provided by Visual Studio Code this extension introduces code completion, code actions, definition provider, hover tooltips and a fragment explorer.

The approach for this extension is based on Markdown documents. For this extension the Markdown specification is only slightly adapted to make supporting literate programming easy. The code fragments are expressed in code fences as per the Markdown specification, either with surrounding triple backticks or triple tildes. Along with the programming language identifier the opening line has been extended to contain the fragment name and type essentially as options to the code fence. The opening line thus will look like py : <<fragment name>> to create a new fragment, or like py : <<fragment name>>=+ to amend an existing fragment.

The order of declaration of code fragments does not matter. It is possible to reference code fragments before they are created. It is thus possible to write fragments that reference code fragments that haven't been seen yet. The only requirement is that code fragments referenced by other code fragments eventually are created.

From fragment to source code

Fragments themselves don't directly create source files in most cases, but in the end source files is what is wanted from this extension.

To create actual source files a fragment creation line needs to be used with a slightly extended form of the creation tag mentioned above. The name has to be suffixed with the string .* between the chevrons. Furthermore a file name needs to be specified after the equal sign which is followed by whitespace and a dollar sign $. This is essentially a relative path that is going to be appended to each workspace folder as the root. A top-level fragment looks like py : <<top-level fragment.*>> = ./src/source.py $. The name of this fragment is top-level fragment, and once it has been fully extrapolated will it be written to a file in the workspace folder under src as the file source.py.

The Literate Programming extension allows the program author to write multiple projects in the same Visual Studio Code workspace. Each workspace folder is the root for its own literate project. Within each project there can be one or more literate files. These files carry the extension .literate. One literate file can contain zero or more code fragments. A literate file can also contain more than one top level fragment. In other words an author can create multiple source files within just one literate document.

This text describes the Literate Programming extension as a literate program.

Fragment Model

The tools provided by the Literate Programming extension are built around one repository of the project providing all necessary information around fragments.

The fragment repository handles parsing of literate documents, reacting to changes made by users. The repository provides all fragments found in the projects added to the current workspace. Additionally the repository will write out source files and rendered HTML files.

The fragment model is defined in the FragmentRepository class, which will be described in detail after introducing a couple of classes that help the repository.

FragmentMap class

The FragmentMap class holds a map of strings, which are the fragment names, and their associated FragmentInformation instances. This map is available through the map property. The class provides also a clear method and a dispose method.

<<fragment map>>=

class FragmentMap {
  map : Map<string, FragmentInformation>;

  constructor()
  {
    this.map = new Map<string, FragmentInformation>();
  }

  clear()
  {
    this.map.clear();
  }

  dispose()
  {
    this.map.clear();
  }
};

List of GrabbedState

The class GrabbedStateList holds an array of GrabbedState accessible through the list property. The class provides clear and dispose properties.

GrabbedState is collected from the MarkdownIt parser. It contains tokens and related information generated by the parser.

<<list of grabbed states>>=

class GrabbedStateList {
  list : Array<GrabbedState>;

  constructor()
  {
    this.list = new Array<GrabbedState>();
  }

  clear()
  {
    this.list = new Array<GrabbedState>();
  }

  dispose()
  {
    while(this.list.length>0)
    {
      this.list.pop();
    }
  }
};

The FragmentRepository class

The FragmentRepository uses several helper classes that we looked at just above.These we introduce right before defining the repository class.

<<fragment repository>>=

<<fragment map>>
<<list of grabbed states>>
<<fragment tag location>>

export class FragmentRepository {
  <<fragment repository member variables>>
  <<fragment repository constructor>>
  <<fragment generation method>>

  <<method to get fragments from repository>>

  <<method to get fragment on line for position>>
  <<method to get token at position>>
  <<method to get state for workspace>>
  <<method to get state for document>>
  <<method to get all reference locations>>

  dispose() {
    for(let fragmentMap of this.fragmentsForWorkspaceFolders.values())
    {
      fragmentMap.dispose();
    }
    this.fragmentsForWorkspaceFolders.clear();

    for(let grabbedState of this.grabbedStateForWorkspaceFolders.values())
    {
      grabbedState.dispose();
    }
    this.grabbedStateForWorkspaceFolders.clear();
  }
}

Member variables

Our FragmentRepository needs a couple of member variables to function properly. We'll need an instance of a properly configured MarkdownIt parser.

<<fragment repository member variables>>=

private md : MarkdownIt;

The MarkdownIt parser will handle the actual tokenizing and parsing of the literate files.

Since we work with a multi-root workspace we'll create a map of maps. The keys for this top-level map will be the workspace folder names. The actual FragmentMaps will be the values to each workspace folder.

<<fragment repository member variables>>=+

readonly fragmentsForWorkspaceFolders : Map<string, FragmentMap>;

For our parsing functionality we need an Array<GrabbedState>, which we have encapsulated in the class GrabbedStateList and is available through the list property. Each GrabbedStateList is saved to the map of workspace folder name and list key-value pair.

<<fragment repository member variables>>=+

readonly grabbedStateForWorkspaceFolders : Map<string, GrabbedStateList>;

Finally we need a DiagnosticCollection to be able to keep track of detected problems in literate projects. TBD: this probably needs to be changed into a map of DiagnosticCollection, again with the workspace folder names as keys.

<<fragment repository member variables>>=+

readonly diagnostics : vscode.DiagnosticCollection;

Constructor

The constructor takes an extension context to register any disposables there. We'll be registering to text document changes, and to workspace changes. In both cases we want to process literate files to regenerate fragments, source files and HTML files.

<<fragment repository constructor>>=

constructor(
  context : vscode.ExtensionContext
)
{
  <<initializing the fragment repository members>>

  <<subscribe to text document changes>>
  <<subscribe to workspace changes>>
}

Initializing members

First we make sure we have an instance of the MarkdownIt parser that is set up for our literate files processing.

<<initializing the fragment repository members>>=

this.md = createMarkdownItParserForLiterate();

Then we'll make sure the maps for tracking fragment maps and grabbed states are created, and finally pushing our diagnostics collection to our subscription.

<<initializing the fragment repository members>>=+

this.fragmentsForWorkspaceFolders = new Map<string, FragmentMap>();
this.grabbedStateForWorkspaceFolders = new Map<string, GrabbedStateList>();
this.diagnostics = vscode.languages.createDiagnosticCollection('literate');
context.subscriptions.push(this.diagnostics);

Subscribing to text document changes

The repository subscribes to the onDidChangeTextDocument event on the workspace. It could process literate files on each change, but the completion item provider needs to trigger itself processing of literate files. Since completion item provider gets called on typing a opening chevron (<) we skip triggering the processing here when such a character has been typed.

<<subscribe to text document changes>>=

context.subscriptions.push(
  vscode.workspace.onDidChangeTextDocument(
    async (e : vscode.TextDocumentChangeEvent) =>
    {
      if(!(e.contentChanges.length>0 && e.contentChanges[0].text.startsWith('<')))
      {
        await this.processLiterateFiles(e.document);
      }
    }
  )
);

Subscribing to workspace changes

Triggering of processing literate documents is necessary when new workspace folders have been added. Additionally we need to clean up fragment maps and grabbed states for those workspace folders that have been removed from the workspace folder.

<<subscribe to workspace changes>>=

context.subscriptions.push(
  vscode.workspace.onDidChangeWorkspaceFolders(
    async (e : vscode.WorkspaceFoldersChangeEvent) =>
    {
      for(const addedWorkspaceFolder of e.added) {
        await this.processLiterateFiles(addedWorkspaceFolder);
      }
      for(const removedWorkspaceFolder of e.removed)
      {
        this.fragmentsForWorkspaceFolders.delete(removedWorkspaceFolder.name);
        this.grabbedStateForWorkspaceFolders.delete(removedWorkspaceFolder.name);
      }
    }
  )
);

Processing literate files

The parsing and setting up of the fragments map is handled with the method processLiterateFiles. Additionally the method will write out all specified source files.

Processing the literate files is started generally in one of three cases: 1) change in workspace due to addition or removal of a workspace folder, 2) change to a literate document or through triggering of the literate.process command.

<<fragment generation method>>=

async processLiterateFiles(
  trigger :
    vscode.WorkspaceFolder
    | vscode.TextDocument
    | undefined) {
      <<set up workspace folder array>>
      <<iterate over workspace folders and parse>>
}

First we determine the workspace folder or workspace folders to process. In the case where trigger is a workspace folder or a text document we use the given workspace folder or determine the one to which the text document belongs. In these cases we'll have an array with just the one workspace folder as element. When the trigger is undefined we'll use all workspace folders registered to this workspace.

<<set up workspace folder array>>=

const workspaceFolders : Array<vscode.WorkspaceFolder> | undefined = (() => {
  if(trigger)
  {
    <<get workspace if text document>>
    <<else just use passed in workspace>>
    if("eol" in trigger) {
      const ws = determineWorkspaceFolder(trigger);
      if(ws)
      {
        return [ws];
      }
    } else {
      return [trigger];
    }
  }
  if(vscode.workspace.workspaceFolders && vscode.workspace.workspaceFolders.length>0) {
    let folders = new Array<vscode.WorkspaceFolder>();
    for(const ws of vscode.workspace.workspaceFolders)
    {
      folders.push(ws);
    }
    return folders;
  }
  return undefined;
}
)();

We can check if our trigger is a TextDocument by checking if eol is a property. If the eol property exists we are dealing with an TextDocument, if it doesn't exist we are dealing with a Workspace.

<<get workspace if text document>>=

if("eol" in trigger) {
  const ws = determineWorkspaceFolder(trigger);
  if(ws)
  {
    return [ws];
  }
}

Again, when the property eol is not found in the object we were passed we can assume it is just a workspace so return that as the one element in the array we return.

<<else just use passed in workspace>>=

else
{
  return [trigger];
}

With the list of workspace folders set up we can iterate over each folder and then handle literate files in that workspace folder.

<<iterate over workspace folders and parse>>=

if(workspaceFolders) {
  for(const folder of workspaceFolders)
  {
    <<set up fragments and grabbedStateList>>
    if(fragments && grabbedStateList) {
      <<clear FragmentMap and GrabbedStateList>>
      <<iterate over all files, write out html>>
      <<hanle fragments for map>>
      <<extrapolate fragments and save out>>
    }
  }
}

First we ensure entries for our workspace exist in the maps for FragmentMap and GrabbedStateList.

<<set up fragments and grabbedStateList>>=

    if(!this.fragmentsForWorkspaceFolders.has(folder.name))
    {
      this.fragmentsForWorkspaceFolders.set(folder.name, new FragmentMap());
    }
    if(!this.grabbedStateForWorkspaceFolders.has(folder.name))
    {
      this.grabbedStateForWorkspaceFolders.set(folder.name, new GrabbedStateList());
    }

Next we can get the FragmentMap and GrabbedStateList for our workspace folder. These we'll fill up with the data of our literate project.

<<set up fragments and grabbedStateList>>=+

    const fragments = this.fragmentsForWorkspaceFolders.get(folder.name);
    const grabbedStateList = this.grabbedStateForWorkspaceFolders.get(folder.name);

Each time we process a literate project we clear out the fragments and state so that we don't end up with stray elements.

<<clear FragmentMap and GrabbedStateList>>=

fragments.clear();
grabbedStateList.clear();

Our first pass is iterating over all the literate files in our folder, parsing them as we go. Each parsed file will be rendered as HTML and saved out to disk. The parser state with all the tokens will be set to grabbedStateList.list. We need to await on this async function, otherwise our state will be incomplete. The full state is needed for the next two steps.

<<iterate over all files, write out html>>=

await iterateLiterateFiles(folder,
                           writeOutHtml,
                           grabbedStateList.list,
                           this.md);

With the state complete, and our HTML files saved out, we are going to do two passes over the state. Lets do the first step here: we clear out the diagnostics, and then await on handleFragments. This function we call such that there is no extrapolation of fragments, nor source files are going to be saved. We await for the function to complete, otherwise our fragment map will be incomplete, or even just missing later on.

<<hanle fragments for map>>=

this.diagnostics.clear();
fragments.map = await handleFragments(folder,
                                      grabbedStateList.list,
                                      this.diagnostics,
                                      false,
                                      undefined);

The second step we'll call the fragment handler again, but this time we do want the fragments to be completely extrapolated, and the final source files written to disk. Before the call we again clear out the DiagnosticCollection so that we get the correct diagnostics in case of errors in literate files.

<<extrapolate fragments and save out>>=

this.diagnostics.clear();
await handleFragments(folder,
                      grabbedStateList.list,
                      this.diagnostics,
                      true,
                      writeSourceFiles);

Fetching fragments for workspace folder

When we call getFragments we assume the literate projects have all been process properly. In most cases that is triggered automatically, but it may be necessary to trigger the processing manually before calling getFragments. When the projects have been properly processed, though, this function returns the FragmentMap for the given workspace folder.

<<method to get fragments from repository>>=

getFragments(workspaceFolder : vscode.WorkspaceFolder) : FragmentMap
{
let fragmentMap : FragmentMap = new FragmentMap();
this.fragmentsForWorkspaceFolders.forEach(
  (value, key, _) =>
  {
    if(key === workspaceFolder.name)
    {
      fragmentMap = value;
      }
    }
  );

  return fragmentMap;
}

Getting fragment on line for position

This method checks to see if for the given text line and position a fragment usage or mention can be found.

First we find matches on the current line against FRAGMENT_USE_IN_CODE_RE. In all matches we check which of them is at the given position. We do that by searching for the index of the match tag name, including the double chevron bracketing.

The range for the FragmentLocation will be created from the found index and run the length of the tag name including the double enclosing chevrons. An attempt to find the corresponding fragment is made, but if no such fragment exists the FragmentLocation will be created with the fragment set to undefined. The root and add parts are also given to the fragment location, even if they were not matched. This information can be used elsewhere to determine what kind of fragment was found at the given position.

<<method to get fragment on line for position>>=

getFragmentTagLocation(
  document : vscode.TextDocument,
  currentLine : vscode.TextLine,
  position : vscode.Position
) : FragmentLocation
{
  const workspaceFolder : vscode.WorkspaceFolder | undefined = determineWorkspaceFolder(document);
  const matchesOnLine = [...currentLine.text.matchAll(FRAGMENT_USE_IN_CODE_RE)];
  for(const match of matchesOnLine)
  {
    if(!match || !match.groups) {
      continue;
    }
    const tagName = `${OPENING}${match.groups.tagName}${CLOSING}`;
    const foundIndex = currentLine.text.indexOf(tagName);
    if(foundIndex>-1) {
      if(foundIndex <= position.character && position.character <= foundIndex + tagName.length)
      {
        const startPosition = new vscode.Position(currentLine.lineNumber, foundIndex);
        const endPosition = new vscode.Position(currentLine.lineNumber, foundIndex + tagName.length);
        let range : vscode.Range = new vscode.Range(startPosition, endPosition);
        let fragment : FragmentInformation | undefined;
        if(workspaceFolder) {
          const fragments = theOneRepository.getFragments(workspaceFolder).map;
          fragment = fragments.get(match.groups.tagName) || undefined;
        }
        return new FragmentLocation(match.groups.tagName, document.uri, range, fragment, match.groups.root, match.groups.add);
      }
    }
  }

  return unsetFragmentLocation;
}

The FragmentLocation

A fragment location encodes the occurrence of what could be a fragment that already exists or one that still needs to be defined. The class holds the name of the fragment, the range of this string in the resource specified by the uri, and whether a FragmentInformation was found or not.

The properties root and add can be used to determine what type of fragment is at the given range.

<<fragment tag location>>=

export class FragmentLocation
{
  readonly rangeExclusive : vscode.Range;
  readonly valid : boolean;

  constructor(
    public readonly name : string,
    public readonly uri: vscode.Uri,
    public readonly range : vscode.Range,
    public readonly fragment : FragmentInformation | undefined,
    public readonly root : string | undefined,
    public readonly add : string | undefined
  )
  {
    this.valid = uri.fsPath.indexOf('not_valid_for_literate')===-1;
    if(name.startsWith(OPENING)) {
      this.rangeExclusive = new vscode.Range(
        range.start.line, range.start.character + 2,
        range.end.line, range.end.character - 2
      );
    }
    else
    {
      this.rangeExclusive = range;
    }
  }
}
const unsetFragmentLocation =
    new FragmentLocation(
      '',
      vscode.Uri.file('not_valid_for_literate'),
      new vscode.Range(0,0,0,0),
      undefined,
      undefined,
      undefined
    );

Get fragment usage token

This method takes a text document and a range, based on which the token containing the range is returned. If no token is found, or the workspace folder is not available the emptyToken constant is returned.

<<method to get token at position>>=

getTokenAtPosition(
  document : vscode.TextDocument,
  range : vscode.Range
) : TokenUsage
{

Determine the workspace folder for the given text document. As mentioned above, is no workspace folder is found the emptyToken is returned.

<<method to get token at position>>=+

  const workspaceFolder : vscode.WorkspaceFolder | undefined = determineWorkspaceFolder(document);
  if(!workspaceFolder)
  {
    return emptyToken;
  }

<<method to get token at position>>=+

  const state = this.getDocumentState(document);

We can iterate over all the tokens in the grabbed state of the document. We're only interested in tokens that have a valid map property, since we need to check the range asked for.

<<method to get token at position>>=+

  for(const token of state.gstate.tokens)
  {
    if(token.map) {

If the range given is contained within the token map we create a new TokenUsage and return that. This concludes the search for the token containing the range we are interested in.

<<method to get token at position>>=+

      const tokenRange = new vscode.Range(token.map[0], 0, token.map[1], 1024);
      if(tokenRange.contains(range))
      {
        let tokenUsage : TokenUsage = {
          token : token,
        };
        return tokenUsage;
      }
    }
  }

<<method to get token at position>>=+

  return emptyToken;
}

The TokenUsage interface helps determining whether we have a token or not. TBD: we can probably get rid of this interface and just use a Token directly.

<<token usage interface>>=

interface TokenUsage
{
  token : Token | undefined,
}

const emptyToken : TokenUsage =
{
  token : undefined,
};

Get list of grabbed states for a workspace

<<method to get state for workspace>>=

getWorkspaceState(workspaceFolder : vscode.WorkspaceFolder) : GrabbedStateList
{
  let grabbedState : GrabbedStateList = new GrabbedStateList();
  this.grabbedStateForWorkspaceFolders.forEach(
    (value, key, _) =>
    {
      if(key === workspaceFolder.name)
      {
        grabbedState = value;
        }
      }
    );

  return grabbedState;
}

Get the grabbed state of a document

<<method to get state for document>>=

getDocumentState(document: vscode.TextDocument) : GrabbedState
{
  let grabbedState : GrabbedState = emptyState;
  const ws = determineWorkspaceFolder(document);
  if(ws) {
    const workspaceState = this.getWorkspaceState(ws);
    for(const state of workspaceState.list)
    {
      if(document.uri.path === state.literateUri.path)
      {
        grabbedState = state;
      }
    }
  }

  return grabbedState;
}

Get all reference locations

Finding all references for a fragment, that is fragment usage or fragment mention in a literate project will go over all tokens of a workspace. For each reference a vscode.Location is returned.

The getReferenceLocations method takes a workspace folder and a fragment name, and will return an array of vscode.Location.

<<method to get all reference locations>>=

getReferenceLocations(
  workspaceFolder : vscode.WorkspaceFolder,
  fragmentName : string
) : vscode.Location[]
{

We start with an empty list of locations, which we will fill for each reference hit we determine in the given literate project. For the workspace folder we get the latest grabbed state.

We then will proceed to iterate through all grabbed states. Remember that each grabbed state corresponds to a literate document. From that grabbed state we will iterate over each token, and we'll be interested only in the tokens that have a valid map property.