Literate programming is a programming paradigm introduced by Donald Knuth. A program is written in a natural language with snippets of code interspersed. From this text usable source code is generated, along with well formatted human-readable document.
The most important influence for this literate programming extension is the PBR Book.
This extension provides a set of tools that help the programmer writing literate programs. Through automation the process of writing literate programs should be as painless as possible. The programmer writes his literate programs using Markdown. When a literate programmer needs a snippet they can add a code fence. In this extension snippets are called code fragments. Building on the framework provided by Visual Studio Code this extension introduces code completion, code actions, definition provider, hover tooltips and a fragment explorer.
The approach for this extension is based on Markdown documents. For this
extension the Markdown specification is only slightly adapted to make supporting
literate programming easy. The code fragments are expressed in code fences
as per the Markdown specification, either with surrounding triple backticks or
triple tildes. Along with the programming language identifier the opening line
has been extended to contain the fragment name and type essentially as options
to the code fence. The opening line thus will look like py : <<fragment name>>
to create a new fragment, or like py : <<fragment name>>=+
to amend an
existing fragment.
The order of declaration of code fragments does not matter. It is possible to reference code fragments before they are created. It is thus possible to write fragments that reference code fragments that haven't been seen yet. The only requirement is that code fragments referenced by other code fragments eventually are created.
Fragments themselves don't directly create source files in most cases, but in the end source files is what is wanted from this extension.
To create actual source files a fragment creation line needs to be used with a
slightly extended form of the creation tag mentioned above. The name has to be
suffixed with the string .*
between the chevrons. Furthermore a file name
needs to be specified after the equal sign which is followed by whitespace and a
dollar sign $
. This is essentially a relative path that is going to be
appended to each workspace folder as the root. A top-level fragment looks like
py : <<top-level fragment.*>> = ./src/source.py $
. The name of this fragment is
top-level fragment
, and once it has been fully extrapolated will it be written
to a file in the workspace folder under src
as the file source.py
.
The Literate Programming extension allows the program author to write
multiple projects in the same Visual Studio Code workspace. Each workspace
folder is the root for its own literate project. Within each project there can
be one or more literate files. These files carry the extension .literate
.
One literate file can contain zero or more code fragments. A literate file can
also contain more than one top level fragment. In other words an author can
create multiple source files within just one literate document.
This text describes the Literate Programming extension as a literate program.
The tools provided by the Literate Programming extension are built around one repository of the project providing all necessary information around fragments.
The fragment repository handles parsing of literate documents, reacting to changes made by users. The repository provides all fragments found in the projects added to the current workspace. Additionally the repository will write out source files and rendered HTML files.
The fragment model is defined in the FragmentRepository
class, which will be
described in detail after introducing a couple of classes that help the
repository.
The FragmentMap
class holds a map of strings, which are the fragment names,
and their associated FragmentInformation
instances. This map is available
through the map
property. The class provides also a clear
method and a
dispose
method.
class FragmentMap {
map : Map<string, FragmentInformation>;
constructor()
{
this.map = new Map<string, FragmentInformation>();
}
clear()
{
this.map.clear();
}
dispose()
{
this.map.clear();
}
};
The class GrabbedStateList
holds an array of GrabbedState
accessible through
the list
property. The class provides clear
and dispose
properties.
GrabbedState
is collected from the MarkdownIt parser. It contains tokens and
related information generated by the parser.
class GrabbedStateList {
list : Array<GrabbedState>;
constructor()
{
this.list = new Array<GrabbedState>();
}
clear()
{
this.list = new Array<GrabbedState>();
}
dispose()
{
while(this.list.length>0)
{
this.list.pop();
}
}
};
The FragmentRepository
uses several helper classes that we looked at just
above.These we introduce right before defining the repository class.
<<fragment map>>
<<list of grabbed states>>
<<fragment tag location>>
export class FragmentRepository {
<<fragment repository member variables>>
<<fragment repository constructor>>
<<fragment generation method>>
<<method to get fragments from repository>>
<<method to get fragment on line for position>>
<<method to get token at position>>
<<method to get state for workspace>>
<<method to get state for document>>
<<method to get all reference locations>>
dispose() {
for(let fragmentMap of this.fragmentsForWorkspaceFolders.values())
{
fragmentMap.dispose();
}
this.fragmentsForWorkspaceFolders.clear();
for(let grabbedState of this.grabbedStateForWorkspaceFolders.values())
{
grabbedState.dispose();
}
this.grabbedStateForWorkspaceFolders.clear();
}
}
Our FragmentRepository
needs a couple of member variables to function
properly. We'll need an instance of a properly configured MarkdownIt parser.
private md : MarkdownIt;
The MarkdownIt parser will handle the actual tokenizing and parsing of the literate files.
Since we work with a multi-root workspace we'll create a map of maps. The keys
for this top-level map will be the workspace folder names. The actual
FragmentMap
s will be the values to each workspace folder.
readonly fragmentsForWorkspaceFolders : Map<string, FragmentMap>;
For our parsing functionality we need an Array<GrabbedState>
, which we have
encapsulated in the class GrabbedStateList
and is available through the list
property. Each GrabbedStateList
is saved to the map of workspace folder name
and list key-value pair.
readonly grabbedStateForWorkspaceFolders : Map<string, GrabbedStateList>;
Finally we need a DiagnosticCollection
to be able to keep track of detected
problems in literate projects. TBD: this probably needs to be changed into a
map of DiagnosticCollection
, again with the workspace folder names as keys.
readonly diagnostics : vscode.DiagnosticCollection;
The constructor takes an extension context to register any disposables there. We'll be registering to text document changes, and to workspace changes. In both cases we want to process literate files to regenerate fragments, source files and HTML files.
constructor(
context : vscode.ExtensionContext
)
{
<<initializing the fragment repository members>>
<<subscribe to text document changes>>
<<subscribe to workspace changes>>
}
First we make sure we have an instance of the MarkdownIt parser that is set up for our literate files processing.
this.md = createMarkdownItParserForLiterate();
Then we'll make sure the maps for tracking fragment maps and grabbed states are created, and finally pushing our diagnostics collection to our subscription.
this.fragmentsForWorkspaceFolders = new Map<string, FragmentMap>();
this.grabbedStateForWorkspaceFolders = new Map<string, GrabbedStateList>();
this.diagnostics = vscode.languages.createDiagnosticCollection('literate');
context.subscriptions.push(this.diagnostics);
The repository subscribes to the onDidChangeTextDocument
event on the
workspace. It could process literate files on each change, but the
completion item provider needs to trigger itself processing of literate files.
Since completion item provider gets called on typing a opening chevron (<
) we
skip triggering the processing here when such a character has been typed.
context.subscriptions.push(
vscode.workspace.onDidChangeTextDocument(
async (e : vscode.TextDocumentChangeEvent) =>
{
if(!(e.contentChanges.length>0 && e.contentChanges[0].text.startsWith('<')))
{
await this.processLiterateFiles(e.document);
}
}
)
);
Triggering of processing literate documents is necessary when new workspace folders have been added. Additionally we need to clean up fragment maps and grabbed states for those workspace folders that have been removed from the workspace folder.
context.subscriptions.push(
vscode.workspace.onDidChangeWorkspaceFolders(
async (e : vscode.WorkspaceFoldersChangeEvent) =>
{
for(const addedWorkspaceFolder of e.added) {
await this.processLiterateFiles(addedWorkspaceFolder);
}
for(const removedWorkspaceFolder of e.removed)
{
this.fragmentsForWorkspaceFolders.delete(removedWorkspaceFolder.name);
this.grabbedStateForWorkspaceFolders.delete(removedWorkspaceFolder.name);
}
}
)
);
The parsing and setting up of the fragments
map is handled with the method
processLiterateFiles
. Additionally the method will write out all specified
source files.
Processing the literate files is started generally in one of three cases: 1) change
in workspace due to addition or removal of a workspace folder, 2) change to a
literate document or through triggering of the literate.process
command.
async processLiterateFiles(
trigger :
vscode.WorkspaceFolder
| vscode.TextDocument
| undefined) {
<<set up workspace folder array>>
<<iterate over workspace folders and parse>>
}
First we determine the workspace folder or workspace folders to process. In the
case where trigger
is a workspace folder or a text document we use the given
workspace folder or determine the one to which the text document belongs. In
these cases we'll have an array with just the one workspace folder as element.
When the trigger is undefined
we'll use all workspace folders registered to
this workspace.
const workspaceFolders : Array<vscode.WorkspaceFolder> | undefined = (() => {
if(trigger)
{
<<get workspace if text document>>
<<else just use passed in workspace>>
if("eol" in trigger) {
const ws = determineWorkspaceFolder(trigger);
if(ws)
{
return [ws];
}
} else {
return [trigger];
}
}
if(vscode.workspace.workspaceFolders && vscode.workspace.workspaceFolders.length>0) {
let folders = new Array<vscode.WorkspaceFolder>();
for(const ws of vscode.workspace.workspaceFolders)
{
folders.push(ws);
}
return folders;
}
return undefined;
}
)();
We can check if our trigger
is a TextDocument
by checking if eol
is a
property. If the eol
property exists we are dealing with an TextDocument
, if it doesn't exist we are dealing with a Workspace
.
if("eol" in trigger) {
const ws = determineWorkspaceFolder(trigger);
if(ws)
{
return [ws];
}
}
Again, when the property eol
is not found in the object we were passed we can
assume it is just a workspace so return that as the one element in the array we
return.
else
{
return [trigger];
}
With the list of workspace folders set up we can iterate over each folder and then handle literate files in that workspace folder.
if(workspaceFolders) {
for(const folder of workspaceFolders)
{
<<set up fragments and grabbedStateList>>
if(fragments && grabbedStateList) {
<<clear FragmentMap and GrabbedStateList>>
<<iterate over all files, write out html>>
<<hanle fragments for map>>
<<extrapolate fragments and save out>>
}
}
}
First we ensure entries for our workspace exist in the maps for FragmentMap
and GrabbedStateList
.
if(!this.fragmentsForWorkspaceFolders.has(folder.name))
{
this.fragmentsForWorkspaceFolders.set(folder.name, new FragmentMap());
}
if(!this.grabbedStateForWorkspaceFolders.has(folder.name))
{
this.grabbedStateForWorkspaceFolders.set(folder.name, new GrabbedStateList());
}
Next we can get the FragmentMap
and GrabbedStateList
for our workspace
folder. These we'll fill up with the data of our literate project.
const fragments = this.fragmentsForWorkspaceFolders.get(folder.name);
const grabbedStateList = this.grabbedStateForWorkspaceFolders.get(folder.name);
Each time we process a literate project we clear out the fragments and state so that we don't end up with stray elements.
fragments.clear();
grabbedStateList.clear();
Our first pass is iterating over all the literate files in our folder,
parsing them as we go. Each parsed file will be rendered as HTML and saved out
to disk. The parser state with all the tokens will be set to
grabbedStateList.list
. We need to await
on this async
function, otherwise
our state will be incomplete. The full state is needed for the next two steps.
await iterateLiterateFiles(folder,
writeOutHtml,
grabbedStateList.list,
this.md);
With the state complete, and our HTML files saved out, we are going to do two
passes over the state. Lets do the first step here: we clear out the
diagnostics, and then await on handleFragments
. This function we call such
that there is no extrapolation of fragments, nor source files are going to be
saved. We await
for the function to complete, otherwise our fragment map will
be incomplete, or even just missing later on.
this.diagnostics.clear();
fragments.map = await handleFragments(folder,
grabbedStateList.list,
this.diagnostics,
false,
undefined);
The second step we'll call the fragment handler again, but this time we do
want the fragments to be completely extrapolated, and the final source files
written to disk. Before the call we again clear out the DiagnosticCollection
so that we get the correct diagnostics in case of errors in literate files.
Again we wait for the results, just to ensure it all completes before we go on.
this.diagnostics.clear();
await handleFragments(folder,
grabbedStateList.list,
this.diagnostics,
true,
writeSourceFiles);
When we call getFragments
we assume the literate projects have all been
process properly. In most cases that is triggered automatically, but it may be
necessary to trigger the processing manually before calling getFragments
. When
the projects have been properly processed, though, this function returns the
FragmentMap
for the given workspace folder.
getFragments(workspaceFolder : vscode.WorkspaceFolder) : FragmentMap
{
let fragmentMap : FragmentMap = new FragmentMap();
this.fragmentsForWorkspaceFolders.forEach(
(value, key, _) =>
{
if(key === workspaceFolder.name)
{
fragmentMap = value;
}
}
);
return fragmentMap;
}
This method checks to see if for the given text line and position a fragment usage or mention can be found.
First we find matches on the current line against FRAGMENT_USE_IN_CODE_RE
. In
all matches we check which of them is at the given position. We do that by
searching for the index of the match tag name, including the double chevron
bracketing.
The range for the FragmentLocation
will be created from the found index and
run the length of the tag name including the double enclosing chevrons. An
attempt to find the corresponding fragment is made, but if no such fragment
exists the FragmentLocation
will be created with the fragment set to
undefined
. The root
and add
parts are also given to the fragment location,
even if they were not matched. This information can be used elsewhere to
determine what kind of fragment was found at the given position.
getFragmentTagLocation(
document : vscode.TextDocument,
currentLine : vscode.TextLine,
position : vscode.Position
) : FragmentLocation
{
const workspaceFolder : vscode.WorkspaceFolder | undefined = determineWorkspaceFolder(document);
const matchesOnLine = [...currentLine.text.matchAll(FRAGMENT_USE_IN_CODE_RE)];
for(const match of matchesOnLine)
{
if(!match || !match.groups) {
continue;
}
const tagName = `${OPENING}${match.groups.tagName}${CLOSING}`;
const foundIndex = currentLine.text.indexOf(tagName);
if(foundIndex>-1) {
if(foundIndex <= position.character && position.character <= foundIndex + tagName.length)
{
const startPosition = new vscode.Position(currentLine.lineNumber, foundIndex);
const endPosition = new vscode.Position(currentLine.lineNumber, foundIndex + tagName.length);
let range : vscode.Range = new vscode.Range(startPosition, endPosition);
let fragment : FragmentInformation | undefined;
if(workspaceFolder) {
const fragments = theOneRepository.getFragments(workspaceFolder).map;
fragment = fragments.get(match.groups.tagName) || undefined;
}
return new FragmentLocation(match.groups.tagName, document.uri, range, fragment, match.groups.root, match.groups.add);
}
}
}
return unsetFragmentLocation;
}
A fragment location encodes the occurrence of what could be a fragment that
already exists or one that still needs to be defined. The class holds the name
of the fragment, the range of this string in the resource specified by the uri,
and whether a FragmentInformation
was found or not.
The properties root
and add
can be used to determine what type of fragment
is at the given range.
export class FragmentLocation
{
readonly rangeExclusive : vscode.Range;
readonly valid : boolean;
constructor(
public readonly name : string,
public readonly uri: vscode.Uri,
public readonly range : vscode.Range,
public readonly fragment : FragmentInformation | undefined,
public readonly root : string | undefined,
public readonly add : string | undefined
)
{
this.valid = uri.fsPath.indexOf('not_valid_for_literate')===-1;
if(name.startsWith(OPENING)) {
this.rangeExclusive = new vscode.Range(
range.start.line, range.start.character + 2,
range.end.line, range.end.character - 2
);
}
else
{
this.rangeExclusive = range;
}
}
}
const unsetFragmentLocation =
new FragmentLocation(
'',
vscode.Uri.file('not_valid_for_literate'),
new vscode.Range(0,0,0,0),
undefined,
undefined,
undefined
);
This method takes a text document and a range, based on which the token
containing the range is returned. If no token is found, or the workspace folder
is not available the emptyToken
constant is returned.
getTokenAtPosition(
document : vscode.TextDocument,
range : vscode.Range
) : TokenUsage
{
Determine the workspace folder for the given text document. As mentioned above,
is no workspace folder is found the emptyToken
is returned.
const workspaceFolder : vscode.WorkspaceFolder | undefined = determineWorkspaceFolder(document);
if(!workspaceFolder)
{
return emptyToken;
}
Next we can retrieve the state for the document.
const state = this.getDocumentState(document);
We can iterate over all the tokens in the grabbed state of the document. We're
only interested in tokens that have a valid map
property, since we need to
check the range asked for.
for(const token of state.gstate.tokens)
{
if(token.map) {
If the range given is contained within the token map we create a new TokenUsage and return that. This concludes the search for the token containing the range we are interested in.
const tokenRange = new vscode.Range(token.map[0], 0, token.map[1], 1024);
if(tokenRange.contains(range))
{
let tokenUsage : TokenUsage = {
token : token,
};
return tokenUsage;
}
}
}
If no hit was found return the emptyToken
.
return emptyToken;
}
The TokenUsage
interface helps determining whether we have a token or not.
TBD: we can probably get rid of this interface and just use a Token
directly.
interface TokenUsage
{
token : Token | undefined,
}
const emptyToken : TokenUsage =
{
token : undefined,
};
getWorkspaceState(workspaceFolder : vscode.WorkspaceFolder) : GrabbedStateList
{
let grabbedState : GrabbedStateList = new GrabbedStateList();
this.grabbedStateForWorkspaceFolders.forEach(
(value, key, _) =>
{
if(key === workspaceFolder.name)
{
grabbedState = value;
}
}
);
return grabbedState;
}
getDocumentState(document: vscode.TextDocument) : GrabbedState
{
let grabbedState : GrabbedState = emptyState;
const ws = determineWorkspaceFolder(document);
if(ws) {
const workspaceState = this.getWorkspaceState(ws);
for(const state of workspaceState.list)
{
if(document.uri.path === state.literateUri.path)
{
grabbedState = state;
}
}
}
return grabbedState;
}
Finding all references for a fragment, that is fragment usage or fragment
mention in a literate project will go over all tokens of a workspace. For
each reference a vscode.Location
is returned.
The getReferenceLocations
method takes a workspace folder and a fragment name,
and will return an array of vscode.Location
.
getReferenceLocations(
workspaceFolder : vscode.WorkspaceFolder,
fragmentName : string
) : vscode.Location[]
{
We start with an empty list of locations, which we will fill for each reference hit we determine in the given literate project. For the workspace folder we get the latest grabbed state.
We then will proceed to iterate through all grabbed states. Remember that each
grabbed state corresponds to a literate document. From that grabbed state we
will iterate over each token, and we'll be interested only in the tokens that
have a valid map
property.
const fragmentTag = OPENING+fragmentName+CLOSING;
let locations = new Array<vscode.Location>();
let grabbedStateList = this.getWorkspaceState(workspaceFolder).list;
for(const grabbedState of grabbedStateList)
{
for(const token of grabbedState.gstate.tokens)
{
if(token.map)
{
When we have a token that could contain a reference we'll see if there is any occurrence of the fragment tag, otherwise the content has no reference.
if(token.content.indexOf(fragmentTag) > -1)
{
With a hit in the entire content of the token we need to figure out each reference, which we do by splitting the token content into lines if there are any new line characters, then for each line look at each hit.
If our token is a fence
we initialize idx
to 1, otherwise to 0.
const lines = token.content.split("\n");
let idx = token.type === 'fence' ? 1 : 0;
for(const line of lines) {
let offset = line.indexOf(fragmentTag);
while(offset>-1) {
When offset is larger than -1 we know we have a hit, so we can create a new
range using the token.map[0]
and the idx
. The range will include the entire
fragment tag, with the opening and closing double chevrons.
let range = new vscode.Range(
token.map[0] + idx,
offset,
token.map[0] + idx,
offset + fragmentTag.length
);
The location then is created with the uri of the literate file that contains this token, and the rnge we just set up.
let location = new vscode.Location(grabbedState.literateUri, range);
locations.push(location);
We chech for the next occurrance of the fragment tag by looking enough characters past the current offset. That way we'll ensure we get to all the references if there are multiple on one line.
offset = line.indexOf(fragmentTag, offset + 5);
}
Update idx
for each pass while going through the lines array.
idx++;
}
}
}
}
}
Return the locations
array. If there were hits the locations
array will have
entries, if there were no hits the array will be empty.
return locations;
}
As mentioned in the introduction the main idea of the extension is to collect
all fragments that are created in all .literate
files. Once all fragments have
been collected they are extrapolated until the top fragments are the full source
files. Fully extrapolated top fragments are written to the source files as
specicied for them.
The first step is to put each .literate
file through the MarkdownIt
renderer. Each rendering will be given a special environment that will be used
to collect the state for the render. The state will contain the document
tokenized according the Markdown specification. The state env
is of type
GrabbedState
. Among the tokens will be the code fences that are code
fragments. For each .literate
file the grabbed state env
is saved in the
list of GrabbedState
s envList
.
async function iterateLiterateFiles(workspaceFolder : vscode.WorkspaceFolder,
writeHtml : WriteRenderCallback
| undefined
| null,
envList : Array<GrabbedState>,
md : MarkdownIt)
{
<<find all literate files in workspace>>
try {
for (let fl of foundLiterateFiles) {
<<get text from literate document>>
<<parse literate file>>
<<write out rendered file if requested>>
}
} catch (error) {
console.log(error);
}
}
We ensure that only literate files are going to be parsed for their program
fragments. We do that by using a vscode.RelativePattern
using the workspace
folder passed into iterateLiterateFiles
.
const foundLiterateFiles = await getLiterateFileUris(workspaceFolder);
We get the content of our literate file using getFileContent
. We do need
to await
for that so that we actually get the string and not a promise.
const text = await getFileContent(fl);
With the text
for our literate document ready we harvest the relative file
path to our document from the workspace folder. fname
is then set as the
literateFileName
of our GrabbedState
instance that we push into the
envList
so that we can access it later. Now we finally get to pass the text
of our literate document to the MarkdownIt renderer. Once that is done we
have both an HTML representation of our document as well as the entire parser
state in env
.
const fname = path.relative(workspaceFolder.uri.path, fl.path);
const env: GrabbedState = { literateFileName: fname, literateUri: fl, gstate: new StateCore('', md, {}) };
envList.push(env);
const rendered = md.render(text, env);
If a callback implementing WriteRenderCallback
is passed to
iterateLiterateFiles
we call that with the endered file content so that it can
be saved as an HTML
file with the same name as the .literate
file that was
being rendered, but with the extension replaced with .html
. Conversely, if no
callback was passed in it is not called and rendered results are not saved to
disk.
if(writeHtml)
{
await writeHtml(fname, workspaceFolder.uri, rendered);
}
The GrabbedState
interface is used to create a type that helps us collecting
the tokens for each .literate
file. Instances of objects with this interface
are passed to a MarkdownIt renderer. The renderer will have the
GrabberPlugin
registered, which provides a rule that helps us collecting the
states of each rendered file. The grabbed state is collected in gstate
, which
is an instance of the StateCore
, provided by MarkdownIt.
The interface defines literateFileName
, which is the filename of the
literate document to which the grabbed state belongs. literateUri
is the
full uri for this document. Finally gstate
holds the StateCore
of the
parsing result.
interface GrabbedState {
literateFileName: string;
literateUri: vscode.Uri;
gstate: StateCore;
}
We define a GrabbedState
that is not valid, the emptyState
. This allows us
to always return an object instead of undefined
in select cases.
const emptyState : GrabbedState =
{
literateFileName : '',
literateUri : vscode.Uri.file('not_valid_for_literate'),
gstate: new StateCore('', createMarkdownItParserForLiterate(), '')
};
In the iterateLiterateFiles
we start by setting up the MarkdownIt parser.
const md : MarkdownIt = createMarkdownItParserForLiterate();
The function createMarkdownItParserForLiterate
does this setup so that it is
easy to get a new parser to use for different purposes, like parsing documents
to get the code fragment names for code completion.
The highlight function we use to ensure our code fragments get syntax highlighting. This simply relies on highlight.js to do the work.
We also tell MarkdownIt to use our grabberPlugin
. This plug-in harvests the
internal states for each document into instances of GrabbedState
. These states
we'll later use to get all the different code fragments and to weave them into
the code files they describe.
Finally we replace the default fence
rule with our own renderCodeFence
rule.
The intent of that rule will be explained in the section on renderCodeFence
.
function createMarkdownItParserForLiterate() : MarkdownIt
{
const md : MarkdownIt = new MarkdownIt({
highlight: function(str: string, lang: string, attrs: string) {
if(lang && hljs.getLanguage(lang)) {
return '<pre><code>' +
hljs.highlight(str, {language : lang}).value +
'</code></pre>';
}
return '<pre title="' + attrs + '">' + md.utils.escapeHtml(str) + '</pre>';
}
})
.use(grabberPlugin);
oldFence = md.renderer.rules.fence;
md.renderer.rules.fence = renderCodeFence;
return md;
}
Before we dive deeper into the processing of .literate
documents it is
necessary to have a look at how fragments work.
Fragments in the literate
extension have a specific format that requires a bit
of explaining.
There are four types of fragment tags, three of which either create or modify a fragment, and one that expresses fragment usage.
For the detection of fragments a couple of regular expressions are used. These are explained in more detail below.
Lets start by looking at the form for fragment tag use.
Fragments can be used in code blocks by using their tag double opening and
closing chevrons around the fragment name <<fragment name>>
. To detect usage
of fragments in code we use FRAGMENT_USE_IN_CODE_RE
.
const FRAGMENT_USE_IN_CODE_RE =
/(?<indent>[ \t]*)<<(?<tagName>.+)>>(?<root>=)?(?<add>\+)?/g;
The regular expression captures four groups. A match will give us 5 or more
results, the whole string matched and the captured groups. There may be some
additional parts after that, but those we will discard. The whole string matched
is called the tag
. The first group is called indent
, which will be used to
indent the whole fragment code when it gets extrapolated into the final code.
The second group is called tagName
, which is the fragment name. The third
group is called root
and the final group is called add
. For fragment use we
essentially need only the second group tagName
, with the indent
still
serving a function. The other groups are in the regular expression so we can
identify incorrect use of fragments in code: creating or adding to fragments
inside code blocks is not valid.
The application of FRAGMENT_USE_IN_CODE_RE
is explained in more detail in the
section on code realization.
There is the tag used to create a new fragment, which is always in conjunction with the opening code fence tag. This means either a triple backtick or triple tilde followed by the programming language identifier for the following code block. The actual fragment tag is placed as first option right after the colon following the language specifier.
const FRAGMENT_RE =
/(?<lang>[^:]*)(?<colon>:)?.*<<(?<tagName>.+)>>(?<root>=)?(?<add>\+)?\s*(?<fileName>.*\s+\$)?(?<extraSettings>\s+.*)?/;
Most of the groups correspond to the ones defined by FRAGMENT_USE_IN_CODE_RE
with a few additions. Most notably there is the group catching the language
specifier, the group to catch the filename and the group to catch extra
settings, called lang
, fileName
and extraSettings
respectively.
The filename group has to end in whitespace and a dollar sign.
Also the colon is separated out into a group. That will allow for checking if a tag declaration is properly formed. When the colon is missing it is possible to detect this and emit a diagnostic accordingly.
So to create a new tag the info line for the code fence could look like
py : <<a fragment name>>=
.
To add to a fragment a +
is added, so it could look like
py : <<a fragment name>>=+
. Having a fragment without =
or =+
on the code
fence info line is an error.
All code fragments are fetched from each environment state. This is done through
looking for all fence
tokens. If the token.info
for a fence
matches the
FRAGMENT_RE
we can check to see whether the fragment we have currently in our
hands is a new fragment (root && !add
) or whether this one expands an existing
one (root && add
), as will be explained in more detail further down.
async function handleFragments(
workspaceFolder : vscode.WorkspaceFolder,
envList : Array<GrabbedState>,
diagnostics : vscode.DiagnosticCollection,
extrapolateFragments : boolean,
writeSource : WriteSourceCallback | undefined) : Promise<Map<string, FragmentInformation>>
{
const folderUri = workspaceFolder.uri;
<<build fragment map>>
if(extrapolateFragments)
{
<<extrapolate fragments>>
}
if(writeSource) {
writeSource(workspaceFolder, fragments);
}
return Promise.resolve(fragments);
}
First we build a map of all available fragments. These will go into fragments
,
which is of type Map<string, FragmentInformation>
. The name of a fragment will
function as the key, and an instance of FragmentInformation
will be the value.
const fragments = new Map<string, FragmentInformation>();
const overwriteAttempts = new Array<string>();
const missingFilenames = new Array<string>();
const addingToNonExistant = new Array<string>();
for (let env of envList) {
for (let token of env.gstate.tokens) {
<<handle fence tokens>>
}
}
Each fence
token we find we need to check. There may be of course code fences
in the document that do not create or modify a fragment. These we need to skip.
Since we are handling code fences we use FRAGMENT_RE
to match token.info
. A
fragment is malformed if the colon is missing, so we need to
<<emit diagnostic when colon is missing>>
.
if (token.type === 'fence') {
const linenumber = locationOfFragment(token);
const match = token.info.match(FRAGMENT_RE);
if (match && match.groups) {
let lang = match.groups.lang.trim();
let colon = match.groups.colon;
let name = match.groups.tagName;
let root = match.groups.root;
let add = match.groups.add;
let fileName = match.groups.fileName;
let extraSettings = match.groups.extraSettings;
<<emit diagnostic when colon is missing>>
<<add to existing fragment>>
<<create a new fragment>>
}
}
The diagnostic emitted has a message telling the colon is missing, along with the line number and the literate file this happened in.
if(lang && !match.groups.colon) { let msg = `Missing colon for fragment: ${name}. ${env.literateFileName}${linenumber}`; const diag = createErrorDiagnostic(token, msg); updateDiagnostics(env.literateUri, diagnostics, diag); }
If the root
group has captured a result but not the add
group we know we
have a new fragment on our hand.
If we already have in our fragments
map a key with the same name
as the
fragment we are currently handling we add an error diagnostic message. We don't
stop handling fences, or the entire literate.process
command for that matter.
We keep on going, but leave it up to the programmer to see and handle the error
messages.
If a fragment name with .*
is found we need to ensure there is a result in the
fileName
capture group. That is going to be needed to write out the source
code file eventually. A file defining fragment without a file name is an error.
When everything appears to be in order a new FragmentInformation
instance is
created with the information found. The code for this fragment is the token
content in token.content
. Finally the new FragmentInformation
instance is
added to the fragments
map.
If a new fragment is going to be created, but it already exists in the fragment
map we emit an error diagnostic. To ensure we emit the error diagnostic only
once the fragment name is added to overwriteAttempts
.
if (root && !add) {
if (fragments.has(name)) {
if(!overwriteAttempts.includes(name))
{
let msg = `Trying to overwrite existing fragment fragment ${name}. ${env.literateFileName}${linenumber}`;
const diag = createErrorDiagnostic(token, msg);
updateDiagnostics(env.literateUri, diagnostics, diag);
overwriteAttempts.push(name);
}
}
If it does not yet exist in the fragment map we can proceed. We need to check though if we have a top-level fragment. In that case we require a file name, so emit an error diagnostic when that is missing.
else {
if (!fileName && name.indexOf(".*") > -1 ) {
if(!missingFilenames.includes(name)) {
let msg = `Expected filename for star fragment ${name}`;
const diag = createErrorDiagnostic(token, msg);
updateDiagnostics(env.literateUri, diagnostics, diag);
missingFilenames.push(name);
}
}
On the contrary, if we have a non-starred fragment but we do get a filename we also issue a diagnostic to notify the programmer of the mistake.
if(fileName && name.indexOf(".*")===-1) {
let msg = `Unexpected filename for non-star fragment ${name}`;
const diag = createErrorDiagnostic(token, msg);
updateDiagnostics(env.literateUri, diagnostics, diag);
}
We do need to make sure that the fileName gets cleaned up because the matching expression for this contains whitespace and a dollar sign.
if(fileName) {
fileName = fileName.replace(/\s+\$/, "");
}
Check the extraSettings
group if a template is specified. Find the
vscode.Uri
for the file specified. Use that if it exists, otherwise keep
sourceTemplateUri
at undefined
.
let sourceTemplateUri : vscode.Uri | undefined = undefined;
if(extraSettings) {
let settings = extraSettings.split(";");
for(let setting of settings)
{
setting = setting.trim();
if(setting.startsWith("template"))
{
let settingParts = setting.split("=");
const sourceTemplateFilePattern : vscode.RelativePattern = new vscode.RelativePattern(workspaceFolder, settingParts[1]);
const _foundSourceTemplateFile = await vscode.workspace
.findFiles(sourceTemplateFilePattern)
.then(files => Promise.all(files.map(file => file)));
if(_foundSourceTemplateFile.length===1)
{
sourceTemplateUri = _foundSourceTemplateFile[0];
}
}
}
}
We can now finally create the FragmentInformation
instance and add it to our
fragment map.
let code = token.content;
let fragmentInfo: FragmentInformation = {
lang: lang,
literateFileName: env.literateFileName,
sourceFileName: fileName,
templateFileName: sourceTemplateUri,
code: code,
tokens: [token],
env: env,
};
fragments.set(name, fragmentInfo);
}
}
If both the root
and add
groups have capture their results, an =
and an
+
respectively we need to add code to an existing fragment.
For this to work a new fragment needs to be always present before the modifying fragment. It is an error to try to modify a fragment that hasn't been seen yet.
The fragment with specified name
is fetched, and when it is not undefined
the token.content
is appended to the code
of the FragmentInformation
instance we got from the map. The current token is also appended to the tokens
list.
The fragments map is updated with the modified FragmentInformation
instance.
if (root && add) {
if (fragments.has(name)) {
let fragmentInfo = fragments.get(name) || undefined;
if(fragmentInfo && fragmentInfo.code) {
let additionalCode = token.content;
fragmentInfo.code = `${fragmentInfo.code}${additionalCode}`;
fragmentInfo.tokens.push(token);
fragments.set(name, fragmentInfo);
}
} else {
if(!addingToNonExistant.includes(name)) {
let msg = `Trying to add to non-existant fragment ${name}. ${env.literateFileName}:${linenumber}`;
const diag = createErrorDiagnostic(token, msg);
updateDiagnostics(env.literateUri, diagnostics, diag);
addingToNonExistant.push(name);
}
}
}
We have now seen the FragmentInformation
type being used several times, so it
is important to take a moment to clarify it in more detail.
The interface allows us to gather information for each found code fragment. It
allows us to store the programming language identifier, name of the .literate
file and name of the targeted source file, if the code fragment happens to be a
top fragment.
The actual code for the fragment is stored in code
. Furthermore the tokens for
the complete fragment are stored in the tokens
list. This list is of objects
that fullfill the Token
interface, which is provided by the MarkdownIt
module.
interface FragmentInformation {
lang: string;
literateFileName: string;
sourceFileName: string;
templateFileName: vscode.Uri | undefined;
code: string;
tokens: Token[];
env: GrabbedState;
}
Writing source files is a matter of looping through the keys of a fragments map.
For each key that ends with the .*
string we check if a fragment exists, and
if for that fragment a source filename is recorded. If that is the case write
out the file with the code content of the fragment.
If a vscode.Uri
is defined for templateFileName
read the file contents and
use that instead of the default one that says just [CODE]
. This means that for
a template to work properly it needs to contain the string [CODE]
, since that
will be replaced with the code generated for this file.
For newline handling we'll replace all single LF occurances with CRLF when the underlying operating system is Windows. Otherwise do the reverse: replace CRLF with a single LF.
async function writeSourceFiles(workspaceFolder : vscode.WorkspaceFolder,
fragments : Map<string, FragmentInformation>)
{
const folderUri = workspaceFolder.uri;
for(const name of fragments.keys()) {
if (name.endsWith(".*")) {
let fragmentInfo = fragments.get(name) || undefined;
if (fragmentInfo && fragmentInfo.sourceFileName) {
let sourceTemplate = '[CODE]';
if(fragmentInfo.templateFileName) {
sourceTemplate = await getFileContent(fragmentInfo.templateFileName);
}
let code = sourceTemplate.replace("[CODE]", fragmentInfo.code);
let fixed = '';
if(os.platform()==='win32')
{
const lf2crlf = /([^\r])\n/g;
fixed = code.replaceAll(lf2crlf, '$1\r\n');
} else {
const crlf2lf = /\r\n/g;
fixed = code.replaceAll(crlf2lf, '\n');
}
const encoded = Buffer.from(fixed, 'utf-8');
let fileName = fragmentInfo.sourceFileName.trim();
const fileUri = vscode.Uri.joinPath(folderUri, fileName);
try {
await vscode.workspace.fs.writeFile(fileUri, encoded);
} catch(writeError)
{
console.log(writeError);
}
}
}
}
}
Once all fragments have been collected from the .literate
files of the project
fragments can be combined into source code.
let pass: number = 0;
const rootIncorrect = new Array<string>();
const addIncorrect = new Array<string>();
const fragmentNotFound = new Array<string>();
do {
pass++;
let fragmentReplaced = false;
for (let fragmentName of fragments.keys()) {
let fragmentInfo = fragments.get(fragmentName) || undefined;
if (!fragmentInfo) {
continue;
}
const casesToReplace = [...fragmentInfo.code.matchAll(FRAGMENT_USE_IN_CODE_RE)];
for (let match of casesToReplace) {
if(!match || !match.groups) {
continue;
}
let tag = match[0];
let indent = match.groups.indent;
let tagName = match.groups.tagName;
let root = match.groups.root;
let add = match.groups.add;
if (root && !rootIncorrect.includes(tag)) {
let msg = `Found '=': incorrect fragment tag in fragment, ${tag}`;
const diag = createErrorDiagnostic(fragmentInfo.tokens[0], msg);
updateDiagnostics(fragmentInfo.env.literateUri, diagnostics, diag);
rootIncorrect.push(tag);
}
if (add && !addIncorrect.includes(tag)) {
let msg = `Found '+': incorrect fragment tag in fragment: ${tag}`;
const diag = createErrorDiagnostic(fragmentInfo.tokens[0], msg);
updateDiagnostics(fragmentInfo.env.literateUri, diagnostics, diag);
addIncorrect.push(tag);
}
if (!fragments.has(match.groups.tagName) && tagName !== "(?<tagName>.+)" && !fragmentNotFound.includes(tagName)) {
let msg = `Could not find fragment ${tag} (${tagName})`;
let range = fragmentUsageRange(fragmentInfo.tokens[0], tagName);
const diag = createErrorDiagnostic(fragmentInfo.tokens[0], msg, range);
updateDiagnostics(fragmentInfo.env.literateUri, diagnostics, diag);
fragmentNotFound.push(tagName);
}
let fragmentToReplaceWith = fragments.get(tagName) || undefined;
if (fragmentToReplaceWith) {
let code = fragmentToReplaceWith.code;
let lines = code.split("\n").slice(0, -1);
let indentedLines = lines.flatMap(function (e, _) {
return indent + e;
});
let newcode = indentedLines.join("\n");
fragmentReplaced = true;
fragmentInfo.code = fragmentInfo.code.replace(tag, newcode);
fragments.set(fragmentName, fragmentInfo);
}
}
}
if(!fragmentReplaced) {
break;
}
}
while (pass < 25);
Our extension uses a custom code fence rendering rule to ensure the code fragment name is also rendered as part of the fence.
Essentially the old, default rendering rule for fences is first used to create the original fence.
Then the token.info
is matched against the FRAGMENT_RE
regular expression.
If we have a match we prepare the HTML
code to essentially wrap around the
HTML
as generated by the default rule. Before we can actually wrap that in our
div
tags with the necessary tags we ajust the rendered HTML
code to protect
fragment tags. Otherwise these will also be syntax colored, and that we don't
want. The fragment tag protection is explained in <<fragment tag protector>>
.
For further cleanup of the rendered result any spans with comments in are
removed. These code comments are useful for the generated source code, but a
literate program otherwise already documents code thoroughly. Comment removal from the HTML
rendition is done with <<remove comments from HTML>>
The fence is skipped if its info contains the string SETTINGS
, since that
denotes a configuration block as can be specified in index.literate
. And the
configuration block is not intended to be visible in either resulting code nor
resulting HTML output.
If the fence has its token.info
end with the string mermaid
(all
lower-case), and is not a valid fragment fence then the token.content
is
wrapped in <pre class="mermaid">
and </pre>
. This allows the HTML template
module import of mermaid.js
to render diagrams expressed in these tags.
<<fragment tag protector>>
<<remove comments from HTML>>
function renderCodeFence(tokens : Token[],
idx : number,
options : MarkdownIt.Options,
env : any,
slf : Renderer) {
let rendered = '';
if (oldFence && tokens[idx].info.indexOf("SETTINGS")<0) {
rendered = oldFence(tokens, idx, options, env, slf);
let token = tokens[idx];
if (token.info) {
const match = token.info.match(FRAGMENT_RE);
if (match && match.groups) {
let lang = match.groups.lang.trim();
let name = match.groups.tagName;
let root = match.groups.root;
let add = match.groups.add;
let fileName = match.groups.fileName;
if (name) {
root = root || '';
add = add || '';
fileName = fileName || '';
fileName = fileName.trim();
rendered = protectFragmentTags(rendered);
rendered = removeCodeComments(rendered);
rendered =
`<div class="codefragment">
<div class="fragmentname"><<${name}>>${root}${add} ${fileName}</div>
<div class="code">
${rendered}
</div>
</div>`;
}
}
else if(token.info.endsWith('mermaid')) {
rendered =
`<pre class="mermaid">
${token.content}
</pre>`;
}
}
}
return rendered;
};
With protectFragmentTags
we can adjust the rendered HTML as received from the
oldFence
. We'll search for HTML
that is our fragment tag: <<
,
followed by tag name, followed by >>
. Such occurrances we wrap in a
span
tag that has the CSS
class clean-literate-tag-name
. The class can be
set up to essentially override any styling applied through hljs-keyword
.
For matching we use two regular expressions, FRAGMENT_HTML_RE
and FRAGMENT_HTML_KEYWORD_CLEANUP_RE
.
With FRAGMENT_HTML_KEYWORD_CLEANUP_RE
all span
tags injected by hljs
can be
cleaned up, and with FRAGMENT_HTML_RE
we can wrap our fragment tag names in a
class of our own, literate-name-tag
to handle special styling for tags in code
fences.
const FRAGMENT_HTML_CLEANUP_RE= /(<span.class="hljs-.+?">)(.*?)(<\/span>)/g;
const FRAGMENT_HTML_RE= /(<<.+?>>)/g;
Since these regular expressions are used with replaceAll
they need to be
marked global with g
.
To make sure the highlights are properly cleaned we introduce an inline function
cleanHighlights
that takes care of all the highlights by using a replaceAll
on the match passed on to cleanHighlights
. The result is used to wrap inside
the span
with the literate-name-tag
class.
function protectFragmentTags(rendered : string) : string {
function cleanHighlights(match : string, _: number, __: string)
{
let internal = match.replaceAll(FRAGMENT_HTML_CLEANUP_RE, "$2");
return `<span class="literate-tag-name">${internal}</span>`;
}
return rendered
.replaceAll(
FRAGMENT_HTML_RE,
cleanHighlights
);
}
In our CSS file we can now specify .literate-tag-name
to say an italic font to
stand out in the code fences.
In rendered HTML
code comments are wrapped in span
tags with the class
hljs-comment
. These can be on one line, or for comment blocks on multiple
lines. Since the goal is to remove these completely from rendered HTML
the
regular expression for the match will be just to do that: match the span with
the hljs-comment
even if it is over several lines. To do that we use also the
s
modifier to the expression.
const CODECOMMENT_HTML_RE= /<span class="hljs-comment">.*?<\/span>/gs;
The remove action becomes now a simple replaceAll
on the rendered HTML
using
the regular expression CODECOMMENT_HTML_RE
with the empty string as
replacement.
function removeCodeComments(rendered : string) : string {
rendered = rendered.replaceAll(CODECOMMENT_HTML_RE, "");
return rendered;
}
The command literate.process
is registered with Visual Studio Code. The
disposable that gets returned by registerCommand
is held in
literateProcessDisposable
so that we can push it into context.subscriptions
.
Here we find the main program of our literate.process
command. Our
MarkdownIt is set up, .literate
files are searched and iterated. Each
.literate
file is rendered, and code fragments are harvested. Finally code
fragments are extrapolated and saved to their respective source code files. The
HTML
files are also saved to files.
Diagnostic messages are also handled here. Errors and warnings are shown where
necessary. On successfull completion a simple status bar message will be used.
An information diagnostic message is not good here, because that will prevent
the usage of literate.process
in for instance tasks.json
, since the
diagnostic message will block execution of a task if it were used as prelaunch
task. That is obviously not good for the workflow.
let literateProcessDisposable = vscode.commands.registerCommand(
'literate.process',
async function () {
theOneRepository.processLiterateFiles(undefined);
return vscode.window.setStatusBarMessage("Literate Process completed", 5000);
});
context.subscriptions.push(literateProcessDisposable);
let literateCreateFragmentForTagDisposable = vscode.commands.registerCommand(
'literate.create_fragment_for_tag',
async function (range? : vscode.Range) {
createFragmentForTag(range);
}
);
context.subscriptions.push(literateCreateFragmentForTagDisposable);
The literate.create_fragment_for_tag
will do as its name suggests. When the
position in the document is on a tag then the command will add a code fence to
the document.
If the tag at the position is fragment usage in a fragment then the code fence will be created after the current fragment using the same language as specified in the current fragment. If the tag is a fragment mention then instead the current fragment map is checked for what the most used language is and that is pre-filled.
First we ensure we have an active editor.
function createFragmentForTag(range? : vscode.Range)
{
let activeEditor = vscode.window.activeTextEditor;
if(activeEditor)
{
From the active editor we find the document. The editor also has the information
about where our cursor currently is, under active
property on selection
, but
we use range.start
if range
was passed in.
const document = activeEditor.document;
const position = range ? range.start : activeEditor.selection.active;
For the document and position we determined we can get the fragment at that location. We retrieve that from the repository so that we can get the range for the fragment use.
const fragmentLocation = theOneRepository.getFragmentTagLocation(
document,
document.lineAt(position),
position);
With the range of the fragment use in hand we can find the Markdown token where this range is contained for the document we're in.
const tokenUsage = theOneRepository.getTokenAtPosition(
document,
fragmentLocation.range);
The token needs to be valid to be able to use it for determining the insert. The
map
property on Token
will tell us the begin and end lines. We want to add
our new fragment definition after this token. We'll access the map
when we are
ready to do an insert
on the WorkspaceEdit
.
We initialize a temporary language id to LANGID
.
if(tokenUsage.token && tokenUsage.token.map)
{
let workspaceEdit = new vscode.WorkspaceEdit();
let langId : string = 'LANGID';
If we have a fence
Token
we try matching the info
property of the token
with FRAGMENT_RE
. This gives us the language id used for that fence. We'll be
using the same language id for the new code fragment.
if(tokenUsage.token.type === 'fence' && tokenUsage.token.map)
{
let match = tokenUsage.token.info.match(FRAGMENT_RE);
if(match && match.groups) {
langId = match.groups.lang;
}
}
We can now create the new fragment string with the language id and the fragment tag name we want to create the fragment for.
Newlines at the begin and end of the string ensure the fragment won't be created without the necessary empty lines.
let newFragment = `\n${FENCE} ${langId} : ${OPENING}${fragmentLocation.name}${CLOSING}=\n${FENCE}\n`;
Now that we have the new fragment text ready we can call insert
on our
workspace edit. The position is created with the second element of the
token.map
as the line number, and 0 to have the insert happen at the beginning
of the line.
Finally we apply the workspace edit to our workspace. This will give us the new fragment right after the paragraph or code fence with the fragment name we found at the position we ran the command at.
workspaceEdit.insert(
document.uri,
new vscode.Position(tokenUsage.token.map[1], 0),
newFragment
);
vscode.workspace.applyEdit(workspaceEdit);
}
}
}
Registering the literate.split_fragment
command, setting it up so that it
could take a vscode.Position
parameter, which helps in programmatically firing
the command for a certain pre-computed location.
let literateSplitFragmentDisposable = vscode.commands.registerCommand(
'literate.split_fragment',
async function (position? : vscode.Position) {
splitFragment(position);
}
);
context.subscriptions.push(literateSplitFragmentDisposable);
The literate.split_fragment
will split the current fragment below the line
where the cursor is. If no active text editor was found nothing will happen.
With one in hand though we can either use the position given to the method, or
otherwise use the cursor location in the document.
function splitFragment(position_? : vscode.Position)
{
let activeEditor = vscode.window.activeTextEditor;
if(activeEditor)
{
const document = activeEditor.document;
const position = position_ ? position_ : activeEditor.selection.active;
With the document and position we can find the Token
at that location.
Continue only if it is a fence.
const tokenUsage = theOneRepository.getTokenAtPosition(
document,
new vscode.Range(position, position));
if(tokenUsage.token && tokenUsage.token.type === 'fence')
{
Next we match the info line, we want to ensure we have actually a fragment here.
let match = tokenUsage.token.info.match(FRAGMENT_RE);
if(match && match.groups)
{
From the matched info line we take the language identifier and the fragment tag name. We can create the text that will split the current fragment.
let langId = match.groups.lang.trim();
let tagName = match.groups.tagName.trim();
let textToInsert = `${FENCE}\n\n${FENCE}${langId} : ${OPENING}${tagName}${CLOSING}=+\n`;
Finally we can create the workspace edit, make the insert on the next line from the cursor, and apply the edit.
let workspaceEdit = new vscode.WorkspaceEdit();
workspaceEdit.insert(
document.uri,
new vscode.Position(position.line+1, 0),
textToInsert
);
vscode.workspace.applyEdit(workspaceEdit);
}
}
}
}
In this chapter are a few methods that help creating and updating diagnostics. These diagnostics help the literate programmer determining if there are problems with the text and where.
Diagnostic messages are instances of vscode.Diagnostic
. To show them in the Problems panel in VSCode we add the diagnostics
collection, which typically is passed into the updateDiagnostics
.
function updateDiagnostics(
uri: vscode.Uri,
collection: vscode.DiagnosticCollection,
diagnostic : vscode.Diagnostic | undefined): void {
if (uri) {
if (diagnostic) {
const diags = Array.from(collection.get(uri) || []);
diags.push(diagnostic);
collection.set(uri, diags);
}
} else {
collection.clear();
}
}
Instances of vscode.Diagnostic
can be created with createErrorDiagnostic
. This takes a markdownIt
token
, a message
and a range
. If the passed in range isn't a proper range the range is harvested from the passed in token
.
For now all messages are considered errors.
token Token that carries the faulty code fragment
* @param message Error message
*/
function createErrorDiagnostic(token: Token, message: string, range? : vscode.Range) : vscode.Diagnostic {
range = range ? range : fragmentRange(token);
let diagnostic: vscode.Diagnostic = {
severity: vscode.DiagnosticSeverity.Error,
message: message,
range: range
};
return diagnostic;
}
token Token to extract code location from
*/
function locationOfFragment(token: Token): number {
let linenumber = token.map ? (token.map[0]) : -1;
return linenumber;
}
locationOfFragmentEnd
is used to get the last line of the given token in the
literate document. This is typically used for code fences in this extension.
token Token to extract code location from
*/
function locationOfFragmentEnd(token: Token): number {
let linenumber = token.map ? (token.map[1] ) : -1;
return linenumber;
}
fragmentRange
is a method to construct a vscode.Range
for a given token that
is a code fragment.
token Token to create range for
*/
function fragmentRange(token: Token): vscode.Range {
let startTagName = token.info.indexOf("<<") + 2;
let endTagName = token.info.indexOf(">>") - 1;
let start = new vscode.Position(locationOfFragment(token), startTagName);
let end = new vscode.Position(locationOfFragmentEnd(token), endTagName);
let range: vscode.Range = new vscode.Range(start, end);
return range;
}
This method gives a Range
for the given tag name based on the passed in
Token
. The line number for the occurrance is computed, along with the begin
and and positions within the line.
function fragmentUsageRange(token : Token, tagName : string) : vscode.Range
{
let startLineNumber = locationOfFragment(token);
const lines = token.content.split('\n');
let index : number = 0;
for(const line of lines)
{
startLineNumber++;
index = line.indexOf(tagName);
if(index > -1)
{
break;
}
}
let start = new vscode.Position(startLineNumber, index - 2);
let end = new vscode.Position(startLineNumber, index + tagName.length + 2);
return new vscode.Range(start, end);
}
Function to get all literate files in a given workspace. We need to ensure that
we give the uris in the correct order. The order will be defined by an
index.literate
if it exists. Otherwise use the order in which they are found
by findFiles
. If an index.literate
exists also harvest the SETTINGS
fence
if there is any.
async function getLiterateFileUris(
workspaceFolder : vscode.WorkspaceFolder
) : Promise<vscode.Uri[]>
{
const literateFilesInWorkspace : vscode.RelativePattern =
new vscode.RelativePattern(workspaceFolder, '**/*.literate');
const _foundLiterateFiles = await vscode.workspace
.findFiles(literateFilesInWorkspace)
.then(files => Promise.all(files.map(file => file)));
let foundLiterateFiles = new Array<vscode.Uri>();
<<see if an index.literate exists>>
<<search index for html links>>
<<sort uris based on html link order>>
<<get SETTINGS from index.literate>>
return foundLiterateFiles;
}
TBD: create instead a markup that allows us to express the literate file order in whichever file we want.
If we don't find an index.literate
file return the found literate files as is.
const index = _foundLiterateFiles.find(uri => uri.path.endsWith('index.literate'));
if(!index)
{
return _foundLiterateFiles;
}
We now parse the index file to get the state with markdown tokens. We don't need the rendered HTML of the index document, so we discard that.
const md = createMarkdownItParserForLiterate();
const text = await getFileContent(index);
const env: GrabbedState = { literateFileName: 'index.literate', literateUri: index, gstate: new StateCore('', md, {}) };
const _ = md.render(text, env);
We create a new list of strings to which we will add the file uris in the order
is found in the parsed tokens. We do that by looking for bullet_open_list
and
bullet_close_list
. We assume that inside these lists there will be items that
contain links. Once we've encountered an bullet_open_item
we start looking for
list_item_open
. When the token is found we get the inline token that is two
tokens away from the list_item_open
token. We make sure that it has child
tokens, and that the first child token has a valid attrs
property. Here we
find the uri to the HTML version of a literate file.
TBD add support for ordered list.
let links = new Array<string>();
let bulletListOpen = false;
let idx = 0;
for(let token of env.gstate.tokens)
{
if(token.type==='bullet_list_open')
{
bulletListOpen = true;
}
if(token.type==='bullet_list_close')
{
bulletListOpen = false;
}
if(bulletListOpen && token.type==='list_item_open')
{
let inline = env.gstate.tokens[idx+2];
if(inline.children && inline.children[0].attrs)
{
try {
const currentUri = inline.children[0].attrs[0][1];
let path = currentUri.replace("html", "literate");
const foundUri = _foundLiterateFiles.find(uri => uri.path.endsWith(path));
if(foundUri)
{
foundLiterateFiles.push(foundUri);
}
} catch(_) {}
}
}
idx++;
}
Then ensure that the index file also exists in the list to return.
const finalCheck = foundLiterateFiles.find(uri => uri.path.endsWith('index.literate'));
if(!finalCheck)
{
foundLiterateFiles.splice(0, 0, index);
}
Finally harvest code fences that have SETTINGS
in the info. Split the content
of the fence into lines, and loop over the lines.
If a trimmed line starts with template
harvest the filename that is given as
its value. As an additional check ensure the file exists before assigning its
URI to htmlTemplateFile
.
Function to write out the given rendered
content out to a file. The rendered
string will be set into a HTML body. The HTML template will be read from the
file at the URI specified by htmlTemplateFile
if it is set, otherwise use a
hard-coded piece of HTML template.
For the template to work it needs to have the string [CONTENT]
where the
rendered Markdown HTML is going to be substituted.
The default HTML template imports mermaid.js
as a module, so that it can work
on any pre
tag that has the CSS class mermaid
set.
If authors
contains a string that is not empty one ore more Open Graph
protocol tags will be used to replace the [AUTHORS]
tag if it exists. If
authors
is empty [AUTHORS]
will be replaced with that. If the authors
string contains semi-colons it will be split on those, and for each author are
there will be an article:author
tag added.
Regarding the file endings we do the same as for source files. On Windows we replace single LF instances with CRLF and vice versa on non-Windows machines replace CRLF instances with LF.
async function writeOutHtml
(fname : string,
folderUri : vscode.Uri,
rendered : string) : Promise<void>
{
let html = '';
const getContent = async () => {
let _html = '';
if(htmlTemplateFile) {
_html = await getFileContent(htmlTemplateFile);
} else {
_html =
`<!DOCTYPE html>
<html>
<head>
<meta name="description" content="A Literate Program written with the Literate Programming vscode extension by Nathan 'jesterKing' Letwory and contributors" />
<meta property="og:description" content="A Literate Program written with the Literate Programming vscode extension by Nathan 'jesterKing' Letwory and contributors" />
<link rel="stylesheet" type="text/css" href="./style.css">
[AUTHORS]
</head>
<body>
[CONTENT]
<script type="module">
import mermaid from 'https://cdn.jsdelivr.net/npm/mermaid@10/dist/mermaid.esm.min.mjs';
</script>
</body>
</html>`;
}
return _html;
};
html = await getContent();
let authorlist = authors.split(";");
let metaAuthors = '';
for(let author of authorlist) {
metaAuthors += `<meta name="author" content="${author}">`;
}
html = html
.replace("[CONTENT]", rendered)
.replace("[AUTHORS]", metaAuthors);
if(os.platform()==='win32'){
const lf2crlf = /([^\r])\n/g;
html = html.replaceAll(lf2crlf, '$1\r\n');
} else {
const crlf2lf = /\r\n/g;
html = html.replaceAll(crlf2lf, '\n');
}
const encoded = Buffer.from(html, 'utf-8');
fname = fname.replace(".literate", ".html");
const fileUri = vscode.Uri.joinPath(folderUri, fname);
return Promise.resolve(vscode.workspace.fs.writeFile(fileUri, encoded));
};
For each literate file in the workspace we'll get eventually the text
content, but we do have to check if any of the files are opened in an editor.
Especially for on-the-fly updating of the tree view, but also for fragment name
completion and similar functionality we need to get the text from the
TextDocument
instead of the file on disk. If there is a TextDocument
that
corresponds to the literate file we are currently handling we read the text
into currentContent
, otherwise we set it to an empty string.
async function getFileContent(
file : vscode.Uri
) : Promise<string>
{
const currentContent = (() =>
{
for(const textDocument of vscode.workspace.textDocuments) {
if(vscode.workspace.asRelativePath(file) === vscode.workspace.asRelativePath(textDocument.uri)) {
return textDocument.getText();
}
}
return '';
}
)();
If currentContent
is an empty string we read the content from the file on
disk, and decode it into text
. If on the other hand we do have
currentContent
, we use that for our text
instead. The currentContent
wil be more up-to-date than what we have on disk.
const content = currentContent ? null : await vscode.workspace.fs.readFile(file);
const text = currentContent ? currentContent : new TextDecoder('utf-8').decode(content);
return text;
}
Our Visual Studio Code entry file is the extension.ts
file. While developing
the plug-in the JavaScript version created from this, in out/extension.js
is
set as the entry point for the extension, in package.json
. But when it is
prepared for release on the Visual Studio Code marketplace this needs to be
changed to the minified and bundled version that gets realized as out/main.js
.
This ensures, together with a properly set up .vscodeignore
that the published
package stays small in size. Without that the package is easily over 2MB in
size, but properly configured it is under 400KB.
The extension main entry lies in the activation of the extension, as given by
<<activate the extension>>
, but before we get there we need to set up several
bits and pieces that are required for the proper functioning of the tools.
First of all we import all the functionality and modules we are going to need.
<<import necessary modules for literate>>
After the imports we introduce the oldFence
where we will keep a hold of the
fence rule from the default MarkdownIt parser. I was not entirely sure how to
best tackle it, so for now it is here.
Here we also find htmlTemplateFile
and authors
, variables that can be set in
a literate program through a code fence in index.literate
with the token
info containing the string SETTINGS
.
let oldFence : Renderer.RenderRule | undefined;
const FENCE = '```';
const OPENING = '<<';
const CLOSING = '>>';
let htmlTemplateFile : vscode.Uri | undefined = undefined;
let authors = '';
With that out of the way we introduce the interfaces we use in the Literate Programming extension.
<<introduce interfaces>>
Next we set up the fragment regular expressions and define everything needed to implement the fragment explorer. This explorer will show up in the Explorer bar when a literate project is open. We need a representation for a node in the tree view, a data provider for the tree view and then the actual tree view explorer itself.
<<fragment regular expressions>>
<<fragment node>>
<<fragment tree provider>>
<<fragment explorer>>
<<fragment hover provider>>
For our extension we need to override the code fence rule since we want to
augment the rendering of the code fences. Specifically we want to add the
fragment line prior to the code block. This is explained in the section on
<<renderCodeFence rule>>
.
Also we have a way to create a MarkdownIt parser the way we need it. It is
explained in more detail in the section on <<create markdownit parser>>
.
<<renderCodeFence rule>>
<<create markdownit parser>>
The central mechanism of the Literate Programming extension, the tools it
provides, are expressed in <<render and collect state>>
,
<<handle fragments>>
and <<method to write out source files>>
. These all ensure that
all literate files can be iterated, parsed, rendered. And that from the
parsed state all the code fragments can be collected and extrapolated into the
source file or source files as written in the literate program.
<<render and collect state>>
<<handle fragments>>
<<method to write out source files>>
<<fragment repository>>
<<rename provider class>>
<<code action provider class>>
<<definition provider class>>
<<reference provider class>>
Utility function to determine the workspace folder for a TextDocument
function determineWorkspaceFolder(document : vscode.TextDocument) : vscode.WorkspaceFolder | undefined
{
if(!vscode.workspace.workspaceFolders || vscode.workspace.workspaceFolders.length === 0)
{
return undefined;
}
for(const ws of vscode.workspace.workspaceFolders)
{
const relativePath = path.relative(ws.uri.toString(), document.uri.toString());
if(!relativePath.startsWith('..'))
{
return ws;
}
}
return undefined;
}
Although the fragments mentioned above are the soul of the extension they are
not of much use without the proper activation. With this activate
implementation all providers and commands are registered with Visual Studio
Code.
<<activate the extension>>
<<diagnostic updating>>
<<utility functions>>
<<create fragment for tag>>
<<split fragment>>
There is nothing currently needed for deactivation of the extension, so there is just an empty-bodied implementation for it.
export function deactivate() {}
import { TextDecoder } from 'util';
import * as vscode from 'vscode';
import * as path from 'path';
import * as os from 'os';
import StateCore = require('markdown-it/lib/rules_core/state_core');
import Token = require('markdown-it/lib/token');
import MarkdownIt = require("markdown-it");
import Renderer = require('markdown-it/lib/renderer');
const hljs = require('highlight.js');
import { grabberPlugin } from './grabber';
interface WriteRenderCallback {
(
fname : string,
folderUri : vscode.Uri,
content : string
) : Promise<void>
};
interface WriteSourceCallback {
(
workspaceFolder : vscode.WorkspaceFolder,
fragments : Map<string, FragmentInformation>
) : Thenable<void>
};
<<grabbed state type>>
<<fragment information type>>
<<token usage interface>>
The extension activation sets up all our tools and data structures. The
activation happens through the activate
function. This takes a context to
which we push all our disposables for proper cleanup when our extension gets
decativate. Note that our activate
implementation is also marked async
,
because we want to await
where necessary.
let theOneRepository : FragmentRepository;
export async function activate(context: vscode.ExtensionContext) {
We start the activation by setting up the FragmentRepository
. This is the hard
of the processing of literate projects. We give it the context
so that it can
also push disposables to the context.subscriptions
for proper cleanup.
With the repository set up we will process the entire workspace for literate
projects. We want to await
here to ensure everything is ready, that is our
repository can provide fragments when requested.
With that done we place our repository in the context.subscriptions
.
theOneRepository = new FragmentRepository(context);
await theOneRepository.processLiterateFiles(undefined);
context.subscriptions.push(theOneRepository);
Now that the repository is up and running we can register all our commands,
views and providers. Note that we currently use markdown
to register to. In
the future we would probably want to ensure that .literate
becomes its own
language ID to register agains.
<<register literate.process>>
<<register literate.create_fragment_for_tag>>
<<register literate.split_fragment>>
<<register fragment tree view>>
<<register completion item provider>>
<<register definiton provider>>
<<register reference provider>>
context.subscriptions.push(
vscode.languages.registerHoverProvider('markdown', new FragmentHoverProvider(theOneRepository))
);
context.subscriptions.push(
vscode.languages.registerRenameProvider('markdown', new LiterateRenameProvider(theOneRepository))
);
context.subscriptions.push(
vscode.languages.registerCodeActionsProvider('markdown', new LiterateCodeActionProvider(theOneRepository))
);
From our extension activation we return an extension of the built-in MarkdownIt parser. This way we get a preview where the parser is configured the same way as our extension. This should result in previews that are close to what our HTML rendering does.
console.log('Ready to do some Literate Programming');
return {
extendMarkdownIt(md: any) {
md.use(grabberPlugin);
oldFence = md.renderer.rules.fence;
md.renderer.rules.fence = renderCodeFence;
return md;
}
};
};
So you have made it this far - or perhaps you just skipped over a lot of text. If you actually read all the text up to this point you have read also all of the code for the entire Literate Programming. I appreciate you took the time to read this document. I hope it helped you get more interested in the literate programming paradigm.
I invite you to install the Literate Programming extension for Visual Studio Code and start using it in your daily work.