GTK SourceView Highlighting
I wasn’t going to write about this. But the scattered information on the web convinces me to.
Note that,
There is documentation, if not really enough
The results are good
The configuration system is for a limited set of GTK editors
The job is easy but tricks get in the way. If you are not reading these notes, avoid.
What system this works on
You have a GTK desktop. This usually means a Debian derivative, Ubuntu or the like. You would like, when certain files are opened, for them to be automatically coloured, like happens when you open a file of computer code.
What gadgets it works on
Gadgets that use GTK sourceView for displaying text. That usually means Gedit, and it’s forks, e.g. Pulma and Xed.
Howto
Pick a mime type
Maybe it exists. Maybe you want to create new mime type.
Find and choose an XDG path
Run this to find configured Freedesktop paths,
You will see several paths. If you configure at a /home relative path, the changes will only work for the given user. If you configure at a /usr relative path, then the changes will appear systemwide.
If you want a home relative path, but there are none in XDG_DATA_DIRS, follow the instructions in extending Freedesktop paths.
Setup a new highlighting configuration
The examples create a highlighter for files of type ‘garbage’. The definitions are placed at a path in userspace.
The highlighting definitions built into the system are in,
Theres a lot of them, on my system maybe 120. Most are computer languages, but some are for configuration files—‘changelong.lang’, ‘mallard.lang’— and some are for markup—‘markdown.lang’, ‘latex.lang’ etc.
Pick one with some features like, or close to, the results you want to create.
Check what version of GtkSourceView you have on your system (use Synaptic or apt‐get).
Make this directory, with the appropriate sourceview version name,
Copy your starter file into the directory, then rename,
Edit the new file,
and alter these XML elements. A note about the _section attriute. The documentation glosses this as ‘just put in Script’. This is where the lang will be classed in the GUIs that use the lang file. Me, I have found I have uses for other classifications, especially ‘Source’ and ‘Markup’,
Test
Somewhere, make a text file to use as a test,
Now you need to shut down every running instance of your text editor. All instances.
Now open a terminal, and launch the text editor. The reason for using a terminal to launch is that it will display any errors,
If all is good, the text editor will pop up the test file, recognise it has the extension ‘.garbage’, then find the XML config file. The text editor should show the language as ‘Garbage’ in the display somewhere. It should apply the contents of the configuration in the file. Currently that is a copy of some other language description. But hey, if all that happened, that is a remarkably easy start.
Edit the new file for highlighting
Ok, now you need to edit the file to highlight your way. Use the two weblinks below, and start flipping between them,
GTK language file reference,
https://developer.gnome.org/gtksourceview/stable/style-reference.html
GTK tutorial for the XML file,
https://developer.gnome.org/gtksourceview/stable/lang-tutorial.html
You may need these notes on how to create a new style,
TK language style scheme reference,
https://developer.gnome.org/gtksourceview/stable/style-reference.html
You may want to read the notes below, too.
Notes on the style files
The style files set the colour, font variations and so forth on the text. These configurations are not in the same file as the highlighting definitions. They are in,
More info follows,
Palette
The style files often start with a pallete. Here is from the ‘classic.xml’
These pallete styles are rarely fully‐referenced in the same style file. So they are intended aa general resource, available both for GUI preference setting and further lang file definitions.
Default styles
‘def’ means ‘use the style defined in default’,
In this case, the default ‘comment’ style.
There is no mention of default styles in documentation. But they are clearly present in the all the builtin themes. This appears to be by design—they are a small set, and the form is copied from one style file to another. If you use these styles, they will be available in any theme. Here they are, with notes,
def:comment
def:shebang
def:doc‐comment‐element
Usually italic. Rarely used, usually for versioning like ‘deprecated’. In mediawiki.lang, ‘light‐emphasis’
def:constant
def:special‐char
Used for escaped chars/codepoints in strings
def:identifier
def:statement
An oddity, in naming anyway. Used for list and emphasis marks in markdown.lang, ID selectors in css.lang, builtin targets in automake.lang, properties in javascript.lang. Maybe described as “for important modifying marks?”
def:type
def:preprocessor
def:error
You wouldn’t think a simple regex could detect errors like a parser, but this style is used to indicate some found errors, such as incorrect closures
def:warning
You wouldn’t think a simple regex could detect errors like a parser, but this style is used to indicate warnings, such as legacy code
def:note
barely used. Markdown.lang uses for linebreaks, Foth.lang for alerts
def:underlined
markup languages use it for web adresses/urls
The default theme
The default styles are built into a default theme. This offers no more styling options than the list above. However several styles are given alias names. For example, the is a ‘def:function’.
Non‐default styles
When highlighting, it is common to target some lexical categories.
However, in the theme files these styles are often missing. You can not rely on these, even if you find them in a x.lang files,
def:boolean
Used for ‘true’/‘false’
def:number
def:floating‐point
def:keyword
def:builtin
Heading styles
These heading styles appear to be cut and pasted into every theme file. But have been commented out,
Other styles in theme files
Another sample from /classic.xml. This shows several features.
It shows a few basic style commands. It shows how most styles still borrow the small basic set of styles, It shows how time has added language‐specialist styles to the theme though that is not going to help an ad‐hoc custom writer like me and my readers.
Notes on creating a lang file
I found it best to start with the default styles, and get my matching working first.
XML editing
XML editing is notoriously a pain in the neck. The error reports from the files are not bad, for XML. They provide a line number and misleading advice.
Contexts
Instead of globally searching for regexes, the code searches for a stretch of code, This stretch is called a ‘context’. Then the config can do other searches within that stretch. And so on, recursively.
This is like a parse rule. In no way does it turn an xxx.lang config into a parser, it doesn’t make regexes more or less expressive. But, for highlighting purposes, it makes regexes easier to handle.
Take a highlighting problem that many of the xxx.lang files address. You would like to identify strings. Fine, look for opening and closing quotes.
But now you want to highlight escaped chars. Escaped chars are important to your readers, and are hard to spot in the middle of a string. So you make a regex to spot escaped chars.
Very crude, ok, but you get the idea.
The problem with this is that a general (‘global’) regex will highlight escaped chars everywhere in the source data. It would be unusual to find escaped chars free in data but, if they are there, this regex will highlight them. Worse, it will highlight escaped chars in comments and, if you are highlighting computer code, docstrings. But contexts solves that. Define like this,
Now escaped chars will only be highlighted within a string type.
To do this job without contexts would be an annoying recursive call. Contexts are a small feature, but powerful, and useful for this job.
Types of context
The XML code allows several types of context. I’ve reordered the presentation here to give examples/templates for the most common constructions. These examples do not cover many attributes—look at the reference page for full details.
This is a reduced form of the ‘simple’ context. It’s the most basic match/highlight,
The larger form of the ‘simple’ context can highlight, but also allows highlighting within captured groups.
Note that this is somewhat limited. The Include element can only contain subpattern matches, which means matching can not be further nested. When I grepped the builtin xxx.lang files, I found that most of the definitions that used subpattern matches were for markup languages. The definitions were highlighting string matches. The only computer language definition that used subpattern matches extensively was Ruby. It used them to highlight Ruby’s sophisticated string manipulation facilities.
The container context is a slightly different way of capturing a stretch. It captures the begin and maybe end of the stretch (the underlying code matches the contents),
You’ve seen them above, subpattern matches match group captures in a regex,
Keyword contexts,
Reuse a context
Optionally, contexts have ids. These contexts can then be used in different places. This uses a ‘reference’ context,
Namespacing of style references
Styles are namespaced by preceding with the name of the xxx.lang file. This appears to be convention, not forced by code assertion.
The separator is a colon,
‘def’ stands for default. You can see this in the style file discussion up a few headings.
Setting of comments
The metaclass attribute seems to set with class def:comment, and does nothing special???
Matching is in order
In any context, first match wins.
This can be important. As the documentation points out, you need your most particular matches first, or you will never reach later contexts. For the following two rules, one or more pound signs will always match the first content, or ‘rule’, so never reach the second,
These two contexts need to be the other way up,
What is the ‘class’ attribute?
Good question. It lets you switch spell‐checking on and off for any given context. Most xx.lang files don’t use it for anything else.
On the whole
Don’t try to grab stretches of regexed code then break them down. The configuration is too limited. Try to find ways to isolate chunks, and highlight them. Explanation follows…
OR/AND logic must use fallthrough
This rather grates with me. To say ‘match this keyword, or show error’ I’d like to write something like,
But you can’t do that, full matches means simple context means only subpatter highlighting. You can get close to that effect using fallthrough. Match the keyword first, but if that fails, highlight as an error,
I can’t think configuration like this generating efficient regexing, but I’ve not checked.
You can’t gather contexts
I’d like to break keywords into batches, then join them. Can’t be done, each context must be referenced individually.
Can’t match keywords on a regex subpattern
That would be nice, but is not going to work, Use subpatterns to trim text and you can’t voke the keyword context. Don’t trim the text and the keywords will not match the untrimmed text.
New styles
You can abandon the defaults entirely if you want. Invent new styles for parts of the document. Give them new names. Some of the builtin language definitions do this. Here is python.lang for GTKSourceView3
Yep. It’s got a specialised ‘python’, and this is a new style ‘builtin‐application’.
Thoughts
First, I’d point out that auto‐colouring has more use than for computer code. Would you like some configuration files to be coloured for easy reading? Do you have some rough text templates? Custom HTML files? I do, several.
I can grumble. For me, XML configuration is a black mark. The limitation on subpattern contexts not nesting seems awkward—with that the system could do some limited parsing. And, though I understand the need for themes, inline styles would help. And in no way can this system match the power of an EMacs mode—it can’t change bindings, represent text content, or do partial‐parsing. And the system only covers a few editors for GTK desktops.
But I’m not going to complain. Far from it. The system uses one file, placed in an easily‐locatable and edited position. Albeit in GTK’s odd we‐are‐getting‐there way, it is documented. The configuration uses readily understood components and, considering these, makes good results. Maybe only one in 2000 users will have any interest, but I’d say this chunk of thinking is under‐loved, and a little gem.