bibtex
Section: (pj)
Updated: 2021-10-19
Index
Return to Main Contents
INTRO
This document is a bit of documentation meant to supplement "Tame
the BeaST" by Nicholas Markey, an excellent source of documentation
on all things BibTeX, and as far as I know the only public documentation
on .bst files.
I rely heavily on his work and occasionally point out errors.
Probably you should read his paper before you read this.
For clarity's sake, I generally steal his examples.
My references are all to version 1.3 published 16 October 2005.
Brace Depth
Brace depth refers to strings from the .bib file, especially the title
and author fields.
A character's brace depth is equal to the number of surrounding
braces (but see "Special Characters" below).
For example, you might have a book titled:
-
"The {Latex} {C}ompanion"
Here, most of the string has brace depth 0, but "Latex" and
"C" have brace depth 1.
Special Characters
A special character is an escape sequence immediately surrounded by
braces, like {'E} or {}.
In spite of the braces, a special character is considered brace depth 0.
However, special characters must themselves appear at brace depth 0; if
they are surrounded by other braces, they are not considered special
characters.
For example, suppose we wanted to correct the example above by using the
macro.
Then we write:
-
"The {\LaTeX} {C}ompanion"
Here, "C" is at brace depth 1, and everything else, even the
special character , is at brace depth 0.
Now suppose we wrote the title like this:
-
"The {{\LaTeX}} {C}ompanion"
In this case, "{}" is not a special character, because it starts
at brace depth 1, not 0.
The string "" is considered to be at brace depth 2.
If an escape sequence lacks braces entirely, as in
The \LaTeX Companion, it is not a special character, just
plain text.
Note that an apparent contradiction appears on page 21 of "Tame the
BeaST." It correctly remarks that special characters must appear at
brace depth 0, but then it states, "Anything in a special character
is considered as being at brace depth 0, even if it is placed between
another pair of braces." The "even if" is a null hypothesis,
because special characters are only special characters at brace depth 0.
purify$
The purify$ command reduces a string to only spaces and
alphanumerics.
Hyphens and tildes become spaces, and all other weird characters are
removed.
Inside of special characters, even spaces are removed.
Thus t\haete, t{\hae}te,
t{\ha{e }}te, and t{\hae }te all
become tete.
text.prefix$
You can grab a string's first n characters with
text.prefix$.
Braces are not included in the count of characters, but they are still
output.
Their closing braces are also output, appended, if necessary, to the end
of the result.
Special characters count as one character, but of course they must be at
brace depth 0 to count as special characters.
Thus, the first character of {\'E}cole is
{\'E}, but the first character of
{{\'E}}cole is {{\}} (the backslash is
the character).
text.length$
This command returns the number of characters in a given string.
Special characters count as one character, and braces don't count at
all.
Thus, a b c is 5, {a} is 1, and
{\'a} is 1, but {{\'a}} is 3.
Since the last example isn't a special character, the
\'a are counted as three separate characters.
substring$
Given a string, a start position, and a number of characters, this
command returns a substring.
The position of the first character is 1.
Unlike the commands above, substring$ knows no secret meaning
for "character." Every character is alike.
Thus, the substring of {\LaTeX} from position 2 of length
3 is \La.
(Note that "Tame the BeaST" on page 32 erroneously says the
result is \LaT: 4 characters.)
sort.key$
Reading "Tame the BeaST" page 32, I thought this variable was
set automatically by some internal algorithm.
It isn't.
You must set it yourself, with something like this:
-
title 'sort.key$ :=
change.case$
Depending on the option, change.case\$ will set all
letters to uppercase or lowercase, or (with the "t" option) set
all letters to lowercase while leaving the first letter unchanged.
It only affects letters at brace depth 0.
Therefore special characters are changed, but characters inside regular
braces are not.
For example, with the "t" option, t\haEte
becomes t\haete and t{\haE}te
becomes t{\hae}te, but t{{\haE}T}e
remains t{{\haE}T}e.
When you use the "t" option, if the first character is a special
character, the whole special character retains its original
capitalization.
Thus, {\'E}GAD becomes {\'E}gad
and {\LaTeX} Companion becomes
{\LaTeX} companion.
Surviving Sorting
When you compose your .bib file, you should expect that the bib style
will munge your entries to set the sort.key$.
Probably it will run purify$ on them.
To ensure proper sorting, you should write your .bib entries so all the
necessary data remains intact.
Surviving Capitalization
Your bib style will probably also run another transformation, at least
on the title.
It will use change.case$ with the "t" option to set
all letters after the first to lowercase.
Since change.case$ only affects letters at brace depth 0, use
can use braces to preserve desired capitals.
If you embed commands like , this is essential.
Surviving Sorting and Capitalization Together
Since most people want to use one .bib file with multiple .bst files,
any style should basically honor the brace-tricks used to survive
sorting (aka purify$) and capitalization (aka
change.case$).
You should write your bibliography entries with these two functions in
mind.
Suppose your bibliography includes this:
-
title = "The \LaTeX Companion"
That will fail, because change.case$ will produce
\latex, which is a command LaTeX doesn't know.
It will also lowercase "Companion." You could instead try:
-
title = "The {\LaTeX} {C}ompanion"
This is even worse!
It does not solve the change.case$ problem (except for the
"C"), because change.case$ still alters special
characters.
In addition, since purify$ removes special characters, it
will sort as "The Companion," which will probably deposit it in
the wrong place.
To solve both problems at once, do this:
-
title = "The {{\LaTeX}} {C}ompanion"
Now change.case$ will leave \LaTeX alone,
since it is at brace depth 2, and purify$ will pass it
through as LaTeX (without the backslash), so it will sort
properly.
Surviving Author Parsing
The author field endures similar munging, at least for
sort.key\$, but it is also parsed to find the first name,
last name, and any intermediate "von"-like bits.
This happens inside format.name$.
If you enter the name as von Last, First, you needn't
worry as much, but if you give First von Last, you must be
aware of how BibTeX parses the field.
There must be a Last, so it gets the last word in the field.
Then BibTeX examines each word, from left to right.
As long as the words are capitalized, they go to the First.
The first non-capitalized word begins the von.
To determine if a word is "capitalized," BibTeX finds its first
0-depth character, including special characters, and checks whether it
is uppercase or lowercase.
If it is uppercase, the word is capitalized; otherwise, it is not.
If there are no 0-depth characters (because the whole word is in
braces), the word counts as capitalized and goes to the First.
Once the von begins, words can fall into either the von or the Last.
These remaining words are examined in reverse order, from right to left.
As long as the words are capitalized or in braces, they go to the Last,
but as soon as a lowercase word is found, it and all remaining words go
to the von.
A simpler way of putting this is that all contiguous capitals at the
beginning are First and all contiguous capitals at the end are Last
(where braced words count as capitals), and everything left is von.
If all the words are capitals, then everything but the last word goes
into the First.
The last word always counts as a capital so that Last has at least one
word, but the First may be empty if the first word is lowercase.
A lowercase last word can still accumulate more words in the Last, as
long as they are capitalized.
One more wrinkle: you can group words by enclosing the intervening space
in braces, like this: Jean de La{ }Fontaine or this:
Jean de {La Fontaine}.
BibTeX will treat these joined words as a single word.
So in this first example, the algorithm will start by sending all of
La{ }Fontaine to the Last.
For some names, you want to avoid all this parsing entirely, so that the
whole name goes into Last.
In that case, you can enter names entirely surrounded with braces, like
this:
-
author = "{Gregory the Theologian}"
- •
-
See ~/src/test/bibtex/ for test runs of some of the above.
- •
-
See ~/doc/latex/bibtex.pdf for good documentation, with some errors.
For example, on p. 32 he says that "{} 2 3 substring$" returns
"" when actually it returns "".
AUTHORS
Paul A. Jungwirth.
Index
- INTRO
-
- Brace Depth
-
- Special Characters
-
- purify$
-
- text.prefix$
-
- text.length$
-
- substring$
-
- sort.key$
-
- change.case$
-
- Surviving Sorting
-
- Surviving Capitalization
-
- Surviving Sorting and Capitalization Together
-
- Surviving Author Parsing
-
- AUTHORS
-
This document was created by
man2html,
using the manual pages.
Time: 21:16:02 GMT, January 04, 2026