About the Data
Document Database
The code does not use an SQL database (although it can I am not sure I want it to), so it is fairly heavy on the file system (as is an SQL database). Every post and comment is stored as a separate file, and some meta data too are stored in files, but every Operating System is optimized for reading and writing files.
Changes will occur in the data format so I am hesitant to document it. Taken as a whole the data is a combination hierarchical, relational, document-oriented database. But the API is designed for change. The goofy "Object Oriented like" library interface I described? I will just replace one library with a new one that adheres to it when making changes, or simply add another library to extend operation (although there are a limited amount of letters).
Currently, posts are stored in meta/psts/ as numbered files; other directories are for pages, post comments, and post meta data for categories and tags. FTP or cPanel can be used for basic site administration.
Record Format
Both posts and comments are stored in a standard format: newline delimited headers followed by a newline, followed by the data. Such a format has two very good features. Easy to parse, amenable to additions. (This is as near to non-encoded as data can be.) Example:
Title Date <p>Post text.</p>
It takes only two lines of code to parse that data no matter how many headers there are. The last point means that data can be added at will. Any meta data such as categories and tags can be "just added". Whether the data is stored in a file or in a database does not matter.
There is the drawback that the headers are order dependent. But the sheer simplicity of it makes for a drawback easy to code with.
The code:
I=$IFS
IFS=$'\n'
hdrs=(${data%%$'\n'$'\n'*})
IFS=$I
body=${data#*$'\n'$'\n'}
Not exactly two lines, but that is really small, fast, efficient, understandable, etc. All the attributes one wants in code—it is the format of the data that allows for this.
No Markup Language
The record body needs to be HTML as it is just printed. However there is a simple "word for word substitution" support.
No Data Definition Language
Although I do use the term database for the data storage (as it is appropriate) there is no "language" behind it (or in front of it). The database name is a subdirectory name, the primary key is a file name, and the table is newline delimited text. A (near) map:
CREATE DATABASE ... | mkdir $path |
INSERT INTO ... | echo "$title"$'\n'"$date"$'\n'$'\n'"$data" > $path/$n |
DELETE FROM ... | rm $path/$n |
Others can be extrapolated. The file system calls are all in a library as an API and not used directly (that is just an example). To actually use an SQL database—or anything else—the API stays the same and a different library is used. Supplemental data, such as "recent comments" and "post tags" are also mapped to the file system via an API.
I would call this NoSQL except "row" and "column" have no use here (nor Perl or AWK). Perhaps non-SQL, but that label is to vague (do a search). Without SQL is also... well, really weird (do a search). And this is not anti SQL (you will have to add "-injection"). This is just not SQL.
Global Data
In About the Code some of the API and variable names were detailed. Here is more. Global data is used to store record data. Example:
API | Action | Type | Name |
---|---|---|---|
P l | list | array | Pl |
P n | count | integer | Pn |
P r $n | read | strings | title, date, body |
Comment Parsing
In a data related note, the most complicated part of WordBash, at over 300 lines (the average library is about 100) is the Comments library. It needs to do a few very important things:
- Strip HTML
- Reject links
- Break long words
- Convert lines to paragraphs
All by Bash only code—no SED, AWK, etc.—which is the most important design criteria I imposed on the code.
On no External Programs
For parsing/manipulating large amounts of data (hundred kilobytes and up) Bash alone might be slow, but blog posts are never that large (and of course size could be checked and appropriate steps taken; for an example of that see libs/Q).
I discuss performance in a later post.
Notes:
1. There is one library that is "out of place": libs/D.sh. It is the database library and not called by the CGI code but by libs/P and libs/G.
2. Header data can have a name such as: date:Jan 1, 2014. Not difficult to do and it then becomes scalable.
3. The Admin code to edit posts is not included at this time.
4. I really just wanted to see if it could be done.