WordBash

The WordPress clone written in GNU Bash

header image
Posted on Dec 16, 2013 7:12PM

About the Data

Document Database

The code does not use an SQL database (although it can I am not sure I want it to), so it is fairly heavy on the file system (as is an SQL database). Every post and comment is stored as a separate file, and some meta data too are stored in files, but every Operating System is optimized for reading and writing files.

Changes will occur in the data format so I am hesitant to document it. Taken as a whole the data is a combination hierarchical, relational, document-oriented database. But the API is designed for change. The goofy "Object Oriented like" library interface I described? I will just replace one library with a new one that adheres to it when making changes, or simply add another library to extend operation (although there are a limited amount of letters).

Currently, posts are stored in meta/psts/ as numbered files; other directories are for pages, post comments, and post meta data for categories and tags. FTP or cPanel can be used for basic site administration.

Record Format

Both posts and comments are stored in a standard format: newline delimited headers followed by a newline, followed by the data. Such a format has two very good features. Easy to parse, amenable to additions. (This is as near to non-encoded as data can be.) Example:

Title
Date

<p>Post text.</p>

It takes only two lines of code to parse that data no matter how many headers there are. The last point means that data can be added at will. Any meta data such as categories and tags can be "just added". Whether the data is stored in a file or in a database does not matter.

There is the drawback that the headers are order dependent. But the sheer simplicity of it makes for a drawback easy to code with.

The code:

bash code

I=$IFS

IFS=$'\n'

hdrs=(${data%%$'\n'$'\n'*})

IFS=$I

body=${data#*$'\n'$'\n'}

Not exactly two lines, but that is really small, fast, efficient, understandable, etc. All the attributes one wants in code—it is the format of the data that allows for this.

No Markup Language

The record body needs to be HTML as it is just printed. However there is a simple "word for word substitution" support.

No Data Definition Language

Although I do use the term database for the data storage (as it is appropriate) there is no "language" behind it (or in front of it). The database name is a subdirectory name, the primary key is a file name, and the table is newline delimited text. A (near) map:

CREATE DATABASE ...mkdir $path
INSERT INTO ...echo "$title"$'\n'"$date"$'\n'$'\n'"$data" > $path/$n
DELETE FROM ...rm $path/$n

Others can be extrapolated. The file system calls are all in a library as an API and not used directly (that is just an example). To actually use an SQL database—or anything else—the API stays the same and a different library is used. Supplemental data, such as "recent comments" and "post tags" are also mapped to the file system via an API.

I would call this NoSQL except "row" and "column" have no use here (nor Perl or AWK). Perhaps non-SQL, but that label is to vague (do a search). Without SQL is also... well, really weird (do a search). And this is not anti SQL (you will have to add "-injection"). This is just not SQL.

Global Data

In About the Code some of the API and variable names were detailed. Here is more. Global data is used to store record data. Example:

APIActionTypeName
P llistarrayPl
P ncountintegerPn
P r $nreadstringstitle, date, body

Comment Parsing

In a data related note, the most complicated part of WordBash, at over 300 lines (the average library is about 100) is the Comments library. It needs to do a few very important things:

All by Bash only code—no SED, AWK, etc.—which is the most important design criteria I imposed on the code.

On no External Programs

For parsing/manipulating large amounts of data (hundred kilobytes and up) Bash alone might be slow, but blog posts are never that large (and of course size could be checked and appropriate steps taken; for an example of that see libs/Q).

I discuss performance in a later post.


Notes:
1. There is one library that is "out of place": libs/D.sh. It is the database library and not called by the CGI code but by libs/P and libs/G.
2. Header data can have a name such as: date:Jan 1, 2014. Not difficult to do and it then becomes scalable.
3. The Admin code to edit posts is not included at this time.
4. I really just wanted to see if it could be done.

0 Comments
Leave a Reply

Comment length is limited, all HTML will be stripped and do not submit links.

0m0.024s 0m0.029s 0m0.003s 0m0.006s