WordBash

The WordPress clone written in GNU Bash

header image
Posted on Feb 1, 2014 4:35PM

Performance

This post addresses Bash performance by demonstrating code to accomplish three things:

I think you will find it enlightening (well, I did).

Stripping HTML Tags

Stripping tags is straight forward and uses a single state flag:

bash code

# strip tags

 

st() {

    n=${#data}

    t=0

    r=

    for (( i=0; i<$n; i++ )); do

        if [[ ${data:$i:1} == ">" ]]; then t=0; continue; fi

        if [[ $t == 1 ]]; then continue; fi

        if [[ ${data:$i:1} == "<" ]]; then t=1; continue; fi

        r+=${data:$i:1}

    done

    data=$r

}

That, though, only demonstrates that expanding a string one character at a time is slow. Sometimes case is used for parsing data, so let's give it a try:

bash code

# strip tags version 2

 

st() {

    n=${#data}

    t=0; i=0

    r=

    while [ $i -lt $n ]; do

        case ${data:$i:1} in

        \>) t=0;;

        \<) t=1;;

        *) if [[ $t -eq 0 ]]; then r+=${data:$i:1}; fi;;

        esac

        (( i++ ))

    done

    data=$r

}

Which is faster, but it still is not fast enough (and I dislike mis-applying a construct like that). There is something that is fast enough:

bash code

# strip tags version 3

 

st() {

    t=0

    r=

    while read -r -n 1 c; do

        if [[ $c == ">" ]]; then t=0; continue; fi

        if [[ $t == 1 ]]; then continue; fi

        if [[ $c == "<" ]]; then t=1; continue; fi

        if [[ -z $c ]]; then c=$'\n'; fi

        r+=$c

    done <<< "$data"

    data=$r

}

Note that these and the other examples here use a global for the data; very easy to manage with code as small and as structured as WordBash.

Converting POST Data

POST data is encoded in a simple manner, and decoding using Bash could be simple string replacements:

bash code

POST_STRING=${POST_STRING//+/ }

POST_STRING=${POST_STRING//%0D/}

POST_STRING=${POST_STRING//%/\\x}

Culminating with something like this (which would actually be done on the data split by &):

bash code

POST_STRING=$(echo -e "$POST_STRING")

But even with a fairly small amount of data those string replacements can take a long time with a very steep curve up as the data increases. A much faster Bash way with a nice linear line up is:

bash code

p=

while read -r -n 1 t; do

    if [[ $t == '%' ]]; then

        read -r -n 2 t

        if [[ $t == '0D' ]]; then

            continue

        fi

        t="\\x$t"

    elif [[ $t == '+' ]]; then

        t=" "

    fi

    p+=$t

done <<< "$POST_STRING"

POST_STRING=$p

But that still is unacceptably slow with large data (because no newlines). So, Bash needs external help.

This is more than acceptably fast even for very large data:

bash code

POST_STRING=`echo "$POST_STRING" | sed -e '

s/+/ /g

s/%0D//g

s/%/\\\\x/g

'`

The external program (which could be PERL or any of many others available) need not be called directly—it could be a shell script. The data need not be in-line—it could be in the shell script or in a file.

Indeed, other shell scripts (as stand alone programs of their own) could do much of what WordBash libraries do. Making many small programs work together is what scripting languages are generally used for. (I just chose to do something completely different.)

Post Translation Data

Another string replacement is what I call post translations, allowing for some Admin laziness. There is an array of data:

bash code

# post translate data; FROM = TO

 

dp=(

'–-' '&mdash;'

'BASH' '<strong>Bash</strong>'

)

What is odd about it is that it is a simple array but needs to be "even", which means it is applied by:

bash code

# post translate

 

pt() {

    for (( i=0; i<${#dp[@]}; i+=2 )); do

        body=${body//${dp[i]}/${dp[i+1]}}

    done

}

This works quite well except that it slows down fast as the data grows, slowing down so much that it would be unacceptably slow. It could be:

bash code

# post translate SED version

 

dp='

s/–-/\&mdash;/g

s/BASH/<strong>Bash<\/strong>/g

'

 

body=`echo "$body" | sed -e "$dp"`

Which would be noticeably faster.

But Not So Fast

SED is not used here. The Bash code is used for a demonstration.

This is the largest post here (about 17,700 characters), and on my 1.5GHz Pentium M, Windows XP/SP2, Cygwin test computer it displays in about two seconds (user time; about 300 milliseconds of system time). Not particularly fast, but it could be that computer capacity is growing...

But, there is one important thing about Bash string replacement: Bash string replacement is not really slow.

Wait Just a Second

The times output at the bottom of the display is there for this reason: Refresh this page a few times, noting the time it takes. Then edit library D (libs/D.sh) to comment out line 137 and uncomment line 138. Refresh this page.

Interestingly, if you look at the function (br in B.sh) you will see that it is... Bash string replacement:

bash code

# string replace

 

br() {

local l r

    r=

    while read -r l; do

        r+=${l//$1/$2}$'\n'

    done <<< "$3"

    echo "$r"

}

Bash string replacement can be said to be flawed as it's performance plummets unacceptably past a certain size and becomes slow—yet line by line string replacement for the same data is fast.

The Final Word

WordBash demonstrates that Bash only code works well and that a web application like WordPress can be implemented in only a few thousand lines of shell script.


Notes:
1. A drawback to this type of substitution is that it will occur everywhere, even when not wanted—like in these examples; so steps must be taken (I simply used character substitution in the posts, but the data and/or code can be made "smarter" by checking word boundary, etc.)

Comments are closed
0m0.028s 0m0.028s 0m0.008s 0m0.011s