80% of the problems, caused by 20% of the bugs

If you’ve been around the internet for a long enough time, you’ve probably heard of the 80-20 principle, or Pareto Principle. For some reason, it seems like everything is linked to the proportion 80 to 20. For example, 80% of the world’s wealth is held by 20% of the population, or in my case, 80% of the problems were caused by 20% of the bugs, even if not obviously caused by them. While I was skeptical about the claim before starting on my most recent project, now I am a full on believer.

Recently, I’ve been writing my own dialect of Lisp. I mostly am doing this as an exercise in creating high(er) level languages for embedded systems, cutting a lot of the fat that current high level languages like Python have. And I chose Lisp because Lisp is cool. While all of this is a blog post for another day, what to take away from this was that I decided to write this for every system ever made, meaning I had to choose a very common programming language to base on (especially more important since Lisp is interpreted, not compiled), so I chose ANSI 89 C.

When I started writing my Lisp interpreter, I was doing so very fast. In C, you are the memory management, meaning if you don’t do everything correctly, your program could start executing undefined code, or return undefined values. While starting, the most bugs I had were uninitialized values for variables. For example, in the following code, we have no idea what the value of x would be. It could be 0, it might not be. Its really up to the compiler to decide if it should be initialized, and up to where on the stack it is allocated to say if it already has a value.

int x;

printf("x is %d\n", x);

Now, these values are easy enough to find, simply make sure to pass the warning flag -Wuninitialized to the GNU C Compiler (or Clang as well), and boom, every uninitialized variable hopefully should be highlighted. This worked for me, so it was overall an easy fix. However as I continued on with the project, I ran into stranger errors dealing with memory management which seemed by all odds to be coming from a billion different directions.

Now, I have been programming in C/C++ for a little under 10 years at this point, having started sometime around 2014, so I know my way around. But I also have a tendency to be very lazy when programming, especially for proof of concept things, which usually leads to a “I’ll finish this up later, its not like this is production code or anything” type of mentality when working. Usually what this means in practice is neglecting memory management, since “you can just allocate a buffer of like 1KB and it should never overflow… right?”

Working on my Lisp interpreter, I first ran into problems of programs that were too large having variables overwritten randomly. In the interpreter, variables just hold a direct value, usually a pointer to memory, or an integer or decimal value. If it is a pointer to memory, it adds it to another list, holding all active memory allocations. This list of allocations allows the garbage collector to determine whether or not memory is in use, and frees it if it no longer is. This is a pretty nice system, if only C had it. I was quickly having problems of allocations being overwritten with seemingly garbage data, which caused memory that was still in use to be freed, which would cause undefined activity when it was called upon in the future. This is particularly a big problem on Microsoft Windows, which has a very strict memory model, which allows for zero out of bounds memory reads. Basically, on Windows, if you free it, its gone forever. Now, this would usually cause a crash on Windows, but not on Linux, so for a while, I just assumed the Windows version was completely broken because “Windows”.

Next problem with memory management was the freeing of garbage data. For example, if the variable list overflowed into the allocation list, then garbage data could be written to the memory stored as an allocation, which when no longer needed, could cause the system to try to free memory that it has no right to free. And likewise if the allocation list overflowed, it would spill into the openFiles list, which contained references to open files being read by the interpreter. Again, a file could be closed that would result in undefined behavior, especially if the file descriptor wasn’t open. This happened while porting the interpreter to classic Mac OS version 6. And again, I chalked it up to “man Apple really sucks”.

These bugs really took a toll on me, especially because I was watching them in debuggers, and just magically seeing values overwrite other values, for seemingly no reason. It was especially strange too, since when the program would eventually crash for what it did, the crash point would be usually a strange location, somewhere very far from where the problem was taking place. And by this point, the memory management aspect of the program had long since left my current focus, as I just assumed “Its worked this far, it should keep working”. So I wasn’t even really considering that bad memory management could cause all of these errors, which led to multiple long nights of adding unneeded sanity checks, and attempted error correction. Remember kids, fix a problem, don’t try to patch around it.

Eventually I was running out of options to try to fix things, it seemed like at this point, every internal function had some crashing bug. So I decided to run the program through valgrind, a memory debugger among other things, and man, I should have done that much earlier. To my absolute horror, I was making overflow write after overflow write. I wish the GNU Debugger had been more up front with me about that. Once I fixed all the overflow writes by keeping track of buffer size, and reallocating when needed, magically all of the bugs went away. And so too could all the shoddy patchwork. Overall just fixing about 3 buffer overflows made the program about 1000% more stable.

Now, I’m not going to act like my Lisp interpreter is anywhere close to being done (although stay tuned if you’re curious, I’ll probably make a few more posts here about it in its lead up to version 1.0), but it sure as hell is a lot more stable. So in closing, somehow, I guess the 80-20 principle is true. If you are having a problem with your program, or a program you are using, look for the most obvious sources of error, because in all likelihood the obscure bugs you have to work for to get to take effect probably aren’t the ones that are causing your program to crash, its probably the ones like I’m facing, even if it doesn’t seem like it.

Posted in Programming | Tagged , , , , | Leave a comment

The sad state of input methods on modern Linux

こんにちは、私はコリンです。よろしくおねがいします。

So… I decided this blog will also be a place for me to post rants. So lets start. I speak 4 languages: English, Spanish, Mandarin Chinese, and Japanese. 2 of those 4 use a non Latin character set, so I have to use input methods to type characters in those language. This is not just relegated to far eastern languages, but ones like Russian and Arabic too. Input methods have long been the way to type in these languages, mostly because it solves the problem of “how do we translate this language into a set of keys to easily type on a keyboard.”

A language like Chinese is ideographic, which means it was never meant to be typed, but drawn. While a few Chinese keyboards do exist for typing by “creating” characters from strokes printed on the keys, its much easier to learn Pinyin, the Romanization of the pronunciation, and learn to type that way. However if all chinese was typed that way, we would get sentences like “wo shi xue sheng” rather than the actual Chinese characters “我是学生.” Not to mention, since chinese has many homophones, like “是 – shi – to be” and “十 – shi – the number 10” however the first is pronounced as “shure”, and the second as “shee” thanks to the tones attached to them. (In Pinyin, there are marks to denote the tones, but to type those you ALSO need an input method.) Thankfully input methods exist to take the Pinyin you type into account and turn it into characters that fit into context.

Now all of this sounds like a net positive, and for the most part it is. Both Windows and MacOS have built in support for input methods which are easily accessible from the settings, and Google implements a really damn good one into all Google web apps. But Linux… well… they kinda work on Linux.

In all fairness, I have used input methods on Linux for the better part of almost 5 years now. Originally just to convert my QWERTY keyboard into Dvorak (another good use for input methods), but more or less now to convert typed Chinese and Japanese into their correct written form. For the most part, both major input methods, FCITX and IBus have worked alright, however now we are in the transition between GTK 3 and GTK 4, along side the transition from X11 to Wayland. This transition has truly thrown a spanner into the works for the input methods, and just goes to show what Linux needs is more of a stable backbone instead of the module fluid system that exist now.

In Linux, both IBus and FCITX seem to hook into the application’s text input, rather than the core system’s text input layer. This is good as it means the core daemon doesn’t need to be run with root, but also means its up to each application to honor input methods. On X11 this is much easier, because all you need to do is have X11 honor the input method, while this doesn’t guarantee every application will still work with it, there are still less moving parts. On Wayland however, the problem gets worse. Wayland expects every application to be in charge of everything itself. Everything from drawing its titlebar and decorations, to honoring input methods. Personally I have been using Wayland for the better half of a year now on GNOME, and I haven’t had as many issues with input methods arising just from Wayland as I thought I’d have, but some applications, particularly ones based on alternate GUI toolkits (not GTK or QT) seem to have the worst problems. With those its just more effective to run them in XWayland and hope that works with the input method.

Although this isn’t to say X11 doesn’t have its own issues with input methods. For some reason, Wayland or not, it seems like GTK and QT universally disrespect the input method. I’m not entirely too sure how the input methods work low level, so I’m not exactly sure how they send their data, but what I do know is multiple environment variables are needed to make sure you get the input method working in applications. Usually this means adding a few lines to your `.bashrc` or what not. Here is my config:

# Can't believe I have to do this in 2019 + ?
export GTK_IM_MODULE=ibus
export XMODIFIERS=@im=ibus
export QT_IM_MODULE=ibus

So all of this might be a little alarming, but for the most part, somehow this horrible beast of input methods just seem to work fine. The problem is when technology changes. With the switch from GTK3 to GTK4, I noticed a lot of applications that traditionally worked with input methods no longer work. Whether this is a problem with IBus not supporting GTK4 (despite being integrated directly into GNOME), or GTK4 speaking a different language to the input method, or hell, maybe I need some new undocumented environment variable. This mostly came about when using chromium and other chromium based browsers, which I discovered an override for thankfully.

GTK_IM_MODULE=ibus vivaldi --enable-features=UseOzonePlatform --ozone-platform=wayland --gtk-version=4

Now if that’s all I needed, I would be happy, however of course something has to go wrong. Thanks to flatpak being a sandbox (and flatpak is where I have chromium downloaded), it doesn’t like to play nice with IBus. I’m not too well versed in flatpak, so for all I know, maybe I need to install some module to make it work, but right now EVERY flatpak app cannot work with an input method, like at all.

To be honest, all of this should not be necessary if Linux ever wants to be a household name, especially for people in countries where they don’t use a standard Latin alphabet, or people who speak multiple languages, but want one keyboard. The amount of time I have invested into troubleshooting input methods is insane, and in all honesty, pushes me more towards Windows or MacOS for work in foreign languages, because at least those platforms “Just work :tm:”. This was made worse when during my tenure at a Japanese university, I used only Windows 10, and the experience was so good that all my work in Japanese now is done on Windows.

Hopefully this rant post isn’t too annoying or meandering to find some meaning out of. If you want more information on how to setup an input method, either to torture yourself or otherwise, this arch wiki article provides pretty good information on how to do it. In all honesty, input methods on Linux work. Like period, they work. Its just the sometimes that they don’t, they really stand out more than the many times they do work, especially when on systems like Windows or MacOS, they work constantly. I’m not really too sure how to go about fixing this that isn’t simply “run the daemon at a system level” or “make every Linux application use the same GUI libraries”, all of which the Linux community would hate. So basically, we just have to live with it. いじょうです。

Posted in Linux | Tagged , | Leave a comment

So I finally got around to setting up a blog…

I’m going to be honest, I used to really make fun of wordpress, but its really easy to setup. Hopefully this is easier than my horribly janky system that used Blæst as a markdown parser that was terrible and barely worked. Particularly it was awkward that the æ in Blæst would cause the ENTIRE program to break, so it had to manually go back and edit it in. Hopefully wordpress is easier. Also too my friend Brandon already set up his blog on wordpress and it looks pretty nice so…


Here are some things I really like about wordpress so far

  • The themes are a nice hit of nostalgia. I remember the web before sites like WIX and Squarespace took over, and especially before frameworks like React took over. What’s nice about wordpress was that it just seemed to slot in perfectly with my already existing (and rather bare bones) web server. That’s pretty cool in my opinion.
  • WordPress seems to organize everything really nicely. Being able to write posts and categorize them from a nice panel is pretty convenient, especially since before I had to do it all manually using markdown.

Now the obligatory “How I setup wordpress” section

Setting up wordpress was honestly almost too easy. To start, I use Debian stable with the apache web server. I do NOT like to add external repositories, so I do everything manually (this is assuming someone has packaged wordpress for Debian already). I also do not have the ability to run Docker or other container programs because I lack disk space. So I basically am used to having to compile and backport my own programs, and sometimes go through hell doing it. But wordpress was different, it just seemed to work directly out of the box on Debian stable, so here’s how I did it.

Assuming you use apache2, the installation should look something like this.

First install the needed packages.

 $ apt install php7.4 php7.4-mysql libapache2-mod-php7.4

Next, we need to create the folder in your web directory for wordpress to live. For this example I will use https://website.com/blog, which would be located at /var/www/html/blog

$ mkdir /var/www/html/blog
$ cd /var/www/html/blog

Now we need to add the wordpress database and user to our database server… Oh… and install a database server

For the database server I decided to use mariadb, seems like everyone uses it.

$ apt install mariadb-server

Now once you have mariadb installed, lets add the wordpress user

$ mysql -u root

From here you are in the mariadb interactive shell, so we need to type SQL commands.

Let’s create the wordpress database:

CREATE DATABASE wordpress;

And the wordpress user:

CREATE USER 'wpuser'@'localhost' IDENTIFIED BY 'securePasswordHere';
GRANT ALL PRIVILEGES ON wordpress.* TO 'wpuser'@'localhost';
FLUSH PRIVILEGES;

Make sure to update the username and password shown above if you feel like it. Also make sure to remember all of this for when we setup wordpress itself.

Now leave mariadb by just typing:

QUIT;

Okay… Now we are prepared to get wordpress setup.

First restart apache2 so our modules update, to do this just run

service apache2 restart

To download wordpress, enter your directory where you wish to install it (we are using /var/www/html/blog) and type:

wget http://wordpress.org/latest.tar.gz
tar xf latest.tar.gz

This will extract to a folder named wordpress/, if you don’t want that, simply run

mv wordpress/* .

Now make sure to allow wordpress to write to the folder, for some reason, if you don’t do this, wordpress will ask for FTP credentials to log into your server (why…?).

chown -R www-data:www-data /var/www/html/blog

Now navigate to the site and directory you installed to, and just follow the few instructions, and boom! Now you have wordpress!

Posted in Server | Tagged , | Leave a comment