paritybit.ca

Files for paritybit.ca
Log | Files | Refs | README | LICENSE

commit 6515ca2dc35fbbde573315ec3e4f10fd533ea9c7
parent 12ab7abdd5bf72edc0f1cbbdfaeacd7821b18e31
Author: Jake Bauer <jbauer@paritybit.ca>
Date:   Mon, 27 Mar 2023 17:12:25 -0400

*

Diffstat:
Mcontent/garden/index.md | 1+
Mcontent/garden/programming-style.md | 436+++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------
2 files changed, 308 insertions(+), 129 deletions(-)

diff --git a/content/garden/index.md b/content/garden/index.md @@ -50,6 +50,7 @@ Here are links, documents, and other things I found interesting that I want to g * [How Browsers Work](https://www.freecodecamp.org/news/web-application-security-understanding-the-browser-5305ed2f1dac/) * [An Introduction to Language-Oriented Programming Using Racket](https://beautifulracket.com/) * [Repeat yourself, do more than one thing, and rewrite everything](https://programmingisterrible.com/post/176657481103/repeat-yourself-do-more-than-one-thing-and) +* [The Little Book About OS Development](http://littleosbook.github.io/) ## 🌾 The Plots diff --git a/content/garden/programming-style.md b/content/garden/programming-style.md @@ -1,5 +1,5 @@ -Title: Some Notes on Programming Style and Composition -Summary: Some Notes on Programming Style and Composition +Title: Some Notes on Program Style and Composition +Summary: Some Notes on Program Style and Composition # [%title] @@ -16,9 +16,9 @@ of indentation to use. There are demonstrable pros and cons to different styles and people will choose what they like best. As long as it doesn't produce objectively ugly, unreadable code, it is acceptable. -My overarching philosophy is that one should write programs for humans to -read, not computers to execute. Code that is cleverly written is code that is -poorly written (unless you're doing it for a competition or for fun). +My overarching philosophy is that one should write programs for humans to read, +not computers to execute. Code that is "cleverly" written is code that is poorly +written (unless you're doing it for a competition or for fun). Where possible, use common sense to make your program as readable as it can be. When working on a codebase that is not your own, follow the existing style (if @@ -27,8 +27,9 @@ gofmt or Python's PEP 8), abide by that style. <ul> <li><a href="#dependencies">Dependencies</a></li> - <li><a href="#comments">Comments</a></li> <li><a href="#complexity-and-optimization">Complexity and Optimization</a></li> + <li><a href="#comments">Comments</a></li> + <li><a href="#conditions">Conditions</a></li> <li><a href="#naming">Naming</a></li> <li><a href="#braces-and-parentheses">Braces and Parentheses</a></li> <li><a href="#line-length">Line Length</a></li> @@ -68,38 +69,6 @@ dependencies on Google and GitHub) is very concerning for the future of programs written in these languages. In fact, I have run into issues getting older Python programs to run due to packages no longer being available. -## Comments - -Comments should exist in code to express why something is being done a certain -way or to clarify a particularly tricky bit of code that is hard to express in -any other way. Comments like the following are completely useless and should -not exist: - -``` -struct Node *np = &node /* Create a pointer to the node */ -``` - -Comments should also not have annoyingly large banners, and big blocks of -comments should be avoided. If you use the -"[doxygen-style](https://en.wikipedia.org/wiki/Doxygen)" of commenting before -a function and at the top of a file, you _must_ treat this as writing -high-quality documentation that you would serve to an end user. Otherwise it -ends up being as useless as the above example in practice. I have often seen -these kinds of comments used as an excuse to not write good documentation -("Look, we have doxygen-generated documentations that list all the parameters; -we documented this code!") which makes these kinds of comments just clutter. - -Great uses of comments include: - -* Pointing to external documentation or another file as a guide or explanation -* TODO or FIXME markers -* Explaining a particularly complex data structure -* Explaining the purpose of a function that isn't immediately clear by reading its name -* Explaining the quirks of a particular algorithm (e.g. working around a hardware limitation) - -Also see: [The Misunderstood Concepts of Code -Comments](https://text.martinmch.com/2020-08-11-comments-should-be-red.html). - ## Complexity and Optimization Try to minimize complexity as much as possible. Complexity **does not @@ -109,10 +78,10 @@ reliance on external dependencies, where lines of code is just a symptom of these diseases. The more code a program has, the more bugs your program can have. The more -features your code has, the more ways those features can [interact in -unexpected ways](https://flak.tedunangst.com/post/features-are-faults-redux) -and result in bugs. The more clever a program is written, the harder it is to -spot bugs or debug. +features your code has, the more ways those features can [interact in unexpected +ways](https://flak.tedunangst.com/post/features-are-faults-redux) and result in +bugs. The more cleverly a program is written, the harder it is to spot bugs or +debug. Also, simple algorithms and data structures are often preferable over fancy ones—even though fancy ones may be theoretically faster for large @@ -140,12 +109,71 @@ bespoke optimizations and only then will you know what is worth optimizing and what is not. There is no point making a program harder to read by applying optimizations that lead to extremely marginal improvements. +## Comments + +Comments should exist in code to express why something is being done a certain +way or to clarify a particularly tricky bit of code that is hard to express in +any other way. Comments like the following are completely useless and should +not exist: + +``` +struct Node *np = &node /* Create a pointer to the node */ +``` + +Comments should also not have or be annoyingly large banners, and big blocks of +comments should be avoided. Definitely don't put a copy of your code's license +at the top of each file; a simple, single-line statement referring to the +LICENSE file is sufficient. + +If you use the "[doxygen-style](https://en.wikipedia.org/wiki/Doxygen)" of +commenting before a function and at the top of a file, you _must_ treat this as +writing high-quality documentation that you would serve to an end user. +Otherwise it ends up being as useless as the above example in practice. I have +often seen these kinds of comments used as an excuse to not write good +documentation ("Look, we have doxygen-generated documentations that list all +the parameters; we documented this code!") which makes these kinds of comments +just clutter. + +Great uses of comments include: + +* Pointing to external documentation, a specification, or a guide that explains what is going on in the code +* TODO or FIXME markers +* Explaining a particularly complex data structure or algorithm +* Explaining the purpose of a function or code block that isn't immediately clear +* Explaining the quirks of a particular algorithm (e.g. working around a hardware limitation) +* Explaining critical decisions (why one algorithm was chosen over another, system requirements, etc.) + +Also see: [The Misunderstood Concepts of Code +Comments](https://text.martinmch.com/2020-08-11-comments-should-be-red.html). + +## Conditions + +It is always preferable to be explicit when checking conditions. For example: + +``` +if (x == 1) { ... } + +if (ptr == NULL) { ... } + +while (*s != '\0') { ... } +``` + +is better than: + +``` +if (x) { ... } + +if (!ptr) { ... } + +while (*s) { ... } +``` + ## Naming Names should be descriptive and clear in context, but not redundant or excessive. In general, procedures should be named based on what they do and functions should be named based on what they return. Variables and types should -be nouns (e.g. `num_cakes` or `Parser`). +be nouns (e.g. `num_cakes` or `Parser`). For example: ``` draw(); @@ -206,6 +234,8 @@ naming. For example, if you have a variable representing the maximum and minimum physical address in memory, choose names like `max_phys_addr` and `min_phys_addr`, not `max_phys_addr` and `lowest_address`. +In summary... + ### Worse ``` @@ -248,15 +278,25 @@ if (condition) ``` should have surrounding braces. Although it can seem excessive, especially -paired with my other preferences, it eliminates a class of bugs that arises -when one adds code to a control block but forgets to also add the braces: +paired with my other preferences, it eliminates a class of errors that arises +when one adds code to a control block or comments out a line but forgets to +also add the braces: ``` if (condition) do_stuff(); do_more_stuff(); <- this is not in the if block! + +if (condition) + /* do_stuff(); */ + +do_other_stuff(); <- now this is in the if block! ``` +Plus, I find it more ergonomic to be able to quickly comment out or add +statements without also then having to add or remove braces to keep the style +consistent or the code correct. + (Although whitespace-indented languages such as Python do not have to deal with this issue, that property brings along other issues with code readability due to blocks not being as clearly delineated. Like I said, there is no One True @@ -291,10 +331,10 @@ if (condition) { because it is much easier to select the else statement (for deletion, copying, etc.) if it is not on the same line as the closing brace of the if statement. -This is not only helpful in vim's visual mode, but also when selecting code -with a mouse as one can be less precise when selecting the block of code. It -is also easier to read this way, regardless of how you feel about what I am -about to say: +This is not only helpful in vim's visual mode, but also when selecting code with +a mouse as one can be less precise when selecting the block of code. It is also +easier to read this way, regardless of how you feel about what I am about to +say: Although many have unfortunately settled on the following brace style for functions, control statements, and the like: @@ -323,21 +363,21 @@ more complex. For example: ``` if (is_logged_in(client) - && client->assignedAddress - && strncmp(client->username, "admin", sizeof("admin")) == 0 - && authenticate(client->password, password)) -{ + && client->assignedAddress + && strncmp(client->username, "admin", sizeof("admin")) == 0 + && authenticate(client->password, password)) { render_admin_panel(); } ``` -is far nicer than: +is not as nice as: ``` if (is_logged_in(client) - && client->assignedAddress - && strncmp(client->username, "admin", sizeof("admin")) == 0 - && authenticate(client->password, password)) { + && client->assignedAddress + && strncmp(client->username, "admin", sizeof("admin")) == 0 + && authenticate(client->password, password)) +{ render_admin_panel(); } ``` @@ -345,75 +385,63 @@ if (is_logged_in(client) Similarly: ``` -if (abs(hpos[0] - tpos[0]) > 1 || abs(hpos[1] - tpos[1]) > 1) -{ - if (hpos[0] > tpos[0] && hpos[1] < tpos[1]) - { - tpos[1]--; - tpos[0]++; - } - else if (hpos[0] > tpos[0] && hpos[1] > tpos[1]) - { - tpos[1]++; - tpos[0]++; - } - else if (hpos[0] < tpos[0] && hpos[1] < tpos[1]) - { - tpos[1]--; - tpos[0]--; - } - else if (hpos[0] < tpos[0] && hpos[1] > tpos[1]) - { - tpos[1]++; - tpos[0]--; - } - else if (hpos[0] > tpos[0]) - { - tpos[0]++; - } - else if (hpos[0] < tpos[0]) - { - tpos[0]--; +for (s = opts; (p = strsep(&s, ",")) != NULL;) { + /* always leave space for one more argument and the NULL */ + if (argc >= maxargc - 3) { + int newmaxargc = maxargc + 50; + + argv = ereallocarray(argv, newmaxargc, sizeof(char *)); + maxargc = newmaxargc; } - else if (hpos[1] > tpos[1]) - { - tpos[1]++; - } - else if (hpos[1] < tpos[1]) - { - tpos[1]--; + if (*p != '\0') { + if (*p == '-') { + argv[argc++] = p; + p = strchr(p, '='); + if (p) { + *p = '\0'; + argv[argc++] = p+1; + } + } + else { + argv[argc++] = "-o"; + argv[argc++] = p; + } } } ``` +<small>The above code is from <code>fsck.c</code> in the OpenBSD codebase and +is ISC-licensed.</small> -is generally more readable than: +is not as nice as: ``` -if (abs(hpos[0] - tpos[0]) > 1 || abs(hpos[1] - tpos[1]) > 1) { - if (hpos[0] > tpos[0] && hpos[1] < tpos[1]) { - tpos[1]--; - tpos[0]++; - } else if (hpos[0] > tpos[0] && hpos[1] > tpos[1]) { - tpos[1]++; - tpos[0]++; - } else if (hpos[0] < tpos[0] && hpos[1] < tpos[1]) { - tpos[1]--; - tpos[0]--; - } else if (hpos[0] < tpos[0] && hpos[1] > tpos[1]) { - tpos[1]++; - tpos[0]--; - } - else if (hpos[0] > tpos[0]) { - tpos[0]++; - } - else if (hpos[0] < tpos[0]) { - tpos[0]--; - } - else if (hpos[1] > tpos[1]) { - tpos[1]++; +for (s = opts; (p = strsep(&s, ",")) != NULL;) +{ + /* always leave space for one more argument and the NULL */ + if (argc >= maxargc - 3) + { + int newmaxargc = maxargc + 50; + + argv = ereallocarray(argv, newmaxargc, sizeof(char *)); + maxargc = newmaxargc; } - else if (hpos[1] < tpos[1]) { - tpos[1]--; + if (*p != '\0') + { + if (*p == '-') + { + argv[argc++] = p; + p = strchr(p, '='); + if (p) + { + *p = '\0'; + argv[argc++] = p+1; + } + } + else + { + argv[argc++] = "-o"; + argv[argc++] = p; + } } } ``` @@ -426,13 +454,16 @@ visually line up. With this style, it is much easier to keep track of block boundaries when the braces are distinctly on their own lines. It also means that, should the conditions in an `if` statement or the arguments to a function grow too long to -fit comfortably on one line, there is still noticeable separation between the -function's declaration and its body without an additional level of indentation. +fit comfortably on one line, there is still clear separation between the +statement and its body without needing to awkwardly double-indent the set of +conditions or arguments. + +While some denounce this style for being not as compact, note that, just like +the line length argument, this has been largely irrelevant for a long time. +Most displays from the last couple decades can show a lot more than 25 or even +40 rows of text at a time at reasonable resolutions and font sizes. -While some might denounce this style because it's not as compact and one can't -fit as many lines of meaningful code onto the screen at once, note that, just -like the line length argument, this has been largely irrelevant since the 90's, -if not earlier. Speaking of which... +Speaking of line length... ## Line Length @@ -459,7 +490,8 @@ statement look awkward, then don't. ``` int -this_is_a_function_with_a_long_name(int param1, int param2, int param3, int param4, int param5, int param6, int param7, int param8) +this_function_has_a_long_name_and_lots_of_args(int param1, int param2, int param3, + int param4, int param5, int param6, int param7, int param8) { return CONSTANT + max(param1, param2, param3, param4, param5, param6, param7, param8); @@ -470,7 +502,7 @@ this_is_a_function_with_a_long_name(int param1, int param2, int param3, int para ``` int -this_is_a_function_with_a_long_name( +this_function_has_a_long_name_and_lots_of_args( int param1, int param2, int param3, int param4, int param5, int param6, int param7, int param8) { @@ -584,7 +616,7 @@ Different programming languages have different conventions or best-practices that are dependent on the syntax and usage of that language. Here are some notes on various languages: -[This section needs work] +<p class="note">I am still expanding this section</p> ### C @@ -592,11 +624,157 @@ C functions should be named in `snake_case` because of the convention that internal functions are preceded by two underscores and "namespacing" your functions by prepending a category, or type is more readable with underscores. -There is no performance difference between `++i` and `i++` for in modern -compilers. +There is no performance difference between `++i` and `i++` in most compilers +(unless you tell your compiler to not attempt any optimizations whatsoever). + +Prefer enums over `#define` statements, they are easier to debug. + +Avoid macros, and if you do have to use them for performance reasons, they must +be as simple as possible so they are not a pain to debug. + +Sometimes a "magic number" is nicer to have in the code directly, with an +accompanying comment, than in a completely different section of the code as +a `#define`. Choose to do what is more readable and easily understood. + +Only `typedef` structs when they are supposed to be opaque to the user (i.e. +they should only interact with the struct through functions, and never access +fields directly). Also, separate `typedef`s from your structs/enums because it's +easier to read and grep for. + +The return type for functions should be on a separate line so it's easy to +search a codebase for the function implementation. Also, if a function takes no +arguments, be explicit about it with the `void` keyword. Do this: + +``` +int +main(void) +{ + return 0; +} +``` + +not this: + +``` +int main() +{ + return 0; +} +``` + +Declare variables on their own line and definitely **do not** mix declarations +and assignments on the same line. Do: + +``` +int x; +int y; +``` + +not: + +``` +int x = 5, y; +``` + +Also, initialize variables near where they will be used, rather than at their +declaration. + +Avoid manual inlining or other statements that only serve to "hint" at the +compiler to do something. Let the compiler handle it. + +**Never** use unsafe versions of functions (e.g. `strcpy`, `sprintf`) even when +you "know" your data will fit. You never know when someone else (or a future +version of you) will come along and make a change to the code that causes the +data to no longer fit, and now you have a bug at best, or a vulnerability at +worst. + +Don't use `alloca` or allocate large arrays/structs on the stack. Prefer +`malloc`/`free`. + +Return values should generally be `-1` or `NULL` for errors, `0` for success, +`>0` for any other non-error state/value your function/program wishes to +communicate. Make use of standard `errno`, `perror` and other such tools to set +or find out exactly what went wrong. + +Prefer the `/* ... */` style of comments. These are trivially easy to extend +into multi-line comments and I personally think they look nicer than C++ style +`// ...` comments. Also, when writing a multi-line comment, write them like so: + +``` +/* This is a multiline comment. + * A comment with multiple lines. + * Many lines, such wow. */ +``` + +or: + +``` +/* + * This is a multiline comment. + * A comment with multiple lines. + * Many lines, such wow. + */ +``` + +Just don't mix the two. + +For `switch`/`case` statements, use "fallthrough" comments unless several case +statements follow the same branch. For example: + +``` +switch (...) +{ + case 1: + case 2: + case 3: + /* ... */ + /* fallthrough */ + case 4: + /* ... */ + break; + default: + /* ... */ + break; +} +``` + +Don't `#include` `.c` files and don't use `#include` in header files. + +Make liberal use of `valgrind` and other profiling/static checking tools to +catch obvious mistakes. * [C Programming in Plan 9 from Bell Labs](http://doc.cat-v.org/plan_9/programming/c_programming_in_plan_9) * [Notes on Programming in C by Rob Pike](http://doc.cat-v.org/bell_labs/pikestyle) +* [BadDiode C Style Guide](https://badd10de.dev/notes/c-programming.html) +* [Sigrid's C Style](https://ftrv.se/3) + +### Shell Scripting + +Prefer `do`, `then`, etc. on the same line as the control statement. Since there +are no braces to demarcate code blocks in shell scripts the same way there are +in many programming languages, the pair of `if`/`fi`, `for`/`done`, and so on +make up the visual marker of a block of code. For example: + +``` +for i in $(seq 10); do + echo "$i" +done + +if [ "$var" = "value" ]; then + echo "var is value" +fi +``` + +Use quotes around variables wherever possible. This prevents accidental +incorrect behaviour where a variable expands to a value that contains spaces or +special characters that then get interpreted by the shell as being additional +arguments to a program or other shell syntax. + +In general, write POSIX-compliant shell programs that use portable program flags +over ones that are OS or shell-specific. If non-portable flags or programs must +be used on one system but not another, check for OS versions using `uname -s` +and set variables like `$cmd` and `$cmd_flags` which can be used later on in the +script to call the right program with the right flags. ### Python