Linux Fu: Shell Scripts in C, C++, and Others

by admin

on September 17, 2019

At first glance, it might not seem to make sense to write shell scripts in C/C++. After all, the whole point to a shell script is to knock out something quick and dirty. However, there are cases where you might want to write a quick C program to do something that would be hard to do in a traditional scripting language, perhaps you have a library that makes the job easier, or maybe you just know C and can knock it out faster.

While it is true that C generates executables, so there’s no need for a script, usually, the setup to build an executable is not what you want to spend your time on when you are just trying to get something done. In addition, scripts are largely portable. But sending an executable to someone else is fairly risky — but your in luck because C shell scripts can be shared as… well, as scripts. One option is to use a C interpreter like Cling. This is especially common when you are using something like Jupyter notebook. However, it is another piece of software you need on the user’s system. It would be nice to not depend on anything other than the system C compiler which is most likely gcc.

Luckily, there are a few ways to do this and none of them are especially hard. Even if you don’t want to actually script in C, understanding how to get there can be illustrative.

The Whole Shebang

I’m going to assume your shell is Bash. There may be subtle differences between shells, but shells will typically support a way to launch scripts known as the shebang — it’s the use of the hash and exclamation characters (#!) you’ve probably seen at the top of scripts.

When Bash sees you are trying to execute a file, it tries to figure out what kind of file it is using a magic number lookup the way file does. The file command actually uses a library called “magic” to do this and you can run man magic to see a database of sorts that is at work. In theory, there’s a text representation and a compiled version, but many common distributions don’t install the source by default. Regardless, the database looks for certain magic numbers in files to determine their type — programs don’t need to rely on file extensions, for example.

The exact format isn’t important, but a typical entry has an offset to look inside the file and a number or pattern to match. In the case of a shell script the magic number is 0x23 0x21 which is, of course, #!. In particular, system calls that execute something can tell the difference between a shell script and just a random text file.

Normally, you’ll see something like #!/usr/bin/bash which causes the file to run as a Bash script. Of course, this hardcodes the location of the system copy of bash. Some argue this is good because you think you have a chance at getting a known copy of Bash. Others argue that if you have an upgraded copy of Bash in your personal directories it won’t use that. If you agree with the latter group, you can try #!/usr/bin/env bash — that still hardcodes a path, but that executable only sets up the environment.

The interpreter, though, doesn’t have to be Bash or even a proper shell. For example, an Awk program might have #!/usr/bin/awk -f as a first line. So one strategy would be to build a script that can “launch” the underlying C “script.”

That’s one approach, but I took a different one. My original thought was that since #! looks like a preprocessor statement, a script file might be directly usable to the C compiler. That might have been true in the past, but a modern preprocessor throws an error when it sees something it doesn’t expect.

Marking C Files as Bash Scripts

I wanted to keep things simple. The following lines at the very front of a stand-alone C file is enough to make things work:

#!/usr/bin/env bash
#if 0
source cscript_simplec
#endif

The first line tells the system that this is a Bash script. You might be wondering why I would mark it as a Bash script when I’m trying to get to C. Well, the very next few lines are a Bash script. The #if and #endif statements are just comments to Bash. And the source command tells the shell to read cscript_simplec from somewhere on the directory path.

That source never comes back, so what’s after it doesn’t matter to Bash. However, this file will pass to gcc if the executable is out of date. Suppose this file is example.c. There will be an executable example.c.bin in the same directory. (This implies that the first person to run the script needs to have write permission to the directory.)

If the binary is newer than the source file, we simply run it using exec. This causes the program to overlay the current copy of Bash which saves a little memory compared to just running the new program. However, if the source is newer, the script rebuilds the binary first.

There’s a slight problem. Although most of the file will be legal C, the first line isn’t. Yet that line is crucial for the startup. The answer is to cut that line off. Here’s what cscript_simplec looks like:

if [ "$0" -nt "$0.bin" ]
then
  CCOPTS="${CCOPTS:-O3}"
  if ! tail -n +2 "$0" | gcc -x c "${CCOPTS}" -o "$0.bin" -
  then
    echo Compile Error on $0
    exit 999
  fi
fi
exec "$0.bin"

The final command on line 10 cuts off the first line and feeds gcc through the pipe. Because there’s no file name, we have to tell gcc that it is reading a C file (the -x option). You can set CCOPTS or it will default to -O3.

Of course, if you were going to send this out into the wild, you might want to just include this whole chunk — or something similar — in the script and forego the source command. That would work.

Complexity

It’s easy to change the code for something different like C++. Since this is scripting, it is pretty safe to assume there is one file and the executable is directly dependent on only the source file. However, if you want a bit more complexity — some would argue too much for a simple script too — you can turn to make.

Replace cscript_simplec with cscript_make if you want to try that. You’ll have to provide a makefile, too (example.c.make in this case). A suitable one is:

$(SCRIPT_OUT_NAME):$(SCRIPT_NAME)
       gcc -x c $(SCRIPT_NAME) -o $(SCRIPT_OUT_NAME)

Note you have to use $(SCRIPT_NAME) for the source file and $(SCRIPT_OUT_NAME) for the executable. This is a silly example, of course, but you could create a complex set of dependencies and compile options using a makefile. On the other hand, this seems to violate the simple principle, so you are probably better off just writing a normal C program at that point.

If you really need a high-level scripting language, you might consider Python or one of the many other interpreted languages available. However, understanding the mechanism and how to subvert the C compiler might still come in handy someday. After all, you can pull some ugly/beautiful hacks with the preprocessor and compiler.

Categories:

bash c++gcc Hackaday Columns linux Linux Hacks Posts scripting Skills