Code Editor: My Final Year Project

Since the academic year is over and I finally graduated I think I can now write about my final year project without it jeopardizing any marking.

My project was to create a code editor with syntax highlighting and autocomplete. In addition to these features required by the brief I also created other functionality in order to expand the functionality of the editor; these were error-feedback, project spaces, compiler integration, and simple source-control mechanisms (commit, revert, and update). The editor targeted the C language, specifically the ANSI-C standard.

Standard editor script

Figure 1. The main layout of the editor

The above figure shows the main layout of the editor that the user interacts with. It’s a fairly simple layout devoting the most screen space to the coding section and making the editor functionality available in the toolbar to the left.

Regarding the toolbar, the choice of this was largely based on another editor’s interface, one made by Robert Crocombe a couple of years earlier. The difference was that he made a very elegant toolbar which expands to give greater information about the tool; in my toolbar I relied on familiarity with common icons in order to show the functionality of each button (this is something I’ll have to change according to the user feedback I got).

Overall I managed to get the majority of the goals for the project, but couldn’t get all of the source-control solutions done in time.

Syntax highlighting

The syntax highlighting feature was made to highlight each different lexeme in the C language. This included all 32 standard keywords and operators (numbers, pointers, etc), and any user-defined types.

Thankfully, a fair few people have undertaken the code editor project before me so a pretty clear method for visually drawing the text on the screen had been provided; inherit from the standard text box and override the rendering method to highlight the text.

Like the people who did the project before me I managed to make a few optimisations to the rendering code; before generating any of the formatted text objects to be displayed first check to see if the line to be rendered would be on-screen. In this way the same number of lines are rendered regardless of the length of code and there is negligible difference between the rendering times because the checks for whether the line is visible is comprised of a relatively inexpensive mathematical comparison.

Autocomplete

Code editor autocomplete example 1

Figure 2. The autocomplete feature suggesting two valid variables

Code editor autocomplete example 2

Figure 3. The autocomplete feature suggesting method signatures alongside suggestions

Code editor autocomplete example 3

Figure 4. The autocomplete feature ensuring the method signature remains displayed until the closing bracket of the method call has been completed.

An autocomplete feature is one which suggests any of the different possible tokens to be entered based on the partial string the user has entered. For example, if the user had entered the text ‘d’ the autocomplete feature would suggest the words ‘do’, ‘double’, and any variables starting with the string ‘d’.

One of the aspects of my editor that I’m most proud of is its ability to account for variable scope. When I was looking at the other editors I didn’t want to have an editor that suggested variable names regardless of scope (Notepad++ had the type of autocomplete I wanted to avoid (not that Notepad++ is a bad editor)). In order to accomplish this scope awareness I’ve built up some classes to build a type of symbol table. A MethodScope object is created for each method. This MethodScope object contains the name of the method and its start and end points in the text. Each method scope object contains a list of variable objects. These are comprised of the variable name, its type, and the point in text beyond which it is applicable. Using these objects it is possible to see which variables are applicable by comparing the text’s caret position to each MethodScope’s start and end position and the applicable point of each variable within that scope.

The autocomplete feature also tries to allow the user a degree of error before completely removing suggestions. To do this I’ve used the edit distance algorithm to make sure that the entered text is within a set number of changes before deciding that a variable cannot be suggested. For example, if the code contains a variable ‘LocalVariable’ and the user enters ‘localVariabl’ the variable would still be entered because 2 of the characters could be changed to get us to the actual suggestion (2 was the acceptable number I decided on, but this could easily be changed). I should mention that this was an idea suggested by people who had previosuly undertaken the code editor project.

Error-feedback

My code editor contains syntactic error-feedback. This is accomplished via the Irony.NET parsing technology which I chose because it allows the BNF grammar to be coded into the program and doesn’t require any kind of additional runtime-environment. Incidentally, the autocomplete and syntax highlighting features could be accomplished via this parsing technology by checking the end nodes of the generated parse tree for their token types and then colouring and storing them. This approach was not used because the first syntactic error causes the parsing technology to stop parsing and I didn’t have the time implement any error-recovery mechanism.

The parsing technology stops on the first syntactic error, but an ideal error-feedback feature will highlight as many errors as possible. Using the MethodScopes object it is possible to extract all of the text from a method and parse them individually, thus my editor can get the first syntactic error in each method and highlight multiple errors throughout the code.

Project spaces

This is the most basic of the features I’ve implemented. Each file in the project should be located in the same space in secondary memory. Once a program is saved through the editor both the back-end file and the XML project file are updated. The XML project file contains lists of each file in the project, the location of any SVN repository location, and any project-specific compiler arguments that might have been entered.

The XML format was chosen because this back-end file has to be human-readable in the event that edits have to be made outside of the editor. Jason was another viable format option that’s quicker to parse, but XML was chosen because the speed of execution isn’t vital to this feature and because the .NET library comes with in-built XML tools whereas a Jason approach would require 3rd party tools which I wanted to limit the use of for clarity’s sake.

Below is an example of a typical project XML file which is loaded and saved in order to store the necessary information about the project:

Main.c
Person.h
Person.c

To conform with good standards that Project tag really should encompass all of the file tags, but this worked well enough.

Compiler integration

Compilation and run program settings

Figure 5. The compiler and run-program sections of the settings page.

Compilation is a necessary part of C development.  Since I wanted my editor to provide all of the necessary tools for writing C programs it is necessary for the user to be able to compile and run the output program through the editor. Above is the compiler and run program section of the settings page that the user-feedback indicates could use some work.

The editor allows the user to specify any compiler on the system and any number of compiler directives. Once specified a compiler setup is saved into secondary memory and can be used again on multiple programs. Having multiple applicable compilers was a feature I adapted from Code::Blocks since it seems like a useful feature but I didn’t think it was readily accessible in most of the larger editors I’d looked at.

Ensuring the compiler setup can be applicable to multiple projects means I’ve had to setup a mechanism to evaluate the setup arguments and produce the output string to be passed to the compiler as an argument. For example, the GCC compiler can output an executable with the specified name using the -o argument. If the user specifies this in my editor (as shown in the figure below), the output string -o followed by whatever the project name is produced by the EvaluateArgument method.

Standard GCC setup

Figure 6. The standard GCC argument for outputting the executable stored in the C editor.

        public string EvaluateArgument(Project PassedProject)
        {
            string ReturningString = CompilerArgumentSpecifier;

            switch(ChosenArgumentType)
            {
                case ArgumentType.UseAllFiles:
                    for(int i = 0; i < PassedProject.ReturnProjectFiles().Count; i++)
                    {
                        if(PassedProject.ReturnProjectFiles()[i].EndsWith(".c"))
                        {
                            ReturningString += ("\"" + PassedProject.ProjectLocation + @"\" + PassedProject.ReturnProjectFiles()[i] + "\"");
                            ReturningString += " ";
                        }
                    }
                    break;
                case ArgumentType.ProjectName:
                    ReturningString += "\"" + PassedProject.Name + "\"";
                    break;
                case ArgumentType.ProjectLocation:
                    ReturningString += "\"" + PassedProject.ProjectLocation + "\"";
                    break;
                case ArgumentType.EntireProjectAddress:
                    ReturningString += ("\"" + PassedProject.ProjectLocation + @"\" + PassedProject.Name + "\"");
                    break;
                case ArgumentType.ManualEntry:
                    ReturningString += "\"" + OptionalArgumentType + "\"";
                    break;
                case ArgumentType.Ended:
                    //Should be used when the argument is a single variable
                    break;
            }
            return ReturningString;
        }

One of the things I should have done, and might in the future, is have each compiler include all of the standard directives by default; when I implemented the feature I wanted the user to be able to input the necessary directives in the (admittedly unlikely) event that someone was using an unconventional compiler or variation of standard, but including all standard directives by default would save significant time.

Running the compiler and the output program uses very similar code. The .Net library’s System.Diagnostics library contains the code to launch an external process and specify any input arguments. This works quite simply for individual program arguments, but the compiler arguments required me to build a mechanism to produce the output arguments based on their type.

Source-control mechanisms

I wanted the editor to provide commit, update, and revert to options to the user. Unfortunately, due to time constraints and a lack of familiarity with the API (SharpSVN which was the best tool I could find for the job), it wasn’t possible to implement the revert to feature within the time given for the project.

The editor allows the user to both commit and update their work via the tortoise source-control software, but there are some caveats. Unfortunately, I wasn’t able to implement the functionality to commit new folders to the repository thus if the user wants to commit and update from a repository it is first necessary to checkout the folder from the repository and then save the project in the checked out folder.

I should mention that someone quite generously put up working code for the commit and update functions of the API, which meant I could adapt the code to implement these features quicker. These can be located here (Accessed 28 October 2016).

Overall

My editor works very well considering the functionality I tried to implement, the timespan I had, and the new tools I had to work with. If I’m going to return to this in the future then there are a fair few simple updates to be implemented:

  • Update the source-control mechanism to allow the user to create folders on the repository side
  • Include the revert-to functionality in the source-control functionality
  • Changed the compiler directives settings to include all standard directives by default

There’s quite a bit more that could be done, however these are relatively simple (hopefully simple in the case of the SVN work) that could be done in the immediate future.

This blog post is simply a quick look at the project; it would take a great deal longer to talk about all of the project. While I’ve just done a quick run-through of the features here it’s fair to say that I made some unorthodox design choices and had to come up with strange solutions for obscure problems, which might make pretty good posts later on.

Creating a better management system

ManagementSystemServerSide ManagementSystemClientSide

Figure 1 (left). The server application which services each client

Figure 2 (right). A simple client to communicate via the implemented protocol

This week I’ve been working on a better management system. This was an interesting project because it incorporates many different approaches to make it accessible and secure.

The basis for the project is very simple. A user can connect to and log into the server via the client application (this could be changed to a web-server or mobile application later with a little careful tweaking). Once logged in the client can place a new order or look at the list of their previous orders.

Once connected all interactions take place over a persistent connection. In this way, a user’s individual Id can be associated with the connection (incidentally it would be possible to extend this further later on in order to implement some kind of privilege mechanism to distinguish customers and staff in this hypothetical company).

The actual protocol implemented is very basic with the first line sent denoting the command necessary and subsequent lines supplying the information for each command.

Multi-threading

A server has to handle many connections concurrently (or it should at least if it’s going to be considered any good), so my application relegates each connection to a separate thread. In my application the TCPListener object listens on a background-thread (which is necessary because the AcceptTcpClient method is a blocking call) and launches a new thread passing the accepted TcpClient to it.

            while (true)
            {
                TcpClient Connection = ConnectionListener.AcceptTcpClient();
                Thread t = new Thread(() => { ServiceConnection(Connection); });
                t.Start();
                //Notion of using delegate to pass object to thread provided at the following source
                //https://stackoverflow.com/questions/3988514/how-to-pass-refrence-and-get-return-in-thread
                //Accessed 26/06/2017
            }

The idea of using a delegate method to pass the TcpClient to the thread was provided here by user sumit_programmer (accessed 26/07/2017).

Salting hashes

While creating this application I’ve attempted to follow best practice when storing password, meaning that users with the same passwords should not have the same hashes associated with them. This is generally accomplished by appending random data to the password to make sure the hash becomes vastly different.

private string GenerateSalt()
{
     string GeneratedSalt = "";

     for(int i = 0; i < SaltLength; i++)
     {
         GeneratedSalt += System.Text.Encoding.ASCII.GetString(new byte[] { (byte)rnd.Next(0, 255) });
     }

     return GeneratedSalt;
}

In my system this is done by generating a random string of fixed length and appending it to the entered password. I’ve accomplished this by simple using the .Net Random object and using the result for hash values. This has worked pretty well so far but because the Random object only gets its value from a pseudo-random sequence it is entirely possible that the same character will be selected from each part of the salt if selected quickly enough. Fortunately, this is unlikely to happen throughout the salt and very unlikely to produce the same salts for the same passwords. As an example both users 1 and 2 have the same password (‘password’) with vastly different hashes to be matched.

ChangedHashes

Figure 3. The difference in hashes for the same password

Hash-based check-sums

Another of the security measures I’ve implemented is to ensure that the message received hasn’t been tampered with en-route. To do this a hash of the concatenated message contents is sent along with the message. Then a new hash is recreated upon receipt; if the newly created hash and received hash are equal then there has been no tampering, but if they differ the message has been altered.

This leaves me with something of a weird question; if the message has been intercepted and changed do I want to inform the client of it? If you do wouldn’t this alert the attacker to the fact that they’ve been discovered? Because this is unlikely to occur in my system I’ve simply resolved to make sure that nothing happens in the event that the two hashes don’t match, but this is something I’ll have to look into later.

      int ChecksumLength = int.Parse(ConnectionInputStream.ReadLine());

      string CheckSum = "";

      for(int i = 0; i < ChecksumLength; i++)
      {
           CheckSum += (char)ConnectionInputStream.Read();
      }
      ConnectionInputStream.ReadLine();

An interesting problem that I didn’t think of accounting for before the start of the project was that the hashes would inevitably include special characters (see above). This meant that the length of the hash would have to be sent to know how many individual characters to read in; this is the downside of using something low-level like ports.

Conclusion

This hasn’t been a bad project. I’ve made a reasonably useful protocol that could be extended further, and I’ve implemented some fairly reasonable security measures throughout it. It certainly may have been a better idea to implement it in some higher form of communication technology but overall I’m reasonably happy with it.