Troubleshooting lab errors

How to save time by troubleshooting basic lab issues

Nir Cohen avatar
Written by Nir Cohen
Updated over a week ago

The goal of this article is to help you troubleshoot any lab issues you or your students may experience while testing out Strigo class configurations or while training takes place.

While some lab-related issues may happen because of problems with Strigo's backend, most of the issues our users experience are either due to problematic lab configuration, or errors related to environmental constraints or user errors. Understanding the potential origin of these types of issues may help you save time and, in some cases, prevent frustration for you and your students.

Here, we'll try to create an exhaustive list of problems and their potential solutions. A reasonable understanding of class resource definition is required here.

Possible Issues:

Lab Configuration Issues

Lab configuration issues are issues that can almost definitely be resolved by the instructor or the training operations team in your organization.

Problem: Wrong Permissions

When defining resources in the class template, you must also provide a username. That username is the user Strigo's backend will use to provision and connect to the lab. In the case of Windows machines, this is usually "Administrator", while in Linux it's usually the default user for the specific Linux distribution you're using (e.g. ubuntu for Ubuntu, ec2user for Amazon Linux, etc..).

  • In Windows, we will use that user to generate a random password and use that to connect to the machine.

  • In Linux, we will use that user to deploy our terminal multiplexer (tmux) under that user's home directory (~/.strigo/).

If the wrong user is provided, we will not be able to connect to the machine at all, resulting in an error when starting a workspace.

Note: If the wrong user is provided for a Windows machine, we will display an error stating that the wrong credentials were provided. For Linux, this is more tricky, and we'll only display the credentials error after a few minutes of trying to connect.

Resolution:

  • Make sure the class template resource has the correct username configured.

  • For Windows, make sure you've followed our guide on how to create custom Windows images, as one part of it is required for us to be able to generate the relevant password.

Note: When using a Linux Desktop-based lab, the provided user must not have a password and must be able to SSH into the instance, otherwise, we will not be able to provision it correctly.

Problem: Non-existing/Unshared Image

Strigo will not allow you to save a class resource if its image (AMI) can't be found, or if it's not shared with Strigo.

However, there's a single scenario in which a resource cannot be found. If you save a class template and then delete one of the images defined in that template completely. When trying to create a workspace, you will see the following warning:

Resolution:

In this case, you will have to edit the class template and provide an AMI Strigo can access.

Problem: Image in the Wrong Region

While you cannot technically provide Strigo with an image that's in the wrong region (as it won't let you save the lab resource in the class template), you do have to make sure to choose the relevant region.

Resolution: When pasting in the AMI ID, you also have to choose the correct region where the AMI resides. If you don't have that information, you should contact whoever is in charge of creating your images.

Note: If you'd like to use the same image in a different region, see AWS's official documentation.

Problem: Unsuitable Instance Type

Choosing the instance type for your lab affects four main attributes:

  • CPU

  • RAM

  • Architecture

  • Network capacity

Choosing an unsuitable instance type may result in performance issues, lab crashes, Strigo not being able to connect to the lab, and other weird behaviors. All of these issues are not related to Strigo whatsoever. The instance type should accommodate your training needs, which will depend on the type of workload you're generating and the applications you're running within the lab.

For Windows, for example, we suggest using at least 2 CPU’s and 8GB of RAM, as otherwise, the lab may take a lot of time to load, and you may notice the interface being slow and unresponsive.

Problem: Storage Issues

Much like with wrong instance types, running out of storage, or using the wrong storage type for the specific type of workload you're generating can make labs crash or slow them down marginally.

Resolution:

You should make sure you provide enough storage, of the relevant type when you create your AMI. You can follow this guide (and more specifically, step 2) to make sure you set up the relevant properties for your image.

Note: There is no way for Strigo to extend the storage for labs (currently running ones, or otherwise).

Problem: Long loading time for Windows labs

Sometimes, users may think that their lab isn't loading, while what's actually happening is that it's just taking a while for the lab to be provisioned in AWS. That's because, by default, Windows machines tend to take a while (5-8m) to load on AWS.

Resolution:

Fortunately, we provide a thorough guide on how to reduce Windows lab loading time here.

Problem: Bad Init Script

If you're using Init Scripts (specifically to AWS, these are known as "user data") to run scripts in your labs when they load, you may experience issues if they're misconfigured.

Unfortunately, there is no way for Strigo to analyze what the lab is running (as users can pretty much write whichever scripts they want). This also means that if the scripts are somehow not doing what they're intended to do, or if they're just completely wrecking the operation system, Strigo will not be aware of this, and will simply not be able to show the lab for the users, stating that "Something Went Wrong".

Resolution:

Please make sure you check your scripts thoroughly by running test events.

To make sure the scripts are formatted correctly, follow these guides:

Problem: Init script changes user password

For both Linux Desktops and Windows Desktops, we generate a random password that allows Strigo to connect to the instance.

We've stumbled into occurrences where the user data script is written to change the password of the user configured in the class template resource. This will prevent Strigo from being able to connect to the lab entirely.

Resolution:

Please do not dynamically change the password of the user you configured in Strigo.

Environmental issues or User Error

Environmental or User-error issues are issues that cannot be resolved by configuration in any way. Neither the instructor nor Strigo can anticipate these problems, resulting in the need to debug issues on the fly with students.

Problem: Mishandled Lab

One of the things that makes Strigo so awesome (we're not biased at all), is that flexibility that allows users to do whatever they need in the labs.

The flip side of that is that students can also do the following:

  • Shutdown the lab completely.

  • Delete necessary files or just destroy the operating system completely.

  • Utilize lab resources (CPU, memory, storage, network) in a way that will either prevent Strigo from connecting to it, or just prevent them from working on their lab.

Resolution:

  • If the user shuts the lab down, the student can always restart it (the instructor can do that for the student as well). It may seem like there's a problem with Strigo, as it just won't show the lab's interface, but really all that's required is a restart.

  • Unfortunately, a restart might not suffice. If the student just deleted necessary files from the machine (for example), restarting the lab may not help at all, in which case the lab should be replaced completely.

  • Over-utilizing the lab may also require a restart.

Problem: Lab is slow or completely frozen

The hardest problem to solve is related to environmental issues. Some of these issues are solvable by making specific changes to how or where labs are provisioned, while others may be very hard to track and resolve. Most of these issues result in a slow response or complete freezes of the lab interfaces.

These issues mostly revolve around:

  • The lab's chosen region being far away from where the attendees reside.

  • The lab's instance type is too small for the required workload.

  • The lab is over-utilized.

  • The attendee's network is just too slow to handle the connection to the lab.

  • The attendee is behind a corporate firewall/VPN, preventing required network access to Strigo or slowing things down.

We've covered some of these issues before, so we won't address them again here.

To address the rest of them:

  • If the chosen region is far away from the attendees, you may want to copy your AMI to a region closer to them. We understand that this isn't ideal, and we'd like to provide geo-location-based lab provisioning, but, for now, this is a step you can take.

  • If the attendee's network is slow, you can ask them to move to a different network.

  • If the attendee is behind a firewall/VPN, you can ask them to move to another network.

Problem: Lab does not reboot

Instructors and students in Strigo can reboot their own labs. Sometimes, rebooting doesn't work. This usually has nothing to do with Strigo, but rather because the student did something to the lab that prevents us from being able to reboot it. For example, the user can over-utilize the CPU, or stop services on the machine required for us to interact with it.

Unfortunately, most of the time, this means that the lab will have to be replaced. If a lab itself is mishandled that way, there isn't much Strigo can do.

Everything Else

If all else fails, we may have an issue provisioning the lab for you. For example, sometimes AWS runs out of quota for the specific instance type you're trying to use. In any of these cases, you should contact our support.

Did this answer your question?