testing Archives - Lightrun

Testing in Production: Recommended Tools

Lightrun Marketing — Thu, 11 Jun 2020 07:08:29 +0000

Testing in production has a bad reputation. The same kind “git push – – force origin master” has. Burning houses and Chuck Norris represent testing in production in memes and that says it all. When done poorly, testing in production very much deserves the sarcasm and negativity. But that’s true for any methodology or technique.

This blog post aims to shed some light on the testing in production paradigm. I will explain why giants like Google, Facebook and Netflix see it as a legitimate and very beneficial instrument in their CI/CD pipelines. So much, in fact, that you could consider starting using it as well. I will also provide recommendations for testing in production tools, based on my team’s experience.

Testing In Production – Why?

Before we proceed, let’s make it clear: testing in production is not applicable for every software. Embedded software, on-prem high-touch installation solutions or any type of critical systems should not be tested this way. The risks (and as we’ll see further, it’s all about risk management) are too high. But do you have a SaaS solution with a backend that leverages microservices architecture or even just a monolith that can be easily scaled out? Or any other solution that the company engineers have full control over its deployment and configuration? Ding ding ding – those are the ideal candidates.

So let’s say you are building your SaaS product and have already invested a lot of time and resources to implement both unit and integration tests. You have also built your staging environment and run a bunch of pre-release tests on it. Why on earth would you bother your R&D team with tests in production? There are multiple reasons: let’s take a deep dive into each of them.

Staging environments are bad copies of production environments

Yes, they are. Your staging environment is never as big as your production environment – in terms of server instances, load balancers, DB shards, message queues and so on. It never handles the load and the network traffic production does. So, it will never have the number of open TCP/IP connections, HTTP sessions, open file descriptors and parallel writes DB queries perform. There are stress testing tools that can emulate that load. But when you scale, this stops being sufficient very quickly.

Besides the size, the staging environment is never the production one in terms of configuration and state. It is often configured to start a fresh copy of the app upon every release, security configurations are eased up, ACL and services discovery will never handle real-life production scenarios and the databases are emulated by recreating them from scratch with automation scripts (copying production data is often impossible even legally due to privacy regulations such as GDPR). Well, after all, we all try our best.

At best we can create a bad copy of our production environment. This means our testing will be unreliable and our service susceptible to errors in the real life production environment.

Chasing after maximum reliability before the release costs. A lot.

Let’s just cite Google engineers:

“It turns out that past a certain point, however, increasing reliability is worse for a service (and its users) rather than better! Extreme reliability comes at a cost: maximizing stability limits how fast new features can be developed and how quickly products can be delivered to users, and dramatically increases their cost, which in turn reduces the number of features a team can afford to offer.

Our goal is to explicitly align the risk taken by a given service with the risk the business is willing to bear. We strive to make a service reliable enough, but no more reliable than it needs to be.”

Let’s emphasize the point: “Our goal is to explicitly align the risk taken by a given service with the risk the business is willing to bear”. No unit/integration/stating env tests will ever make your release 100% error-free. In fact they shouldn’t (well, unless you are a Boeing engineer). After a certain point, investing more and more in tests and attempting to build a better staging environment will just cost you more compute/storage/traffic resources and will significantly slow you down.

Doing more of the same is not the solution. You shouldn’t spend your engineers’ valuable work hours chasing the dragon trying to diminish the risks. So what should you be doing instead?

Embracing the Risk

Again, citing the great Google SRE Book:

“…we manage service reliability largely by managing risk. We conceptualize risk as a continuum. We give equal importance to figuring out how to engineer greater reliability into Google systems and identifying the appropriate level of tolerance for the services we run. Doing so allows us to perform a cost/benefit analysis to determine, for example, where on the (nonlinear) risk continuum we should place Search, Ads, Gmail, or Photos…. That is, when we set an availability target of 99.99%,we want to exceed it, but not by much: that would waste opportunities to add features to the system, clean up technical debt, or reduce its operational costs.”

So it is not just about when and how you run your tests. It’s about how you manage risks and costs of your application failures. No company can afford its product downtime because of some failed test (which is totally OK in staging). Therefore, it is crucial to ensure that your application handles failures right. “Right”, quoting the great post by Cindy Sridharan, means:

“Opting in to the model of embracing failure entails designing our services to behave gracefully in the face of failure.”

The design of fault tolerant and resilient apps is out of the scope of this post (Netflix Hystrix is still worth a look though). So let’s assume that’s how your architecture is built. In such a case, you can fearlessly roll-out a new version that has been tested just enough internally.

And then, the way to bridge the gap so as to get as close as possible to 100% error-free, is by testing in production. This means testing how our product really behaves and fixing the problems that arise. To do that, you can use a long list of dedicated tools and also expose it to real-life production use cases.

So the next question is – how to do it right?

Testing In Production – How?

Cindy Sridharan wrote a great series of blog posts that discusses the subject in a great depth. Her recent Testing in Production, the safe way blog post depicts a table of test types you can take in pre-production and in production.

Refined by testing spectrum a bit more – this againnisnt comprehensive, but makes it increasingly clear that testing in production (especially testing post release) is just as important as pre-production testing.

Thoughts? pic.twitter.com/YS0VjACvqD

— Cindy Sridharan (@copyconstruct) February 9, 2018

One should definitely read carefully through this post. We’ll just take a brief look and review some of the techniques she offers. We will also recommend various tools from each category. I hope you find our recommendations useful.

Load Testing in Production

As simple as it sounds. Depending on the application, it makes sense to stress its ability to handle a huge amount of network traffic, I/O operations (often distributed), database queries, various forms of message queues storming and so on. Some severe bugs appear clearly only upon load testing (hi, memory overwrite). Even if not – your system is always capable of handling a limited amount of a load. So here the failure tolerance and graceful handling of connections dropping become really crucial.

Obviously, performing a load test in the production environment will stress your app configured for the real life use cases, thus it will provide way more useful insights than loading testing in staging.

There are a bunch of software tools for load testing that we recommend, many of them are open sourced. To name a few:

mzbench

mzbench supports MySQL, PostgreSQL, MongoDB, Cassandra out of the box. More protocols can be easily added. It was a very popular tool in the past, but had been abandoned by a developer 2 years ago.

HammerDB

HammerDB supports Oracle Database, SQL Server, IBM Db2, MySQL, MariaDB, PostgreSQL and Redis. Unlike mzbench, it is under active development as for May 2020.

Apache JMeter

Apache JMeter focuses more on Web Services (DB protocols supported via JDBC). This the old-fashioned (though somewhat cumbersome) Java tool I was using ten years ago for fun and profit.

BlazeMeter

BlazeMeter is a proprietary tool. It runs JMeter, Gatling, Locust, Selenium (and more) open source scripts in the cloud to enable simulation of more users from more locations.

Spirent Avalanche Hardware

If you are into heavy guns, meaning you are developing solutions like WAFs, SDNs, routers, and so on, then this testing tool is for you. Spirinet Avalanche is capable of generating up to 100 Gbps, performing vulnerability assessments, QoS and QoE tests and much more. I have to admit – it was my first load testing tool as a fresh graduate working in Checkpoint and I still remember how amazed I was to see its power.

Shadowing/Mirroring in Production

Send a portion of your production traffic to your newly deployed service and see how it’s handled in terms of performance and possible regressions. Did something go wrong? Just stop the shadowing and put your new service down – with zero impact on production. This technique is also known as “Dark Launch” and described in detail by CRE life lessons: What is a dark launch, and what does it do for me? blog post by Google.

A proper configuration of load balancers/proxies/message queues will do the trick. If you are developing a cloud native application (Kubernetes / Microservices) you can use solutions like:

HAProxy

HAProxy is an open source easy to configure proxy server.

Envoy proxy

Envoy proxy is open source and a bit more advanced than HAProxy. Wired to suit the microservice world, this proxy was built into the microservices world and offers functionalities of service discovery, shadowing, circuit breaking and dynamic configuration via API.

Istio

Istio is a full open-source service mesh solution. Under the hood it uses the Envoy proxy as a sidecar container in every pod. This sidecar is responsible for the incoming and outgoing communication. Istio control service access, security, routing and more.

Canarying in Production

Google SRE Book defines “canarying” as the following:

To conduct a canary test, a subset of servers is upgraded to a new version or configuration and then left in an incubation period. Should no unexpected variances occur, the release continues and the rest of the servers are upgraded in a progressive fashion. Should anything go awry, the modified servers can be quickly reverted to a known good state.

This technique, as well as similar (but not the same!) Blue-Green deployment and A/B testing techniques are discussed in this Cristian Posta blog post while the caveats and cons of canarying are reviewed here. As for recommended tools,

Spinnaker

Netflix open-sourced the Spinnaker CD platform leverages the aforementioned and many other deployment best practices (as in everything Netflix, built bearing microservices in mind).

ElasticBeanstalk

AWS supports Blue/Green deployment with its PaaS ElasticBeanstalk solution

Azure App Services

Azure App Services has its own staging slots capability that allows you to apply the prior techniques with a zero downtime.

LaunchDarkly

LaunchDarkly is a feature flagging solution for canary releases – enabling to perform a gradual capacity testing on new features and safe rollback if issues are found.

Chaos Engineering in Production

Firstly introduced by Netflix’s ChaosMonkey, Chaos Engineering has emerged to be a separate and very popular discipline. It is not about a “simple” load testing, it is about bringing down services nodes, reducing DB shards, misconfiguring load balancers, causing timeouts – in other words messing up your production environment as badly as possible.

Winning tools in that area are tools I like to call “Chaos as a service”:

ChaosMonkey

ChaosMonkey is an open source tool by Netflix . It randomly terminates services in your production system, making sure your application is resilient to these kinds of failures.

Gremlin

Gremlin is another great tool for chaos engineering. It allows DevOps (or a chaos engineer) to define simulations and see how the application will react in different scenarios: unavailable resources (CPU / Mem), state changes (change systime / kill some of the processes), and network failures (packet drops / DNS failures).

Here are some others

Debugging and Monitoring in Production

The last but not least toolset to be briefly reviewed is monitoring and debugging tools. Debugging and monitoring are the natural next steps after testing. Testing in production provides us with real product data, that we can then use for debugging. Therefore, we need to find the right tools that will enable us to monitor and debug the test results in production.

There are some acknowledged leaders, each one of them addressing the need for three pillars of observability, aka logs, metrics, and traces, in its own way:

DataDog

DataDog is a comprehensive monitoring tool with amazing tracing capabilities. This helps a lot in debugging with a very low overhead.

Logz.io

Logz.io is all about centralized logs management – its combination with DataDog can create a powerful toolset.

New Relic

A very strong APM tool, which offers log management, AI ops, monitoring and more.

Prometheus

Prometheus is open source monitoring solution that includes metrics scraping, querying, visualization and alerting.

Lightrun

Lightrun is a powerful production debugger. It enables adding logs, performance metrics and traces to production and staging in real-time, on demand. Lightrun enables developers to securely adding instrumentation without having to redeploy or restart. Request a demo to see how it works.

To sum up, testing in production is a technique you should pursue and experiment with if you are ready for a paradigm shift from diminishing risks in pre-production to managing risks in production.

Testing in production complements the testing you are used to doing, and adds important benefits such as speeding up the release cycles and saving resources. I covered some different types of production testing techniques and recommended some tools to use. If you want to read more, check out the resources I cited throughout the blog post. Let us know how it goes!

Learn more about Lightrun and let’s chat.

The post Testing in Production: Recommended Tools appeared first on Lightrun.

7 Must-Have Steps for Production Debugging in Any Language

Lightrun Team — Tue, 04 Oct 2022 09:01:00 +0000

Debugging is an unavoidable part of software development, especially in production. You can often find yourself in “debugging hell,” where an enormous amount of debugging consumes all your time and keeps the project from progressing.

According to a report by the University of Cambridge, programmers spend almost 50% of their time debugging. So how can we make production debugging more effective and less time-consuming? This article will guide you through seven essential steps to optimize your production debugging process.

What is Production Debugging?

Production debugging identifies the underlying cause of issues in an application in a production environment. This type of debugging can also be done remotely, as it might not be practical to debug the program locally during the production phase. These production bugs are more difficult to fix as the developers might not have access to the local environment when the issues arise.

Production debugging starts with diagnosing the type of production bug and logging the application. The logging mechanism of the application is configured to send information to a secure server for further inspection.

Classical Debugging vs. Remote Debugging

In classical debugging, the function you wish to debug runs within the same system as that of the debugger server. This system can be your workstation or a network-accessible machine. Remote debugging is the process of troubleshooting an active function on a system that is reachable via a network connection.

The idea behind remote debugging is to simplify the debugging of distributed system components. Essentially, it is the same as connecting directly to the server and starting a debugging session there. If you are a VS Code user, you will know how much of a life-saver its extensions are. VS Code remote debugging extensions are no exception. In contrast, remote debugging in IntelliJ IDEA is built in.

Modern infrastructure challenges for Production Debugging

Modern infrastructure is more dispersed and consists of various mobile elements, making it more challenging to identify the issue and track the bug’s origin. The more complex the program, the more challenging it becomes to find a bug.

For example, let’s consider serverless computing. The application is detached at the base level, consisting of specialized, controlled infrastructure-hosted programmatic functions. Thus, it is nearly impossible for a developer to perform a debugging procedure under typical circumstances since the program does not execute in a local environment.

Why debug in production?

If developers followed the best programming practices precisely, an application would have no flaws by the time it was released. In such an ideal situation, there won’t be a need for debugging at the production level. However, this is frequently not the case because there are constantly minor problems and bugs that need fixing, making production debugging a continual and time-consuming process.

There are many reasons why we can’t handle these issues locally. Some of these issues won’t even occur in a local setup. Even if you can reproduce the issue in local environments, it’ll be a time-consuming and challenging task. Also, you have to very quickly solve production issues as customers are constantly engaging with the system. Therefore, the recommended solution is to do production debugging to solve production issues.

Production debugging poses various challenges, such as having to troubleshoot the app and disturbing its performance. Moreover, making changes to the program while it is running might lead to unanticipated outcomes for users and interfere with their overall user experience. You can overcome these potential troubleshooting issues with a debugging tool like Lightrun.

Don’t give up on production debugging just yet! You can take some approaches to make this process a lot easier.

7 essential tips for production debugging in any language

1. Stress test your code

Stress testing involves putting an application under extreme conditions to understand what will happen in the worst-case scenario. Testers create a variety of stress test scenarios to assess the resilience of software and fix any bugs. For example, stress testing shows how the system behaves when many users access the application simultaneously. Furthermore, it examines how the app manages simultaneous logins and monitors system performance while there is a lot of user traffic.

Stress testing may resolve these problems before you make the program available to consumers. It ensures a positive user experience even during periods of high user traffic.

2. Document external dependencies

The “README” file in the source system must include a detailed description of each external requirement. This document will be helpful to anyone who works with this program in the future and needs to understand the required resources to operate it efficiently.

3. Define your debugging scope

This step attempts to pinpoint precisely where in the app the error occurred. By determining your scope beforehand, you avoid wasting time reviewing every single line of code of an application or troubleshooting irrelevant services.

Instead, you focus on a specific part of the app where the bug may be located. Finding a minor bug in 10000 lines of code isn’t feasible, so you should aim to find bugs in the smallest possible scope.

4. Ensure all software components are running

Ensure that software components, including web servers, message servers, plugins, and database drivers, are functioning well and running the most recent updates before starting the debugging process. This ensures that no software elements are to blame for the errors. If all software components are functioning correctly, you may begin to investigate the problem by using logs.

5. Add a balanced number of logs

Logs can be inserted into code lines, making it simpler for developers to retrieve the information they need. They highlight the relevant context and statistical data that help developers anticipate and resolve issues quickly. They are especially beneficial if there is a large amount of code to read.

The entire code should have a suitable number of logs added at all levels. The more logs there are, the more data developers get, and the easier it is to detect errors. However, there should be a balance since excessive logging might overwhelm engineers with irrelevant data. So try to keep track of the smallest portion of the production.

6. Invest in a robust debugging tool

Instead of running a program directly on the processor, debuggers can have a greater degree of control over how it executes by using instruction-set simulators. It enables debuggers to pause or terminate the program in response to particular circumstances. Debuggers display the location of the error in the target application when it crashes.

Tools like Lightrun eliminate the need for troubleshooting, redeploying, or redeveloping the app, resulting in faster debugging. No time is wasted as developers can add logs, analytics, and traces to the app in real-time while it is running. Most importantly, there will be no downtime.

7. Avoid adding extra code

The ideal piece of code to add to a live application is none at all. Adding extra code can have even more significant repercussions than the bug you were trying to resolve in the first place, as you are modifying the app while customers are using it. Therefore, it should be treated as a last resort, and any added code should be carefully considered and strategically written for debugging purposes.

Production debugging doesn’t have to be a pain

It is neither possible nor feasible to deliver a bug-free program to users. However, developers should try to avoid these issues while being ready to handle them if necessary. Lightrun makes production debugging as easy as it can get by enabling you to add logs, metrics, and traces to your app in real-time, with no impact on performance. You can reduce MTTR by 60% and save time debugging to focus on what really matters: your code. Excited to try it out? Request a free demo!

The post 7 Must-Have Steps for Production Debugging in Any Language appeared first on Lightrun.

What is Kubernetes Lens?

Lightrun Marketing — Mon, 08 Nov 2021 17:44:24 +0000

As a DevOps Engineer, one day you’re performing magic in the terminal, settling clusters, and feeling like a god. On some other days, you feel like a total fraud and scam. Errors and bugs appear from everywhere, you don’t know where to start, and you don’t know where to look. Sadly, days like this come far too often. To be more specific, what often causes these bad days is none other than Kubernetes itself. While Kubernetes is the force and magic that manages your clusters, it can also be your bane.

Kubernetes is a portable, extensible open-source system for automation, deployment, scaling, and management of containerized applications and services. It is a cluster management tool that helps to abstract machines, storage, and networks away from their physical implementation. Almost everyone in the DevOps community uses Kubernetes.

However, one major problem with Kubernetes is that it comes with a vast amount of moving parts and certain complexities, such as handling clusters, scaling, storage orchestration, batch execution, and more. This all hinders mainstream developer adoption.

Another problem with Kubernetes is the use of command-line CLIs that consume and retrieve multiple files, and the use of tools like kubectl that might be good for some, but which can be overwhelming for others who may prefer GUIs.

In this article, you will learn what Kubernetes Lens is, what it does, and why it is useful.

About Kubernetes Lens – The Kubernetes IDE

Kubernetes Lens is an effective, open-source IDE for Kubernetes. Lens simplifies working with Kubernetes by helping you manage and monitor clusters in real time. It was developed by Kontena, Inc. and then acquired by Mirantis in 2020, who then open-sourced it and made it available to download for free.

Lens is a standalone application and can be installed on macOS, Windows, and some Linux flavors. With Kubernetes Lens, you can talk to any Kubernetes cluster, anywhere.

Kubernetes Lens is aimed at developers, SREs, and software engineers in general. It is most likely the only platform you will need to manage the cluster system of your Kubernetes. It is backed by a number of Kubernetes and cloud-native ecosystem pioneers such as Apple, Rakuten, Zendesk, Adobe, Google, and others.

Why Kubernetes Lens?

There are a variety of features that make Kubernetes Lens a highly attractive tool. Here is an overview of a few of them.

Cluster Management

Managing clusters in Kubernetes can be difficult, but with Kubernetes Lens, you can work on multiple clusters while maintaining context with each of them. Lens makes it possible to configure, change, and redirect clusters with one click, organizing and revealing the entire working system in the cluster while providing metrics. With this information, you can easily and very quickly edit changes and apply them confidently.

Adding a Kubernetes cluster to Lens is easy. All you need to do is point the local/online kubeconfig file to Lens and it automatically discovers and connects with it.

With Lens, you can inspect all the resources running inside your cluster, ranging from simple Pods and Deployments to the custom types added by your applications.

Built-In Visualization and Metrics

Kubernetes Lens comes with a built-in Prometheus setup that has a multi-user feature that gives role-based access control (RBAC) for each user. That means that, in a cluster, users can only access visualizations they have permission to access.

In Lens, when you configure a Prometheus instance, it is able to display metrics and visualizations about the cluster. To add Prometheus to Lens if it is not already installed, follow these steps:

Right-click on the cluster icon in the upper left corner of the UI.
Click Settings.
Under Features, find and select the Metrics stack.
Then click Install to install Prometheus stack (This may take a couple of seconds or minutes.)

After the installation, Lens autodetects Prometheus for that cluster and then begins to display cluster metrics and visualizations. You can also preview the Kubernetes manifests for Prometheus before you apply them.

With Prometheus, you get access to real-time graphs, resource utilization charts, and usage metrics such as CPU, memory, network, requests, etc., which are integrated into the Lens dashboard. These graphs and metrics are shown in the context of the particular cluster that is viewed at that moment, in real time.

Kubernetes Lens also integrates with Helm, making it easy to install and manage Helm charts and releases in Kubernetes.

Kubernetes Lens allows you to use available Helm repositories from the Artifact Hub and automatically adds a bitnami repository by default if no other repositories are already configured. If you need to add any other repositories, those can be added manually via the command line. Do note that configured Helm repositories are added globally to the user’s computer, so other processes can see those as well. All charts from configured Helm repositories will be listed in the Apps section.

Lens Extensions

Kubernetes Lens Extensions allow you to add new and custom features and visualizations to accelerate development workflows for all the technologies and services that integrate with Kubernetes. To use Lens Extensions, go to File (or Lens on macOS) and then click Extensions in the application menu. You can install extensions in three ways on Lens:

Installing the extension as a .tgz file, then dragging and dropping it in the extension management page will install it for you.
If the extension is hosted on the web, you can paste the URL and click Install, and Lens will download and install it.
You can also move the extension into your ~/.k8slens/extensions (or C:\Users\.k8slens\extensions) folder and Lens will automatically detect it and install the extension.

Kubernetes Lens also allows you to script your own extensions with the Lens APIs. They support adding new object details, creating custom pages, adding status bar items, and other UI modifications. Extensions can be published to npm to generate a tarball link that the Kubernetes Lens install screen can reference.

GUI over CLI

Lens provides a way to manage Kubernetes through GUI because managing multiple clusters across various platforms and substrates means deciphering the other complexities of multiple access contexts, modes, and methods for organizing clusters, components, nodes, and infrastructure. Solving all these from the command line is difficult, slow, and fallible. This is due especially to the constant increase in the number of clusters and applications, not to mention their configurations and requirements.

With the Kubernetes Lens GUI, you can do several things:

You can add clusters manually, by browsing through their kubeconfigs and can immediately identify kubeconfig files on your local machine.
With Lens, you can put these clusters into workgroups in whatever way you interact with them.
Lens provides visuals on the state of objects such as including Pods, Deployments, namespaces, network, storage, and even custom resources in your cluster. This makes it easy to identify and debug any issue with the cluster.

For the CLI lovers, Lens doesn’t leave you high and dry. You can also invoke its built-in terminal and execute your favorite kubectl command line.

Lens Terminal

The built-in terminal uses a version of kubectl that is API-compatible with your cluster. The terminal can:

Automatically detect your cluster version and then assign or download the correct version in the background.
Maintain the correct kubectl version and context as you switch from one cluster to another.

Integrations

Lens gives you access and allows you to work with a wide variety of Kubernetes clusters on any cloud, all from a single, unified IDE. The clusters may be local (e.g., minikube or Docker Desktop) or external (e.g., Docker Enterprise, EKS, AKS, GKE, Rancher, or OpenShift). Clusters may be added simply by importing the kubeconfig with cluster details.

Lens Spaces

Kubernetes Lens promotes teamwork and collaboration via this feature called Spaces. It is a collaborative space for cloud-native development teams and projects. With a Lens space you can:

Easily organize & access your team clusters from anywhere: GKE, EKS, AKS, on premises, or a local dev cluster.
Easily access and share all clusters in a space securely.

Cluster Connect

In Kubernetes, sharing access to the different clusters is difficult. When working as an administrator with different providers that require you to use the same tools, or when trying to get access to kubectl files, make those files work with your kubectl. Then connect the kubectl file to the same network with the target cluster API. However, you will need to use a VPN to be in the same network as the provider, and in some cases, you will also need to use different IAM providers. These are security risks because users might bypass security best practices.

Lens uses Cluster Connect to share access to the cluster without compromising the security of the cluster.

With Kubernetes Lens Spaces, you can send and receive invitation access to other clusters. All invitations are aggregated and then exposed to you using the Lens Kubernetes proxy. To access the clusters, you download the Cluster Connect agent in the desired cluster. The agent then allows you to connect to clusters from Lens Spaces using end-to-end encryption to secure connections between you and the clusters, eliminating the need for a VPN and the need for an inbound port to be enabled on the firewall. This also means you can access and work with their Kubernetes clusters easily from anywhere.

Cluster Connect is based on the BoreD OSS software. Check out the documentation to learn more about Cluster Connect.

Multiple Workspaces Management

Lens organizes clusters into logical groups called workspaces. This helps DevOps and SREs who have to manage multiple (even hundreds of) clusters. Usually, a single workspace contains a list of clusters and their full configuration.

Kubernetes Lens is one of the most effective Kubernetes UIs you’ll ever use. It supports CRD Helm 3 and it has a friendly GUI. Lens will, of course, also handle the cluster settings for you.

Recap of Key Features for Beginners

Kubernetes Lens provides situational awareness for everything that runs in Kubernetes, lowering the barrier to entry for developers just getting started. It is an ideal solution for many reasons, including:

It provides the confidence that your clusters are properly set up and configured.
There is increased visibility, real-time statistics, log streams, and direct troubleshooting facilities.
The ability to organize clusters quickly and easily totally improves productivity and business speed.
EKS, AKS, GKE, Minikube, Rancher, K0s, etc.—any Kubernetes you might be using—all work with Lens. You only need to import the kubeconfigs for the appropriate clusters.
Kubernetes Lens is built on an open source with an active community, supported by Kubernetes and cloud-native ecosystem pioneers.

Debugging Kubernetes in Production

Kubernetes Lens consists of numerous great features, as this article has shown you. It is an independent app much unlike the built-in Kubernetes dashboard. If you use Kubernetes and appreciate the variety of its GUI, then you should definitely check out Kubernetes Lens.

For developers looking to get visibility into their code regardless of environment or deployment type, from monolith to microservices, consider Lightrun.

For advanced users of the Kubernetes stack, Lens can’t provide the type of observability and debugging capabilities that Lightrun can for production applications in real-time. With Lightrun, developers can:

Troubleshoot Kubernetes easily by dynamically adding logs lines
Add as many logs as you need until you identify the problem
Multi-instance support (microservices, big data workers) using a tagging mechanism
Explore the call stack and local variables in any location in the code in the same version they occurred in
Traverse the stack just like a regular breakpoint
Add snapshots in the IDE you’re already using, easily
Need more snapshots? Add as many as you need. You’re not breaking the system

Naturally, Lightrun offers a very robust yet easy to use way of monitoring the K8S stack, which you can try out yourself.

Lightrun is a secure, developer-native observability platform that enables you to add logs, snapshots, and metrics directly into your source code or application, in any environment. It really is the next level of Kubernetes Lens, allowing you to troubleshoot Kubernetes directly from any IDE.

With Lightrun, you can debug monolith microservices, Kubernetes, Docker Swarm, Big Data, and serverless in real time. Be sure to check out Lightrun for all of your cluster management needs.

The post What is Kubernetes Lens? appeared first on Lightrun.

5 Most Common API Errors and How to Fix Them

Lightrun Marketing — Thu, 29 Jul 2021 17:17:50 +0000

As software got more complex, more and more software projects rely on API integrations to run. Some of the most common API use cases involve pulling in external data that’s crucial to the function of your application. This includes weather data, financial data, or even syncing with another service your customer wants to share data with.

However, the risk with API development lies in the interaction with code you didn’t write—and usually cannot see—that needs debugging. This makes error identification critical, so you don’t waste development time trying to fix the wrong problems.

Luckily, some errors are more common than others, and these are the best place to start looking when your API calls aren’t working as expected.

In this article, we’ll explain how you can spot these errors in your own code, fix them, and get back on track.

Using HTTP Instead of HTTPS

Security on the web is crucial. And as more and more websites adopt HTTPS over HTTP, API endpoints should do the same. If the API is developed with this potential error in mind, you should get an informative error. Informative errors tell you to access the endpoint via HTTPS rather than HTTP.

For example, a response might look like this:

This is a best-case scenario, because this error message tells you how to fix the problem: by making an HTTPS call, instead of HTTP.

However, when an API is built without this potential error case in mind, it can masquerade as other errors we’ll discuss later. With a very similar case, a less resilient API might produce the following:

500 Internal Server Error: One of the least helpful errors, 500 Internal Server Errors mean the server can’t handle the request. However, this can also happen when you pass incorrect or incomplete information to the API (or when it’s simply broken).
403 Forbidden: Depending on how the API infrastructure is set up, you might get a 403 Forbidden error. While you may have incorrect credentials, this could be the result of an undetected HTTP vs. HTTPS error as we discussed earlier.
404 Not Found: Some servers don’t have HTTP endpoints, so they return 404 errors. leading you to believe you’ve mistyped the endpoint URL or something similar.

Nowadays, most API endpoints use HTTPS, so it’s usually safe to assume you should be calling the HTTPS endpoint. If you’re not, and you get one of the errors listed above, this should definitely be one of the first things you check.

Using the Wrong HTTP Method

Even if you’ve never accessed an API method before, you use the GET method every time you access a website in your browser. But when it comes to APIs, different endpoints require a different HTTP method (GET, POST, PUT, PATCH, or DELETE) depending on what action you’re trying to complete. For example, if you’re trying to access the Twitter API to get a list of a user’s tweets, you’d likely be calling a GET endpoint. If you’re trying to tweet as that user through the API, you’d likely use a POST method.

Those are by far the most common methods, although PUT and PATCH are sometimes used to update existing records in a database that’s behind an API, for example.

As with the previous example, this can be a straightforward error to detect. If the API recognizes the route but can tell you’re using the wrong method, sometimes it will just tell you:

However, in some cases, this error can present as one we’ve already discussed:

500 Internal Server Error: If the server doesn’t know how to handle receiving the incorrect method gracefully, it may just fail completely and give you a 500 Internal Server Error. In this case you may have to look deeper into the error logs to debug.
403 Forbidden: Depending on how the server is configured, it may not allow you to access any of the endpoints with the incorrect method and will return a 403 error. You may be tempted to check whether your authentication is working correctly, when the problem may really be an incorrect method being called on the endpoint.
404 Not Found: Some API frameworks simply return a 404 error when the incorrect HTTP method is used because your request doesn’t match a known route.

When your API call returns an error, you should double check the documentation to make sure you’re using the correct HTTP method. You should do this even when the error you’re seeing is not the 405 error that explicitly indicates you’re using the wrong method.

Using Invalid Authorization

APIs beyond the most basic usually require some sort of authorization. Sometimes that’s an API key, a username and password, an OAuth token, a JSON Web Token, or a custom authentication method.

What’s important is that this authorization is provided with each and every API request. This ensures that the API knows the requester has adequate permissions for the operation being requested. When these credentials are incomplete or incorrectly formatted, the API in question can produce a variety of errors. Typically this will be the 403 Forbidden error, which tells the user they’re not allowed to access that particular resource.

In that case you should check your credentials, as well as the API documentation, to make sure they’re formatted correctly. For example, some APIs require a username and password separated by a : character, some require credentials to be base64 encoded, and some have different requirements entirely.

It’s important to make sure you’re following the parameters laid out in the documentation so that your credentials are accepted.

Caching Errors

Especially in cases where APIs are heavily used, results may be cached to improve performance for everyone who has API access. This is usually very beneficial, as it provides everyone with the ability to get data when they need it. However, there are two potential cases where this approach is problematic.

In the first case, the information from the API may be cached and outdated as a result. In this case, discuss with your team whether this caching time can be reduced—without affecting API performance.

The second—and more difficult case to debug—is when an error state is cached. This can lead to an API returning an error, even if it’s actually been resolved. To fix this, check with your API provider to see if there is a testing environment that doesn’t utilize caching.

Alternatively, double check your API call on a different machine or with a different set of credentials. You can also check your API documentation to see if there’s some cache invalidation method available. In some cases, the API cache can be invalidated manually, but this shouldn’t be done regularly as it removes the benefits of caching in the first place.

If you’re encountering an error we’ve discussed above yet believe you’ve fixed all the possible root causes, see whether the response is cached on either the API side or in your API client.

Invalid Fields

If you’re passing data to an API instead of just receiving it, it’s important to provide all the data the API expects, and (in most cases) ignore data it doesn’t support. Always read through the endpoint documentation for any API endpoint you’re trying to access, and make sure you’re passing the correct data. If you aren’t doing this, you’ll hopefully get a specific error message telling you about missing or extraneous data.

It’s also possible the API will return a 500 Internal Server Error if it cannot handle an unexpected response properly. If you encounter this error and have already run through the debugging steps for each of the previous errors, double check your data to make sure it matches the specification indicated in the API documentation.

Wrapping Up

Working with APIs can be intimidating for many because of having to interact with code they don’t directly control. This often means a developer’s more traditional debugging methods are not possible. However, with experience you’ll start to know what to look for when you encounter various API errors, allowing you to fix them faster. With these errors resolved, you’ll be powering your application with APIs and external data, and improving user experience as a result.

The post 5 Most Common API Errors and How to Fix Them appeared first on Lightrun.

Shift Left Testing: 6 Essentials for Successful Implementation

Lightrun Team — Wed, 06 Jul 2022 12:30:44 +0000

Testing can evoke polarized reactions from developers. Some love it. Some prefer never to hear of such a suggestion.

But acts of testing is necessary – especially shift left testing.

Testing is often resisted by teams that are pressured by shorter release cycles tend to forgo testing altogether in order to meet deadlines. This results in lowered quality software, which can lead to security vulnerabilities and user experience due to defects.

The concept of shift left testing is popular in DevOps circles. According to an Applause survey, 86% of teams are already using a shift left approach early in the software development life cycle. It is also reported that shift left helped reduce the number of accidental bugs released and the ROI of using a shift left approach is better than fixing broken software that’s deployed.

But what is shift left testing? and why is it a vital component for your team and project’s success?

What is Shift Left Testing?

The goal of shift left testing is to find bugs as early as possible. By doing this, developers can fix these bugs before they cause major problems. By finding bugs early on, companies can avoid the costly process of fixing these bugs after they have been deployed.

In addition to this, shift left testing can improve the quality of the software. By finding bugs early, developers can make sure that they are fixed before the software is released. Shift left testing is not a new concept. However, it has gained popularity in recent years as companies have started to realize the benefits of this methodology.

Why implement a Shift Left Testing approach?

When it comes to software development, the earlier a problem is found, the cheaper it is to fix. This is the philosophy behind shift left testing, also known as shift left development or continuous testing.

The goal of shift left testing is to move testing earlier in the software development process. By doing so, problems can be found and fixed before they have a chance to cause major issues further down the line.

There are many benefits to shift left testing, including:

Reduced costs: By finding and fixing problems early, the overall cost of development is reduced.
Improved quality: Catch problems early, the quality of the final product is improved.
Faster development: By testing continuously, development can move faster as there is no need to wait for a separate testing phase.
Increased collaboration: By involving testers early on, they can provide valuable input to the development process.
Security Remediation: Earlier detection of security issues prevents costly security breaches.

6 Essentials for successful Shift Left Testing implementation

1. Plan the testing framework

There are several ways to plan the testing framework for a shift left implementation. One approach is to use a model-based testing tool, such as Test Driven Development (TDD) or Behavior-Driven Development (BDD). These tools allow developers to write tests that are based on the behavior of the software, rather than its implementation. This makes it easier to catch errors early before they make it into the code.

2. Define coding standards

As organizations begin to adopt a shift left mentality in their development processes, it becomes increasingly important to establish and enforce coding standards. These standards help to ensure that code is consistent, readable, and maintainable.

It is also important to establish a mechanism for enforcing the coding standards. This enforcement can be done through a combination of tools and processes. Tools such as static analysis can be used to check code for compliance with the defined standards automatically. Processes such as code review can also be used to ensure that code meets the defined standards.

3. Develop a feedback process

In order to ensure that a shift left implementation is successful, it is important to develop a feedback process. This feedback process should include input from all stakeholders, including those who will be using the new system, as well as those who will be responsible for maintaining it. It is also important to solicit feedback at different stages of the implementation process, in order to get a comprehensive picture of how the system is working.

Once the feedback process is in place, it is important to act on the feedback received. This may mean making changes to the system or providing additional training to users. It is also important to keep track of the feedback received, in order to identify any trends.

4. Automate testing where possible

There are many benefits to automating testing, particularly for a shift left implementation. Automated testing can help to speed up the testing process, as well as improve the accuracy of the results. It can also help to reduce the number of manual tests that need to be carried out, which can free up time for other activities.

When deciding which tests to automate, it is important to consider the ROI (return on investment). Automating a test that is run frequently and takes a long time to complete can save a lot of time and money in the long run. Conversely, automating a test that is only run occasionally may not be worth the effort.

There are a number of tools available to help with automated testing. These include open source tools such as Selenium and Watir, and commercial tools such as HP UFT (Unified Functional Testing). The choice of tool will depend on the technology being used and the preferences of the team.

5. Bring testers into the initial SD stages

Bringing testers into the initial stages of development can be a challenge, but it is essential for a successful shift left implementation. Here are a few tips for getting started:

Define what you need from testers.

Before you can bring testers into the fold, you need to know what you need from them. What types of testing will they be doing? What are your expectations for turnaround time? By being clear about your needs upfront, you can avoid any confusion or frustration later on.

Communicate the benefits of shift left.

Once you know what you need from testers, make sure to communicate the benefits of shift left to them. Explain how moving testing earlier in the process can help to identify and resolve issues more quickly. emphasize how this will ultimately lead to a better experience for users.

Train testers on the new process.

If you want testers to be successful in the new process, it is important to provide them with training. Teach them about the different types of testing that will be conducted and how to best go about it. Also, be sure to give them plenty of opportunities to practice so that they are comfortable with the new process before go-live.

Set up a feedback loop.

As with any new process, it is important to set up a feedback loop so that you can continuously improve. Make sure to solicit feedback from testers after each round of testing. What went well? What could be improved? By constantly tweaking and improving the process, you can ensure that it is as effective as possible.

Bringing testers into the initial stages of development can be a challenge, but it is essential for a successful shift left implementation. By following these tips, you can set your team up for success.

6. Make use of observability platforms

Observability platforms like Lightrun are tools that help developers monitor their applications in real-time, identify issues, and quickly fix them.

There are many different observability platforms available, each with its own set of features. However, all observability platforms share some common features that can be used to implement a shift left strategy.

Some of the most common features of observability platforms include:

Logging: Collect and store application logs for analysis.
Monitoring: Monitor application performance and identify issues in real-time.
Tracing: Trace the path of a request through the application to identify bottlenecks.
Alerting: Set up alerts to notify developers of issues as they happen.

Summary

It’s easy to get lost in the benefits of shift left testing – from shorter timelines, fewer bugs, increased software integrity, and lowered long-term costs – shift left testing is a vital component in a devOps team toolkit.

There are many platforms that implement different processes inside a shift left methodology with Lightrun being the platform for implementing observability.

Why leave the surface bubbling up of bugs up to chance? It’s easy to be proactive and give your devOps team internal visibility with the help of Lightrun.

The post Shift Left Testing: 6 Essentials for Successful Implementation appeared first on Lightrun.