Best Practices archives - Lightrun

Observability vs. Monitoring

Lightrun Team — Sat, 21 May 2022 10:09:26 +0000

Although all code is bound to have at least some bugs, they are more than just a minor issue. Having bugs in your application can severely impact its efficiency and frustrate users. To ensure that the software is free of bugs and vulnerabilities before applications are released, DevOps need to work collaboratively, and effectively bridge the gap between the operations, development and quality assurance teams.

But there is more to ensuring a bug-free product than a strong team. DevOps need to have the right methods and tools in place to better manage bugs in the system.

Two of the most effective methods are monitoring and observability. Although they may seem like the same process at a glance, they have some apparent differences beneath the surface. In this article, we look at the meaning of monitoring and observability, explore their differences and examine how they complement each other.

What is monitoring in DevOps?

In DevOps, monitoring refers to the supervision of specific metrics throughout the whole development process, from planning all the way to deployment and quality assurance. By being able to detect problems in the process, DevOps personnel can mitigate potential issues and avoid disrupting the software’s functionality.

DevOps monitoring aims to give teams the information to respond to bugs or vulnerabilities as quickly as possible.

DevOps Monitoring Metrics

To correctly implement the monitoring method, developers need to supervise a variety of metrics, including:

Lead time or change lead time
Mean time to detection
Change failure rate
Mean time to recovery

Deployment frequency

What is Observability in DevOps?

Observability is a system where developers receive enough information from external outputs to determine its current internal state. It allows teams to understand the system’s problems by revealing where, how, and why the application is not functioning as it should, so they can address the issues at their source rather than relying on band-aid solutions. Moreover, developers can assess the condition of a system without interacting with its complex inner workings and affecting the user experience. There are a number of observability tools available to assist you with the software development lifecycle.

The Three Pillars of Observability

Observability requires the gathering and analysis of data released by the application’s output. While this flood of data can become overwhelming, it can be broken down into three fundamental data pillars developers need to focus on:

1. Logs

Logs refer to the structured and unstructured lines of text an application produces when running certain lines of code. The log records events within the application and can be used to uncover bugs or system anomalies. They provide a wide variety of details from almost every system component. Logs make the observability process possible by creating the output that allows developers to troubleshoot code by simply analyzing the logs and identifying the source of an error or security alert.

2. Metrics

Metrics numerically represent data that illustrates the application’s functioning over time. They consist of a series of attributes, such as name, label, value, and a timestamp that reveals information on the system’s overall performance and any incidents that may have occurred. Unlike logs, metrics don’t record specific incidents but return values representing the application’s overall performance. In DevOps, metrics can be used to assess the performance of a product throughout the development process and identify any potential problems. In addition, metrics are ideal for observability as it’s easy to identify patterns gathered from various data points to create a complete picture of the application’s performance.

3. Trace

While logs and metrics provide enough information to understand a single system’s behavior, they rarely provide enough information to clarify the lifetime of a request located in a distributed system. That’s where tracing comes in. Traces represent the passage of the request as it travels through all of the distributed system’s nodes.

Implementing traces makes it easier to profile and observe systems. By analyzing the data the trace provides, your team can assess the general health of the entire system, locate and resolve issues, discover bottlenecks, and select which areas are high-value and their priority for optimization.

Monitoring vs. Observability: What’s the Difference?

We’ve compiled the below table to better distinguish between these two essential DevOps methods:

Monitoring	Observability
Practically any system can be monitored	The system has to be designed for observation
Asks if your system is working	Asks what your system is doing
Includes metrics, events, and logs	Includes traces
Active (pulls and collects data)	Passive (pushes and publishes data)
Capable of providing raw data	Heavily relies on sampling
Enables rapid response to outages	Reduces outage duration
Collects metrics	Generates metrics
Monitors predefined data	Observes general metrics and performance
Provides system information	Provides actionable insights
Identifies the state of the system	Identifies why the system failed

Observability vs. Monitoring: What do they have in common?

While we’ve established that observability and monitoring are entirely different methods, this doesn’t make them incomparable. On the contrary, monitoring and observability are generally used together, as both are essential for DevOps. Despite their differences, their commonalities allow the two methods to co-exist and even complement each other.

Monitoring allows developers to identify when there is an anomaly, while observability gives insights into the source of the issue. Monitoring is almost a subset of, and therefore key to, observability. Developers can only monitor systems that are already observable. Although monitoring only provides solutions for previously identified problems, observability simplifies the DevOps process by allowing developers to submit new queries that can be used to solve an already identified issue or gain insight into the system as it is being developed.

Why both are essential?

Monitoring and observability are both critical to identifying and mitigating bugs or discrepancies within a system. But to fully utilize the advantages of each approach, developers must do both thoroughly. Manually implementing and maintaining these approaches is an enormous task. Luckily, automated tools like Lightrun allow developers to focus their valuable time and skills on coding. The tool enables developers to add logs, metrics, and traces to their code without restarting or redeploying software in real-time, preventing delays and guaranteeing fast deployment.

The post Observability vs. Monitoring appeared first on Lightrun.

Top 9 Observability Tools in 2022

Lightrun Team — Sat, 21 May 2022 05:17:37 +0000

Cloud infrastructure is becoming more useful for companies but also more complex. DevOps methods have become a critical way of maintaining control over the increasingly robust infrastructure. CEOs across industries are working on implementing the DevOps methodology to ensure functional cloud management, and one of the most effective practices is observability.

With over 74% of CEOs expressing concerns that increased complexity will continue to lead to performance management difficulties, it’s clear that investing in observability tools is a must. But with the wide range of tools available on the market, how do you choose the tool that’s right for you? And what features are more suitable for your organization’s needs? This article lists the top 9 observability tools of 2022 to help you make the best choice for your business.

What is an Observability Tool?

As infrastructure becomes increasingly complex, observability grows more challenging. Observability tools perform the tasks required for observability, including monitoring systems and applications through monitors and logs. In contrast to individual monitoring tools, observability tools allow organizations to receive constant insights and feedback from their systems. Organizations receive actionable insights into their business faster than they would from tools focusing solely on monitoring or logging. Observability tools allow organizations to understand system behavior, giving them the information they need to prevent system problems by predicting them before they occur.

Features to look for in Observability Tools

Before we look at specific tools, let’s examine some of the features you should look for when choosing the right observability tool for your organization. Some features to have in mind include:

A dashboard that provides monitoring services, such as a clear view of your system
Alerts in case of events or anomalies
Tracking abilities to track significant events
Long term monitoring with comparisons, allowing the system to detect anomalies
Automated issue detection and analysis
Event logging for speedy resolution
The ability to use SLA tracking to measure meta-data and data quality compared to pre-set standards

Top 9 Observability Tools for 2022

The market for observability tools continues to grow, and the variety of choices can become overwhelming. We’ve collected the top nine tools, divided into four categories: shift-left observability, serverless monitoring, incident response, and application performance monitoring.

Shift-Left Observability

The shift-left concept refers to taking processes traditionally performed during later stages of the product lifecycle and implementing them earlier on. Shift-left observability simply means implementing observability practices earlier into the product’s lifecycle.

1. Dynatrace

Dynatrace is a more comprehensive SaaS tool that addresses a wide range of monitoring services, particularly for large-scale organizations. The system uses an AI engine called Davis to automate anomaly detection and root cause analysis services. While pricing may vary depending on the package, it starts at $69 per month for 8 GB per host if billed annually.

Dynatrace’s AI and advanced anomaly detection tools have made it a popular option for large organizations looking to monitor complex infrastructure while quickly detecting vulnerabilities. Unfortunately, the solution does have its downsides, such as being on the more expensive end of observability solutions and lacking updated technical documentation.

2. Lightrun

Lightrun offers a developer-native observability platform that allows users to add logs, metrics, and traces to production and staging. It gives you full observability of your infrastructure, enabling you to quickly detect and mitigate any potential issues without adding extra code. The solution’s logs and metrics can be added in real-time and even while the product is running.

The solution offers a free 14-day trial and affordable pricing. It is already used by companies like Nokia, Taboola, DealHub, and WhiteSource. Lightrun facilitates early debugging in real-time for various systems, from monolith applications to microservices.

Serverless Monitoring

Serverless monitoring allows users to only access infrastructure and resources as they need them instead of pre-purchasing unnecessary server capacity from the get-go. By using serverless monitoring, organizations save money as they only pay for the resources they use.

3. Lumigo

Lumigo is a solution that builds a virtual stack trace of all the services that participate in a process. The tool presents all the data it gathers in a clear visual map with search and filter capabilities, allowing organizations to identify and mitigate issues quickly. Its features include creating data visibility across infrastructure and giving organizations the data they need to remove bottlenecks. A free version is available, and paid versions begin at $99 per month.

Although the system offers many unique capabilities and visual tools such as graphs and timelines, some reports are oversaturated, and it is difficult to sort through for relevant information. The solution is still in its early stages and therefore missing some crucial capabilities.

Incident Response

Incident response is the process used by DevOps, IT, and dev teams to manage any issues or incidents, including damage control and prevention. It generally includes a guideline that delineates the response to follow in the event of an incident.

4. Lightstep

Lightstep is a solution that collects data and presents it clearly and concisely, allowing users to monitor their applications and respond to any unusual changes or anomalies. Lightstep enables users to minimize the effects of outages and other crises on operations. The company offers a free option and group prices starting at $100 per active service per month.

Lightstep provides clear visibility into the required tasks and gives teams insight into what they need to prioritize. Some users report that the solution can perform somewhat slowly at times and that the mobile application doesn’t perform as well as the desktop app.

Application Performance Monitoring

Applications performance monitoring allows organizations to monitor their IT environment, assess whether it meets performance standards, and identify bugs and other potential problems. This allows organizations to upgrade their performance and offer a stellar user experience.

5. Anodot

Anodot’s solution uses machine learning software to constantly assess and compare performance, allowing it to provide real-time anomaly alerts and even predictions of anomaly sources. The solution enables businesses to cut their detection and manage issues faster. Anodot states that most users miss out on 85% of their usable data and claim to minimize detection and resolution time by 80%.

Anodot services are used by major tech companies such as Payoneer and TripAdvisor. The system offers a variety of payment plans and a free demo. Although it provides many valuable features, the system’s UI has room for improvement, and its algorithm is not always accurate.

6. Datadog

Datadog is a monitoring, security, and analytics platform designed for developers, security engineers, IT operations teams, and business users who interact with the cloud. The SaaS platform automates performance monitoring, infrastructure monitoring, and log management. The platform provides users with real-time observability across their entire architecture.

The solution offers a free version, a free trial, and two pricing options that allow users to pay per host starting at $15 or per log, starting at $1.27 per million log events. Datadog is trusted by Shell, Samsung, 21st Century Fox, and many well-known corporations. Despite its many benefits, the system can be challenging to navigate, and the documentation is not always up to par.

7. Grafana

Grafana is an observability tool that creates reports and usage insights for developers and builds dashboards to make data easily viewable and readable. Grafana is trusted by several large corporations, including Siemens, eBay, and PayPal. The platform can be used in conjunction with other similar platforms, including Datadog and Dynatrace, and can report on these platforms’ performance.

A free version of the platform is available, and paid versions start at $8 for a single user. Although the tool is free and includes features such as an alert and notification system, the platform has limited dashboard designs and organization.

8. Honeycomb

Honeycomb is an analysis tool that allows developers to identify application issues quickly. The platform also gives developers the ability to resolve problems using the same interface. The solution enables teams to understand their software better, simplifying the debugging and upgrading process and allowing the team to resolve issues more quickly.

The platform offers a free plan for individuals and a 14-day free trial for enterprises. It is excellent for analyzing systems and identifying the source of incidents, but it is less effective for traditional monitoring purposes.

9. New Relic

New Relic’s platform is designed to speed up the repair process and reduce downtime, increasing productivity and allowing engineers to focus on enhancing application performance. The system is easy to set up and offers real-time analytics to help developers troubleshoot their applications. The platform is flexible and can even provide teams with guidelines offering response suggestions.

The company offers a variety of pricing plans, including a free program and several plans that require contact with the company for pricing details. The system’s application monitoring and infrastructure monitoring stand out for their effectiveness. Still, the system is less effective as a proactive monitoring system and tends to send false alarms.

Observability Is Essential

Observability tools are critical to monitoring growing and increasingly more complex infrastructure. While choosing the correct monitoring tool can be complicated, finding the most suitable one to meet your organization’s needs can streamline your monitoring and maintenance processes. Lightrun offers a solution that provides both monitoring and observability services. To see for yourself, request a Lightrun demo today.

The post Top 9 Observability Tools in 2022 appeared first on Lightrun.

Dynamic Observability Tools for API Live Debugging

Eran Kinsbruner — Wed, 14 Jun 2023 16:27:36 +0000

Intro

Application Programming Interfaces (APIs) are a crucial building block in modern software development, allowing applications to communicate with each other and share data consistently. APIs are used to exchange data inside and between organizations, and the widespread adoption of microservices and asynchronous patterns boosted API adoption inside the application itself.

The central role of APIs is also evident with the emergence of the API-first approach, where the application’s design and implementation start with the API, thus treating APIs as first-class citizens and developing reusable and consistent APIs.

In the last decade, Representational state transfer (REST) APIs have come to dominate the scene, becoming the predominant API technology on the web. REST is more of an architectural approach than a strict specification: This free-formedness is probably the key to REST success as it has been essential in making REST popular and one of the critical enablers of a loose coupling between API providers and consumers. However, sometimes this bites back as a lack of consistency in the API behavior and interface. This is sometimes alleviated using specification frameworks like OpenAPI or JSON Schema.

Also, it’s worth pointing out the role of developers in designing and consuming APIs, as frequently, the development of an API requires strict collaboration between backend developers, frontend developers, and mobile developers since the role of API is the integration of different applications and systems.

Challenges in API integration

Despite being central to modern application development, API integration remains challenging. Those challenges mainly originate from the fact that the systems connected by APIs form a distributed system, with the usual complexities involved in distributed computing. Also, the connected systems are mostly heterogeneous (different tech stacks, data models, ownership, hosting, etc.), leading to integration challenges. Here are the most common ones:

Incorrect data. Improper data formatting or conversion errors (due to inaccurate data type or incompatible data structures) can cause issues with the exchanged data. This often results in malformed JSON, errors in deserialization, and type casting errors.
Lack of proper documentation. Poorly documented endpoints may require extensive debugging to infer data format or API behavior. This is particularly problematic when dealing with third-party services without access to the source code or the architecture.
Incorrect or unexpected logic or behavior. The loosely defined REST model does not allow for specifying the callee behavior formally, or such behavior can be undocumented or implemented wrong for some edge cases.
Poor query parameter handling. Query parameters are the way for the callee to modify the provided results. Often, edge cases arise where parameters are not handled correctly, requiring a trial-and-error debugging process.
Error handling. Even if HTTP provides the basic mechanism of response codes for error handling, each API implementation tends to customize it, either using custom codes or adding JSON error messages. Error handling is not always coherent, even between different endpoints on the same system, and it may be undocumented.
Authentication and authorization errors. The way in which authorization is handled on the API producer can generate errors and unexpected behavior, sometimes manifesting incoherence between different endpoints on the same system.

Errors can be present on the provider side or the consumer side. On the provider side, we often cannot intervene in the implementation, which necessitates implementing workarounds on the consumer side.

For errors on the consumer (wrong deserialization, incorrect handling of pagination, or states, etc.), troubleshooting usually involves examining logs for request/response patterns and adding logs to examine parameters and payloads.

Lightrun Dynamic Observability for API debugging

Ligthrun‘s Developer Observability Platform implements a new approach to observability by overcoming the difficulties of troubleshooting applications in a live setting. It enables developers to dynamically instrument logs for applications that run remotely on a production server by adding logs, metrics, and virtual breakpoints, without the need to code changes, redeployment, or application restarts.

In the context of API debugging, the possibility of debugging on the production environment provides significant advantages, as developers do not need to reproduce locally the entire API ecosystem surrounding the application, which can result difficult: think, for example, to the need to authenticate to third-parties API, or to provide a realistic database to operate the application locally. Also, it is only sometimes possible to reproduce realistic API calls locally, as the local development environment tends to be simplified with respect to the production one.

Lightrun allows debugging API-providing and consuming applications directly on the live environment, in real-time and on-demand, regardless of the application execution environment. In particular, Lightrun makes it possible to:

Add dynamic logs. Adding new logs without stopping the application allows obtaining the relevant information for the API exchange (request/response/state) without leaving the IDE and without losing the state (for example, authentication tokens, complex API interactions, pagination, and real query parameters). It’s also possible to log conditionally only when a specific code-level condition is true, for example, to debug a particular API edge case taken out from a high number of API requests.
Take snapshots. Adding virtual breakpoints that can be triggered on a specific code condition to show the change in time of request parameters and response payloads.
Add Lightrun metrics for method duration and other insights. It makes it possible to measure the execution times of APIs and count the time a specific endpoint is being called.

Lightrun is integrated with developer IDEs, making it ideal for developers, as it allows them to stay focused on their local environment. Doing so, Lightrun works as a debugger that works everywhere the application is deployed, allowing for a faster feedback loop during the API development and debugging phases.

Bottom Line

Troubleshooting APIs returning incorrect data or behaving erratically is essential to ensure reliable communication between systems and applications. By understanding the common causes of this issue and using the right tools and techniques, developers can quickly identify and fix API problems, delivering a better user experience and ensuring smooth software operations. Lightrun is a developer observability platform giving backend and frontend developers the ability to add telemetry to live API applications, thus representing an excellent resolution to API integration challenges. Try it now on the playground, or book a demo!

The post Dynamic Observability Tools for API Live Debugging appeared first on Lightrun.

Expert Guide to IntelliJ License Server

Jura Gorohovsky — Wed, 16 Dec 2020 17:52:26 +0000

JetBrains is a world-class vendor of developer tools that are loved by millions of geeks. IntelliJ IDEA, ReSharper, PhpStorm, PyCharm, and WebStorm are all JetBrains products that have become household names in their respective developer communities.

As development teams grow and get more diverse, companies start to purchase more subscriptions to JetBrains tools. However, buying subscriptions is just the first step. Engineering teams need to distribute licenses among existing developers, provide licenses to new developers as they come on board, and revoke licenses from developers as they leave or switch to a different technology stack.

License distribution takes time and effort, and neglecting license management leads to confusion, downtime, and overspending.
Fortunately, JetBrains provides a set of tools that take the pain away from license management. One of these tools is JetBrains Floating License Server.

If you are an engineering manager or CTO in an organization using JetBrains tools, this guide is for you. Here’s what we’ll cover in this comprehensive guide to JetBrains License Server:

A quick summary of what JetBrains License Server is
Reasons to consider using License Server
Common development team scenarios that License Server addresses
Your next steps, if you decide that your company needs to use License Server

What Is JetBrains License Server?

JetBrains Floating License Server (sometimes called “IntelliJ License Server” or just “License Server”) helps dynamically distribute licenses between JetBrains products used in your company. This means you do not need to issue and revoke licenses manually.

License Server gives you better control over product usage with features such as whitelists, blacklists, and priority lists. Additionally, it monitors the adoption of JetBrains tools in your company, letting you know how many licenses are currently in use, and how many are available.

License Server is a free on-premises application that you can install in your company’s internal network. It’s available to companies that have 50+ commercial subscriptions to any JetBrains products that are part of All Products Pack, namely:

Integrated development environments: IntelliJ IDEA Ultimate, WebStorm, PhpStorm, PyCharm, Rider, CLion, GoLand, DataGrip, RubyMine, and AppCode.
Visual Studio extensions: ReSharper, ReSharper C++, and dotCover.
Profilers: dotTrace and dotMemory.

Finally, License Server has recently started to support licenses to third-party plugins to JetBrains tools that are distributed via the JetBrains Marketplace. However, it does not support JetBrains team tools such as YouTrack, TeamCity, or Space.

If you work in a company with fewer than 50 commercial subscriptions, you’ll need to manually assign and revoke licenses. It’s recommended to look at JetBrains Account and see what it can do in terms of license management.

Why Would Your Company Need License Server?

If your company has a stable in-house developer workforce where each developer uses only one JetBrains product, you may not need License Server at all. However, License Server gets increasingly useful when:

Demand for licensed JetBrains tools changes over time
Many users only need part-time access to JetBrains tools
Developers often switch between different code bases
Your company uses contractors, consultants, or temporary employees

Let’s consider a few typical scenarios for License Server usage.

Use Case 1: One JetBrains Product Is Used Every Day, Other Products Are Used as Needed

Let’s say that each developer on your team uses one JetBrains tool as their primary development environment but occasionally uses different JetBrains tools. For example, on .NET development teams, it’s common for developers to use the following setup that includes multiple JetBrains tools:

JetBrains ReSharper is used with Microsoft Visual Studio for regular development activities like reading, writing, debugging, and refactoring code.
JetBrains dotTrace (performance profiler) and dotMemory (memory profiler) are used sparingly to investigate occasional performance problems or memory leaks.

If all developers on a team use All Products Pack or dotUltimate subscriptions, they’re covered for both ReSharper and profilers. But if the company’s procurement practices are focused on cost savings, this may not be the case. For example, if a team has 70 developers, it may have:

65 ReSharper subscriptions (ReSharper is included, profilers are not).
5 dotUltimate subscriptions (both ReSharper and profilers are included).

In a setup like this, how can you make sure that when a performance problem or memory leak occurs, any developer can use dotTrace or dotMemory?

Without License Server: licenses that include the profilers would be provided to a fixed group of 5 developers, and you’d have to manually reassign licenses using JetBrains Account.
With License Server: any developer who wants to use a profiler to investigate a performance problem can do it without any administrative overhead. They simply launch dotTrace or dotMemory, and License Server takes care of selecting the license that includes the profiler.

This setup only works as long as profilers are used by no more than five developers at the same time. But if concurrent profiler usage grows, you can buy additional dotUltimate subscriptions.

Use Case 2: Splitting Time Between New and Old Codebases

A related scenario is when a team writes a new application while maintaining a legacy application, and the two applications use different tech stacks. For example, let’s say the new application is developed in Node.js and React, and the legacy application uses Spring Boot.

Developers spend most of their time developing the new application in JetBrains WebStorm. From time to time, they need to update the legacy application’s codebase for maintenance purposes, and they use IntelliJ IDEA for this. Do all developers on the team need both WebStorm and IntelliJ licenses? Probably not. Instead, you can:

Provide all developers with WebStorm licenses for their main line of work.
Add a few IntelliJ licenses to maintain the legacy application.
Make the licenses available via IntelliJ License Server so that whoever occasionally needs IntelliJ IDEA can automagically obtain a license.

As soon as a maintenance task on the legacy codebase is completed and the developer closes PhpStorm, the license returns to the license pool and becomes available for use by any other developer.

Use Case 3: Providing Licenses to Part-Time Users

Another related scenario is provisioning licenses to employees who aren’t exactly full-time developers. For example, your team might have:

DevOps engineers who use a repository of automation scripts that are stable and don’t need regular maintenance.
Product Marketing Managers who change the copy on your website once in a while.
Security experts who are mostly focused on code review but sometimes need to refactor pieces of vulnerable code.

Having a License Server with a few JetBrains All Products Pack licenses available to these employees will help you reduce costs. Rather than purchasing licenses for each of them, you reassign the existing licenses to team members as they need them.

Use Case 4: Providing Licenses to Contractors, Consultants, or Interns

Even engineering teams with in-house developers might send some development tasks to freelance contractors. This is quite common when you need temporary help on a project or extra hands to help get a product to launch faster.

When you find a contractor, you may want to give them access to your VCS repository and provide them with JetBrains tools so they can effectively work with the rest of your team. If you take this approach, give them a link to your IntelliJ License Server so that they can use your company’s licenses.

When their contract expires, you simply revoke access to your License Server (along with any other internal resources that you make available to contractors). The license then goes back to the pool, ready to be used by the next freelancer you bring on board.

Similarly, you can use License Server to provide licenses to students who join your company for summer internships. In this scenario, the students come in batches and spend a few months on their projects. By using IntelliJ License Server, it takes a bit of an administrative burden off you.

Common Questions About JetBrains Floating License Server

Q. What impact does remote work have on using License Server? Is it more or less important for distributed teams and why?

A. Remote work is unlikely to be a factor in your decision to use License Server. Whether developers are co-located or distributed, they usually connect to a corporate VPN anyway – which is where License Server is usually set up for them.

Q. Is there a SaaS version of License Server?

A. Not right now, although based on this job opening, JetBrains is considering a SaaS version of License Server in the future. Integrating its functionality into JetBrains Account feels like the most natural path forward.

Q. My company uses IntelliJ License Server to manage pre-subscription JetBrains licenses purchased before 2015. Is this the same thing?

A. No, the current version of License Server is very different. See this deprecation notice for more details.

Make sure to also check out the FAQ section in JetBrains License Server docs.

Setting Up JetBrains Floating License Server

JetBrains does a great job documenting how to install, configure, and maintain License Server. Instead of diving deep into this, I’ll give you a quick checklist of the major steps you need to make IntelliJ License Server available in your company.

Contact the JetBrains Sales team to enable License Server for your company’s JetBrains Account.
If your company’s hardware meets system requirements (make sure you check out both Windows and Linux/macOS requirements), install and start License Server.
Register your License Server with your company’s JetBrains Account.
Add licenses from JetBrains Account to License Server.
Configure usage reporting and notifications.
Let your developers know that they can now use License Server.

Once developers have access to your server, they’ll need to configure each of their JetBrains products with License Server.

License Server Configuration

Anyone on your team can now start a supported JetBrains product and configure it to fetch licenses from your License Server installation.

Let’s see how to do this in PyCharm, IntelliJ IDEA, or any other JetBrains IDE:

On the main menu, select Help | Register.
Set Get license from: to License server.
If you know the URL of your company’s License Server, paste it into the Server address: field. If you don’t, click Discover Server to make PyCharm lookup License Server on your local network.
Click Activate.

For a more detailed procedure, see the Register section of your JetBrains IDE documentation. This page also shows you how to configure License Server discovery for silent IDE installations.

The process is slightly different for other JetBrains tools. Here’s how you make ReSharper, dotTrace, or dotCover connect to JetBrains License Server:

On the main menu, select ReSharper | Help | License Information.
In the License Information dialog, select Use license server.
ReSharper will try to auto-detect your License Server, but if it fails, click the + icon to specify your License Server’s URL.

For more details, see License Information dialog and Specify License Information in ReSharper docs. Among other things, the latter article describes differences between floating and permanent license tickets. Permanent tickets are useful if you need to use JetBrains products without a steady connection to your License Server (for example, if you’re about to catch a flight).

Wrapping Up

JetBrains Floating License Server is a great free tool for streamlining license distribution in large development teams.

License Server is worth exploring if:

Your company uses 50 or more subscriptions to JetBrains IDEs, extensions, profilers, or third-party plugins.

And

You want to stop thinking about provisioning licenses every time someone joins or leaves a team.

Even if your company doesn’t qualify for License Server yet, don’t forget to check out what JetBrains Account can do for you. It can help you with other license-management tasks like bulk-assigning licenses, revoking and reassigning licenses, and managing auto renewals.

The post Expert Guide to IntelliJ License Server appeared first on Lightrun.

IllegalArgumentException in Java

Jura Gorohovsky — Wed, 11 May 2022 12:21:15 +0000

Let’s look at IllegalArgumentException, which is one of the most common types of exceptions that Java developers deal with.

We’ll see when and why IllegalArgumentException usually occurs, whether it’s a checked or unchecked exception, as well as how to catch and when to throw it. We’ll use a few examples based on common Java library methods to describe some of the ways to handle IllegalArgumentException.

When and why does IllegalArgumentException usually occur in Java?

IllegalArgumentException is a Java exception indicating that a method has received an argument that is invalid or inappropriate for this method’s purposes.

This exception is normally used when further processing in the method depends on the invalid argument and can not continue unless a proper argument is provided instead.

IllegalArgumentException is commonly used in scenarios where the type of a method’s parameter is not sufficient to properly constrain its possible values. For example, look for an IllegalArgumentException whenever a method expects a string argument that it then internally parses to match a specific pattern.

Is IllegalArgumentException checked or unchecked?

IllegalArgumentException is an unchecked Java exception (a.k.a. runtime exception). It derives from RuntimeException, which is the base class for all unchecked exceptions in Java.

Here’s the inheritance hierarchy of IllegalArgumentException:

Throwable (java.lang)
    Exception (java.lang)
        RuntimeException (java.lang)
            IllegalArgumentException (java.lang)

Because IllegalArgumentException is an unchecked exception, the Java compiler doesn’t force you to catch it. Neither do you need to declare this exception in your method declaration’s throws clause. It’s perfectly fine to catch IllegalArgumentException, but if you don’t, the compiler will not generate any errors.

What Java exceptions are related to IllegalArgumentException?

IllegalArgumentException is the most generic in a group of exceptions that indicate incorrect input data. It has a lot of inheritors in the JDK that represent more specific input errors. These include:

IllegalFormatException and its inheritors that are thrown when illegal syntax or format specifiers are detected in a format string.
InvalidPathException that is thrown when a string that is expected to represent a file system path can’t be converted into an object of type Path because it contains invalid characters.
NumberFormatException that is thrown when an attempt to convert a string to a numeric type fails because the string has an incompatible format.

In OpenJDK 17, the full list of exceptions derived from IllegalArgumentException is as follows:

IllegalArgumentException (java.lang)
    CSVParseException (com.sun.tools.jdeprscan)
    IllegalChannelGroupException (java.nio.channels)
    IllegalCharsetNameException (java.nio.charset)
    IllegalFormatException (java.util)
        DuplicateFormatFlagsException (java.util)
        FormatFlagsConversionMismatchException (java.util)
        IllegalFormatArgumentIndexException (java.util)
        IllegalFormatCodePointException (java.util)
        IllegalFormatConversionException (java.util)
        IllegalFormatFlagsException (java.util)
        IllegalFormatPrecisionException (java.util)
        IllegalFormatWidthException (java.util)
        MissingFormatArgumentException (java.util)
        MissingFormatWidthException (java.util)
        UnknownFormatConversionException (java.util)
        UnknownFormatFlagsException (java.util)
    IllegalSelectorException (java.nio.channels)
    IllegalThreadStateException (java.lang)
    InvalidKeyException (javax.management.openmbean)
    InvalidOpenTypeException (javax.management.openmbean)
    InvalidParameterException (java.security)
    InvalidPathException (java.nio.file)
    InvalidStreamException (com.sun.nio.sctp)
    KeyAlreadyExistsException (javax.management.openmbean)
    NumberFormatException (java.lang)
    PatternSyntaxException (java.util.regex)
    ProviderMismatchException (java.nio.file)
    SAGetoptException (sun.jvm.hotspot)
    UnresolvedAddressException (java.nio.channels)
    UnsupportedAddressTypeException (java.nio.channels)
    UnsupportedCharsetException (java.nio.charset)

How to catch IllegalArgumentException in Java

Since IllegalArgumentException is an unchecked exception, you don’t have to handle it in your code: Java will let you compile just fine.

In many cases, instead of trying to catch IllegalArgumentException, you can simply check that a value falls in the expected range before passing it to a method.

If you do choose to handle IllegalArgumentException with a try/catch block, depending on your business logic, you may want to substitute the offending argument with a default value, or modify the offending argument to make it fall inside the expected range.

When you handle IllegalArgumentException, note that it doesn’t provide any specialized methods other than those inherited from RuntimeException and ultimately from Throwable. When catching and handling an IllegalArgumentException, as with any other Java exception, you most commonly use standard Throwable methods like getMessage(), getLocalizedMessage(), getCause(), and printStackTrace().

Let’s look at an example of how you can get an IllegalArgumentException when working with common Java library code. When handling it, we’ll log the exception and retry with a default value instead of an incorrect argument.

IllegalArgumentException example 1: Unrecognized log level in Java logging API

When you work with Java’s core logging API defined in module java.util.logging, you either use predefined log levels (SEVERE, WARNING, FINE, FINER, etc.) or provide a string that is then parsed to match a known log level or an integer:

String logLevel = "SEVERE";
LOGGER.log(Level.parse(logLevel), "Processing {0} entries in a list", list.size());

If you make a typo specifying a log level in a string — for example, pass in the string "SEVER" instead of "SEVERE" — the logger will not be able to parse the level and will throw IllegalArgumentException:

Exception in thread "main" java.lang.IllegalArgumentException: Bad level "SEVER"
    at java.logging/java.util.logging.Level.parse(Level.java:527)
    at com.lightrun.exceptions.Main.main(Main.java:29)

Process finished with exit code 1

You could prevent this exception altogether if you avoid parsing and stick to the predefined log levels. However, if for some reason you need to keep parsing log levels, one option would be to wrap logging in the try/catch block, and when you catch IllegalArgumentException, recover by rolling back to a default logging level:

String logLevel = "SEVER";
try {
    LOGGER.log(Level.parse(logLevel), "Processing {0} entries in a list", list.size());
} catch (IllegalArgumentException e) {
    Level defaultLogLevel = Level.WARNING;
    LOGGER.log(Level.INFO, "Provided invalid log level {0}, defaulting to INFO", logLevel);
    LOGGER.log(defaultLogLevel, "Processing {0} entries in a list", list.size());
}

IllegalArgumentException example 2: Randomizer

Java’s standard random number generator, java.util.Random, has a method nextInt(int bound) that generates a random number in a range from 0 (inclusive) to the upper bound expressed by its bound parameter (exclusive). Since the parameter is of type int, you can pass 0 or a negative integer without breaking any type system constraints. However, in the context of this method, passing any number smaller than 1 does not make sense: the generator can’t generate a random number in the range from 0 (inclusive) to 0 (exclusive), nor will it generate a number in a negative range. Consider the following method:

public static int randomize(ArrayList list){
    Random randomGenerator = new Random();
    return randomGenerator.nextInt(list.size());
}

This method takes a list and returns a random number from 0 to the length of the list. Now, if the list has any items in it, this code will work fine, but what if the list is empty?

ArrayList integerList = getListFromElsewhere();
System.out.printf("The size of this list is %d%n", integerList.size());
int randomInteger = randomize(integerList);

Let’s see what happens when we run this code:

The size of this list is 0
Exception in thread "main" java.lang.IllegalArgumentException: bound must be positive
    at java.base/java.util.Random.nextInt(Random.java:322)
    at com.lightrun.exceptions.RandomSample.randomize(RandomSample.java:35)
    at com.lightrun.exceptions.RandomSample.illegalArgumentExceptionWithRandomizer(RandomSample.java:15)
    at com.lightrun.exceptions.Main.main(Main.java:11)

Process finished with exit code 1

As you can see, nextInt() throws an exception that remains unhandled, and the program exits.

Suppose you’re unable to modify the randomize() method. How can you handle this on the call site?

In principle, you could throw a new runtime exception, specify the IllegalArgumentException coming from nextInt() as its cause, and communicate up the call stack that the provided list should not be empty:

int randomInteger;
try {
    randomInteger = randomize(integerList);
} catch (IllegalArgumentException illegalArgumentException) {
    String description = String.join("\n",
            "The provided list is empty (size = 0).",
            "The randomizer can't generate a random number between 0 and 0.",
            "In order to use the size of a list as the upper bound for generating random numbers,",
            "please provide a longer list."
            );
    throw new RuntimeException(description, illegalArgumentException) {
    };
}

Another solution, and probably a more practical one, would be to check if the supplied list is empty, and if so, return a fixed value instead of calling the randomize() method:

ArrayList integerList = getListFromElsewhere();
int randomInteger = integerList.size() > 0 ? randomize(integerList) : 1;

When and how to throw IllegalArgumentException in Java

You normally throw an IllegalArgumentException when validating input parameters passed into a Java method and you need to be more strict than the type system allows.

For example, if your method accepts an integer parameter that it uses to express a percentage, then you probably need to make sure that in order to make sense, the value of that parameter is between 0 and 100. If the value falls out of that range, you can throw an IllegalArgumentException:

public static int getAbsoluteEstimateFromPercentage(double percentOfTotal) {

    int totalPopulation = 143_680_117;

    if (percentOfTotal < 0 || percentOfTotal > 100) {
        throw new IllegalArgumentException("Percentage of total should be between 0 and 100, but was %f".formatted(percentOfTotal));
    }

    return (int) Math.round(totalPopulation * (percentOfTotal * 0.01));
}

It’s important to provide meaningful exception messages to make troubleshooting easier. This is why instead of throwing an IllegalArgumentException with an empty value, we provide a message that:

Defines the valid range of parameter values.
Includes the exact value that was out of the valid range.

When you throw an IllegalArgumentException in your method, you don’t have to add it to the method’s throws clause because it’s unchecked. However, many developers tend to add selected unchecked exceptions to the throws clause anyway for documentation purposes:

public static int getAbsoluteEstimateFromPercentage(double percentOfTotal) throws IllegalArgumentException {}

Even if you do add IllegalArgumentException to throws, callers of your method will not be obliged to handle it.

An alternative way of documenting important unchecked exceptions is using the @throws Javadoc documentation tag. In fact, the Oracle guidelines on using Javadoc comments claim that including unchecked exceptions such as IllegalArgumentException into a method’s throws class is a bad programming practice and recommends using the @throws documentation tag instead:

/**
 * @param percentOfTotal Percentage of total
 * @return A rough estimate of the absolute number resulting from taking a percentage of total
 * @throws IllegalArgumentException if percentage is outside the range of 0..100
 */
public static int getAbsoluteEstimateFromPercentage(double percentOfTotal) {

How to avoid IllegalArgumentException

Because IllegalArgumentException is an unchecked exception, your IDE or Java code analysis tool will probably not help you see if code that you’re calling will throw this exception before you run your application.

What your IDE can sometimes do for you is detect if the argument that you pass to a library method is out of range for this method. If you’re using IntelliJ IDEA for Java development, it comes with a set of annotations for JDK methods: additional metadata that helps clarify how these methods should be used. Specifically, there’s a @Range annotation that describes the acceptable range of values for a method, and if you’re writing code that violates that range, the IDE will let you know.

For example, the nextInt() method of JDK’s Random type is annotated with @Range, and if you invoke the Quick Documentation popup on that method, you’ll see that it tells you about the acceptable range:

Even if you go ahead and pass a value to nextInt() that is known to be out of range for this method, IntelliJ IDEA will display a highlight in the code editor to warn you about the issue:

However, if you’re using a method that is not annotated like this, don’t rely on your IDE: you’re on your own.

In general, when calling library methods, it’s a good practice to take note of throws clauses and @throws Javadoc documentation tags. In many cases, library developers use either or both of these tools to document why their methods could throw IllegalArgumentException.

Production debugging isn’t scary with Lightrun

Properly handling exceptions is one thing that you as a developer can do to ensure a smooth ride in production for your Java applications. Still, let’s face it: any non-trivial application will have bugs. You’re lucky if you can reproduce a bug in a local environment, debug and happily push a verified fix.

What if you can’t? Debugging remotely is tricky: you need to rely on existing logging, repeatedly redeploy updates with more logs and attempted fixes, and you’re even unable to set a proper breakpoint because you can’t afford to halt a production environment.

Take a look at Lightrun: our next-gen remote debugger for your production environment. With Lightrun, you can inject logs without changing code or redeploying, and add snapshots: breakpoints that don’t stop your production application. Lightrun supports Java, .NET, Python and Node.js applications, integrates with IntelliJ IDEA and VS Code. Set up a Lightrun account and check for yourself!

The post IllegalArgumentException in Java appeared first on Lightrun.

Live Debugging for Critical Systems

Lightrun Team — Mon, 23 Oct 2023 08:38:01 +0000

Live debugging refers to debugging software while running in production without causing any downtime. It has gained popularity in modern software development practices, which drives many critical systems across businesses and industries. In the context of always-on, cloud-native applications, unearthing severe bugs and fixing them in real time is only possible through live debugging. Therefore, live debugging becomes an integral part of any developer’s skill set.

This post will explore the various types of critical software systems where live debugging becomes imperative. It will also emphasize the broader strategies for live debugging of such applications.

Type of Critical Systems Where Live Debugging is Important

The definition of a critical system is that it must be highly reliable and retain its reliability as it evolves without incurring performance degradation or prohibitive costs.

Broadly, critical systems can be classified as follows.

Safety Critical Systems

Safety critical systems are systems where failure or malfunction can lead to loss of lives or serious physical injury. In many cases, the malfunctioning also has a second order impact in the form of environmental damage or ecological imbalance.

Software that manages such systems must be designed to control the operational aspects of the systems such that any malfunction has a limited impact on human life, as well as the local flora and fauna of the impacted region. The most obvious example of such a system is the avionics software installed on an aircraft that controls flight surfaces, engine systems, landing gear, and other auxiliary subsystems.

Mission Critical Systems

Mission critical systems are designed around a set of important goals. Therefore, they are intended to facilitate the completion of the goals with clearly stated trade-offs, no matter what hurdles are encountered in the course.

A commonly used mission critical system is map-based navigation software. Most users of Google Maps and other app-based navigation systems know how this software works. It guides drivers to drive to their destination along the road in minimum time. In this case, the mission is to reach the destination, and the trade-off is the time. Therefore, these systems are designed to recommend the best route to the destination in the minimum possible time.

Similar systems are also installed aboard aircraft, ships, and spacecraft with more complex trade-offs around fuel consumption and arrival times.

Business Critical Systems

Business critical systems are systems where failure can prevent an organization from completing important business functions or meeting key objectives. The higher order impact of such failures can result in revenue and reputation loss, eventually leading to degraded performance in the stock market or during subsequent fiscal quarters.

Common examples of software driven business critical systems are payment processing systems or customer support systems. Failure in such a system often disrupts the process workflow. If not addressed in time, such situations can grow out of control, resulting in revenue loss or a decline in the net promoter score for the organization.

Parameters Governing the Health of Critical Systems

The rules for live debugging of critical systems take a radically different approach. Firstly, these systems are designed in a fail-operational or fail-safe design methodology. In this way, these systems can continue functioning or safely shut down a subsystem in case of failure.

Live debugging of such systems in a production setup does not need the developer’s intervention to get into the innards of the source code and figure out the root cause. However, it is important to keep a tab on some key metrics that indicate the systems’ overall health. Let’s take a look at how these metrics can be calculated at a high level.

Mean Time Between Failures (MTBF)

MTBF is a reliability metric. It is a measure of the average time between failures of a critical system or its subsystem components. A higher value for MTBF corresponds to less frequent failures and is, therefore, considered desirable.

MTBF helps in further statistical analysis across all components of a critical system. Comparing MTBF across components can contribute to system design. For example, a subsystem with high MTBF requires less redundancy for fail-operational working. Similarly, a subsystem with lower MTBF must be improved via redesign or rigorous testing.

Mean Time to Resolve (MTTR)

MTTR stands for Mean Time To Resolve (the R sometimes also stands for Recovery or Repair). It is a maintainability metric that measures the average time required to resolve a show-stopper bug in a failed system or component.

MTTR is important to assess a system’s availability and serviceability from the end user’s perspective. A lower value of MTTR is always desirable. A higher MTTR most likely corresponds to inefficient diagnosis procedures or lack of skilled resources.

Mean Time to Acknowledge (MTTA)

MTTA stands for Mean Time To Acknowledge. It is the average time from when a failure is triggered to when work begins on the issue. It indicates how soon the RCA (Root Cause Analysis) is conducted to arrive at the source of failure. A higher MTTA is undesirable and can be indicative of overly complex system design.

The MTTA metric is always lower than MTTR since it takes less time to acknowledge a failure than to resolve it completely. If this is not the case, the critical system is most likely in an unstable state and requires further analysis in a staged environment.

Lightrun: A Reliable Observability Platform for Live Debugging of Critical Systems

Lightrun is a developer-centric observability platform. It empowers developers to ask intricate questions on production deployment and get answers in the form of logs, snapshots, and metrics. This approach enables live debugging of critical systems without causing downtime or performance degradation.

Lightrun is well suited for tracking MTBF in critical systems by injecting timestamped log messages within the running software. This feature creates a stream of dynamic logs that can capture the health-related metrics of the system for proactive remediation. It is also designed for dynamic instrumentation, allowing developers to investigate the software runtime in real time, resulting in reduced MTTA and MTTR.

Lightrun has been proven to reduce the MTTR by up to 60%, resulting in faster bug resolution. All these achievements have a direct impact on improving customer experience and increasing developer productivity.

To experience what it is like to perform live debugging on running production software, sign up for a free Lightrun trial and get started within minutes with your Java, Python, Node.js, or .NET applications. If you’d rather know more before you start, feel free to request a Lightrun demo.

The post Live Debugging for Critical Systems appeared first on Lightrun.

Putting Developers First: The Core Pillars of Dynamic Observability

Eran Kinsbruner — Sun, 24 Sep 2023 13:55:04 +0000

Introduction

Organizations today must embrace a modern observability approach to develop user-centric and reliable software. This isn’t just about tools; it’s about processes, mentality, and having developers actively involved throughout the software development lifecycle up to production release.

In recent years, the concept of observability has gained prominence in the world of software development and operations. Rooted in three foundational pillars—logging, metrics, and tracing—observability provides a comprehensive understanding of application behavior. These pillars allow teams to diagnose and address issues with greater precision and efficiency.

However, a notable challenge in observability is that many tools available today are designed by and for operations teams. Their primary focus often lies in monitoring, alerting, and system health from an infrastructural standpoint. This design bias can leave developers, who require a different granularity and data context, somewhat in the lurch. Instead of offering insights into code behavior, performance bottlenecks, or specific code-level issues, traditional observability tools may present data in a way that’s more aligned with operational needs. This mismatch underscores the importance of creating or adopting observability tools that cater explicitly to developers, ensuring that they can gain actionable insights from the system and application data in a manner that resonates with their specific workflow and challenges.

With the surge in adopting a platform engineering approach, there’s a profound shift in how organizations perceive and manage the Software Development Life Cycle. At the heart of this approach is providing developers with a robust platform that abstracts away infrastructural complexities and offers tools and services that accelerate development. As platform engineering becomes a catalyst for advanced SDLC management, there is a pressing need to elevate observability proficiency across organizations. Platform engineering, by design, involves a profound intersection of development and operations, which necessitates that the engineers possess a unique blend of skills. Among the emerging skill sets, debugging and observability stand out as paramount.

Why Developer Ownership is Non-negotiable

Over recent years, the software engineering industry has recognized the importance of granting developers ownership of their products to ensure software reliability, agility, and ease of maintenance. Developers should have control over their code, from creation to deployment. They must be able to deploy, rollback, observe, and debug code in production in order to speed up the feedback loop at the core, enabling faster improvements.

The software and overall user experience could improve with the right tools and responsibilities. Real-time debugging in a production environment is invaluable as developers have more context and knowledge to quickly fix the issue as they understand the recent changes best.

The Lightrun Three Pillars of Dynamic Observability

Lightrun offers a suite of features designed to enhance developers’ capabilities. One standout aspect is Lightrun’s ability to debug applications right in the live environment, providing real-time, on-demand insights irrespective of where the application is running.

Pillar 1. Dynamic Logging

Text logging remains a fundamental debugging tool. However, using it in remote environments presents challenges. Centralized logging platforms have grown, offering centralized log ingestion with efficient search capabilities. Yet, they often fall short for real-time remote debugging, mainly because of inherent delays, focus on post-event analysis, and disconnection from the local development environment.

In debugging remote environments, traditional logging can slow down the feedback loop for the developer, as adding a log line usually requires at least an entire CI/CD pipeline run, and most often, deploying a new version to production is impossible or hard to do frequently.`

Many developers opt for overlogging to compensate, leading to increased storage, computation, and possible licensing costs, not counting the difficulty of navigating a massive amount of logs to find the required piece of information.

Finally, log tools are often poorly integrated into developers’ IDEs, resulting in an unnecessary learning curve and shifting developers’ attention away from their primary environment. In some extreme cases, developers lack direct access to production logs because the organization cannot offer a method for secure access.

On the other hand, Lightrun Dynamic Logging enables developers to add new logs without halting the application. This ensures uninterrupted access to crucial data directly from the developer IDE. There’s also the possibility to log only when a specific code-level condition is true, significantly reducing the amount of information that needs to be evaluated to pinpoint an issue.

Pillar 2. Snapshots

Traditional debugging methods often involve a fragmented approach: logs for raw data, metrics for system health overviews, traces for request flows across services, and the occasional breakpoint to dive deep into a specific problem. While each tool offers its distinct advantage, developers often find themselves bouncing between them, trying to piece together a comprehensive understanding of what’s happening within their code. This approach can slow debugging and leave significant gaps in understanding, especially when attempting to correlate high-level data and specific code behaviors. Also, the powerful debugging model where the developer can put breakpoints in the applications can not be directly translated into running live applications, as you can not block them easily.

On the other hand, Lightrun Snapshots introduce a paradigm shift in the debugging process by acting as virtual breakpoints that don’t disrupt the flow of application execution. Unlike traditional breakpoints, which halt execution for inspection, Lightrun Snapshots seamlessly blend into the running application, allowing developers to add conditions, evaluate expressions, and delve deep into any code-level object without ever having to stop, restart, or redeploy the application. Integrated completely within the developer’s IDE, these snapshots not only offer a debugger-like experience but also enable a deeper connection to live applications by alerting developers when specific code segments are executed. This dynamic and continuous approach to debugging, compatible with a range of platforms like AWS, Azure, and Kubernetes, ensures that developers can gain deep insights into their applications right beside the source code, making debugging more intuitive and efficient.

Pillar 3. Metrics

Traditionally, just like with logs, developers have often felt the need to preemptively add many metrics, trying to cover all bases. This scattershot approach not only clutters the telemetry data but also risks overlooking that one critical metric needed during a production issue. Lightrun, however, challenges this paradigm by offering dynamic, code-level metrics. Instead or in addition to instrumenting the application with metrics upfront, Lightrun allows for the real-time insertion of precise metrics directly into live applications, ensuring relevance and accuracy without compromising the execution or state of the application.

With its comprehensive suite of tools, developers can gain insights ranging from the frequency of a specific line being executed with the Counter, to the time efficiency of methods with Method Duration and even block-wise timing with TicToc. Custom Metrics further broaden the scope, granting the freedom to export any numeric expression into a trackable metric.

In Summary

With its suite of features, including dynamic logging, snapshots, and real-time metrics, Lightrun integrates seamlessly with developers’ existing IDEs, positioning itself as an essential ally in the modern development toolkit. If you’re looking to stay ahead in the competitive development space, Lightrun might just be your answer. Dive into its functionalities on the playground, or schedule a demo to experience its capabilities firsthand!

The post Putting Developers First: The Core Pillars of Dynamic Observability appeared first on Lightrun.

Debugging Modern Applications: Advanced Techniques

Lightrun Team — Tue, 10 Oct 2023 15:25:22 +0000

Today’s applications are designed to be always available and serve users 24/7. Performing live debugging on such applications is akin to doctors operating on a patient.

Since the advent of the “as a service” model, software is like a living, breathing entity, akin to an anatomical system. Operating on such entities requires more dexterity on the developer’s part, to ensure that the software application lives on while being debugged and improved continuously.

Let’s look at time travel debugging, continuous observability, and more advanced debugging and live debugging techniques that are available to developers working on modern applications.

1: Time Travel Debugging for Live Issue Analysis

Time travel debugging allows developers to reconstruct and replay the historical runtime state of a running application. The runtime state consists of logs, snapshots, and other metrics, and data is captured with timestamps. Therefore, it can be time-traversed by going back and forth in time to understand the series of events that led to a bug.

Replaying the runtime execution sequence makes it possible to understand the system behavior better. Visualization also plays an important role in this process. There are several ways of visualizing the runtime state for assisting in time travel debugging, such as:

Timeline view: a chart of logged events plotted along a timeline.
Object graphs: a graph that depicts objects, their properties, and references between objects as nodes.
Memory heat maps to illustrate memory allocation and access patterns.

Apart from these visualization approaches, it is also possible to reconstruct a visual illustration of the runtime behavior based on standard UML (Unified Modelling Language) diagrams, such as state diagrams and sequence charts.

2. Chaos Testing for Live Simulation of Disasters

Chaos testing is a technique that intentionally introduces various failures into a software system. The main goal of this test is to measure the resiliency of the software and its ability to recover from unpredictable conditions.

This is not a debugging technique to fix a specific problem. Instead, it is a strategic debugging approach for assessing software reliability in the face of extreme disasters.

Some of the primary approaches to performing chaos testing include:

Injecting failures. Failures like network delays, server crashes, expired certificates, etc., randomly simulated to trigger anomalous behaviors.
Exceeding thresholds. Deliberately increasing the load on the system to force a breach on certain technical thresholds, such as network bandwidth, data storage, computing power, etc., that cause resource exhaustion.
Global disruption. Disrupting essential services the system depends on, like databases, message queues, caches, APIs, etc., by stopping/killing processes or shutting down critical infrastructure like servers, availability zones/regions.
Forced security intrusion. Forced security breaches in the way of simulated attacks, access loopholes, and failed authentication procedures to validate system sanity and understand attack vectors.

3. Shift Right Testing for Live Performance Predictions

Shift right testing is a DevOps culture. It mandates testing the software in a real world scenario earlier in the development phase. This approach is the opposite of the shift left methodology, which requires the developers to perform quality and security checks in development before getting the code into production.

Both approaches complement each other. However, achieving shift right testing is operationally intensive. That is because it involves reproducing the production environment and simulating heavy user traffic, which should be of the same order of magnitude as production traffic.

Like chaos testing, shift right testing is a broader debugging strategy. This approach de-risks the production deployment from unforeseen issues that may cause disruptions later due to undiscovered severe bugs.

4. Continuous Observability for Live Debugging

Continuous observability allows developers to observe and record the internal state of software during the entire DevOps cycle. More importantly, this is performed without any alteration at the source code level. This approach is best suited for live debugging of specific issues without halting runtime execution or forcing changes to the source code to capture telemetry data.

Continuous observability is best achieved by injecting an agent within the running software. The agent occupies a minimum footprint and captures logs, snapshots, and other metrics required for analysis during live debugging. This technique also complements time travel debugging since the data captured during live debugging can be sorted in time order to analyze the bug.

Supercharge Live Debugging with Lightrun

At Lightrun, we are passionate about helping developers improve their debugging productivity. Lightrun is designed to integrate with IDEs just like their native debuggers but with advanced live debugging support.

Unlike traditional debuggers, which halt the runtime execution of software during debugging, Lightrun allows developers to perform these steps dynamically while the runtime execution carries on. Behind the scenes, this capability is backed by dynamic logs, dynamic telemetry, and dynamic instrumentation.

Dynamic logs from Lightrun can be exported to a visualization platform for time travel debugging. Dynamic telemetry allows chaos and shift right tests to capture valuable data about system performance under various simulated load conditions. Above all, dynamic instrumentation allows developers to set virtual breakpoints anywhere in the source code for continuous observability of the software under production.

If you want to experience what it is like to perform live debugging on running production software, sign up for a free Lightrun trial and get started within minutes with your Java, Python, Node.js, or .NET applications. If you’d rather know more before you start, feel free to request a Lightrun demo.

The post Debugging Modern Applications: Advanced Techniques appeared first on Lightrun.

Effective Remote Debugging in PyCharm

Lightrun Team — Tue, 03 Oct 2023 08:25:41 +0000

In a previous post, we looked at the remote debugging features of Visual Studio Code and how Lightrun takes the remote debugging experience to the next level. This post will examine how Lightrun enables Python remote debugging in PyCharm, the Python IDE from JetBrains.

Remote Debugging in PyCharm

PyCharm has many developer-friendly features, including an integrated debugger. It also boasts several advanced debugging features not found in other IDEs.

Some of PyCharm’s key debugging features are:

Better support for Python. Being a Python IDE, PyCharm is well-suited for Python applications, including multithreaded processes. Other IDEs require additional code or configuration to enable smooth debugging in advanced scenarios.
Built-in profiler. PyCharm boasts a built-in profiler to help remove performance bottlenecks from your Python code.
Remote debugging: PyCharm also supports remote debugging where you can attach to one process or several processes running in parallel.

Drawbacks of Remote Debugging in PyCharm

Although PyCharm provides rich features for Python development, its native support for remote debugging is fairly limited.

Manual configuration. PyCharm’s remote debugging workflow requires setting up an SSH connection to a remote host and an additional run configuration to deploy the Python interpreter remotely.
Source code pollution. If you want to use PyCharm for remote debugging, you need to install and import a separate Python package, pydevd_pycharm. This means you introduce untracked temporary changes to your code, which isn’t a good practice.
Not suitable for production debugging. PyCharm’s debugger, whether local or remote, is suited for debugging applications in development and pre-production environments. Debugging a production application is not possible without making code and configuration alterations.

The native support for remote debugging in PyCharm is comparable to other debuggers such as that of VS Code. All these debuggers still rely on the traditional “Halt, Inspect, and Resume” approach.

There are two primary shortcomings of these debuggers:

Debugging by controlling the runtime execution. These debuggers are designed to halt and resume the runtime execution. This technique cannot be utilized for production applications, which must always be running to serve the users.
Attachment to long-running processes. Traditional debuggers were designed for monolithic applications, which execute as a long-running process. It does not work in a cloud-native environment with hundreds of ephemeral processes.

Therefore, the traditional debugging approaches supported by PyCharm and other IDEs are only suitable for long-running processes in a non-production environment. With the advent of cloud-native applications designed for the modern deployment model, these shortcomings become a clear bottleneck for developers to debug effectively.

Leveling up PyCharm Remote Debugging with Lightrun

Lightrun is a developer-centric continuous observability platform that integrates with most popular IDEs for Java, Node.js, and Python. In the case of PyCharm, it is available as a plugin.

Lightrun extends PyCharm’s remote debugging capabilities in a few ways:

Native IDE support for remote debugging. The Lightrun plugin integrates with PyCharm to offer all the visual controls for remote debugging. Developers have the option to connect to a remotely executing Python application on-the-fly and perform debugging right in the IDE. Lightrun handles the connections and setup for accessing the remote Python application in real time.
Ready for cloud-native debugging. By embedding the Lightrun agent as a Python module within the Python applications, developers can control their production applications remotely and seamlessly perform debugging actions such as setting virtual breakpoints, extracting snapshots of the stack, and capturing metrics on multiple instances of a cloud-native Python application.
Highly secure remote debugging. Lightrun’s security architecture ensures that remote debugging is performed in a sandboxed environment. It is a robust, patented mechanism that ensures that every debugging action performed on a production application is secured to ensure the privacy of the source code and no ill effects on the application’s performance.

PyCharm + Lightrun = Production Grade Remote Debugging

With Lightrun’s integration into PyCharm, developers get higher debugging productivity. Instead of spending hours logging the debugging data across the entire Python source code and analyzing it later, they can capture all the data in PyCharm.

For companies, Lightrun’s dynamic observability capabilities help efficiently detect and address security vulnerabilities in corporate software development. Overall, it leads to a faster time to market.

If you are keen to know more, you can try Lightrun yourself using the playground, or book a demo for a guided introduction.

The post Effective Remote Debugging in PyCharm appeared first on Lightrun.

Effective Remote Debugging with VS Code

Lightrun Team — Mon, 14 Aug 2023 18:19:40 +0000

This post will discuss remote debugging in VS Code and how to improve the remote debugging experience to maximize debugging productivity for developers.

Visual Studio Code, or VS Code, is one of the most popular IDEs. Within ten years of its initial release, VS Code has garnered the top spot among popularity indices, and its community is growing steadily. Developers love VS Code not only for its simplicity but also due to its rich ecosystem of extensions, including the support for debugging.

VS Code Remote Debugging Features

Being an integrated environment, VS Code has built-in support for debugging in many languages. Support for Node.js applications is available by default. This includes JavaScript, Typescript, and any other language that gets transpiled to JavaScript. Language extensions are also available for Python, C/C++, and most popular programming languages.

VS Code’s remote debugging features allow developers to debug a process running on a remote machine or device. This scenario is the opposite of local debugging, where the debugging is performed on a process spawned within VS Code’s integrated environment.

VS Code’s mechanism for debugging relies on attaching the debugger to a process, which is the executable program to be debugged. VS Code offers a custom launch configuration that allows many ways of attaching the debugger to a process. When debugging locally, the process executes inside VS Code’s environment, and the debugger is attached automatically. When you use VS Code for remote debugging, the launch configuration is updated with parameters for the debugger to point to a process running on a remote host via the IP address.

Some of the features of VS Code remote debugging are:

Consistent debugging UI. In VS Code, the user interface for debugging remains unchanged irrespective of local or remote debugging.
Custom launch configuration. VS Code launch configurations offer many options to set parameters for remote debugging. This mainly includes:
1. Port forwarding to set up communication between the VS Code debugger and the process running on the remote computer.
2. Source paths to point to the correct source code version associated with the running process.
3. Environment variables to set additional variables to control the debugging session.
Multi-target debugging. VS Code supports multi-target debugging, wherein developers can launch more than one debugging session pointing to different processes.
Debugging controls. Remote debugging in VS Code provides the same debugging controls developers use in a local debugging environment. These include setting breakpoints, log points and controls for stepping through the code manually.

Additionally, it supports multiple debug protocol adapters for different languages like C++, Python, Go, etc., with the extensibility to build custom debugging adapters for other platforms.

The Paradigm Shift for Debugging

Despite all the rich debugging capabilities, the VS Code debugging interface has shortcomings. To understand these shortcomings, it is vital to know how classical debugging methodology evolved in software engineering.

The classical debugging workflow relies on three approaches:

Halting the process execution. This is done using breakpoints to halt the runtime execution of the process at a certain point where the bug is most likely to reproduce.
Examining of the stack trace. This is done while the process execution is halted to examine the variable values.
Manual control of business logic. This is done to step through the execution of the process, one source code line at a time, and optionally substituting variable values to understand the system behavior precisely.

Given the advancements in software design and deployment models, this traditional approach to debugging, supported by integrated environments like VS Code, needs to catch up in many ways. This deficiency is due to a combination of paradigm shifts across multiple facets of software development: from desktop to cloud-hosted applications, from monoliths to microservices, and from legacy VMs to cloud-native deployments.

Disadvantages of Remote Debugging in VS Code

Given these sweeping paradigm shifts the industry has witnessed in the last few decades, VS Code’s local and remote debugging experience has the following disadvantages:

Traditional debugging isn’t helpful in production environments. The classical debugging approach relies on halting and manual control of process execution, which is not an option for the production environment. With the advent of agile methodologies, developers spend more time fixing bugs in the project’s staging and production phases than in the development phase. Therefore, runtime observability and monitoring are gaining precedence over debugging.
Debuggers were never designed for cloud-native applications. Cloud-native applications are distributed across multiple containers. While VS Code remote debugging supports containerized applications, they can only be used for long-running processes. In contrast, cloud-native deployment uses multiple ephemeral containers, which cannot be managed through the VS Code debugger interface. Also, the traditional debugging approach does not help unearth hard-to-find bugs that occur due to data races or deadlocks common in complex cloud-native applications running across hundreds of containers.
Manipulating the control flow is less relevant in this age of AI. Artificial Intelligence based applications rely on complex data models to make decisions instead of hand-coded control flow logic. VS Code debugger interface cannot debug such processes since it requires a different level of observability and analysis beyond just manipulating the business logic.
Security issues in remote debugging. Facilitating remote debugging also exposes specific ports on the remote computer where the process runs. Even though VS Code supports SSH based connections for secured access, there are no additional measures to impose IAM (Identity & Access Management) like permissions. This situation can result in debug enabled applications running in production where credentials are shared between development teams, leading to a potential security breach in the future.

Enhanced Remote Debugging in VS Code with Lightrun

Lightrun breaks the stereotype of classical debugging and enables debugging any application on any deployment.

The core approach for Lightrun revolves around developer observability, which allows developers to observe the internal behavior of an application at runtime. It surpasses the drawbacks of traditional debugging in the following ways:

Designed for remote debugging in the cloud. All modern, cloud-hosted applications are offered through the “as a service” model, which requires them to be constantly running and available to serve the end user. Lightrun facilitates remote debugging of production applications running on cloud environments without custom configurations or manipulation in process runtime. This includes the popular deployment orchestration platforms such as Kubernetes.
Designed for instant observability. Lightrun can capture live logs and instant snapshots of the running application, offering instant observability. The snapshots act like virtual breakpoints, which provide information about stack traces and variables without pausing the program execution.
Designed for debugging entire applications instead of individual processes. Rather than attaching to every process instance, the Lightrun agent gets embedded within all the runtime workloads of the application. All the logs and snapshots collected from the multiple runtime process instances can be collated in one place for easier investigation of bugs.

Transcend from Remote Debugging to Live Debugging in VS Code

The best part about Lightrun is that it is available as a VS Code extension:

Developers take advantage of a familiar interface to perform live debugging actions, right in VS Code:

While debugging, Lightrun panel views inside VS Code display logs and detailed snapshot information related to running application:

Behind the scenes, the Lightrun VS Code plugin connects to Lightrun agents embedded within the application to make all the live debugging magic happen.

If you are keen to explore Lightrun integration with VS Code further, check out the Lightrun documentation.

You can also sign up for a Lightrun account and get started with live debugging of Node.js, Java, Python, or .NET applications.

The post Effective Remote Debugging with VS Code appeared first on Lightrun.