developers Archives - Lightrun

Observability vs. Monitoring

Lightrun Team — Sat, 21 May 2022 10:09:26 +0000

Although all code is bound to have at least some bugs, they are more than just a minor issue. Having bugs in your application can severely impact its efficiency and frustrate users. To ensure that the software is free of bugs and vulnerabilities before applications are released, DevOps need to work collaboratively, and effectively bridge the gap between the operations, development and quality assurance teams.

But there is more to ensuring a bug-free product than a strong team. DevOps need to have the right methods and tools in place to better manage bugs in the system.

Two of the most effective methods are monitoring and observability. Although they may seem like the same process at a glance, they have some apparent differences beneath the surface. In this article, we look at the meaning of monitoring and observability, explore their differences and examine how they complement each other.

What is monitoring in DevOps?

In DevOps, monitoring refers to the supervision of specific metrics throughout the whole development process, from planning all the way to deployment and quality assurance. By being able to detect problems in the process, DevOps personnel can mitigate potential issues and avoid disrupting the software’s functionality.

DevOps monitoring aims to give teams the information to respond to bugs or vulnerabilities as quickly as possible.

DevOps Monitoring Metrics

To correctly implement the monitoring method, developers need to supervise a variety of metrics, including:

Lead time or change lead time
Mean time to detection
Change failure rate
Mean time to recovery

Deployment frequency

What is Observability in DevOps?

Observability is a system where developers receive enough information from external outputs to determine its current internal state. It allows teams to understand the system’s problems by revealing where, how, and why the application is not functioning as it should, so they can address the issues at their source rather than relying on band-aid solutions. Moreover, developers can assess the condition of a system without interacting with its complex inner workings and affecting the user experience. There are a number of observability tools available to assist you with the software development lifecycle.

The Three Pillars of Observability

Observability requires the gathering and analysis of data released by the application’s output. While this flood of data can become overwhelming, it can be broken down into three fundamental data pillars developers need to focus on:

1. Logs

Logs refer to the structured and unstructured lines of text an application produces when running certain lines of code. The log records events within the application and can be used to uncover bugs or system anomalies. They provide a wide variety of details from almost every system component. Logs make the observability process possible by creating the output that allows developers to troubleshoot code by simply analyzing the logs and identifying the source of an error or security alert.

2. Metrics

Metrics numerically represent data that illustrates the application’s functioning over time. They consist of a series of attributes, such as name, label, value, and a timestamp that reveals information on the system’s overall performance and any incidents that may have occurred. Unlike logs, metrics don’t record specific incidents but return values representing the application’s overall performance. In DevOps, metrics can be used to assess the performance of a product throughout the development process and identify any potential problems. In addition, metrics are ideal for observability as it’s easy to identify patterns gathered from various data points to create a complete picture of the application’s performance.

3. Trace

While logs and metrics provide enough information to understand a single system’s behavior, they rarely provide enough information to clarify the lifetime of a request located in a distributed system. That’s where tracing comes in. Traces represent the passage of the request as it travels through all of the distributed system’s nodes.

Implementing traces makes it easier to profile and observe systems. By analyzing the data the trace provides, your team can assess the general health of the entire system, locate and resolve issues, discover bottlenecks, and select which areas are high-value and their priority for optimization.

Monitoring vs. Observability: What’s the Difference?

We’ve compiled the below table to better distinguish between these two essential DevOps methods:

Monitoring	Observability
Practically any system can be monitored	The system has to be designed for observation
Asks if your system is working	Asks what your system is doing
Includes metrics, events, and logs	Includes traces
Active (pulls and collects data)	Passive (pushes and publishes data)
Capable of providing raw data	Heavily relies on sampling
Enables rapid response to outages	Reduces outage duration
Collects metrics	Generates metrics
Monitors predefined data	Observes general metrics and performance
Provides system information	Provides actionable insights
Identifies the state of the system	Identifies why the system failed

Observability vs. Monitoring: What do they have in common?

While we’ve established that observability and monitoring are entirely different methods, this doesn’t make them incomparable. On the contrary, monitoring and observability are generally used together, as both are essential for DevOps. Despite their differences, their commonalities allow the two methods to co-exist and even complement each other.

Monitoring allows developers to identify when there is an anomaly, while observability gives insights into the source of the issue. Monitoring is almost a subset of, and therefore key to, observability. Developers can only monitor systems that are already observable. Although monitoring only provides solutions for previously identified problems, observability simplifies the DevOps process by allowing developers to submit new queries that can be used to solve an already identified issue or gain insight into the system as it is being developed.

Why both are essential?

Monitoring and observability are both critical to identifying and mitigating bugs or discrepancies within a system. But to fully utilize the advantages of each approach, developers must do both thoroughly. Manually implementing and maintaining these approaches is an enormous task. Luckily, automated tools like Lightrun allow developers to focus their valuable time and skills on coding. The tool enables developers to add logs, metrics, and traces to their code without restarting or redeploying software in real-time, preventing delays and guaranteeing fast deployment.

The post Observability vs. Monitoring appeared first on Lightrun.

Dynamic Observability Tools for API Live Debugging

Eran Kinsbruner — Wed, 14 Jun 2023 16:27:36 +0000

Intro

Application Programming Interfaces (APIs) are a crucial building block in modern software development, allowing applications to communicate with each other and share data consistently. APIs are used to exchange data inside and between organizations, and the widespread adoption of microservices and asynchronous patterns boosted API adoption inside the application itself.

The central role of APIs is also evident with the emergence of the API-first approach, where the application’s design and implementation start with the API, thus treating APIs as first-class citizens and developing reusable and consistent APIs.

In the last decade, Representational state transfer (REST) APIs have come to dominate the scene, becoming the predominant API technology on the web. REST is more of an architectural approach than a strict specification: This free-formedness is probably the key to REST success as it has been essential in making REST popular and one of the critical enablers of a loose coupling between API providers and consumers. However, sometimes this bites back as a lack of consistency in the API behavior and interface. This is sometimes alleviated using specification frameworks like OpenAPI or JSON Schema.

Also, it’s worth pointing out the role of developers in designing and consuming APIs, as frequently, the development of an API requires strict collaboration between backend developers, frontend developers, and mobile developers since the role of API is the integration of different applications and systems.

Challenges in API integration

Despite being central to modern application development, API integration remains challenging. Those challenges mainly originate from the fact that the systems connected by APIs form a distributed system, with the usual complexities involved in distributed computing. Also, the connected systems are mostly heterogeneous (different tech stacks, data models, ownership, hosting, etc.), leading to integration challenges. Here are the most common ones:

Incorrect data. Improper data formatting or conversion errors (due to inaccurate data type or incompatible data structures) can cause issues with the exchanged data. This often results in malformed JSON, errors in deserialization, and type casting errors.
Lack of proper documentation. Poorly documented endpoints may require extensive debugging to infer data format or API behavior. This is particularly problematic when dealing with third-party services without access to the source code or the architecture.
Incorrect or unexpected logic or behavior. The loosely defined REST model does not allow for specifying the callee behavior formally, or such behavior can be undocumented or implemented wrong for some edge cases.
Poor query parameter handling. Query parameters are the way for the callee to modify the provided results. Often, edge cases arise where parameters are not handled correctly, requiring a trial-and-error debugging process.
Error handling. Even if HTTP provides the basic mechanism of response codes for error handling, each API implementation tends to customize it, either using custom codes or adding JSON error messages. Error handling is not always coherent, even between different endpoints on the same system, and it may be undocumented.
Authentication and authorization errors. The way in which authorization is handled on the API producer can generate errors and unexpected behavior, sometimes manifesting incoherence between different endpoints on the same system.

Errors can be present on the provider side or the consumer side. On the provider side, we often cannot intervene in the implementation, which necessitates implementing workarounds on the consumer side.

For errors on the consumer (wrong deserialization, incorrect handling of pagination, or states, etc.), troubleshooting usually involves examining logs for request/response patterns and adding logs to examine parameters and payloads.

Lightrun Dynamic Observability for API debugging

Ligthrun‘s Developer Observability Platform implements a new approach to observability by overcoming the difficulties of troubleshooting applications in a live setting. It enables developers to dynamically instrument logs for applications that run remotely on a production server by adding logs, metrics, and virtual breakpoints, without the need to code changes, redeployment, or application restarts.

In the context of API debugging, the possibility of debugging on the production environment provides significant advantages, as developers do not need to reproduce locally the entire API ecosystem surrounding the application, which can result difficult: think, for example, to the need to authenticate to third-parties API, or to provide a realistic database to operate the application locally. Also, it is only sometimes possible to reproduce realistic API calls locally, as the local development environment tends to be simplified with respect to the production one.

Lightrun allows debugging API-providing and consuming applications directly on the live environment, in real-time and on-demand, regardless of the application execution environment. In particular, Lightrun makes it possible to:

Add dynamic logs. Adding new logs without stopping the application allows obtaining the relevant information for the API exchange (request/response/state) without leaving the IDE and without losing the state (for example, authentication tokens, complex API interactions, pagination, and real query parameters). It’s also possible to log conditionally only when a specific code-level condition is true, for example, to debug a particular API edge case taken out from a high number of API requests.
Take snapshots. Adding virtual breakpoints that can be triggered on a specific code condition to show the change in time of request parameters and response payloads.
Add Lightrun metrics for method duration and other insights. It makes it possible to measure the execution times of APIs and count the time a specific endpoint is being called.

Lightrun is integrated with developer IDEs, making it ideal for developers, as it allows them to stay focused on their local environment. Doing so, Lightrun works as a debugger that works everywhere the application is deployed, allowing for a faster feedback loop during the API development and debugging phases.

Bottom Line

Troubleshooting APIs returning incorrect data or behaving erratically is essential to ensure reliable communication between systems and applications. By understanding the common causes of this issue and using the right tools and techniques, developers can quickly identify and fix API problems, delivering a better user experience and ensuring smooth software operations. Lightrun is a developer observability platform giving backend and frontend developers the ability to add telemetry to live API applications, thus representing an excellent resolution to API integration challenges. Try it now on the playground, or book a demo!

The post Dynamic Observability Tools for API Live Debugging appeared first on Lightrun.

Lightrun Attendance at FinOps X 2023: Unveiling Key Insights, Highlights and Takeaways from the Show

Eran Kinsbruner — Thu, 29 Jun 2023 22:47:05 +0000

Introduction

This week Lightrun attended the annual FinOps X event. The event was sold out and packed with great speakers, practitioners, and amazing atmosphere. Compared to last year which had over 300 attendees, this year the event brought over 1200!

Above is a screenshot taken from the venue entrance reminding the audience with the core principles of FinOps.

As a background, Lightrun is a FinOps Certified Solution dynamic observability platform that has 3 FinOps certified practitioners (Myself included :)), aiming to shift left FinOps to developers and optimize the overall spend earlier in the development lifecycle.

As emphasized in the recent state of FinOps report, a significant obstacle (29.5% of respondents) to the adoption and shift-left approach of FinOps is to enhance developers’ awareness and accountability regarding the cost of their software development, empowering them to take appropriate measures. Executives are increasingly recognizing the importance of cost awareness at various levels, such as feature development, tooling, and overall application development, prompting actions like cost sharing, spend optimization across departments, and more.

In this post, we summarize the key takeaways from this conference, share some notable facts taken from the main keynotes and breakout sessions, and put the spotlight on what’s going to shape the future of FinOps.

Key Takeaways and Highlights

As mentioned above, this year event attendance clearly proved that FinOps is becoming the common practice and language for business that are transforming their cloud usage and looking at ways to optimize their spending.

While there were dozens of sessions and chalk-talks, it was hard to capture all, so in this summary blog I’m covering on the top insights and sessions that I was able to attend and than stood up to me.

To begin with the event summary, a prominent theme echoed by FinOps ambassadors, vendors, and major cloud service providers like Google (represented by Mich Razon), Microsoft (represented by Fred Delombaerde), and Oracle (represented by Phil Newman) revolved around the primary goal of cost optimization through FinOps. This key message emphasized that every dollar saved during the process should be directed towards fostering future growth, driving innovation, and ultimately achieving business success.

High-Level Key Takeaways

FinOps practice is emerging YoY with more organization adoption, and more practitioners getting certified
FinOps success is attributed to the combination of Tools, People Skills, Communication, and Organizational Culture Shift
Technology trends that are shaping and will continue to evolve FinOps maturity are related to the following:
- Automation , Automation , Automation – Across different practices, across teams, and via varying tools.
- Gen-AI and ML for better and more rigor data analysis at scale (more context around that is later in this blog)
- Collaboration between CSPs, vendors, and the FinOps Foundation is transforming FinOps and will reshape the future of FinOps landscape
- Shifting left FinOps and empowering developers to better own the cost per feature, CPU/Memory consumption optimization, and other cost related aspects earlier in the SDLC is key to drive success in maturing FinOps
New open-source project Focus holds a great promise to breaking Siloes and creating a common ground from taxonomy, terminology and understanding of the various FinOps attributes coming from different sources
While the FinOps framework evolves around the cycle of Inform, Optimize and Operate across the varying maturity levels of Crawl, Walk, and Run, the FinOps practice isn’t a one an done objective but rather a continuous effort that requires maintenance, investment, and optimization.
FinOps is launching a new persona based training to more effectively provide training throughout the different organization department as well as re-classifying the FinOps landscape based on categories.
Throughout the sessions there was a clear correlation between the FinOps practice and the term “Efficiency Engineering” – Many advance organizations look at FinOps adoption and maturity as a way to enhance their engineering organizations toward lowered costs that goes hand in hand with innovation and growth from a products perspective.

Before diving deeper into the article, I wanted to share the below insight taken from the Keynote given by Natalie Daley from HSBC showed a best practice and recommendation around maturing FinOps via a crawl, walk, run approach divided into pricing, usage, and architecture categories. Natalie emphasized the continuous improvement that is required to succeed and advance the values and ROI from FinOps practices.

Spotlight on Event Themes at FinOps X 2023

There were many topics covered during the conference including the new FOCUS project launch, generative-AI and ML solutions to aid limitations or challenges with existing tools, automation of FinOps practices toward shifting FinOps left towards engineering in their early stage design and code development, and more. Below is an high level summary of these key themes.

Focus Project Launch !

It started with the opening keynote and continued throughout the breakout sessions and chart talks – FOCUS is one of the next big things in FinOps practice formalization.

As shown in the picture below taken from the session from Udam Dewaraja (Chairperson Focus, FinOps Foundation) and Mike Fuller (FinOps Foundation CTO), the FinOps Foundation is moving forward with an open-source initiative that aims to standardize the taxonomy, metrics, and terminology across all cloud computing solutions: CSPs, SaaS providers, and other vendors that contribute to the overall cloud spending. The end goal is to have a single language with a clear pane of glass to measure, monitor, and optimize cloud cost so all different personas are aligned and informed and can take data-driven smart decisions.

The above highlighted Version 0.5 of the FOCUS specification with support from preliminary vendors and contributors is available, and the team is currently working on version 1.0 which will include more standardization and additional contributions from the community.

Based on the full room of attendees, it is clear that there is both interest and need in the market for such standardization, and the next year would be fundamental to the growth of the spec and its adoption.

At FinOps X 2024, it would be very interesting to measure the journey of this new and exciting project and its outcomes.

Shift Left FinOps Practices Automation

Both tool providers that were exhibiting (e.g. FinOut) in the show as well as dedicated breakout sessions (Capital One) clearly covered the investment in FinOps automation to better drive adoption, awareness, and accountability to cloud spending.

The above screenshot taken from the Capital One session (Brent Segner and David Cahan) is only one of many examples for initiatives led by organizations to bridge gaps between siloed teams via precise data. Below is another example taken from Under Armour where Amy Ashby was covering anomalies and use cases where better communication combined with tooling and automation can save a lot of time and money and mature the entire level of FinOps within the organization.

As highlighted in the above anomalies, there are often spikes in costs that are shown in the reports, however, not every spike needs to be causing panic. For example, when running load and performance testing against a given environment there would obviously be spikes in the system as well as when the cloud spending model is based on consumption, hence, at the beginning of the month the pricing is high but than it decreases. It all comes down to transparency, visibility, data accuracy and most important – communication and tooling.

Another great session with extremely important insights came from Walmart Global Tech where Tim O’brien covered the journey his organization underwent as they built and matured FinOps, especially focusing on engineering team adoption and shifting it left with automation.

Complementing the above sessions was a great session from Target Corporation led by Kim Wier and Ron Tatro. They discussed the concept of FinOps as a Product and how using cultural shifts, technology and automation they are able to shift left FinOps, drive accountability and reduce costs.

A phenomenal session led by Benjamin Coles from Apple showed the journey Apple engineering took towards optimizing their spend through tooling, communication, and by tackling the common FInOps challenges one by one as can be seen in the below screenshot.

In addition to the above, Benjamin highlighted the below reports and dashboard showing weekly, monthly and YoY cloud spend across the various hierarchies (development, testing, production, etc.) and covered the way Apple is looking to optimize CPU utilization as a way to reduce overall cloud spending.

Lastly, Julia Harvey and Andrea Ratliff from Nationwide presented their homegrown built COIN which is an Index for cloud cost optimization that allows better visibility (underutilized compute, idle resources and more), forecasting and cost alignment within the huge organization of Nationwide. Specifically it provides executives with an Index-Based list of opportunities for optimization with the ability to then drill down into the resources level data for further action.

Generative-AI and Machine Learning within FinOps Practice

It was clear during the show that Gen-AI and ML are reshaping the entire technology stack and FinOps is yet another focus area for vendors (e.g. CoreStack) to enhance and maximize the outcomes of their tools. As teams struggle to optimize their cloud spending and are basing their decision making on data analysis across tools, cloud providers, and their own internal processes, artificial intelligence becomes instrumental for resilient, reliable, and scalable outputs.

The use of AI was featured both in the Capital One breakout session as part of the internal tool stack that the tech bank is building and maintaining as part of the FinOps category, as well as in other sessions such as the CSPs panel discussion.

It was evident that to mature FinOps practices, one of the advancements should be around empowering developers to be aware of the costs per feature, user, and customers so they can make the right balance throughout the SDLC for business profitability. In FinOps it is referred to as unit economics and the action is measuring the unit costs to understand business growth.

Another 2 sessions surrounding AI where quite inspirational coming from Microsoft and Adobe.

Microsoft introduced its new AI based cloud cost optimization solution that should allow organizations to monitor, analyze, and optimize their cloud costs through Gen-AI capabilities, chatbots with Q&A abilities tailored to cost optimization and rightsizing.
Adobe introduced their recent AI solution named FireFly that similarly has the abilities to analyze and optimize cloud costs and bridge the gaps between product engineering and central efficiency departments.

The Future of FinOps

So, where is FinOps heading? As highlighted throughout the article, there are few key themes and investment areas where we should expect to see FinOps transforming and growing.

More tools and open-source initiatives to drive consistent language and alignment between internal department within organizations such as the new FOCUS project.
More Investment from CSPs in FinOps solutions – It is clear that it is a common interest across all leading CSPs to be part of the FinOps community and contribute to its success. We already start to see as listed above few new solutions coming from both Google and Microsoft that involve AI and other advanced capabilities to lead the adoption of the FinOps framework.
Gen-AI and Smarter FinOps Practices – this cannot be more evident after the show. Each major vendor is looking at embedding AI capabilities into his solution to boost the value and grow the reliability and scale.
Investment in community and training will continue to grow with more certifications, programs, and events that will bring together all levels of practitioners towards success.

Concluding Thoughts

This event was phenomenal, very well arranged and with awesome keynotes, panel discussions, and breakout sessions.

Cannot wait to see what the future of FinOps will look like, but it is sure heading towards the right way with the above mentioned focus areas.

Here at Lightrun we seek ways to further optimize remote and distributed workload software development and debugging costs as we did with the recent launch of the LogOptimizer, and we plan on doubling down on more advancements that are dev focused to support the shift left of FinOps.

Until next year …. A big thank you to the FinOps Foundation/Linux Foundation for setting up such a great and insightful event!

See you next year

The post Lightrun Attendance at FinOps X 2023: Unveiling Key Insights, Highlights and Takeaways from the Show appeared first on Lightrun.

Maximizing CI/CD Pipeline Efficiency: How to Optimize your Production Pipeline Debugging?

Eran Kinsbruner — Mon, 15 May 2023 11:43:07 +0000

Introduction

At one particular time, a developer would spend a few months building a new feature. Then they’d go through the tedious soul-crushing effort of “integration.” That is, merging their changes into an upstream code repository, which had inevitably changed since they started their work. This task of Integration would often introduce bugs and, in some cases, might even be impossible or irrelevant, leading to months of lost work.

Hence Continuous Integration and continuous deployment (CI/CD) was born, enabling teams to build and deploy software at a much faster pace since the inception of the paradigm shift allowing for extremely rapid release cycles.

Continuous Integration aims to keep team members in sync through automated testing, validation, and immediate feedback. Done correctly, it will instill confidence in knowing that the code shipped adheres to the standards required to be production ready.

However, although many positive factors are derived from CI/CD pipeline, this has since evolved into a complex puzzle of moving parts and steps for organizations where problems occur frequently.

Usually, errors that occur in the pipeline happen after the fact. You have N number of pieces to the puzzle that could fail even if you can resolve some of these issues by piping your logs to a centralized logging service and tracing from there. You are not able to replay the issue.

You may argue for the case of static debugging instances. In this process, one usually traces an error via a stack trace exception or error that occurs. Then you make calculated guesses about where the issue may have happened.

This is then usually followed by some code changes and local testing to simulate the issue and then followed by Deploying the code and going through a vicious cycle of cat and mouse to identify issues.

Issues with CI/CD Pipelines and Debugging

Let’s break down some fundamental issues plaguing most CI/CD pipelines. CI/CD builds, and production deployments rely on testing and performance criteria. Functional testing and validation testing could be automated but challenging due to the scope of different scenarios in place.

Identifying the root cause of the issue

It can be challenging to determine the exact cause of a failure within a CI/CD pipeline while debugging complex pipelines consisting of many stages and interdependent processes can be difficult to understand and comprehend what went wrong and how to fix it.

At its core, a lack of observability and limited access to logs or lack of relevant information can make it challenging to diagnose issues, and at times, the inverse excessive logging and saturation cause tunnel vision.

Another contributing factor If code coverage is low as well edge case scenarios that could potentially be breaking your pipeline will be hard to discover for those that work in a Monorepo environment, issues are exacerbated where shared dependencies and configurations originate from multiple teams or developers that push code without verification cause a dependence elsewhere to break the build deploy pipeline.

How to Optimize your CI/CD Pipeline?

There will be times when you believe you’ve done everything correctly, but something still needs to be fixed.

Your pipeline should have a structured review process.
You need to ensure the pipeline supports automated tests.
Parallelization should be part of your design, with caching of artifacts where applicable.
The pipeline should be built, so it Fails fast with — a feedback loop.
Monitoring should be by design.
Keep builds and tests lean.

All these tips won’t help much if you don’t have a way to observe your pipeline.

Why Should Your CI/CD Pipeline be Observable?

A consistent, reliable pipeline helps teams to integrate and deploy changes more frequently. If any part of this pipeline fails, the ability to release new features and bug fixes grinds to a halt.

An observed pipeline helps you stay on top of any problems or risks to your CI/CD pipeline.

An observed pipeline provides developers with the visibility of their tests, and will they will finally know whether the build process they triggered was successful or not. If it fails, the “Where failed” question is answered immediately.

Not knowing what’s going on in the overall CI/CD process and need to know overall visibility to see how it’s going and overall performance is no longer a topic of discussion.

Tracing issues via different interconnected services and understanding the processing they undergo end to end can be difficult, especially when reproducing the same problem in the same environment is complex or restrictive.

Usually, DevOps and Developers generally try to reproduce issues via their Local environment to understand the root cause, which brings additional complexities of local replication.

Architecture CI/CD pipeline

To put things into context — let’s work through an example of a typical production CI/CD pipeline.

CI/CD pipeline with GitActions and AWS CDK AWS Beanstalk

The CODE

The CI/CD pipeline starts with the source code via Github with git actions to trigger the pipeline. GitHub provides ways to version code but does not track the impact of changes of commits developers make into the repository. For example,

What if a certain change introduces a bug?
What if a branch merge resulted in a successful build but failed deployment?
What if the deployment was successful, then a user received an exception, and it’s already live in production?

BUILD Process

The build process with test cases for code coverage is a critical point of failure for most deployments. If a build fails, the team needs to be notified immediately in order to identify and resolve the problem quickly. You may say they are options like setting alerts to your slack channel or email notifications that can be configured.

Those additional triggers can alert you though they need to provide the ability to trace and debug the issues in a timely manner as one still needs to dig into the code. Failure may be due to some more elusive problems such as missing dependencies.

Unit & Integration TESTS

It’s not enough to know that your build was successful. It also has to pass tests to ensure changes don’t introduce bugs or regressions. Frameworks such as JUnit, NUnit, and pytest generate test result reports though these reports output failed cases but not the how part.

Deploy Application

Most pipelines have pivoted to infrastructure as code where code dictates how Infrastructure provisioning is done. In our example AWS CDK lets you manage the infrastructure as code using Python. While empowering Developers we have the additional complexity of code added which becomes hard to debug.

Post-Deploy Health Checks

Most deployments have an extra step to verify the health such as illustrated in our pipeline. Such checks may include Redis health, and Database health. Since these checks are driven by code we yet have another opportunity for failure that may hinder our success metric.

Visually Illustrating Points of Failure in the CI/CD Pipeline

Below illustrates places that can potentially go wrong i.e Points of failure. This exponentially gets more complex depending on how your CI/CD pipeline has been developed.

Points of failure in our example CI/CD pipeline

Dynamic Debugging and Logging to Remedy your Pipeline Ailments

Let’s take a look at how we can quickly figure out what is going on in our complex pipeline. A new approach of shifting left observability which is the practice of incorporating observability into the early stages of the software development lifecycle via applying a Lightrun CI/CD pipeline Observable Pattern.

Lightrun takes a developer-native observability first approach with the platform; we can begin the process of strategically adding in agent libraries in each component in our CI/CD pipeline, as illustrated below.

Lightrun CI/CD pipeline pattern

Each agent will be able to observe and introspect your code as part of the runtime, allowing you to directly hook into your pipeline line directly from your IDE via a Lightrun plugin or CLI.

This will allow you then to add virtual breakpoints with logging expressions of your choosing to your code in real-time directly from your IDE, i.e., in essence, remote debugging and remote logging such as you would do on your local environment by directly linking into production.

Since virtual breakpoints are non-intrusive and capture the application’s context, such as variables, stack trace, etc., when they’re hit, This means no interruptions to execute code in the pipeline and no further redeployments would be required to optimize your pipeline.

Lightrun Agents can be baked into Docker images as part of the build cycles. This pattern can be further extended by making a base Docker image that has your Lightrun unified configurations inherited by all microservices as part of the build, forming a chain of agents for tracking.

Log placement in parts of the test and deploy build pipeline paired with real-time alerting when log points are reached can minimize challenges in troubleshooting without redeployments.

For parts of code that do not have enough code coverage — all we will need to do is add Lightrun counter metric to bridge the gap to form a coverage tree of dependencies to assist in tracing and scoping what’s been executed and its frequency.

Additional Metrics via the Tic & Toc metric that measures the elapsed time between two selected lines of code within a function or method for measuring performance.

Customized metrics can further be added using custom parameters with simple or complex expressions that return a long int results

Log output will immediately be available for analysis via either your IDE or Lightrun Management Portal. By eliminating arduous and time-consuming CI/CD cycles, developers can quickly drill down into their application’s state anywhere in the code to determine the root cause of errors.

How to Inject Agents into your CI/CD pipeline?

Below we will illustrate using python. You’re free to replicate the same with other supported languages.

Install the Lightrun plugin.
Authenticated IDE pycharm with your Lightrun account
Install the python agent by running python -m pip install lightrun.

pip install lightrun

Add the following code to the beginning of your entrypoint function

import os

LIGHTRUN_KEY = os.environ.get('YOUR_LIGHTRUN_KEY')

LIGHTRUN_SERVER = os.environ.get('YOUR_LIGHTRUN_SERVER_URL')

def import_lightrun():

   try:

       import lightrun

       lightrun.enable(com_lightrun_server=LIGHTRUN_SERVER, company_key=LIGHTRUN_KEY, lightrun_wait_for_init=True, lightrun_init_wait_time_ms=10000,  metadata_registration_tags='[{"name": ""}]')

   except ImportError as e:

       print("Error importing Lightrun: ", e)

as part of the enable function call, you can specify lightrun_wait_for_init=True and lightrun_init_wait_time_ms=10000 as part of the Python agent configuration.

These two configuration parameters will ensure that the Lightrun agent starts up fast enough to work within short-running service functions and apply a wait time of about 10000 milliseconds before fetching Lightrun actions from the management portal. take note these are optional parameters that can be ignored if it doesn’t make sense to apply them for long-lived code execution cycles e.g running a Django project or fast API microservice applications if your using another language like java the same principles apply.

Once your agent is configured, you can call import_lightrun() function in __init__.py part of your pipeline code can be made to ensure agents are invoked when the pipeline starts.

Deploy your code, and open your IDE with access to all your code, including your deployment code.

Select the lines of code you wish to trace and open up the Lightrun terminal and console output window shipped with the agent plugin.

Adding logging to live executing code with Lightrun directly from your IDE

Achieving Unified output to your favorite centralized logging service

If we wish to pipe out logs instead of using the IDE, you can tap into third-party integrations to consolidate the CI/CD pipeline, as illustrated below.

If you notice an unusual event, you can drill down to the relevant log messages to determine the root cause of the problem and begin planning for a permanent fix in the next triggered deploy cycle.

Validation of CI/CD Pipeline Code State

One of the benefits of an observed pipeline is that we can fix the pipeline versioning issues. Without correct tagging, how do you know your builds have the expected commits it gets hard to tell the difference without QA effort.

By adding dynamic log entries at strategic points in the code, we can validate new features and committed code in the pipeline that was introduced into the platform by examining dynamic log output before it reaches production.

This becomes very practical if you work in an environment with a lot of guard rails and security lockdowns on production servers. You don’t worry about contending with incomplete local replications.

Final thoughts

A shift left approach observability in CI/CD pipeline optimization approach can increase your MTTR average time it takes to recover from production pipeline failures which can have a high impact on deploying critical bugs to production.

You can start using Lightrun today, or request a demo to learn more.

The post Maximizing CI/CD Pipeline Efficiency: How to Optimize your Production Pipeline Debugging? appeared first on Lightrun.

A Comprehensive Guide to Troubleshooting Celery Tasks with Lightrun

Leonid Blouvstein — Tue, 09 May 2023 13:42:41 +0000

This article explores the challenges associated with debugging Celery applications and demonstrates how Lightrun’s non-breaking debugging mechanisms simplify the process by enabling real-time debugging in production without changing a single line of code.

Celery: Powerful but Challenging

Celery is not only a powerful, but also a widely adopted distributed task queue that allows developers to effectively manage and schedule tasks asynchronously. As evident from its GitHub repository, which shows 96k open-source projects utilizing it, Celery has become the go-to tool for Python developers, including those who work with popular frameworks such as Django, FastAPI, and Flask.

Moreover, while Celery’s popularity is a testament to its usefulness, it also means that developers need to be prepared to deal with a high volume of issues. As of now, there are over 7.5k issues on Celery’s Github repository (both open and closed). While a larger number of issues can be indicative of a tool’s popularity, the complexity of Celery’s functionality also plays a role.

That being said, the complexity of Celery’s functionality means that debugging can still be a challenge, even for experienced developers. Without the right debugging tools or approach, identifying the source of an issue can be a time-consuming and frustrating process. It’s not uncommon for developers to spend hours reading through the documentationor manually debugging their code.

The complexity can be particularly daunting when developers need to make changes to their code, deploy it to multiple environments, test it thoroughly, and then push it to production.

Fortunately, Lightrun, the cloud-native observability platform, makes debugging Celery applications more accessible and efficient. Its non-breaking debugging mechanisms allow developers to debug Celery applications in real-time, even in production, without the need to modify the codebase.

This is what we are going to examine in this post. Read on to discover how.

The Code Used in this Example

We are going to start with an application that allows users to book online products and services based on their availability. When handling high-traffic web applications that involve booking services or products, Celery can play a crucial role in ensuring that the application is scalable and efficient. This is why we are going to use Celery mainly for the transactional part.

You can find the source here (celery branch).

These are the database schemas we will be using: a product table, a table for transactions, and a table for orders.


class Product(models.Model):
    product_id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
    name = models.CharField(max_length=100)
    description = models.TextField(default="Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.")
    price = models.DecimalField(max_digits=10, decimal_places=2, default=10.00)
    stock_quantity = models.PositiveIntegerField(default=10)

    def get_absolute_url(self):
        return reverse("shop:product_detail", kwargs={"product_id": self.product_id})

class Transaction(models.Model): 
    transaction_id = models.UUIDField(primary_key=True, editable=False, blank=False, default=uuid.uuid4)
    product = models.ForeignKey(Product, on_delete=models.CASCADE)    
    user = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE, blank=True, null=True)

class Order(models.Model):
    order_id = models.UUIDField(primary_key=True, editable=False, blank=False, default=uuid.uuid4)
    user = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE, blank=True, null=True)
    transaction = models.OneToOneField(Transaction, on_delete=models.CASCADE)
    product = models.ForeignKey(Product, on_delete=models.CASCADE)

    def get_absolute_url(self):
        return reverse("shop:order_detail", kwargs={"transaction_id": self.transaction.transaction_id})
    
    def get_full_url(self):
        return "{}{}".format(settings.FULL_SITE_URL, self.get_absolute_url())

@receiver(post_save, sender=Transaction)
def create_order(sender, instance, created, **kwargs):
    if created:
        order = Order.objects.create(transaction=instance, product=instance.product, user=instance.user)
        sender = "admin@example.com"
        receiver = instance.user.email
        subject = "Order confirmation"
        message = f"Order for product {instance.product.name} has been confirmed. Your can view your order at {order.get_full_url()}"
        send_mail(subject, message, sender, [receiver], fail_silently=False)

We also have 4 views:

def product_detail(request, product_id):
    product = Product.objects.get(product_id=product_id)    
    return render(request, "shop/product_detail.html", {"product": product,})

def process_transaction(request, product_id):
    transaction_id = process_transaction_task.delay(product_id, request.user.id)
    return HttpResponseRedirect(reverse("shop:transaction_detail", kwargs={"transaction_id": transaction_id, "product_id": product_id}))

def transaction_detail(request, transaction_id, product_id):              
    message = """
    Order successful.

    You will receive an email with your order details.

    Transaction ID: {}

    Product ID: {}

    """.format(transaction_id, product_id)            
    return render(request, "shop/transaction_detail.html", {"message": message})
        
def order_detail(request, transaction_id):
    order = Order.objects.get(transaction__transaction_id=transaction_id)
    return render(request, "shop/order_detail.html", {"order": order,})

The process_transaction view should handle the payment using a Celery task process_transaction_task.

This is the task:

@task(name="myapp.tasks.process_transaction_task")
def process_transaction_task(product_id, user_id):
		# payment processing simulation
    time.sleep(2)
    product = Product.objects.get(product_id=product_id)
    user = User.objects.get(id=user_id)
    transaction = Transaction.objects.create(product=product, user=user)
    transaction_id = transaction.transaction_id
    product.stock_quantity -= 1
    product.save()    
    return transaction_id

Bug Hunting, the Tedious Way

After dockerizing, building, and deploying the application to a Kubernetes cluster, everything appears to be running smoothly until a customer reports an issue with their transaction details. Specifically, the customer claims that the transaction ID displayed on the post-checkout page is different from the one received in their email.

This is the id that shows on the post-checkout page (4cb73430-17e5-40fc-a153-d0c820230115)

While the one that is sent to the user by email is b9454d45-6b7c-462a-8aba-14829927eef4

Such problems are deadly for the coherence and integrity of data in this use case. So how come the transaction ID discrepancy occurred in the first place? One possibility is that there was a bug in the code that caused the transaction ID to be generated incorrectly. Alternatively, there could have been an issue with the email-sending process that caused the incorrect ID to be included in the message.

In a standard development process, developers initiate by examining the code locally, identifying and resolving any existing issues. After rectifying these concerns, they redeploy the code to a testing environment. An automated test or a CI/CD pipeline then assesses the code to confirm that the implemented fixes do not negatively impact other functionalities. Once the code undergoes comprehensive testing and evaluation, it is deployed to production. This procedure might be iterative if the initial attempt at fixing the code is unsuccessful, or when developers require additional logs and traces to better comprehend issues occurring in the production environment.

Bug Hunting, The Lightrun Way

Using Lightrun, the debugging phase will only take a few seconds.

Adding the following code to your celery.py file (where Celery is initiated) is the first step required before starting. Also, you need to create a Lightrun account to get your key.

from celery.signals import task_prerun
import os

@task_prerun.connect()
def task_prerun(**kwargs):
    """
    This function is called before each task is executed. It enables Lightrun to track the task.
    """
    try:
        import lightrun
        lightrun.enable(
            company_key=os.environ.get('LIGHTRUN_COMPANY_KEY'),
            metadata_registration_tags='[{"name": "dev"}]'
        )
    except ImportError as e:
        print("Error importing Lightrun: ", e)

If you are using VSCode, start by installing the Lightrun extension. Lightrun currently supports IntelliJ IDEA, PyCharm, WebStorm, Visual Studio Code (VSCode), VSCode for the web (vscode.dev), and code-server.

Now, let’s go back to the task code:

@task(name="myapp.tasks.process_transaction_task") def process_transaction_task(product_id, user_id): # payment processing simulation time.sleep(2) product = Product.objects.get(product_id=product_id) user = User.objects.get(id=user_id) transaction = Transaction.objects.create(product=product, user=user) transaction_id = transaction.transaction_id product.stock_quantity -= 1 product.save() return transaction_id

Right-click on the last line, click on “Lightrun”, then choose “Insert a Snapshot” from VSCode menu.

A snapshot is a one-time “breakpoint” that doesn’t block the celery task from running; as opposed to a traditional breakpoint, snapshots collect the stack trace and variables without interrupting the task at all. By replicating the steps taken by your users (in this case, a straightforward checkout), the Lightrun VSCode extension begins capturing the task’s stack trace from the environment configured earlier through lightrun.enable.

For example, in our development environment, we are using:

lightrun.enable( company_key=os.environ.get('LIGHTRUN_COMPANY_KEY'), metadata_registration_tags='[{"name": "dev"}]' )

In our production environment, we can use:

lightrun.enable( company_key=os.environ.get('LIGHTRUN_COMPANY_KEY'), metadata_registration_tags='[{"name": "prod"}]'

You will find the different registration tags you manage on the same Lightrun panel. This is how it shows on VSCode:

Back to the snapshot we captured which is available on the “Snapshot” tab in the same panel:

By clicking on the snapshot, you will be able to access the stack trace of the Celery job, including the function itself, the Celery call, the worker, and so on.

You can start by inspecting the first call, if it is not helpful, move to the second, and so on. For example, after accessing the trace_task which is the second trace, we can see the request dictionary processed by the task.

Surprisingly the correlation_id has the same value as the transaction id.

By definition, in the Celery protocol, correlation_id is the task UUID. This means that transaction_id is capturing the value of the Celery task id instead of the real transaction id.

def process_transaction(request, product_id):
>>  transaction_id = process_transaction_task.delay(product_id, request.user.id)
    return HttpResponseRedirect(reverse("shop:transaction_detail", kwargs={"transaction_id": transaction_id, "product_id": product_id}))

This is how by navigating through the stack trace, we were able to understand that there was a bug. In other words, the transaction id is getting a wrong value.

A fix here should be similar to the following:

def process_transaction(request, product_id):
    transaction_id = process_transaction_task.delay(product_id, request.user.id)
    transaction_id.wait()
    transaction_id = transaction_id.result
    return HttpResponseRedirect(reverse("shop:transaction_detail", kwargs={"transaction_id": transaction_id, "product_id": product_id}))

Conditional Filtering – More Accurate Debugging

In specific scenarios, it becomes necessary to filter snapshots based on criteria such as user, product, or other objects or values. This can be achieved by incorporating a conditional statement while applying the same logic:

In the given example above, we capture traces exclusively for the user with an ID of 1. However, the condition could be different, such as the product ID. The range of applicable conditions depends on your specific use case. Ultimately, this feature enables more precise debugging experiences.

Instant Access to Observability

What sets Lightrun apart is its ability to allow developers to debug their code in production, without the need to add a single line of code to their codebase. This means that developers can diagnose and fix issues in real-time, as they arise, without having to go through the traditional debugging process of deploying, testing, and re-deploying their code.

Lightrun achieves this by using “non-breaking breakpoints” that allow developers to inspect and modify the state of their running application, without interrupting its execution. This means that developers can gain full visibility into the execution of their Celery tasks, as well as the values of variables and functions, in real-time.

Using Lightrun to debug Celery applications is a game-changer for software engineering teams as it saves time, effort, and resources. As a result, software teams, including developers, operation, and observability teams, can achieve a quicker time-to-market, enhanced user experiences, and increased overall productivity

Using Lightrun with Celery is straightforward, and it can be integrated seamlessly into any Celery-based application. With just a few clicks, we have got immediate feedback on an ambiguous issue that arose in production without the need to redeploy a new code.

What’s next?

You can start using Lightrun today or request a demo to learn more. Alternatively, take a look at our Playground where you can play around with Lightrun in a real, live app without any configuration required.

The post A Comprehensive Guide to Troubleshooting Celery Tasks with Lightrun appeared first on Lightrun.

Lightrun Launches New .NET Production Troubleshooting Solution: Revolutionizing Runtime Debugging

Eran Kinsbruner — Mon, 24 Apr 2023 11:31:49 +0000

Introduction and Background

Lightrun, the leading Developer Observability Platform for production environments, announced today that it has extended its support to include C# on its plugins for JetBrains Rider, VSCode, and VSCode.dev. With this new runtime support, .NET developers can troubleshoot their apps against .NET Framework 4.6.1+, .NET Core 2.0+, and .NET 5.0+ technologies.

This new runtime support enables developers to seamlessly integrate Lightrun’s dynamic instrumentation and live debugging capabilities into their .NET applications running on these popular development platforms without requiring any code changes or redeployments.

It’s important to understand that Microsoft’s .NET technology underwent a significant evolution over the past years from being a closed development technology up until .NET 5.0 release in 2021, when it became open-source and publicly available on other operating systems like Linux.

With the above transformation and growing adoption of .NET, development teams are faced with the ongoing task of delivering code at an increasingly rapid pace without sacrificing the critical components of quality and security. This challenge is exacerbated by the intricate and elaborate nature of distributed architecture. While this architecture allows for ease of development and scalability, it also presents a unique challenge in terms of identifying and resolving issues that may arise.

Discovering an issue within your code, whether through personal observation or client feedback, can be an unpleasant experience. The cloud environment, particularly within a distributed and serverless context, can make it challenging to understand the root cause of the problem. As multiple microservices communicate with each other, it is difficult to gain a comprehensive view of the entire application. Moreover, recreating the same situation locally is almost impossible. The situation is compounded by the dynamic nature of the environment, where instances are deployed and removed unpredictably, making it even more challenging to gain visibility into the issue at hand.

Troubleshooting .NET – Current Options

As mentioned earlier, troubleshooting .NET applications can be a very complex task for developers. Within the native IDEs, .NET developers have various options to address issues encountered during the production phase. Here are some of them:

Crash Dumps can provide insight into why code crashes, but analyzing them requires specific skills. However, crash dumps cannot identify the root cause of the issue leading to the crash.
Metric and Counters to collect and visualize data from remotely executing code can provide better insights into the system. However, there are limitations to dynamically adjusting the scope of focus to specific variables or methods.
Logging is a common and straightforward technique to help developers identify issues in running code. It provides a wealth of information, but irrelevant data can be stored, particularly when logging at scale. Consolidating logs into a single service, such as Application Insights, requires accurate filtering to separate important data from the noise. Additionally, static logging does not provide flexibility in what to log or the duration of the logging session.
Snapshot debugging allows for remote debugging of code within the IDE. When exceptions are thrown, a debug snapshot is collected and stored in Application Insights, including the stack trace. A minidump can also be obtained in Visual Studio to enhance visibility into the issue’s root cause. However, this is still a reactive solution that requires a more proactive approach to problem-solving. It also only works on specific ASP.NET applications.
Native IDE Debuggers – built-in debuggers like the one that comes with the JetBrains Rider IDE is good for local debugging, however, it has few drawbacks. It stops the running app for breakpoints and other new telemetry collections, and cannot really help with complex recreation of production issues.

Live Debugging of .NET Applications with Lightrun!

Lightrun provides a unique solution for developers who want to improve their code’s observability without the need for code modifications during runtime. The magic ingredient is the Lightrun agent that enables dynamic logging and observability that is added to the solution in the form of a NuGet package. Once included, it is initialized in your application starting code. And that is it. From there, the Lightrun agent takes care of everything else. And by that, it means connecting to your application at any time to retrieve logs and snapshots (virtual breakpoints) on an ad-hoc basis, without the unnecessary code modification/instrumentation, going through the rigorous code changes approval process, testing and deployment.

Getting Started with Lightrun for .NET within VSCode and Rider IDEs

As shown in the below workflow diagram, .NET developers can get started with Lightrun by following the 5 simple steps.

Detailed Step by Step Instructions: VS Code IDE

STEP 1: Onboarding and Installation

To get started with Lightrun for .NET in VS Code IDE, create a Lightrun account.

Then follow the below steps to prepare your C# application to work with Lightrun (Code sample is available in this Github repository).

Once you’ve obtained your Lightrun secret key from the Lightrun management portal (https://app.lightrun.com), please install the VS Code IDE plugin from the extensions marketplace as shown in the below screenshot.

Post the IDE plugin installation and your user authentication, you should be able to start troubleshooting your C# application directly from your IDE by running it from either your IDE, command-line, or GitHub Actions.

STEP 3: Snapshots

From the VS Code IDE, run your C# application. Once the app is running, developers can add virtual breakpoints (called Lightrun Snapshots) without the need to stop the running app as well as add dynamic and conditional logs.The below screenshot shows how a Lightrun snapshot would look inside the VS Code plugin, and how it provides the developers with the full variable information and call stack so they can quickly understand what’s going on and resolve the issues at hand.

The above example demonstrates a Lightrun snapshot hit on the C# Fibonacci application (line 42).

STEP 4: Conditional Snapshots

In the same way developers can use the Lightrun plugin to add snapshots, they can also include conditions to validate specific use cases throughout the troubleshooting workflow. As seen in the below example, developers can add a conditional snapshot that only captures the fibonacci number that is divisible by 5 (line 51 in the Fibonacci code sample).

STEP 5: Logs

Below screenshot shows how a log could be used within the VS Code IDE plugin to gather specific variable outputs that could be piped into the Lightrun IDE console or the STDout. Such troubleshooting telemetry can be also easily shared from within the plugin via Slack, Jira, or a simple URL as a way of facilitating cross-team collaboration.

Step 6: Conditional Logs

To add a condition to the troubleshooting logs output, you can use the conditional logs field within the IDE plugin as shown below. For the fibonacci code sample, we can only pipe to the Lightrun console the numbers (n) that are divisible by 5 (n % 5 == 0).

Detailed Step by Step Instructions: Rider IDE

In a similar method like with the VS Code IDE, developers can follow the steps below within the Rider IDE and troubleshoot their C# applications.

STEP 1: Onboarding and Installation

To get started with Lightrun for .NET in Rider IDE, create a Lightrun account.

Then follow the below steps to prepare your C# application to work with Lightrun.

Once you’ve obtained your Lightrun secret key from the Lightrun management portal as noted above, please install the Rider IDE plugin from the extensions marketplace as shown in the below screenshot.

STEP 3: Snapshots

Within the Lightrun Rider IDE once the application is running, developers can add virtual breakpoints (called Lightrun Snapshots) without the need to stop the running app as well as add dynamic and conditional logs.The below screenshot shows how a Lightrun snapshot would look inside the VS Code plugin, and how it provides the developers with the full variable information and call stack so they can quickly understand what’s going on and resolve the issues at hand.

The above example demonstrates a Lightrun snapshot hit on the C# Fibonacci application (line 52).

STEP 4: Conditional Snapshots

STEP 5: Logs

The above example demonstrates a Lightrun logpoint that was added to the C# Fibonacci application (line 31) and the log output displayed in the dedicated Lightrun console.

Step 6: Conditional Logs

Bottom Line

With Lightrun, developers can identify and resolve issues proactively, reducing downtime, and enhancing the user experience while also reducing overall costs of logging. If you’re a .NET developer looking to simplify your debugging process, give Lightrun a try and experience the difference it can make in your development workflow.

To see it live, please book a demo here!

The post Lightrun Launches New .NET Production Troubleshooting Solution: Revolutionizing Runtime Debugging appeared first on Lightrun.

Mastering Complex Progressive Delivery Challenges with Lightrun

Dror Bereznitsky — Sun, 21 May 2023 15:08:37 +0000

Introduction

Progressive delivery is a modification of continuous delivery that allows developers to release new features to users in a gradual, controlled fashion.

It does this in two ways.

Firstly, by using feature flags to turn specific features ‘on’ or ‘off’ in production, based on certain conditions, such as specific subsets of users. This lets developers deploy rapidly to production and perform testing there before turning a feature on.

Secondly, they roll out new features to users in a gradual way using canary releases, which involves making certain features available to only a small percentage of the user base for testing before releasing it to all users.

Practices like these allow developers to get incredibly granular with how they release new features to their userbase to minimize risk.

But there are downsides to progressive delivery: it can create very complex code that is challenging to troubleshoot.

Troubleshooting Complex Code

Code written for progressive delivery is highly conditional. It can contain many feature flag branches that respond differently to different subsets of users based on their data profile or specific configuration. You can easily end up with hard-to-follow code composed of complex flows with many conditional statements.

This means that your code becomes very unpredictable. It’s not always clear which code path will be invoked as this is highly dependent on which user is active and in what circumstances.

The difficulty comes when you discover an issue, vulnerability or bug that is related to one of these complex branches and only occurs under certain very specific conditions. It becomes very difficult to determine which code path contains the bug and what information you need to gather to fix it.

This becomes a major barrier to identifying any problems and resolving them effectively.

The Barriers To Resolving Issues In Progressive Delivery

When a problem arises in this complex progressive delivery context, your developers can spend a huge amount of time trying to discern the location and nature of the actual problem amidst all the complexity.

There are three main ways this barrier manifests:

Parsing conditional statements in the code path

Developers have to determine the actual code path that is being executed when the problem arises, a non-trivial issue when there are many different feature flags that are being conditionally triggered by different users in unpredictable ways.

Among all these different possibilities it is very hard to determine which conditional statements will run and therefore to statically analyze the code path that will be executed.

Developers have to add new logs to track the flow of code, forcing them to redeploy the application. Sometimes many rounds of logging/redeployment is required before they get the information they need, which is incredibly time-consuming.

Emulating the production environment locally

Secondly, once the right code path has been isolated, they have to replicate that complex, conditional code on their local machine to test potential fixes.

But if there are many feature flags and conditional statements, it is very hard to emulate that locally to reproduce and assess the problem given the complexity of the production environment.

A huge amount of time and energy is needed to do this, with no guarantee that you will be able to perfectly replicate the production environment.

Generating synthetic traffic that matches the user profiles

Thirdly, when the code path that is executed is highly dependent on specific data (e.g. user data) it is hard to simulate the workloads synthetically in order to properly test the solution in a way that accurately mirrors the production environment.

Yet more time and energy must be expended to trigger the issue in the test environment in a way that gives developers the information they need to properly resolve the issue.

Using Lightrun to Troubleshoot Progressive Delivery

Developer time is extremely valuable. They can waste a lot of time dealing with these niggling hurdles to remediation that could be spent creating valuable new features.

But there is a new approach that can overcome these barriers: dynamic observability.

Lightrun is a dynamic observability platform that enables developers to add logs, metrics and snapshots to live applications—without having to release a new version or even stop the running process.

In the context of progressive delivery, Lightrun enables you to use real-time dynamic instrumentation to:

Identify the actual workflow affected by the issue
Capture the relevant information from that workflow

This means that you can identify and understand your bug or vulnerability without having to redeploy the application or recreate the issue locally, regardless of the complexity of the code.

There are two features of Lightrun that are particularly potent in this regard: Lightrun dynamic logs and snapshots.

Dynamic Logs

You can deploy dynamic logs within each feature flag branch in real-time, providing full visibility into the progressive delivery process without having to redeploy the application.

Unlike regular logging statements which are printed for all requests served by the system. Dynamic logging can target specific users or user segments, using conditions, making them more precise and much less noisy

If there’s a new issue you want to track or a new feature flag branch you want to start logging, you can just add it on the fly. Then you can flow through the execution and watch your application’s state change with every flag flip using real user data, right from the IDE, without having to add endless ‘if/else’ statements in the process.

Granular Snapshots

Similarly, you can place Snapshots – essentially virtual breakpoints – inside any flag-created branch, giving you debugger-grade visibility into each rollout. This gives your developers on-demand access to whatever information they need about the code flows that are affected by your issue.

All the classic features you know from traditional debugging tools, plus many more, are available in every snapshot, which:

Can be added to multiple instances, simultaneously
Can be added conditionally, without writing new code
Provides full-syntax expression evaluation
Is safe, read-only and performant
Can be placed, viewed and edited right from the IDE

By enabling your developers to track issues through complex code and gather intelligence on-demand – all without having to redeploy the app or even leave the IDE – makes troubleshooting progressive delivery codebases much easier.

Developer Benefits Of Using Lightrun

Determine which workflow is being executed during a given issue

Developers can identify exactly which workflow is relevant. This means they no longer have the hassle of troubleshooting sections of code that are not vulnerable because they are not being executed or redeploying the application to insert log messages to track the code flow.

No need to locally reproduce issues

By dynamically observing the application pathways at runtime, you avoid the need to invest significant time and energy into reproducing your production environment locally along with all the complexity of feature flags and config options.

No need to create highly-specific synthetic traffic

Similarly, there is no need to emulate customer workloads by creating highly conditional synthetic traffic to trigger the particular code path in question.

Overall, developers can save a huge amount of time and energy that was previously being sunk into investigating complexity in different ways.

Final Thoughts

Dynamic observability gives you much deeper and faster insights into what’s going on in your code.

With Lightrun’s revolutionary developer-first approach, we enable engineering teams to connect to their live applications and continuously identify critical issues without hotfixes, redeployments or restarts.

If you need a hand troubleshooting complex code flows or dealing with highly conditional progressive delivery scenarios, get in touch.

The post Mastering Complex Progressive Delivery Challenges with Lightrun appeared first on Lightrun.

16 Top Java Communities, Forums and Groups: The Ultimate Guide

Lightrun Team — Wed, 29 Jul 2020 08:27:38 +0000

Developers didn’t need the transformation to WFH to understand the value of online communities and forums for their professional development and work. Many developer groups and forums offer supporting professional communities that assist with questions, dilemmas and advice. In fact, there are so many developer groups, it might be hard to choose. We’ve prepared a guide, which maps out the most active and helpful Java groups, Java forums and Java communities.

Java is a strong, popular and reliable programming language, used widely around the world. Among 9,000,000 Java developers, to be exact. Java supports multiple platforms, new architectures as well as older legacy systems, and enables developing many strong features. Whether you’re a Java newbie or a seasoned professional with 20 years of experience, you can benefit from Java user groups and forums.

We created a whitepaper with the top 16 Java forums, communities, and groups. Most of them cover a wide array of topics: from programming questions to discussions to job opportunities. Engaging in them can help your professional advancement, both by asking questions and by giving advice. They are also a place for networking and building your own personal brand.

Before posting in each, please make sure you check the appropriate type of content and questions on each community. Follow each community’s guidelines when joining them.

Here are the first three communities. You can get the full list by getting the whitepaper, here.

1. Java Tagged Questions on Stack Overflow

Stack Overflow is still the undisputed leader in developer forums and help. For anything Java related search for the questions tagged as Java, or just go here. You can ask and answer anything that has to do with learning and understanding Java itself. This is the place for inserting snippets of code and asking deep-tech questions.

We highly recommend searching for answers before posting and really making an effort to solve your problem on your own, otherwise community members will not hesitate to downvote you. Don’t get discouraged and don’t take offense if they do, try to read the feedback and interact with the other developers. When wielded correctly Stack Overflow can be a very powerful tool in your arsenal.

2. Java News, Tech and Discussions – Reddit

This Reddit community hosts dozens of questions, comments and discussions every day, and the members are engaged in the conversations. The purpose of this Java community is to discuss Java and share technical content. It’s not the place for beginner tutorials or programming questions, but rather a mature discussion and the sharing of knowledge and insights. Are you looking to learn Java? Take a look at the next community.

3. Java Help, Tutorials and Questions – Reddit

Now this Java Community is the one for asking for help with your Java code. With multiple posts and responses, this vibrant community is the one for you to join if you need help, and just as important – for you to provide help to your fellow developers.

Get more recommended communities and groups on Slack, meetup.com, online forums, dev.to and more, from our whitepaper. There are 13 more communities for you to discover. Get the whitepaper here.

The post 16 Top Java Communities, Forums and Groups: The Ultimate Guide appeared first on Lightrun.

developers Archives - Lightrun

Observability vs. Monitoring

What is monitoring in DevOps?

DevOps Monitoring Metrics

What is Observability in DevOps?

The Three Pillars of Observability

1. Logs

2. Metrics

3. Trace

Monitoring vs. Observability: What’s the Difference?

Observability vs. Monitoring: What do they have in common?

Why both are essential?

Dynamic Observability Tools for API Live Debugging

Intro

Challenges in API integration

Lightrun Dynamic Observability for API debugging

Bottom Line

Lightrun Attendance at FinOps X 2023: Unveiling Key Insights, Highlights and Takeaways from the Show

Introduction

Key Takeaways and Highlights

High-Level Key Takeaways

Spotlight on Event Themes at FinOps X 2023

Focus Project Launch !

Shift Left FinOps Practices Automation

Generative-AI and Machine Learning within FinOps Practice

The Future of FinOps

Concluding Thoughts

Maximizing CI/CD Pipeline Efficiency: How to Optimize your Production Pipeline Debugging?

Introduction

Issues with CI/CD Pipelines and Debugging

Identifying the root cause of the issue

How to Optimize your CI/CD Pipeline?

Why Should Your CI/CD Pipeline be Observable?

Architecture CI/CD pipeline

The CODE

BUILD Process

Unit & Integration TESTS

Deploy Application

Post-Deploy Health Checks

Visually Illustrating Points of Failure in the CI/CD Pipeline

Dynamic Debugging and Logging to Remedy your Pipeline Ailments

How to Inject Agents into your CI/CD pipeline?

Achieving Unified output to your favorite centralized logging service

Validation of CI/CD Pipeline Code State

Final thoughts

A Comprehensive Guide to Troubleshooting Celery Tasks with Lightrun

Celery: Powerful but Challenging

The Code Used in this Example

Bug Hunting, the Tedious Way

Bug Hunting, The Lightrun Way

Conditional Filtering – More Accurate Debugging

Instant Access to Observability

What’s next?

Lightrun Launches New .NET Production Troubleshooting Solution: Revolutionizing Runtime Debugging

Introduction and Background

Troubleshooting .NET – Current Options

Live Debugging of .NET Applications with Lightrun!

Getting Started with Lightrun for .NET within VSCode and Rider IDEs

Detailed Step by Step Instructions: VS Code IDE

Detailed Step by Step Instructions: Rider IDE

Bottom Line

Mastering Complex Progressive Delivery Challenges with Lightrun

Introduction

Troubleshooting Complex Code

The Barriers To Resolving Issues In Progressive Delivery

Parsing conditional statements in the code path

Emulating the production environment locally

Generating synthetic traffic that matches the user profiles

Using Lightrun to Troubleshoot Progressive Delivery

Dynamic Logs

Granular Snapshots

Developer Benefits Of Using Lightrun

Determine which workflow is being executed during a given issue

No need to locally reproduce issues

No need to create highly-specific synthetic traffic

Final Thoughts

16 Top Java Communities, Forums and Groups: The Ultimate Guide

1. Java Tagged Questions on Stack Overflow

2. Java News, Tech and Discussions – Reddit

3. Java Help, Tutorials and Questions – Reddit