Get started with LangSmith

LangSmith is a platform for building production-grade LLM applications. It allows you to closely monitor and evaluate your application, so you can ship quickly and with confidence. With LangSmith you can:

Trace LLM Applications: Gain visibility into LLM calls and other parts of your application's logic.
Evaluate Performance: Compare results across models, prompts, and architectures to identify what works best.
Improve Prompts: Quickly refine prompts to achieve more accurate and reliable results.

LangSmith + LangChain OSS

LangSmith integrates seamlessly with LangChain's open source frameworks langchain and langgraph, with no extra instrumentation needed.

If you're already using either of these, see the how-to guide for setting up LangSmith with LangChain or setting up LangSmith with LangGraph.

LangSmith is a standalone platform that can be used on it's own no matter how you're creating your LLM applicatons.

In this tutorial, we'll walk you though logging your first trace in LangSmith using the LangSmith SDK and running an evaluation to measure the performance of your application. This example uses the OpenAI API, however you can use your provider of choice.

1. Install LangSmith

Python
TypeScript

pip install -U langsmith openai

yarn add langsmith openai

2. Create an API key

To create an API key head to the Settings page. Then click Create API Key.

3. Set up your environment

Shell

export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_API_KEY=<your-api-key>

export OPENAI_API_KEY=<your-openai-api-key>

4. Log your first trace

We provide multiple ways to log traces to LangSmith. Below, we'll highlight how to use traceable(). See more on the Annotate code for tracing page.

Python
TypeScript

import openai
from langsmith import wrappers, traceable

# Auto-trace LLM calls in-context
client = wrappers.wrap_openai(openai.Client())

@traceable # Auto-trace this function
def pipeline(user_input: str):
    result = client.chat.completions.create(
        messages=[{"role": "user", "content": user_input}],
        model="gpt-4o-mini"
    )
    return result.choices[0].message.content

pipeline("Hello, world!")
# Out:  Hello there! How can I assist you today?

import { OpenAI } from "openai";
import { traceable } from "langsmith/traceable";
import { wrapOpenAI } from "langsmith/wrappers";

// Auto-trace LLM calls in-context
const client = wrapOpenAI(new OpenAI());
// Auto-trace this function
const pipeline = traceable(async (user_input) => {
    const result = await client.chat.completions.create({
        messages: [{ role: "user", content: user_input }],
        model: "gpt-4o-mini",
    });
    return result.choices[0].message.content;
});

await pipeline("Hello, world!")
// Out: Hello there! How can I assist you today?

Learn more about tracing in the observability tutorials, conceptual guide and how-to guides.

5. View your trace

By default, the trace will be logged to the project with the name default. You should see the following sample output trace logged using the above code.

6. Run your first evaluation

Evaluations help assess application performance by testing the application against a given set of inputs. Evaluations require a system to test, data to serve as test cases, and evaluators to grade the results.

Here we are running an evaluation against a sample dataset using a simple custom evaluator that checks if the real output exactly matches our gold-standard output.

Python
TypeScript

from langsmith import Client, traceable

client = Client()

# Define dataset: these are your test cases
dataset = client.create_dataset(
    "Sample Dataset",
    description="A sample dataset in LangSmith.",
)

client.create_examples(
    inputs=[
        {"postfix": "to LangSmith"},
        {"postfix": "to Evaluations in LangSmith"},
    ],
    outputs=[
        {"response": "Welcome to LangSmith"},
        {"response": "Welcome to Evaluations in LangSmith"},
    ],
    dataset_id=dataset.id,
)

# Define an interface to your application (tracing optional)
@traceable
def dummy_app(inputs: dict) -> dict:
    return {"response": "Welcome " + inputs["postfix"]}

# Define your evaluator(s)
def exact_match(outputs: dict, reference_outputs: dict) -> bool:
    return outputs["response"] == reference_outputs["response"]

# Run the evaluation
experiment_results = client.evaluate(
    dummy_app, # Your AI system goes here
    data=dataset, # The data to predict and grade over
    evaluators=[exact_match], # The evaluators to score the results
    experiment_prefix="sample-experiment", # The name of the experiment
    metadata={"version": "1.0.0", "revision_id": "beta"}, # Metadata about the experiment
    max_concurrency=4,  # Add concurrency.
)

# Analyze the results via the UI or programmatically
# If you have 'pandas' installed you can view the results as a
# pandas DataFrame by uncommenting below:

# experiment_results.to_pandas()

import { Client } from "langsmith";
import { EvaluationResult, evaluate } from "langsmith/evaluation";

const client = new Client();

// Define dataset: these are your test cases
const datasetName = "Sample Dataset";
const dataset = await client.createDataset(datasetName, {
  description: "A sample dataset in LangSmith.",
});
await client.createExamples({
  inputs: [
    { postfix: "to LangSmith" },
    { postfix: "to Evaluations in LangSmith" },
  ],
  outputs: [
    { response: "Welcome to LangSmith" },
    { response: "Welcome to Evaluations in LangSmith" },
  ],
  datasetId: dataset.id,
});

// Define your evaluator(s)
const exactMatch = async ({ outputs, referenceOutputs }: {
  outputs?: Record<string, any>;
  referenceOutputs?: Record<string, any>;
}): Promise<EvaulationResult> => {
  return {
    key: "exact_match",
    score: outputs?.response === referenceOutputs?.response,
  };
};

// Run the evaluation
const experimentResults = await evaluate(
  (inputs: { postfix: string }) => ({ response: `Welcome ${inputs.postfix}` }),
  {
    data: datasetName,
    evaluators: [exactMatch],
    metadata: { version: "1.0.0", revision_id: "beta" },
    maxConcurrency: 4,
  }
);

Click the link printed out by your evaluation run to access the LangSmith experiments UI, and explore the results of your evaluation.
Learn more about evaluation in the tutorials, conceptual guide, and how-to guides.

Get started with LangSmith

1. Install LangSmith

2. Create an API key

3. Set up your environment

4. Log your first trace

5. View your trace

6. Run your first evaluation

Was this page helpful?

You can leave detailed feedback on GitHub.

1. Install LangSmith​

2. Create an API key​

3. Set up your environment​

4. Log your first trace​

5. View your trace​

6. Run your first evaluation​

Was this page helpful?

You can leave detailed feedback on GitHub.

1. Install LangSmith

2. Create an API key

3. Set up your environment

4. Log your first trace

5. View your trace

6. Run your first evaluation