[Beta] Bulk Exporting Trace Data
Please note that the Data Export functionality is in Beta and only supported for LangSmith Plus or Enterprise tiers.
LangSmith's bulk data export functionality allows you to export your traces into an external destination. This can be useful if you want to analyze the data offline in a tool such as BigQuery, Snowflake, RedShift, Jupyter Notebooks, etc.
An export can be launched to target a specific LangSmith project and date range. Once a batch export is launched, our system will handle the orchestration and resilience of the export process. Please note that exporting your data may take some time depending on the size of your data. We also have a limit on how many of your exports can run at the same time. Bulk exports also have a runtime timeout of 24 hours.
Destinations
Currently we support exporting to an S3 bucket or S3 API compatible bucket that you provide. The data will be exported in Parquet columnar format. This format will allow you to easily import the data into other systems. The data export will contain equivalent data fields as the Run data format.
Exporting Data
Destinations - Providing a S3 bucket
To export LangSmith data, you will need to provide an S3 bucket where the data will be exported to.
The following information is needed for the export:
- Bucket Name: The name of the S3 bucket where the data will be exported to.
- Prefix: The root prefix within the bucket where the data will be exported to.
- S3 Region: The region of the bucket - this is needed for AWS S3 buckets.
- Endpoint URL: The endpoint URL for the S3 bucket - this is needed for S3 API compatible buckets.
- Access Key: The access key for the S3 bucket.
- Secret Key: The secret key for the S3 bucket.
We support any S3 compatible bucket, for non AWS buckets such as GCS or MinIO, you will need to provide the endpoint URL.
Preparing the Destination
The following example demonstrates how to create a destination using cURL. Replace the placeholder values with your actual configuration details. Note that credentials will be stored securely in an encrypted form in our system.
curl --request POST \
--url 'https://api.smith.langchain.com/api/v1/bulk-exports/destinations' \
--header 'Content-Type: application/json' \
--header 'X-API-Key: YOUR_API_KEY' \
--header 'X-Tenant-Id: YOUR_WORKSPACE_ID' \
--data '{
"destination_type": "s3",
"display_name": "My S3 Destination",
"config": {
"bucket_name": "your-s3-bucket-name",
"prefix": "root_folder_prefix",
"region": "your aws s3 region",
"endpoint_url": "your endpoint url for s3 compatible buckets"
},
"credentials": {
"access_key_id": "YOUR_S3_ACCESS_KEY_ID",
"secret_access_key": "YOUR_S3_SECRET_ACCESS_KEY"
}
}'
Use the returned id
to reference this destination in subsequent bulk export operations.
GCS XML S3 compatible bucket
Here is an example of what to use for endpoint_url
if you wanted to use the GCS XML API which is compatible with S3:
curl --request POST \
--url 'https://api.smith.langchain.com/api/v1/bulk-exports/destinations' \
--header 'Content-Type: application/json' \
--header 'X-API-Key: YOUR_API_KEY' \
--header 'X-Tenant-Id: YOUR_WORKSPACE_ID' \
--data '{
"destination_type": "s3",
"display_name": "My GCS Destination",
"config": {
"bucket_name": "my_bucket",
"prefix": "data_exports",
"endpoint_url": "https://storage.googleapis.com"
},
"credentials": {
"access_key_id": "YOUR_S3_ACCESS_KEY_ID",
"secret_access_key": "YOUR_S3_SECRET_ACCESS_KEY"
}
}'
See Google documentation for more info
Create an export job
To export data, you will need to create an export job. This job will specify the destination, the project, and the date range of the data to export.
You can use the following cURL command to create the job:
curl --request POST \
--url 'https://api.smith.langchain.com/api/v1/bulk-exports' \
--header 'Content-Type: application/json' \
--header 'X-API-Key: YOUR_API_KEY' \
--header 'X-Tenant-Id: YOUR_WORKSPACE_ID' \
--data '{
"bulk_export_destination_id": "your_destination_id",
"session_id": "project_uuid",
"start_time": "2024-01-01T00:00:00Z",
"end_time": "2024-01-02T23:59:59Z"
}'
Use the returned id
to reference this export in subsequent bulk export operations.
Monitoring the Export Job
Monitor Export Status
To monitor the status of an export job, use the following cURL command:
curl --request GET \
--url 'https://api.smith.langchain.com/api/v1/bulk-exports/{export_id}' \
--header 'Content-Type: application/json' \
--header 'X-API-Key: YOUR_API_KEY' \
--header 'X-Tenant-Id: YOUR_WORKSPACE_ID'
Replace {export_id}
with the ID of the export you want to monitor. This command retrieves the current status of the specified export job.
List Runs for an Export
An export is typically broken up into multiple runs which correspond to a specific date partition to export. To list all runs associated with a specific export, use the following cURL command:
curl --request GET \
--url 'https://api.smith.langchain.com/api/v1/bulk-exports/{export_id}/runs' \
--header 'Content-Type: application/json' \
--header 'X-API-Key: YOUR_API_KEY' \
--header 'X-Tenant-Id: YOUR_WORKSPACE_ID'
This command fetches all runs related to the specified export, providing details such as run ID, status, creation time, rows exported, etc.
List All Exports
To retrieve a list of all export jobs, use the following cURL command:
curl --request GET \
--url 'https://api.smith.langchain.com/api/v1/bulk-exports' \
--header 'Content-Type: application/json' \
--header 'X-API-Key: YOUR_API_KEY' \
--header 'X-Tenant-Id: YOUR_WORKSPACE_ID'
This command returns a list of all export jobs along with their current statuses and creation timestamps.