Skip to content

Ocean Access (S3-compatible API)

Ocean provides an S3-compatible API for accessing your licensed datasets programmatically. This is the recommended approach for developers and for downloading large datasets efficiently.

To access datasets via Ocean, you need:

  1. The slug of the dataset and organization, these are displayed on the dataset access page
  2. An access key (created in Settings → Access Keys)
  1. Log into app.humannative.ai
  2. Navigate to Settings → Access Keys
  3. Click “New access key” and follow the form
  4. Important: Copy the secret immediately - the secret will only be shown once!

Add the following to your ~/.aws/config file:

[profile ocean]
endpoint_url = https://ocean.humannative.ai
s3 =
addressing_style = path

Add the following to your ~/.aws/credentials file:

[ocean]
aws_access_key_id = AKIA.... # Your access key ID
aws_secret_access_key = <your-secret-access-key>

Set the AWS profile environment variable:

Terminal window
export AWS_PROFILE=ocean

Once configured, you can use the AWS CLI as normal:

Terminal window
# List datasets for your organization
aws s3 ls <org-slug>
# Download a specific file
aws s3 cp s3://<org-slug>/<dataset-slug>/filename.ext ./
# Sync an entire dataset
aws s3 sync s3://<org-slug>/<dataset-slug>/ ./local-folder/

obstore is a Python library that provides async access to object stores. Here are examples of how to use it with Ocean:

Terminal window
pip install obstore

For the advanced examples below, you’ll also need:

Terminal window
pip install obstore boto3

Basic Usage with credentials set on environment

Section titled “Basic Usage with credentials set on environment”

This is the bare minimum of code needed to access a file with obstore. Make sure to set OCEAN_ACCESS_KEY_ID and OCEAN_SECRET_ACCESS_KEY.

import obstore as obs
from obstore.store import S3Store
store = S3Store(
"<org-slug>", # The org and dataset slug can be retrieved on the dataset page
prefix="<dataset-slug>",
endpoint="https://ocean.humannative.ai",
access_key_id=os.getenv("OCEAN_ACCESS_KEY_ID"),
secret_access_key=os.getenv("OCEAN_SECRET_ACCESS_KEY"),
virtual_hosted_style_request=False
)
# List files
files = obs.list(store).collect()
print(files)
# Download a file
resp = obs.get(store, "example.mp3")
print(resp.bytes())

See the AWS CLI setup section above for configuring your AWS credentials. Then export AWS_PROFILE=ocean and run this code:

import obstore as obs
from obstore.store import S3Store
from boto3 import Session
from obstore.auth.boto3 import Boto3CredentialProvider
# Use AWS profile for authentication (recommended)
boto_session = Session()
credential_provider = Boto3CredentialProvider(boto_session)
store = S3Store(
"<org-slug>", # The org and dataset slug can be retrieved on the dataset page
prefix="<dataset-slug>",
endpoint="https://ocean.humannative.ai",
virtual_hosted_style_request=False,
credential_provider=credential_provider,
)
# List files
files = obs.list(store).collect()
print(files)
# Download a file
resp = obs.get(store, "example.mp3")
print(resp.bytes())

Advanced Example: Downloading Entire Datasets

Section titled “Advanced Example: Downloading Entire Datasets”

For downloading complete datasets with concurrent downloads:

import asyncio
from pathlib import Path
from obstore.store import S3Store
from boto3 import Session
from obstore.auth.boto3 import Boto3CredentialProvider
import time
from datetime import timedelta
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from obstore.store import ClientConfig
async def download_file(store: S3Store, obj_meta: dict, local_file_path: Path, semaphore: asyncio.Semaphore):
"""Download a single file with concurrency control"""
async with semaphore:
try:
print(f"Downloading {obj_meta['path']} to {local_file_path} ({obj_meta['size']} bytes)...")
get_result = await store.get_async(obj_meta["path"])
with open(local_file_path, "wb") as f:
f.write(get_result.bytes())
print(f"Successfully downloaded {local_file_path}")
except Exception as e:
print(f"Error downloading {obj_meta['path']}: {e}")
async def download_dataset(org_slug: str, dataset_slug: str, local_path: str = ".", concurrency: int = 2):
"""Download an entire dataset from Ocean"""
# Setup authentication
boto_session = Session()
credential_provider = Boto3CredentialProvider(boto_session)
client_config: ClientConfig = {}
client_config["timeout"] = "2h"
# Create store
store = S3Store(
bucket=org_slug,
prefix=dataset_slug,
credential_provider=credential_provider,
client_options=client_config,
endpoint="https://ocean.humannative.ai",
virtual_hosted_style_request=False,
)
# Create local directory
local_download_path = Path(local_path) / dataset_slug
local_download_path.mkdir(parents=True, exist_ok=True)
print(f"Listing objects for dataset {dataset_slug} from organization {org_slug}...")
# List all objects
objects = await store.list().collect_async()
print(f"Found {len(objects)} objects")
if not objects:
print("No objects found")
return
# Create a semaphore to limit concurrent downloads
semaphore = asyncio.Semaphore(concurrency)
download_tasks = []
for obj_meta in objects:
# Skip directories (objects with size 0 and no extension)
if obj_meta["size"] == 0 and "." not in obj_meta["path"]:
print(f"Skipping directory: {obj_meta['path']}")
continue
local_file_path = local_download_path / obj_meta["path"]
local_file_path.parent.mkdir(parents=True, exist_ok=True)
task = download_file(store, obj_meta, local_file_path, semaphore)
download_tasks.append(task)
print(f"\nStarting parallel downloads to {local_download_path}...")
print(f"Concurrency: {concurrency}")
await asyncio.gather(*download_tasks)
print("Download completed")
if __name__ == "__main__":
# The org and dataset slug can be retrieved on the dataset page
asyncio.run(download_dataset("<org-slug>", "<dataset-slug>", local_path="./downloads"))

Obstore has a client side timeout which is set to 30 seconds by default. This might not be enough when downloading large files. In order to increase it, you need to add a client_options map to the S3Store parameters.

store = S3Store(
# ...existing params
client_options={
"timeout": "10m",
},
)

Authentication Errors

  • Verify your access key ID and secret are correct
  • Ensure your AWS profile is properly configured
  • Check that AWS_PROFILE=ocean is set if using profiles

Connection Issues

  • Verify the endpoint URL: https://ocean.humannative.ai
  • Ensure virtual_hosted_style_request=False is set
  • Check your network connectivity

Permission Errors

  • Confirm you have access to the specified organization and dataset
  • Verify your access key has the necessary permissions