Firecrawl | Composio Docs

Overview

Enum

FIRECRAWL

Description

Firecrawl automates web crawling and data extraction, enabling organizations to gather content, index sites, and gain insights from online sources at scale

Authentication Details

API_KEY

api_key

stringRequired

base_url

stringDefaults to https://api.firecrawl.dev/v1

Actions

FIRECRAWL_CANCEL_CRAWL_JOB

Cancels an active or queued web crawl job using its id; attempting to cancel completed, failed, or previously canceled jobs will not change their state.

Action Parameters

stringRequired

Action Response

data

object

error

successful

boolean

FIRECRAWL_CRAWL_JOB_STATUS

Retrieves the current status, progress, and details of a web crawl job, using the job id obtained when the crawl was initiated.

Action Parameters

stringRequired

Action Response

data

object

error

successful

boolean

FIRECRAWL_CRAWL_URLS

Initiates a firecrawl web crawl from a given url, applying various filtering and content extraction rules, and polls until the job is complete; ensure the url is accessible and any regex patterns for paths are valid.

Action Parameters

allowBackwardLinks

boolean

allowExternalLinks

boolean

delay

integer

excludePaths

array

ignoreQueryParameters

boolean

ignoreSitemap

booleanDefaults to True

includePaths

array

limit

integerDefaults to 10

maxDepth

integerDefaults to 2

maxDiscoveryDepth

integer

scrapeOptions_actions

array

scrapeOptions_blockAds

boolean

scrapeOptions_changeTrackingOptions

object

scrapeOptions_excludeTags

array

scrapeOptions_formats

arrayDefaults to ['markdown']

scrapeOptions_headers

object

scrapeOptions_includeTags

array

scrapeOptions_jsonOptions

object

scrapeOptions_location

object

scrapeOptions_maxAge

integer

scrapeOptions_mobile

boolean

scrapeOptions_onlyMainContent

booleanDefaults to True

scrapeOptions_parsePDF

boolean

scrapeOptions_proxy

string

scrapeOptions_removeBase64Images

boolean

scrapeOptions_skipTlsVerification

boolean

scrapeOptions_storeInCache

boolean

scrapeOptions_timeout

integer

scrapeOptions_waitFor

integerDefaults to 123

url

stringRequired

webhook

string

Action Response

data

object

error

successful

boolean

FIRECRAWL_EXTRACT

Extracts structured data from web pages by initiating an extraction job and polling for completion; requires a natural language `prompt` or a json `schema` (one must be provided).

Action Parameters

enable_web_search

boolean

prompt

string

schema

object

urls

arrayRequired

Action Response

data

object

error

successful

boolean

FIRECRAWL_MAP_URLS

Maps a website by discovering urls from a starting base url, with options to customize the crawl via search query, subdomain inclusion, sitemap handling, and result limits; search effectiveness is site-dependent.

Action Parameters

ignoreSitemap

booleanDefaults to True

includeSubdomains

boolean

limit

integerDefaults to 5000

string

url

stringRequired

Action Response

data

object

error

successful

boolean

FIRECRAWL_SCRAPE_EXTRACT_DATA_LLM

Scrapes a publicly accessible url, optionally performing pre-scrape browser actions or extracting structured json using an llm, to retrieve content in specified formats.

Action Parameters

actions

array

excludeTags

array

formats

arrayDefaults to ['markdown']

includeTags

array

jsonOptions

object

location

object

onlyMainContent

booleanDefaults to True

timeout

integerDefaults to 30000

url

stringRequired

waitFor

integer

Action Response

data

object

error

successful

boolean

FIRECRAWL_SEARCH

Performs a web search for a query, scrapes content from the top search results using firecrawl, and returns details in specified formats.

Action Parameters

country

stringDefaults to us

formats

array

lang

stringDefaults to en

limit

integerDefaults to 5

query

stringRequired

timeout

integerDefaults to 60000

Action Response

data

array

error

success

boolean

successful

boolean

warning