Firecrawl

Learn how to use Firecrawl with Composio

Overview

Enum

FIRECRAWL

Description

Firecrawl automates web crawling and data extraction, enabling organizations to gather content, index sites, and gain insights from online sources at scale

Authentication Details

api_key
stringRequired
base_url
stringDefaults to https://api.firecrawl.dev/v1

Actions

Cancels an active or queued web crawl job using its id; attempting to cancel completed, failed, or previously canceled jobs will not change their state.

Action Parameters

id
stringRequired

Action Response

data
object
error
successful
boolean
Retrieves the current status, progress, and details of a web crawl job, using the job id obtained when the crawl was initiated.

Action Parameters

id
stringRequired

Action Response

data
object
error
successful
boolean
Initiates a firecrawl web crawl from a given url, applying various filtering and content extraction rules, and polls until the job is complete; ensure the url is accessible and any regex patterns for paths are valid.

Action Parameters

allowBackwardLinks
boolean
allowExternalLinks
boolean
delay
integer
excludePaths
array
ignoreQueryParameters
boolean
ignoreSitemap
booleanDefaults to True
includePaths
array
limit
integerDefaults to 10
maxDepth
integerDefaults to 2
maxDiscoveryDepth
integer
scrapeOptions_actions
array
scrapeOptions_blockAds
boolean
scrapeOptions_changeTrackingOptions
object
scrapeOptions_excludeTags
array
scrapeOptions_formats
arrayDefaults to ['markdown']
scrapeOptions_headers
object
scrapeOptions_includeTags
array
scrapeOptions_jsonOptions
object
scrapeOptions_location
object
scrapeOptions_maxAge
integer
scrapeOptions_mobile
boolean
scrapeOptions_onlyMainContent
booleanDefaults to True
scrapeOptions_parsePDF
boolean
scrapeOptions_proxy
string
scrapeOptions_removeBase64Images
boolean
scrapeOptions_skipTlsVerification
boolean
scrapeOptions_storeInCache
boolean
scrapeOptions_timeout
integer
scrapeOptions_waitFor
integerDefaults to 123
url
stringRequired
webhook
string

Action Response

data
object
error
successful
boolean
Extracts structured data from web pages by initiating an extraction job and polling for completion; requires a natural language `prompt` or a json `schema` (one must be provided).

Action Parameters

enable_web_search
boolean
prompt
string
schema
object
urls
arrayRequired

Action Response

data
object
error
successful
boolean
Maps a website by discovering urls from a starting base url, with options to customize the crawl via search query, subdomain inclusion, sitemap handling, and result limits; search effectiveness is site-dependent.

Action Parameters

ignoreSitemap
booleanDefaults to True
includeSubdomains
boolean
limit
integerDefaults to 5000
search
string
url
stringRequired

Action Response

data
object
error
successful
boolean
Scrapes a publicly accessible url, optionally performing pre-scrape browser actions or extracting structured json using an llm, to retrieve content in specified formats.

Action Parameters

actions
array
excludeTags
array
formats
arrayDefaults to ['markdown']
includeTags
array
jsonOptions
object
location
object
onlyMainContent
booleanDefaults to True
timeout
integerDefaults to 30000
url
stringRequired
waitFor
integer

Action Response

data
object
error
successful
boolean