Aller au contenu

Scraper API Commands

Comprehensive Scraper API commands and workflows for web scraping and data collection.

Basic API Requests

CommandDescription
curl "http://api.scraperapi.com?api_key=YOUR_KEY&url=https://example.com"Basic scraping request
curl "http://api.scraperapi.com?api_key=YOUR_KEY&url=https://example.com&render=true"Render JavaScript
curl "http://api.scraperapi.com?api_key=YOUR_KEY&url=https://example.com&country_code=US"Use specific country
curl "http://api.scraperapi.com?api_key=YOUR_KEY&url=https://example.com&premium=true"Use premium proxies

Python Implementation

CommandDescription
pip install requestsInstall requests library
import requestsImport requests module
response = requests.get('http://api.scraperapi.com', params={'api_key': 'YOUR_KEY', 'url': 'https://example.com'})Basic Python request
response = requests.get('http://api.scraperapi.com', params={'api_key': 'YOUR_KEY', 'url': 'https://example.com', 'render': 'true'})Python with JavaScript rendering

Node.js Implementation

CommandDescription
npm install axiosInstall axios library
const axios = require('axios')Import axios module
axios.get('http://api.scraperapi.com?api_key=YOUR_KEY&url=https://example.com')Basic Node.js request
axios.get('http://api.scraperapi.com', {params: {api_key: 'YOUR_KEY', url: 'https://example.com', render: true}})Node.js with parameters

PHP Implementation

CommandDescription
$response = file_get_contents('http://api.scraperapi.com?api_key=YOUR_KEY&url=https://example.com')Basic PHP request
$context = stream_context_create(['http' => ['timeout' => 60]])Set timeout context
$response = file_get_contents('http://api.scraperapi.com?api_key=YOUR_KEY&url=https://example.com', false, $context)PHP with timeout

Ruby Implementation

CommandDescription
require 'net/http'Import HTTP library
require 'uri'Import URI library
uri = URI('http://api.scraperapi.com?api_key=YOUR_KEY&url=https://example.com')Create URI object
response = Net::HTTP.get_response(uri)Make Ruby request

Java Implementation

CommandDescription
import java.net.http.HttpClientImport HTTP client
import java.net.http.HttpRequestImport HTTP request
HttpClient client = HttpClient.newHttpClient()Create HTTP client
HttpRequest request = HttpRequest.newBuilder().uri(URI.create("http://api.scraperapi.com?api_key=YOUR_KEY&url=https://example.com")).build()Build request

Advanced Parameters

ParameterDescription
render=trueEnable JavaScript rendering
country_code=USUse specific country proxy
premium=trueUse premium proxy pool
session_number=123Use session for sticky IP
keep_headers=trueKeep original headers
device_type=desktopSet device type
autoparse=trueEnable automatic parsing
format=jsonReturn structured JSON

Geolocation Options

Country CodeDescription
country_code=USUnited States
country_code=UKUnited Kingdom
country_code=CACanada
country_code=AUAustralia
country_code=DEGermany
country_code=FRFrance
country_code=JPJapan
country_code=BRBrazil

Session Management

CommandDescription
session_number=1Use session 1
session_number=2Use session 2
session_number=123Use custom session
session_number=randomUse random session

Error Handling

Status CodeDescription
200Success
400Bad Request
401Unauthorized (invalid API key)
403Forbidden
404Not Found
429Rate limit exceeded
500Internal Server Error
503Service Unavailable

Response Formats

FormatDescription
format=htmlReturn raw HTML (default)
format=jsonReturn structured JSON
format=textReturn plain text

Custom Headers

CommandDescription
custom_headers={"User-Agent": "Custom Bot"}Set custom user agent
custom_headers={"Accept": "application/json"}Set accept header
custom_headers={"Referer": "https://google.com"}Set referer header

JavaScript Rendering

CommandDescription
render=trueEnable JavaScript rendering
wait_for_selector=.contentWait for specific element
wait_for=2000Wait for milliseconds
screenshot=trueTake screenshot

Batch Processing

CommandDescription
curl -X POST "http://api.scraperapi.com/batch" -H "Content-Type: application/json" -d '{"api_key": "YOUR_KEY", "urls": ["url1", "url2"]}'Batch request
async_batch=trueAsynchronous batch processing
callback_url=https://yoursite.com/callbackSet callback URL

Account Management

CommandDescription
curl "http://api.scraperapi.com/account?api_key=YOUR_KEY"Check account status
curl "http://api.scraperapi.com/usage?api_key=YOUR_KEY"Check usage statistics

Rate Limiting

CommandDescription
concurrent_requests=5Set concurrent request limit
delay=1000Add delay between requests
throttle=trueEnable automatic throttling

Proxy Configuration

CommandDescription
proxy_type=datacenterUse datacenter proxies
proxy_type=residentialUse residential proxies
proxy_type=mobileUse mobile proxies
sticky_session=trueEnable sticky sessions

Data Extraction

CommandDescription
extract_rules={"title": "h1"}Extract title from h1
extract_rules={"links": "a@href"}Extract all links
extract_rules={"text": "p"}Extract paragraph text
css_selector=.product-priceUse CSS selector

Webhook Configuration

CommandDescription
webhook_url=https://yoursite.com/webhookSet webhook URL
webhook_method=POSTSet webhook method
webhook_headers={"Authorization": "Bearer token"}Set webhook headers

Monitoring and Debugging

CommandDescription
debug=trueEnable debug mode
log_level=verboseSet verbose logging
trace_id=custom123Set custom trace ID

Performance Optimization

CommandDescription
cache=trueEnable response caching
cache_ttl=3600Set cache TTL in seconds
compression=gzipEnable compression
timeout=30Set request timeout

Security Features

CommandDescription
stealth_mode=trueEnable stealth mode
anti_captcha=trueEnable CAPTCHA solving
fingerprint_randomization=trueRandomize browser fingerprint

Integration Examples

FrameworkCommand
ScrapySCRAPEOPS_API_KEY = 'YOUR_KEY'
Seleniumproxy = "api.scraperapi.com:8001"
Puppeteerargs: ['--proxy-server=api.scraperapi.com:8001']
BeautifulSoupresponse = requests.get(scraperapi_url)

Troubleshooting

IssueSolution
Rate limitedReduce concurrent requests
Blocked IPUse different country code
JavaScript not loadingEnable render=true
Timeout errorsIncrease timeout value
Invalid responseCheck URL encoding