Ir al contenido

Scraper API Commands

Comprehensive Scraper API commands and workflows for web scraping and data collection.

Basic API Requests

Command	Description
`curl "http://api.scraperapi.com?api_key=YOUR_KEY&url=https://example.com"`	Basic scraping request
`curl "http://api.scraperapi.com?api_key=YOUR_KEY&url=https://example.com&render=true"`	Render JavaScript
`curl "http://api.scraperapi.com?api_key=YOUR_KEY&url=https://example.com&country_code=US"`	Use specific country
`curl "http://api.scraperapi.com?api_key=YOUR_KEY&url=https://example.com&premium=true"`	Use premium proxies

Python Implementation

Command	Description
`pip install requests`	Install requests library
`import requests`	Import requests module
`response = requests.get('http://api.scraperapi.com', params={'api_key': 'YOUR_KEY', 'url': 'https://example.com'})`	Basic Python request
`response = requests.get('http://api.scraperapi.com', params={'api_key': 'YOUR_KEY', 'url': 'https://example.com', 'render': 'true'})`	Python with JavaScript rendering

Node.js Implementation

Command	Description
`npm install axios`	Install axios library
`const axios = require('axios')`	Import axios module
`axios.get('http://api.scraperapi.com?api_key=YOUR_KEY&url=https://example.com')`	Basic Node.js request
`axios.get('http://api.scraperapi.com', {params: {api_key: 'YOUR_KEY', url: 'https://example.com', render: true}})`	Node.js with parameters

PHP Implementation

Command	Description
`$response = file_get_contents('http://api.scraperapi.com?api_key=YOUR_KEY&url=https://example.com')`	Basic PHP request
`$context = stream_context_create(['http' => ['timeout' => 60]])`	Set timeout context
`$response = file_get_contents('http://api.scraperapi.com?api_key=YOUR_KEY&url=https://example.com', false, $context)`	PHP with timeout

Ruby Implementation

Command	Description
`require 'net/http'`	Import HTTP library
`require 'uri'`	Import URI library
`uri = URI('http://api.scraperapi.com?api_key=YOUR_KEY&url=https://example.com')`	Create URI object
`response = Net::HTTP.get_response(uri)`	Make Ruby request

Java Implementation

Command	Description
`import java.net.http.HttpClient`	Import HTTP client
`import java.net.http.HttpRequest`	Import HTTP request
`HttpClient client = HttpClient.newHttpClient()`	Create HTTP client
`HttpRequest request = HttpRequest.newBuilder().uri(URI.create("http://api.scraperapi.com?api_key=YOUR_KEY&url=https://example.com")).build()`	Build request

Advanced Parameters

Parameter	Description
`render=true`	Enable JavaScript rendering
`country_code=US`	Use specific country proxy
`premium=true`	Use premium proxy pool
`session_number=123`	Use session for sticky IP
`keep_headers=true`	Keep original headers
`device_type=desktop`	Set device type
`autoparse=true`	Enable automatic parsing
`format=json`	Return structured JSON

Geolocation Options

Country Code	Description
`country_code=US`	United States
`country_code=UK`	United Kingdom
`country_code=CA`	Canada
`country_code=AU`	Australia
`country_code=DE`	Germany
`country_code=FR`	France
`country_code=JP`	Japan
`country_code=BR`	Brazil

Session Management

Command	Description
`session_number=1`	Use session 1
`session_number=2`	Use session 2
`session_number=123`	Use custom session
`session_number=random`	Use random session

Error Handling

Status Code	Description
`200`	Success
`400`	Bad Request
`401`	Unauthorized (invalid API key)
`403`	Forbidden
`404`	Not Found
`429`	Rate limit exceeded
`500`	Internal Server Error
`503`	Service Unavailable

Response Formats

Format	Description
`format=html`	Return raw HTML (default)
`format=json`	Return structured JSON
`format=text`	Return plain text

Custom Headers

Command	Description
`custom_headers={"User-Agent": "Custom Bot"}`	Set custom user agent
`custom_headers={"Accept": "application/json"}`	Set accept header
`custom_headers={"Referer": "https://google.com"}`	Set referer header

JavaScript Rendering

Command	Description
`render=true`	Enable JavaScript rendering
`wait_for_selector=.content`	Wait for specific element
`wait_for=2000`	Wait for milliseconds
`screenshot=true`	Take screenshot

Batch Processing

Command	Description
`curl -X POST "http://api.scraperapi.com/batch" -H "Content-Type: application/json" -d '{"api_key": "YOUR_KEY", "urls": ["url1", "url2"]}'`	Batch request
`async_batch=true`	Asynchronous batch processing
`callback_url=https://yoursite.com/callback`	Set callback URL

Account Management

Command	Description
`curl "http://api.scraperapi.com/account?api_key=YOUR_KEY"`	Check account status
`curl "http://api.scraperapi.com/usage?api_key=YOUR_KEY"`	Check usage statistics

Rate Limiting

Command	Description
`concurrent_requests=5`	Set concurrent request limit
`delay=1000`	Add delay between requests
`throttle=true`	Enable automatic throttling

Proxy Configuration

Command	Description
`proxy_type=datacenter`	Use datacenter proxies
`proxy_type=residential`	Use residential proxies
`proxy_type=mobile`	Use mobile proxies
`sticky_session=true`	Enable sticky sessions

Data Extraction

Command	Description
`extract_rules={"title": "h1"}`	Extract title from h1
`extract_rules={"links": "a@href"}`	Extract all links
`extract_rules={"text": "p"}`	Extract paragraph text
`css_selector=.product-price`	Use CSS selector

Webhook Configuration

Command	Description
`webhook_url=https://yoursite.com/webhook`	Set webhook URL
`webhook_method=POST`	Set webhook method
`webhook_headers={"Authorization": "Bearer token"}`	Set webhook headers

Monitoring and Debugging

Command	Description
`debug=true`	Enable debug mode
`log_level=verbose`	Set verbose logging
`trace_id=custom123`	Set custom trace ID

Performance Optimization

Command	Description
`cache=true`	Enable response caching
`cache_ttl=3600`	Set cache TTL in seconds
`compression=gzip`	Enable compression
`timeout=30`	Set request timeout

Security Features

Command	Description
`stealth_mode=true`	Enable stealth mode
`anti_captcha=true`	Enable CAPTCHA solving
`fingerprint_randomization=true`	Randomize browser fingerprint

Integration Examples

Framework	Command
Scrapy	`SCRAPEOPS_API_KEY = 'YOUR_KEY'`
Selenium	`proxy = "api.scraperapi.com:8001"`
Puppeteer	`args: ['--proxy-server=api.scraperapi.com:8001']`
BeautifulSoup	`response = requests.get(scraperapi_url)`

Troubleshooting

Issue	Solution
Rate limited	Reduce concurrent requests
Blocked IP	Use different country code
JavaScript not loading	Enable render=true
Timeout errors	Increase timeout value
Invalid response	Check URL encoding