Puppeteer
Puppeteer is a Node.js library that provides a high-level API to control Chrome/Chromium over the DevTools Protocol for automated testing and web scraping.
Installation
npm install puppeteer
npm install puppeteer-extra puppeteer-extra-plugin-stealth
Basic Usage
Launch Browser & Navigate
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
await page.screenshot({path: 'example.png'});
await browser.close();
})();
Page Navigation
// Go to URL
await page.goto('https://example.com');
// Wait for navigation
await page.goto(url, {waitUntil: 'networkidle2'});
// Go back/forward
await page.goBack();
await page.goForward();
// Reload page
await page.reload();
Interacting with Elements
| Command | Description |
|---|---|
page.click(selector) | Click element |
page.type(selector, text) | Type text into input |
page.focus(selector) | Focus element |
page.$eval(selector, fn) | Evaluate function on element |
page.$$eval(selector, fn) | Evaluate function on elements |
page.waitForSelector(selector) | Wait for element |
page.waitForNavigation() | Wait for page navigation |
Example Interactions
// Click a button
await page.click('button.submit');
// Type into input
await page.type('input#email', 'user@example.com');
// Select dropdown
await page.select('select#country', 'US');
// Get element text
const title = await page.$eval('h1', el => el.textContent);
// Get multiple elements
const links = await page.$$eval('a', elements =>
elements.map(el => ({href: el.href, text: el.textContent}))
);
// Wait for element to appear
await page.waitForSelector('div.loaded');
Screenshots & PDFs
// Take screenshot
await page.screenshot({path: 'page.png'});
// Full page screenshot
await page.screenshot({path: 'full.png', fullPage: true});
// Element screenshot
const element = await page.$('div.content');
await element.screenshot({path: 'element.png'});
// Save as PDF
await page.pdf({path: 'document.pdf'});
// PDF with custom settings
await page.pdf({
path: 'document.pdf',
format: 'A4',
margin: {top: '1cm', right: '1cm', bottom: '1cm', left: '1cm'}
});
Form Submission
// Fill form
await page.type('input[name="username"]', 'testuser');
await page.type('input[name="password"]', 'password123');
// Submit form
await Promise.all([
page.waitForNavigation(),
page.click('button[type="submit"]')
]);
// Or press Enter
await page.keyboard.press('Enter');
Web Scraping
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
// Scrape data
const data = await page.evaluate(() => {
return Array.from(document.querySelectorAll('article')).map(article => ({
title: article.querySelector('h2').textContent,
link: article.querySelector('a').href,
date: article.querySelector('.date').textContent
}));
});
console.log(data);
await browser.close();
})();
Advanced Features
Performance Metrics
// Get page metrics
const metrics = await page.metrics();
console.log(`Memory: ${metrics.JSHeapUsedSize / 1048576 | 0} MB`);
// Measure performance
const perfMetrics = await page.evaluate(() => {
const navigation = performance.getEntriesByType('navigation')[0];
return {
navigationStart: navigation.navigationStart,
loadComplete: navigation.loadEventEnd,
duration: navigation.loadEventEnd - navigation.navigationStart
};
});
Keyboard & Mouse
// Type with delay
await page.keyboard.type('Hello', {delay: 100});
// Mouse movement
await page.mouse.move(100, 100);
await page.mouse.click(100, 100);
// Drag and drop
await page.mouse.move(100, 100);
await page.mouse.down();
await page.mouse.move(200, 200);
await page.mouse.up();
Handling Dialogs
// Listen for alert
page.on('dialog', async dialog => {
console.log(`Dialog: ${dialog.message()}`);
await dialog.accept();
});
// Dismiss confirmation
page.on('dialog', async dialog => {
await dialog.dismiss();
});
Best Practices
- Use
headless: truefor production - Set
defaultViewportfor consistent screenshots - Close browser instances to free resources
- Handle timeouts with try/catch
- Use
waitForNavigation()before page transitions - Disable images/CSS when scraping to improve speed
- Respect robots.txt and website terms of service
Resources
Last updated: 2025-07-06|Edit on GitHub