Zum Inhalt springen

Puppeteer

Puppeteer is a Node.js library that provides a high-level API to control Chrome/Chromium over the DevTools Protocol for automated testing and web scraping.

Installation

npm install puppeteer
npm install puppeteer-extra puppeteer-extra-plugin-stealth

Basic Usage

Launch Browser & Navigate

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://example.com');
  await page.screenshot({path: 'example.png'});
  await browser.close();
})();
// Go to URL
await page.goto('https://example.com');

// Wait for navigation
await page.goto(url, {waitUntil: 'networkidle2'});

// Go back/forward
await page.goBack();
await page.goForward();

// Reload page
await page.reload();

Interacting with Elements

CommandDescription
page.click(selector)Click element
page.type(selector, text)Type text into input
page.focus(selector)Focus element
page.$eval(selector, fn)Evaluate function on element
page.$$eval(selector, fn)Evaluate function on elements
page.waitForSelector(selector)Wait for element
page.waitForNavigation()Wait for page navigation

Example Interactions

// Click a button
await page.click('button.submit');

// Type into input
await page.type('input#email', 'user@example.com');

// Select dropdown
await page.select('select#country', 'US');

// Get element text
const title = await page.$eval('h1', el => el.textContent);

// Get multiple elements
const links = await page.$$eval('a', elements =>
  elements.map(el => ({href: el.href, text: el.textContent}))
);

// Wait for element to appear
await page.waitForSelector('div.loaded');

Screenshots & PDFs

// Take screenshot
await page.screenshot({path: 'page.png'});

// Full page screenshot
await page.screenshot({path: 'full.png', fullPage: true});

// Element screenshot
const element = await page.$('div.content');
await element.screenshot({path: 'element.png'});

// Save as PDF
await page.pdf({path: 'document.pdf'});

// PDF with custom settings
await page.pdf({
  path: 'document.pdf',
  format: 'A4',
  margin: {top: '1cm', right: '1cm', bottom: '1cm', left: '1cm'}
});

Form Submission

// Fill form
await page.type('input[name="username"]', 'testuser');
await page.type('input[name="password"]', 'password123');

// Submit form
await Promise.all([
  page.waitForNavigation(),
  page.click('button[type="submit"]')
]);

// Or press Enter
await page.keyboard.press('Enter');

Web Scraping

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://example.com');

  // Scrape data
  const data = await page.evaluate(() => {
    return Array.from(document.querySelectorAll('article')).map(article => ({
      title: article.querySelector('h2').textContent,
      link: article.querySelector('a').href,
      date: article.querySelector('.date').textContent
    }));
  });

  console.log(data);
  await browser.close();
})();

Advanced Features

Performance Metrics

// Get page metrics
const metrics = await page.metrics();
console.log(`Memory: ${metrics.JSHeapUsedSize / 1048576 | 0} MB`);

// Measure performance
const perfMetrics = await page.evaluate(() => {
  const navigation = performance.getEntriesByType('navigation')[0];
  return {
    navigationStart: navigation.navigationStart,
    loadComplete: navigation.loadEventEnd,
    duration: navigation.loadEventEnd - navigation.navigationStart
  };
});

Keyboard & Mouse

// Type with delay
await page.keyboard.type('Hello', {delay: 100});

// Mouse movement
await page.mouse.move(100, 100);
await page.mouse.click(100, 100);

// Drag and drop
await page.mouse.move(100, 100);
await page.mouse.down();
await page.mouse.move(200, 200);
await page.mouse.up();

Handling Dialogs

// Listen for alert
page.on('dialog', async dialog => {
  console.log(`Dialog: ${dialog.message()}`);
  await dialog.accept();
});

// Dismiss confirmation
page.on('dialog', async dialog => {
  await dialog.dismiss();
});

Best Practices

  • Use headless: true for production
  • Set defaultViewport for consistent screenshots
  • Close browser instances to free resources
  • Handle timeouts with try/catch
  • Use waitForNavigation() before page transitions
  • Disable images/CSS when scraping to improve speed
  • Respect robots.txt and website terms of service

Resources


Last updated: 2025-07-06|Edit on GitHub