Enabling Fluid Compute Broke Playwright Scraping

I have been using playwright on my node.js app and set it up with GitHub - Sparticuz/chromium: Chromium (x86-64) for Serverless Platforms to get the scraping working with Vercel. Sometimes the scraping takes a long time and I wanted to try fluid computing to increase speed and give the function a larger max time to run. However when I enable it the scraping breaks.

The Error in the logs

browserType.launch: Target page, context or browser has been closed

[pid=21][err] /tmp/chromium: error while loading shared libraries: libnspr4.so: cannot open shared object file: No such file or directory

I found a user who seemed to have a similar issue but adding the env variable did not help me.

Here is my code for setting up the playwright browser

import chromium from '@sparticuz/chromium'
import { addExtra } from 'playwright-extra'
import { chromium as pw, Browser, Page } from 'playwright'
import StealthPlugin from 'puppeteer-extra-plugin-stealth'
//import { chromium as playwrightChromium } from 'playwright'
//import { chromium as playwrightChromium } from 'playwright-extra'
import 'puppeteer-extra-plugin-stealth/evasions/chrome.app/index.js'
import 'puppeteer-extra-plugin-stealth/evasions/chrome.csi/index.js'
import 'puppeteer-extra-plugin-stealth/evasions/chrome.loadTimes/index.js'
import 'puppeteer-extra-plugin-stealth/evasions/chrome.runtime/index.js'
import 'puppeteer-extra-plugin-stealth/evasions/iframe.contentWindow/index.js'
import 'puppeteer-extra-plugin-stealth/evasions/media.codecs/index.js'
import 'puppeteer-extra-plugin-stealth/evasions/navigator.hardwareConcurrency/index.js'
import 'puppeteer-extra-plugin-stealth/evasions/navigator.languages/index.js'
import 'puppeteer-extra-plugin-stealth/evasions/navigator.permissions/index.js'
import 'puppeteer-extra-plugin-stealth/evasions/navigator.plugins/index.js'
import 'puppeteer-extra-plugin-stealth/evasions/navigator.vendor/index.js'
import 'puppeteer-extra-plugin-stealth/evasions/navigator.webdriver/index.js'
import 'puppeteer-extra-plugin-stealth/evasions/sourceurl/index.js'
import 'puppeteer-extra-plugin-stealth/evasions/user-agent-override/index.js'
import 'puppeteer-extra-plugin-stealth/evasions/webgl.vendor/index.js'
import 'puppeteer-extra-plugin-stealth/evasions/window.outerdimensions/index.js'
import 'puppeteer-extra-plugin-stealth/evasions/defaultArgs/index.js'
import 'puppeteer-extra-plugin-user-preferences/index.js'
import 'puppeteer-extra-plugin-user-data-dir/index.js'

export const defaultHeaders = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
    Accept: 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
    'Accept-Language': 'en-US,en;q=0.9',
    'Accept-Encoding': 'gzip, deflate, br',
    Connection: 'keep-alive',
    'Upgrade-Insecure-Requests': '1',
    'Sec-Fetch-Dest': 'document',
    'Sec-Fetch-Mode': 'navigate',
    'Sec-Fetch-Site': 'none',
    'Sec-Fetch-User': '?1',
}

export async function setupBrowser(): Promise<{ browser: Browser; page: Page }> {
    const playwrightChromium = addExtra(pw)
    playwrightChromium.use(
        StealthPlugin({
            enabledEvasions: new Set([
                'chrome.runtime',
                'defaultArgs',
                'iframe.contentWindow',
                'media.codecs',
                'navigator.languages',
                'navigator.permissions',
                'navigator.plugins',
                'navigator.vendor',
                'navigator.webdriver',
                'sourceurl',
                'user-agent-override',
                'webgl.vendor',
                'window.outerdimensions',
            ]),
        })
    )

    const browser = await playwrightChromium.launch({
        headless: true,
        executablePath: process.env.AWS_EXECUTION_ENV ? await chromium.executablePath() : undefined,
        args: [
            ...chromium.args,
            '--no-sandbox',
            '--disable-gpu',
            '--disable-setuid-sandbox',
            '--disable-web-security',
            '--disable-features=IsolateOrigins,site-per-process',
            '--disable-site-isolation-trials',
        ],
    })

    const page = await browser.newPage()
    await page.setExtraHTTPHeaders(defaultHeaders)

    return { browser, page }
}

This seems like a known issue on Node20 Browser launch throwing error after upgrading to Node20 for AWS Lambda · Issue #78 · JupiterOne/playwright-aws-lambda · GitHub

I just attempted to use Node22.14.0 instead and got the same result even with fluid compute off. With node20 and fluid compute off the scraping works fine.

Can you share a minimal repository so we can take a look?

So after playing around with it I was actually able to get the initial scraping part to work. However I noticed that the specific site I am scraping is giving me a 403 error only when fluid compute is on. It seems to be getting caught by cloud flare where it was not before. I am not sure how Fluid would effect the browser settings or stealth plugin I am using to setup the Playwright browser.

I created a much lighter version of my current repo but it contains the same playwright setup

If you send a GET request to this URL it runs the Vercel version
https://playwright-test-psi.vercel.app/

Here is the site I am trying to scrape
https://www.kennedyfloral.com/

I confirmed that the code worked until I turned on fluid computing.

Actually I fear that more things are now broken. My playwright success rate is at about 50% after it was close to 100% before. I even tried old deployment from a month ago and getting errors that I have never seen. Is it possible that flipping fluid compute on and then off could effect these old deployments?

So I am thinking this error was probably caused by Fluid Compute in some way since it started right when I enabled it. After a lot of generic playwright errors I finally found one talking about “insufficient resources”. It seems folders in tmp/ that playwright uses were filling up and then the scraper was struggling to run in Vercel. I was able to fix it with help from this [BUG] [Playwright] Lambda /tmp fills up quickly · Issue #231 · Sparticuz/chromium · GitHub.

Thanks for details. using /tmp is very hacky workaround and not very reliable. It also has very limited access and resource. I am glad you were able to fix the issue.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.