Home/Twitter/X B2B
Twitter/X B2B

Twitter Scraping GitHub: Best Tools & Libraries

A no-BS breakdown of every major open-source Twitter scraper, their limitations, and when to stop fighting X's API and use better lead sources.

Which Twitter Scraping Approach Is Right For You?

Answer 4 quick questions and get a specific tool recommendation - plus the exact risks you should know before building anything.

1. What's your technical background?
2. What's your primary goal?
3. What volume do you need?
4. How do you feel about ongoing maintenance?
Recommended Approach

Complexity Score
Low effort High effort
Key Risks For Your Setup

Why Everyone Is Searching for Twitter Scraping on GitHub

Here's the situation: X (formerly Twitter) killed its free API tier and now charges serious money for access. Developers and marketers who used to pull data freely are now hunting GitHub for open-source alternatives. I get it. I've been in that same spot - trying to build prospect lists from social data without paying enterprise rates for a platform that's increasingly hostile to developers.

But before you spend three days setting up a Python scraper that breaks next week, let's talk about what's actually happening, which GitHub tools are worth your time, and when it makes more sense to route around Twitter entirely for your lead gen goals.

The Twitter/X API Reality Check

The official X API used to be developer-friendly. That era is over. The free tier is now effectively useless for any real data work - you're limited to a tiny read-only quota that resets on a 24-hour window. And the paid tiers are no joke: the Basic tier has doubled in cost to $200/month, while the Pro tier sits at $5,000/month. Enterprise plans carry custom pricing that runs into the tens of thousands per month - a price point reserved for the largest media and data companies on the planet.

To put that in perspective: the free tier's experimental read API is limited to just 100 requests, and post limits were slashed from 1,500 down to 500 per month. For any developer who built something real on the old Twitter API, those numbers make the platform nearly unusable without a significant budget commitment.

The downstream effect has been brutal. Many formerly Twitter-connected apps have simply shut down or stopped working entirely. Tools that relied on affordable API access for scheduling, analytics, and outreach have disappeared or dramatically cut features. The developer ecosystem that once made Twitter valuable to third-party builders has been hollowed out.

This is why GitHub is flooded with scraping projects. Developers needed alternatives fast, and the open-source community delivered - though with varying degrees of reliability.

When Twitter's source code leaked on GitHub back in 2023, it highlighted something I've been telling clients for years: relying on official APIs puts your entire prospecting system at the mercy of someone else's business decisions.

One agency I worked with had built their entire lead gen around Twitter API access, and when pricing changed overnight, their cost per lead jumped 300%. That's why I always recommend having a backup system that doesn't depend on platform cooperation.

The Best Twitter Scraping Libraries on GitHub

There are dozens of repositories under the twitter-scraper topic on GitHub. Most of them are abandoned, broken, or built for niche research purposes. Here are the ones actually worth evaluating.

1. twscrape (Python, actively maintained)

GitHub: vladkens/twscrape

This is the one I'd point you to first if you're going the Python route. It uses Twitter's internal GraphQL API with authorization support, meaning it works by authenticating through real X accounts rather than the official developer API. It handles search results, user profiles, followers and following lists, tweet favoriters, retweeters, and more. The library is actively maintained and ships regular updates to keep pace with X's platform changes - it's been updated consistently and has a solid release history.

The key feature: it does automatic account switching to smooth out rate limits. Since X enforces rate limits per account (resetting every 15 minutes per endpoint), rotating across multiple accounts is the practical workaround. The library manages that rotation for you out of the box, which removes a significant amount of manual plumbing from your setup.

Setup is clean: pip install twscrape, add your account credentials, and you're pulling data. The library handles session management - after the initial login, account cookies are saved to a local database file so you don't re-authenticate on every run. It also supports IMAP-based email verification, so it can handle the verification code flow automatically if X challenges the login.

The catch: you need authorized X/Twitter accounts to run it. You can register new ones, but X has tightened verification significantly and ban rates for scraper accounts are real. The project's own docs recommend using cookie-based accounts and proxies for anything production-grade. For optimal performance, rotating residential proxies are the standard approach.

2. Scweet (Python, production-ready)

GitHub: Altimis/Scweet

Scweet scrapes Twitter's web GraphQL directly - the same API the browser uses - rather than the official developer API. It keeps local state in SQLite for account management, lease tracking, and resume checkpoints, so if a session gets rate-limited mid-scrape, it can pick up where it left off. It also has cursor handoff, meaning if one account hits a rate limit, it hands off the cursor to another account automatically.

The cursor handoff feature is worth highlighting. You can configure cursor_handoff=True and set a max_account_switches cap, which means a long-running scrape can survive multiple rate limit events without manual intervention. The resume=True parameter lets you restart an interrupted scrape from its last checkpoint - useful for any large dataset pull. For production workflows, the maintainers recommend running Scweet via an Apify Actor rather than the raw library, which handles scaling and reliability for you. That's worth noting - the open-source version works, but "works in production" is a different bar than "works on my laptop."

3. selenium-twitter-scraper (Python, Selenium-based)

GitHub: godkingjay/selenium-twitter-scraper

This one takes a different approach - instead of hitting the internal API directly, it uses Selenium to control a real browser and scrape the rendered page. It's capable of scraping tweets from home timelines, user profiles, hashtags, keyword searches, and advanced search queries. The command-line interface is clean and flexible: -u elonmusk for a profile, -ht python for a hashtag, -q "your query" for keyword search, and -a pd to additionally capture the poster's follower and following counts.

The Selenium approach is slower than API-based scrapers and requires ChromeDriver installed locally, but it has one significant advantage: it mimics real human browser behavior more convincingly. That makes it harder for X's bot detection to flag. The downside is that it's resource-intensive - running multiple instances in parallel to get around rate limits means a real server CPU cost, not just network overhead.

Use this one when you want tweets from a very specific source (a competitor's account, a niche hashtag) and you're running small enough volume that browser automation speed isn't a bottleneck.

4. twitter-scraper-selenium (Python, PyPI package)

GitHub: shaikhsajid1111/twitter-scraper-selenium

A packaged version of the Selenium approach that you can install directly via pip: pip install twitter-scraper-selenium. It scrapes profile details, tweets, and user information, outputting results in both JSON and CSV formats. It also has a dual-mode approach: an HTTP request method that hits Twitter's API directly, and a browser automation mode that visits pages and scrolls to collect data. The output structure is well-documented and includes tweet text, timestamps, hashtags, mentions, image URLs, video URLs, and external links - everything you'd want for downstream processing.

Worth noting: this scraper only collects public data available to unauthenticated users, which limits what it can access but also reduces the legal exposure of running it.

5. Twint (Python, largely broken)

GitHub: twintproject/twint

Twint is the one you'll find referenced in older blog posts everywhere. It was an advanced scraping and OSINT tool that worked without Twitter's API at all - no authentication, no Selenium, no rate limitations in the traditional sense. It could pull followers, following lists, tweets going back years, and even surface email addresses and phone numbers mentioned in tweet content.

The problem: X's platform changes have broken Twint repeatedly, and the main repository hasn't been actively maintained to keep pace. If you're seeing it recommended without a caveat about its current state, that's a red flag. Test it before building anything on top of it. For most users, it currently produces errors or empty results rather than working data. It's included here because it's still widely referenced and you'll encounter it - not because you should use it.

6. twikit (Python, API-key-free)

GitHub: d60/twikit

Twikit lets you post, search, and interact with Twitter without an official API key by going through Twitter's internal API. It's actively maintained and has a Discord community. Useful if you need to combine read and write operations without paying for API access. Comes with a Grok AI extension (twikit_grok) as well if that's relevant to your workflow. It supports async operations which makes it efficient for higher-volume collection when you're managing multiple accounts.

7. twitter-scraper in Go (n0madic)

GitHub: n0madic/twitter-scraper

If you're working in Go rather than Python, this is your best option. It reverse-engineered Twitter's frontend JavaScript API. Originally it required no authentication, but that's changed - now all methods require login. You authenticate once, save cookies, and restore them between sessions to avoid re-authenticating every run. It's fast, handles tweet search with full Twitter search operator support, and fits well into Go-based data pipeline architectures where Python's GIL would be a bottleneck.

8. TweeterPy (Python, profile-focused)

GitHub: iSarabjitDhiman/TweeterPy

TweeterPy is a Python library specifically designed to extract data from Twitter user profiles. It covers username lookups, user IDs, bios, followers and following lists, profile media, and tweet history. It's a good fit if your primary use case is deep profiling of specific accounts rather than broad keyword search - for example, pulling everything public about a specific set of target handles before reaching out.

Free Download: Best Lead Strategy Guide

Drop your email and get instant access.

By entering your email you agree to receive daily emails from Alex Berman and can unsubscribe at any time.

You're in! Here's your download:

Access Now →

How These Scrapers Actually Work: The Technical Picture

Understanding the mechanics helps you pick the right tool and set realistic expectations. There are three fundamental approaches you'll see across these GitHub projects:

Internal GraphQL API scraping

This is what twscrape and Scweet use. Twitter's web app communicates with its own backend through a GraphQL API. These endpoints aren't documented publicly, but they're visible in browser developer tools. Scrapers that hit these endpoints directly are fast and return structured data, but they're also the most likely to break when Twitter updates its internal API structure - which happens regularly.

Browser automation (Selenium/Playwright)

Tools like selenium-twitter-scraper control a real browser - Chrome or Firefox - and interact with Twitter the same way a human would: click, scroll, extract rendered HTML. This approach is slower and more resource-intensive, but it's harder to detect and block. The page output is less structured, so you're parsing HTML rather than clean JSON, which means more post-processing work.

Account pool rotation

Most production-grade scrapers rely on multiple accounts to distribute requests. X enforces rate limits per authenticated account - typically 500 search requests per 15-minute window per account. With a pool of 10 accounts, you effectively multiply your throughput by 10. The libraries that handle this automatically (twscrape, Scweet) save significant engineering time compared to building the rotation logic yourself.

Setting Up twscrape: The Quick-Start Guide

If you've decided to run a GitHub scraper and want to start with the most actively maintained option, here's the practical setup for twscrape:

Install via pip:

pip install twscrape

Add your accounts. Cookie-based authentication is more stable than username/password:

from twscrape import API
import asyncio async def main(): api = API() # stores sessions in accounts.db by default cookies = "your_ct0_cookie; auth_token=your_auth_token" await api.pool.add_account("username", "password", "email@example.com", "email_pass", cookies=cookies) await api.pool.login_all() asyncio.run(main())

Or use the CLI directly:

# Add an account
twscrape add_accounts accounts.txt --cookies # Search tweets
twscrape search "cold email B2B" --limit=100 # Get user profile
twscrape user_by_login alexberman # Pull follower list
twscrape followers USER_ID --limit=500 # Pull following list
twscrape following USER_ID --limit=500 # Get tweet details
twscrape tweet_details TWEET_ID

The output is newline-delimited JSON by default, which you can pipe directly to a file or process with jq:

twscrape search "agency owners" --limit=200 > tweets.jsonl

For proxy support, configure your proxy settings in the account pool. Rotating residential proxies are the standard for any volume above test-level scraping - IP-based restrictions are real and a single residential IP will hit rate limits fast at production scale.

After you have your data in JSONL or CSV format, you can pipe it into Clay for enrichment workflows - matching Twitter handles to verified contact info is where the B2B value actually unlocks.

The Practical Problems with GitHub Twitter Scrapers

Let me be straight with you. Every single one of these tools has the same core problem: X is actively trying to break them. Here's what you'll run into:

Here's what I see agencies mess up constantly: they spend weeks setting up scrapers, then realize they have no system to actually use the data. I had one client who scraped 50,000 Twitter profiles, got excited about the data goldmine, then watched it sit unused for three months because they had no outreach infrastructure. The scraping part is honestly the easy part. The hard part is having warmed-up domains, a team that can handle replies under 5 minutes, and knowing your KPIs (I look for 80%+ open rates and 6% positive reply rate minimum).

Need Targeted Leads?

Search unlimited B2B contacts by title, industry, location, and company size. Export to CSV instantly. $149/month, free to try.

Try the Lead Database →

The question I get most often: is scraping Twitter actually illegal?

The short answer is: it depends, and the law is genuinely unsettled. The clearest guidance comes from the hiQ v. LinkedIn case, which went through years of litigation. The Ninth Circuit affirmed that scraping publicly available data does not violate the Computer Fraud and Abuse Act - you can't be criminally prosecuted for pulling data that's publicly visible to anyone on the internet. That ruling remains valid precedent.

But here's the nuance most people miss: the case ultimately settled with a $500,000 judgment against hiQ for breach of LinkedIn's user agreement and violations of California state law. The CFAA argument failed, but the contract-based argument succeeded. Which means: scraping public data likely won't get you prosecuted under federal computer fraud law, but it can absolutely get you sued for breach of contract under state law if you've agreed to the platform's terms of service by creating an account.

The practical risk calculus for most developers and marketers running small-scale Twitter scrapers for research or prospecting: the legal exposure is real but the enforcement priority is low unless you're pulling data at significant commercial scale. X is more likely to address you technically (account bans, IP blocks) than legally. But don't take that as a green light - platforms do escalate when they identify systematic scrapers operating at scale.

Run it at your own risk, stay informed on legal developments in your jurisdiction, and don't build a critical business dependency on data from a source that can pull the plug on you at any moment - legally or technically.

What Twitter Scraping Is Actually Good For (B2B Use Cases)

Let's talk about why someone in a B2B sales or agency context would want to scrape Twitter in the first place. The real use cases are:

That's actually a pretty targeted list, and for most of them, Twitter data alone doesn't get you to a meeting. A Twitter handle isn't an email address. A follower list isn't a prospect list with verified contacts. You still need to cross-reference that social data with actual contact info before any of it becomes actionable for outreach.

If you're building influencer or creator outreach lists specifically, a dedicated tool like ScraperCity's YouTuber Email Finder already solves the problem of going from creator profile to verified email - no DIY scraper setup required.

The B2B play with Twitter scraping isn't complicated, but most people overthink it. I use it to identify companies showing buying signals (hiring posts, funding announcements, product launches), then immediately move to email outreach. One agency I consulted scaled from 10.5k to 25k per month in sales by using Twitter as a signal layer, not a prospecting channel. They'd scrape for intent, verify contacts separately, then hit email hard. Twitter told them WHO to email and WHEN. The actual conversation happened in the inbox where you can control deliverability and track real metrics.

Turning Twitter Signal Into Actual Prospect Data

Here's the workflow that actually converts. Twitter is a signal source, not a contact database. Use it as the first step of a multi-stage process:

Step 1 - Signal identification: Use a GitHub scraper or a tool like TweetHunter to identify who's engaging with specific hashtags, following relevant accounts, or tweeting about pain points in your target market. Export Twitter handles and any profile data available (bio, location, company mentions).

Step 2 - Identity resolution: Take those Twitter handles and match them to real identities. Sometimes the bio gives you enough - name, company, job title. Sometimes you need to cross-reference against LinkedIn or a people search tool. If you have names and companies but need contact info, a people finder tool can surface contact details for individuals you've already identified.

Step 3 - Contact enrichment: Now that you know who the person is, get their verified email or phone number. For email, an email finding tool will be faster and more reliable than trying to parse contact info from tweet text or bios. For phone prospecting, a mobile finder gives you direct dials rather than corporate switchboards.

Step 4 - List cleaning: Before you send to any email list built this way, run it through an email validator to verify deliverability and strip addresses that will bounce. Bounce rates above 5% damage your sender reputation fast.

Step 5 - Outreach: Now you have a verified, enriched list of people who are demonstrably interested in your market. That's a completely different quality of outbound than cold-buying a generic list. Sequence them in Smartlead or Instantly with personalized context from their Twitter activity.

That workflow - social signal → identity → contact enrichment → outreach - converts dramatically better than cold DMs alone or blasting a generic database.

Free Download: Best Lead Strategy Guide

Drop your email and get instant access.

By entering your email you agree to receive daily emails from Alex Berman and can unsubscribe at any time.

You're in! Here's your download:

Access Now →

When to Use GitHub Scrapers vs. Built Tools

Here's the honest breakdown:

Use a GitHub scraper if: You're a developer comfortable with Python or Go, you need highly custom data (specific search operators, raw tweet data for NLP or research), you have the time to maintain and debug the tool when it breaks, and you're not running it at commercial scale. Research projects, competitive analysis, and one-time data pulls are good fits.

Use a purpose-built tool if: You need reliable, ongoing prospect data. You don't want to spend engineering cycles on a fragile scraper. You need contact info - emails, phone numbers - not just Twitter handles. You're running outbound at any real volume. You want something that works today and next month without maintenance.

For the latter, this B2B lead database lets you filter by job title, industry, location, and company size and pull verified contact data directly - without wrestling with X's anti-bot systems. That's just a more direct path to the meetings you actually want.

The reality is that for most B2B sales use cases, fighting Twitter's scraping defenses is the hard way to build a prospect list. The signal is valuable. The contact data pipeline built on top of scraped Twitter data requires significant engineering. Purpose-built lead tools short-circuit that complexity.

I test new tools every single year because the landscape changes fast, but here's my rule: if you're sending under 500 emails a day, built tools are fine. Once you scale past that, you need to think like a systems person, not a tools person. A client came to me doing all his own list building and cold emailing, spending 4+ hours daily on manual tasks. We outsourced both immediately, but the key was thinking through how VAs would mess it up BEFORE handing it off. When you scale, you're not buying tools anymore. You're preventing failure modes and maintaining quality metrics across a team.

Twitter Automation Tools Worth Knowing

Beyond scraping raw data, there's a separate category of Twitter/X tools focused on growth automation and outreach. These are worth knowing about because they address adjacent use cases that often come up when people are researching Twitter scraping.

Drippi - Built exactly for Twitter DM automation and lead generation. If your goal is outreach directly on the platform - finding accounts that match your ICP and starting conversations via DM - Drippi handles the sequencing and targeting without manual effort.

TweetHunter - Good for both content scheduling and finding leads through Twitter search in a managed way. The search features let you identify accounts engaging with specific topics and keywords without running a scraper. Lower technical lift than a GitHub solution for discovery use cases.

Taplio - The LinkedIn equivalent of TweetHunter. If your target market is more active on LinkedIn than Twitter, Taplio covers content-driven growth and outreach on that platform. Worth having both angles covered depending on where your buyers spend their time.

Proxies and Account Management: What You Actually Need

If you're going to run any of these GitHub scrapers seriously, the proxy and account setup is where most people underinvest. Here's the practical breakdown:

Proxies: Rotating residential proxies are the standard. Datacenter proxies get flagged quickly by X's bot detection. Services like Smartproxy, Oxylabs, or Bright Data provide residential IPs that rotate automatically. Expect to pay $50-150/month for a legitimate residential proxy pool adequate for small-scale scraping.

Accounts: You need multiple authenticated accounts. Building a pool of 5-10 accounts gives you enough rotation to scrape at meaningful volume without hitting account-level rate limits continuously. Managing these accounts - keeping sessions alive, handling verification challenges, rotating through them without getting the whole pool flagged - is its own operational overhead.

Monitoring: Set up basic monitoring to detect when your scraper starts returning empty results or errors. X changes its internal API structure without notice, and you need to know immediately when a tool breaks so you can switch to a working version or a different approach.

Storage: twscrape outputs to JSON by default. For anything at scale, pipe to a database rather than flat files - PostgreSQL or SQLite handle the volume fine and make it easier to query and deduplicate your data downstream.

The proxy conversation is where people waste money. Yes, you need them for scraping at scale, but I've watched companies spend $500/month on residential proxies when they're only scraping 1,000 profiles a week. Here's the real bottleneck: account management. If you're rotating Twitter accounts, you need a system where appointment setters can reply to positive responses within 5 minutes, which means your CRM integration matters more than your proxy setup. I've seen agencies hit 40% of positive replies converting to meetings just by optimizing response time, not by buying better proxies.

Need Targeted Leads?

Search unlimited B2B contacts by title, industry, location, and company size. Export to CSV instantly. $149/month, free to try.

Try the Lead Database →

FAQ: Twitter Scraping on GitHub

Is there a Twitter scraper that works without an account?

Historically yes - the old version of Twint worked without authentication, and some early scrapers hit entirely public endpoints. That window has largely closed. X now requires authentication for most useful data endpoints. The browser-automation approach (Selenium) can access some public data without logging in, but it's limited to what an unauthenticated visitor can see, which has been reduced significantly. In practice, any scraper you want to use for real data collection is going to require at least one authenticated account.

Can I scrape Twitter follower lists with these tools?

Yes - twscrape, Scweet, and TweeterPy all support follower and following list extraction. The rate limits are real though: X returns follower data in paginated batches, and each page counts against your rate limit. A large account with hundreds of thousands of followers will take significant time to pull completely, even with account rotation. Plan for it.

What data fields can I actually pull from these scrapers?

For tweet data: tweet ID, text content, timestamp, engagement counts (likes, retweets, replies, views), hashtags, mentions, external links, media URLs, and the quote tweet or reply chain structure.

For user/profile data: username, display name, bio, location (as entered by the user - not verified), website, follower count, following count, tweet count, account creation date, and verification status.

What you can't reliably get: verified email addresses, phone numbers, actual company names (unless they're in the bio), or any data behind Twitter's authentication wall (DMs, draft tweets, private list memberships).

How do I handle the data after I've scraped it?

The most common pipeline: scrape to JSON/CSV, clean and deduplicate, cross-reference against LinkedIn or a contact database, enrich with verified emails, load into your CRM or outreach tool. Clay is built for exactly this enrichment workflow - you can feed it raw Twitter data and waterfall through multiple enrichment sources to get verified contact info.

Does scraping Twitter work for finding B2B leads directly?

Partially. Twitter data gives you good signal - who's engaged in your market, what they care about, what companies they work for. But it rarely gives you direct contact info. The handles and bios are the starting point, not the endpoint. You need a contact enrichment step. That's where tools like ScraperCity - which lets you filter a B2B email database by title, industry, and company size - or a dedicated email finder save you the manual research work.

The Smarter Outbound Play

Here's what I've seen work across thousands of clients: Twitter is a great signal source but a terrible source of contact data. Use it to identify who to target - people engaging with specific hashtags, followers of competitor accounts, users who mention the pain point your product solves. Then take those names and run them through proper contact enrichment to get email addresses and phone numbers you can actually use.

The GitHub scraper gets you halfway. The tools that get you the rest of the way are the ones worth spending time on. If you want to go deeper on building that kind of outbound system - social signal to enriched contact to booked meeting - check out the Daily Ideas Newsletter for specific tactics I'm running, or the Purpose Framework for how to think about the strategy behind it all.

For the technical side of building prospect lists at scale - beyond just Twitter - I cover this kind of systematic outbound approach inside Galadon Gold.

The GitHub scraper is a tool. What you do with the data after is the actual business decision.

Look, I've helped generate hundreds of millions in leads through cold outreach, and here's what I know: the scraping part is never the limiting factor. I had one CEO generate 50k in business in ONE HOUR after we fixed her follow-up system, not her data collection. Your Twitter scraper can be perfect, but if you're not optimizing every 250 opens, if your follow-ups aren't tight, if you can't handle replies fast, you're just collecting data for no reason. Start with email infrastructure that works, THEN add Twitter signals to make it smarter. That's the play that actually scales.

Ready to Book More Meetings?

Get the exact scripts, templates, and frameworks Alex uses across all his companies.

By entering your email you agree to receive daily emails from Alex Berman and can unsubscribe at any time.

You're in! Here's your download:

Access Now →