Track what retail traders are saying, what Reddit is hyping, and what the crowd sentiment looks like across every platform. Build a multi-source sentiment aggregator from scratch.
Markets are driven by two forces: fundamentals and sentiment. While Part 2 covered the "rational" side (financial statements, earnings), this chapter explores the "emotional" side — what millions of traders are thinking, saying, and feeling about stocks in real time.
Sentiment data has become a legitimate edge since the GameStop saga of 2021 proved that retail crowd behavior moves markets. Today, institutional desks monitor Reddit, StockTwits, and Twitter as part of their alpha signals.
Here's the paradox: extreme sentiment is typically a contrarian signal. When everyone is euphoric, the market often tops. When everyone is terrified, the market often bottoms. The trick is measuring "extreme" vs. "normal" sentiment — and that requires data from multiple sources, aggregated and normalized over time.
How many people are talking about a stock. Spike in mentions = something is happening. Leading indicator.
Are posts bullish or bearish? NLP analysis of text. Useful as contrarian signal at extremes.
Headline tone from major news outlets. More institutional signal than social media.
Google Trends as proxy for retail attention. Works surprisingly well for meme stocks and crypto.
StockTwits is the Twitter of finance — 6M+ users posting about stocks with cashtag symbols ($AAPL, $TSLA). Their API is free and requires no authentication for basic endpoints. You get real-time messages, user-tagged sentiment (bullish/bearish), and trending tickers. It's the easiest sentiment data source to start with.
import requests # StockTwits — no API key needed! def get_stocktwits_sentiment(symbol): url = f"https://api.stocktwits.com/api/2/streams/symbol/{symbol}.json" data = requests.get(url).json() messages = data["messages"] bullish = sum(1 for m in messages if m.get("entities", {}).get("sentiment", {}).get("basic") == "Bullish") bearish = sum(1 for m in messages if m.get("entities", {}).get("sentiment", {}).get("basic") == "Bearish") total = bullish + bearish return { "symbol": symbol, "total_messages": len(messages), "bullish": bullish, "bearish": bearish, "bull_ratio": bullish / total if total > 0 else 0.5, "sample_message": messages[0]["body"] } # Usage sentiment = get_stocktwits_sentiment("AAPL") print(f"AAPL: {sentiment['bullish']} bullish / {sentiment['bearish']} bearish ({sentiment['bull_ratio']:.0%})") # Trending tickers trending = requests.get("https://api.stocktwits.com/api/2/trending/symbols.json").json() for t in trending["symbols"][:10]: print(f" ${t['symbol']} — {t['title']}")
Reddit's API, accessed through the excellent Python library PRAW (Python Reddit API Wrapper), gives you access to every post and comment on r/wallstreetbets (15M+ members), r/stocks, r/investing, and r/options. Extract ticker mentions, measure hype levels, and detect emerging meme stocks before they hit mainstream.
import praw import re from collections import Counter # Setup Reddit API client reddit = praw.Reddit( client_id="YOUR_CLIENT_ID", client_secret="YOUR_SECRET", user_agent="FinanceAPIs/1.0" ) def scan_wsb_mentions(limit=100): """Scan r/wallstreetbets for ticker mentions.""" ticker_pattern = re.compile(r'\$([A-Z]{1,5})\b|\b([A-Z]{2,5})\b') mentions = Counter() subreddit = reddit.subreddit("wallstreetbets") for post in subreddit.hot(limit=limit): text = post.title + " " + (post.selftext or "") tickers = ticker_pattern.findall(text) for match in tickers: ticker = match[0] or match[1] # Filter common words that look like tickers if ticker not in {"THE", "FOR", "AND", "ARE", "BUT", "NOT", "ALL", "CAN", "HAS", "HER", "WAS", "ONE", "OUR", "OUT", "DAY", "HAD", "HOT", "OIL", "OLD", "RED", "SIT", "TOP", "TWO", "WAR", "WIN", "YES", "BIG", "USE", "CEO", "IPO", "ETF", "IMO", "DD", "ITM", "OTM", "ATH"}: mentions[ticker] += 1 return mentions.most_common(20) # Top 20 most mentioned tickers on WSB right now top_tickers = scan_wsb_mentions(200) for ticker, count in top_tickers: print(f" ${ticker}: {count} mentions")
ApeWisdom does the Reddit scraping for you. Their free API provides ticker mention rankings across r/wallstreetbets, r/stocks, r/options, and other finance subreddits, updated every few minutes. No authentication needed — just hit the endpoint.
import requests # ApeWisdom — Top mentioned tickers (no API key!) data = requests.get("https://apewisdom.io/api/v1.0/filter/all-stocks/page/1").json() print("Top 10 most mentioned stocks on Reddit:") for stock in data["results"][:10]: print(f" #{stock['rank']} ${stock['ticker']} — {stock['mentions']} mentions ({stock['upvotes']} upvotes)") # Filter by subreddit wsb_data = requests.get("https://apewisdom.io/api/v1.0/filter/wallstreetbets/page/1").json() for stock in wsb_data["results"][:5]: change = stock.get("mentions_24h_ago", 0) print(f" ${stock['ticker']} — {stock['mentions']} mentions (24h change: {stock.get('rank_24h_ago', 'N/A')})")
ChartExchange aggregates Reddit sentiment alongside dark pool data, creating a unique signal that combines social hype with institutional activity. When a stock is trending on Reddit AND seeing unusual dark pool prints, that's a high-conviction setup.
NewsAPI aggregates headlines from 150,000+ news sources worldwide. The free tier gives you 100 requests/day with 1-month history — enough to track daily sentiment for a portfolio. Their "everything" endpoint lets you search for any keyword across all sources.
import requests from textblob import TextBlob # pip install textblob NEWS_KEY = "YOUR_NEWSAPI_KEY" def get_news_sentiment(query, days=7): """Fetch news headlines and compute sentiment using TextBlob.""" url = f"https://newsapi.org/v2/everything?q={query}&sortBy=publishedAt&language=en&pageSize=50&apiKey={NEWS_KEY}" articles = requests.get(url).json()["articles"] sentiments = [] for a in articles: text = a["title"] + ". " + (a["description"] or "") blob = TextBlob(text) sentiments.append({ "title": a["title"], "source": a["source"]["name"], "polarity": blob.sentiment.polarity, # -1 to +1 "subjectivity": blob.sentiment.subjectivity, # 0 to 1 "date": a["publishedAt"][:10] }) avg_polarity = sum(s["polarity"] for s in sentiments) / len(sentiments) return { "query": query, "articles_analyzed": len(sentiments), "avg_sentiment": avg_polarity, "signal": "BULLISH" if avg_polarity > 0.1 else ("BEARISH" if avg_polarity < -0.1 else "NEUTRAL"), "top_positive": max(sentiments, key=lambda x: x["polarity"]), "top_negative": min(sentiments, key=lambda x: x["polarity"]) } result = get_news_sentiment("AAPL Apple") print(f"Apple news sentiment: {result['signal']} ({result['avg_sentiment']:.3f})")
Benzinga is the premium news API for finance. They provide pre-categorized news with built-in sentiment scores, earnings call transcripts, analyst ratings changes, and SEC filing alerts. If you need production-grade news sentiment without building your own NLP pipeline, Benzinga is the gold standard.
MarketAux provides financial news with built-in entity-level sentiment — they don't just tell you the headline is positive, they identify which specific ticker the sentiment applies to. A headline like "Apple beats but Intel warns" correctly assigns positive sentiment to AAPL and negative to INTC.
Unusual Whales combines social sentiment with options flow data — a unique combination. They track what retail traders are saying AND what institutions are doing in the options market. Their "Social Sentiment" feed aggregates StockTwits, Reddit, and Twitter mentions with volume-weighted scoring.
Google Trends data is a surprisingly effective proxy for retail investor attention. Academic research has shown that spikes in Google search volume for a stock ticker precede price moves by 1-3 days. The pytrends library makes it easy to pull this data programmatically.
from pytrends.request import TrendReq import pandas as pd pytrends = TrendReq(hl='en-US') # Search interest for "AAPL stock" over last 90 days pytrends.build_payload(["AAPL stock"], timeframe="today 3-m") interest = pytrends.interest_over_time() print(interest.tail(10)) # Compare multiple tickers pytrends.build_payload(["AAPL stock", "TSLA stock", "NVDA stock"], timeframe="today 3-m") comparison = pytrends.interest_over_time() # Related queries (what else people search) pytrends.build_payload(["AAPL stock"]) related = pytrends.related_queries() print("Rising queries:", related["AAPL stock"]["rising"]) # Detect search spikes (z-score > 2 = unusual interest) import numpy as np mean = interest["AAPL stock"].mean() std = interest["AAPL stock"].std() spikes = interest[interest["AAPL stock"] > mean + 2 * std] print(f"Spike dates: {spikes.index.tolist()}")
Another underutilized signal: YouTube video count. When a stock starts getting mentioned in YouTube thumbnails, it's typically entering the "euphoria" phase. The YouTube Data API (free, 10K units/day) lets you count how many videos mention a ticker in their title. A spike in YouTube content about a stock often marks the top of retail interest — useful as a contrarian exit signal.
import requests YOUTUBE_KEY = "YOUR_YOUTUBE_API_KEY" def count_youtube_videos(query, days=7): """Count YouTube videos about a stock in the last N days.""" from datetime import datetime, timedelta after = (datetime.now() - timedelta(days=days)).isoformat("T") + "Z" url = f"https://www.googleapis.com/youtube/v3/search?part=snippet&q={query}+stock&type=video&publishedAfter={after}&maxResults=50&key={YOUTUBE_KEY}" data = requests.get(url).json() return data["pageInfo"]["totalResults"] # Compare hype levels for ticker in ["AAPL", "TSLA", "NVDA", "GME"]: count = count_youtube_videos(ticker) print(f" {ticker}: {count} videos in last 7 days")
No single sentiment source is reliable on its own. The real edge comes from aggregating multiple sources and looking for convergence. Here's a production-ready multi-source sentiment aggregator:
from dataclasses import dataclass from typing import Dict, List import numpy as np @dataclass class SentimentSignal: source: str score: float # -1.0 (bearish) to +1.0 (bullish) confidence: float # 0.0 to 1.0 volume: int # number of data points class SentimentAggregator: """Aggregate sentiment from multiple sources with weighted scoring.""" WEIGHTS = { "stocktwits": 0.20, "reddit": 0.15, "news": 0.30, "google_trends": 0.10, "options_flow": 0.25, } def aggregate(self, signals: List[SentimentSignal]) -> Dict: if not signals: return {"composite": 0, "signal": "NO_DATA"} # Weighted average, adjusted by confidence weighted_sum = 0 weight_total = 0 for s in signals: w = self.WEIGHTS.get(s.source, 0.1) * s.confidence weighted_sum += s.score * w weight_total += w composite = weighted_sum / weight_total if weight_total > 0 else 0 # Determine signal strength if composite > 0.5: signal = "STRONG_BULL" elif composite > 0.2: signal = "BULLISH" elif composite > -0.2: signal = "NEUTRAL" elif composite > -0.5: signal = "BEARISH" else: signal = "STRONG_BEAR" # Check for convergence (all sources agree) signs = [np.sign(s.score) for s in signals if abs(s.score) > 0.1] convergence = abs(sum(signs)) / len(signs) if signs else 0 return { "composite": round(composite, 3), "signal": signal, "convergence": round(convergence, 2), "sources": len(signals), "high_conviction": convergence > 0.8 and abs(composite) > 0.3 } # Example usage agg = SentimentAggregator() result = agg.aggregate([ SentimentSignal("stocktwits", 0.65, 0.8, 150), SentimentSignal("reddit", 0.45, 0.6, 80), SentimentSignal("news", 0.30, 0.9, 25), SentimentSignal("google_trends", 0.55, 0.5, 1), ]) print(f"Composite: {result['composite']}, Signal: {result['signal']}, Convergence: {result['convergence']}") print(f"High conviction: {result['high_conviction']}")
| Source | Price | Data Type | Latency | Signal Quality | Best For |
|---|---|---|---|---|---|
| StockTwits | Free | Messages + sentiment tags | Real-time | Medium | Real-time sentiment gauge |
| Reddit (PRAW) | Free | Posts + comments | Near real-time | Medium | Meme stock detection |
| ApeWisdom | Free | Mention rankings | Minutes | Medium | Quick WSB pulse check |
| Twitter/X | $100+/mo | Tweets + cashtags | Real-time | High | Breaking news detection |
| NewsAPI | Free / $449 | Headlines | Hours | High | Headline sentiment NLP |
| Benzinga | $199+/mo | News + sentiment scores | Real-time | Very High | Production news sentiment |
| Google Trends | Free | Search volume index | Hours-days | Medium | Retail attention proxy |
| Unusual Whales | $40/mo | Social + options flow | Real-time | High | Combined sentiment + flow |
For most individual developers, no. The Basic plan ($100/mo) only gives you 10K tweets/month of search, which is not enough for comprehensive sentiment analysis. The Pro plan ($5,000/mo) is what institutional desks use. For the same price, you can get Benzinga ($199/mo) which provides pre-processed financial news with sentiment scores — much more signal per dollar.
FinBERT (a BERT model fine-tuned on financial text) is the state-of-the-art for financial sentiment analysis. It outperforms TextBlob by 15-20% on financial text classification. Available free on Hugging Face: ProsusAI/finbert. The trade-off: it requires a GPU for fast inference (or Hugging Face's inference API). For batch processing, run it on a GPU instance; for real-time, use the Hugging Face API ($0.06/1K chars).