In this tutorial, you will learn how to stream tweets with Tweepy library in Python.
Twitter API overview
Twitter offers several API, or methods you can use to retrieve tweets.
- Streaming API
- Search API
What we are interested in here is the streaming API. We will use it to stream statuses (this is the name given to tweets in Twitter’s API documentation). We want to connect the "statuses/sample" endpoint (documentation).
Note that this method provides a very small percentage of total public tweets, from the Twitter Firehose which streams in real-time all tweets.
For this tutorial, you will need tweepy. It is available via pip install tweepy
or conda install -c conda-forge tweepy
if you use anaconda.
Get your credentials from Twitter.com
Visit developer.twitter.com, and log into your Twitter account. Create a new app, and under Keys and tokens, copy your credentials
# Replace the "None"s by your own credentials
ACCESS_TOKEN = None
ACCESS_TOKEN_SECRET = None
CONSUMER_KEY = None
CONSUMER_SECRET = None
Set up the connection
from tweepy import OAuthHandler
from tweepy import API
from tweepy import Streamauth = OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
api = API(auth, wait_on_rate_limit=True,
wait_on_rate_limit_notify=True)
First we import all necessary classes from tweepy library.
- OAuthHandler stores your credentials
- API class sets the connection to Twitter API using your credentials
- Stream class streams the downloaded data
We set up the connection by providing our credentials to an OAuthHandlerinstance
, and giving it to an API
instance.
Tweepy provides two useful parameters for handling download restrictions from Twitter API:
- wait_on_rate_limit
- when download limit is reach, the program will wait and try to reconnect when rate limit is replenished
- wait_on_rate_limit_notify
- a notification will be printed when download limit is reached
The listener
The next step is defining our Listener, that will react on events from the API and handle the received data.
import sys
from tweepy.streaming import StreamListener
class Listener(StreamListener):
def __init__(self, output_file=sys.stdout):
super(Listener,self).__init__()
self.output_file = output_file
def on_status(self, status):
print(status.text, file=self.output_file)
def on_error(self, status_code):
print(status_code)
return False
on_status
is the method that will be called when a tweet is downloaded. We print the tweet the file output_file
. This file is by default the standard output, as we specify it in the __init__
method.
on_error
handles errors returned by the API.
Then, we have to create an instance of the Listener class. You can either choose to put the downloaded tweets into a file.
output = open('stream_output.txt', 'w')
listener = Listener(output_file=output)
or to standard output, by not specifying any file, instantiating with listener = Listener()
.
Streaming
We are now ready to stream:
stream = Stream(auth=api.auth, listener=listener)
try:
print('Start streaming.')
stream.sample(languages=['en'])
except KeyboardInterrupt as e :
print("Stopped.")
finally:
print('Done.')
stream.disconnect()
output.close()
It is good practice to enclose the call to stream.sample()
inside try
statement to handle KeyboardInterrupt
for example.
Final words
Tweepy is a quite powerful library providing access to Twitter APIs. You can check the documentation here for more information.
Here is the complete code