Adam Oudad

Adam Oudad

(Machine) Learning log.

3 minutes read

In this tutorial, you will learn how to stream tweets with Tweepy library in Python.

Twitter API overview

Twitter offers several API, or methods you can use to retrieve tweets.

  • Streaming API
  • Search API

What we are interested in here is the streaming API. We will use it to stream statuses (this is the name given to tweets in Twitter’s API documentation). We want to connect the "statuses/sample" endpoint (documentation).

Note that this method provides a very small percentage of total public tweets, from the Twitter Firehose which streams in real-time all tweets.

For this tutorial, you will need tweepy. It is available via pip install tweepy or conda install -c conda-forge tweepy if you use anaconda.

Get your credentials from Twitter.com

Visit developer.twitter.com, and log into your Twitter account. Create a new app, and under Keys and tokens, copy your credentials

# Replace the "None"s by your own credentials
ACCESS_TOKEN = None
ACCESS_TOKEN_SECRET = None
CONSUMER_KEY = None
CONSUMER_SECRET = None

Set up the connection

  from tweepy import OAuthHandler
  from tweepy import API
  from tweepy import Streamauth = OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
  auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
  api = API(auth, wait_on_rate_limit=True,
     wait_on_rate_limit_notify=True)

First we import all necessary classes from tweepy library.

  • OAuthHandler stores your credentials
  • API class sets the connection to Twitter API using your credentials
  • Stream class streams the downloaded data

We set up the connection by providing our credentials to an OAuthHandlerinstance, and giving it to an API instance.

Tweepy provides two useful parameters for handling download restrictions from Twitter API:

wait_on_rate_limit
when download limit is reach, the program will wait and try to reconnect when rate limit is replenished
wait_on_rate_limit_notify
a notification will be printed when download limit is reached

The listener

The next step is defining our Listener, that will react on events from the API and handle the received data.

  import sys
  from tweepy.streaming import StreamListener
  class Listener(StreamListener):
  def __init__(self, output_file=sys.stdout):
      super(Listener,self).__init__()
      self.output_file = output_file
      def on_status(self, status):
          print(status.text, file=self.output_file)
      def on_error(self, status_code):
          print(status_code)
          return False

on_status is the method that will be called when a tweet is downloaded. We print the tweet the file output_file. This file is by default the standard output, as we specify it in the __init__ method.

on_error handles errors returned by the API.

Then, we have to create an instance of the Listener class. You can either choose to put the downloaded tweets into a file.

  output = open('stream_output.txt', 'w')
  listener = Listener(output_file=output)

or to standard output, by not specifying any file, instantiating with listener = Listener().

Streaming

We are now ready to stream:

  stream = Stream(auth=api.auth, listener=listener)
  try:
      print('Start streaming.')
      stream.sample(languages=['en'])
  except KeyboardInterrupt as e :
      print("Stopped.")
  finally:
      print('Done.')
      stream.disconnect()
      output.close()

It is good practice to enclose the call to stream.sample() inside try statement to handle KeyboardInterrupt for example.

Final words

Tweepy is a quite powerful library providing access to Twitter APIs. You can check the documentation here for more information.

Here is the complete code

My original post on medium.

comments powered by Disqus

Recent posts

See more

Categories

About

This website is a weblog were I write about computer science, machine learning, language learning.