Download Tweets and Save to MongoDB

Photo by Sara Kurfeß on Unsplash

I have created a simple python code to download Tweets from twitter by using tweepy and save to MongoDB

MongoDB

Download the MongoDB installer and install on your machine. MongoDB installer installs the MongoDB engine in your machine and the default connect string to the MongoDB as mongodb://localhost:27017

In order to view the content on the MongoDB easily, you can also download MongoDB client to view the content. I’d recommend MongoDB Compass.

Launch the MongoDB and click on “Connect”, MongoDB Compass connect to MongoDB server on mongodb://localhost:27017.

Next, you can create a new Database on MongoDB for your project. For example political_data_analysis. This database will be used to keep the collection (a.k.a table) to store the tweets data.

Python Code

Photo by Hitesh Choudhary on Unsplash

Launch Pycharm IDE or any Python IDE, create a new project (e.g. PoliticalDataAnalysis) and make sure the Python version is 3.7 and above.

Download Required Libraries

Create a empty text file to keep all the important and required libraries. The file name is requirements.txt and copy and paste following libraries and version to the requirements.txt.

wheel==0.37.0
gunicorn==20.1.0
flask==2.0.2
pymongo==4.0
black==21.11b1
git+https://git@github.com/yeetornghoo/TwitterDownloadApp.git@0.0.1#egg=TwitterDownloadApp

Line 4 above is to download the python code (TwitterDownloadApp) I wrote to your project. You can find the latest version from here. In this article, I am using the first release 0.0.1.

After the requirements.txt is created, go to the Pycharm IDE, at the terminal run following command to install all the required libraries

pip install -r requirements.txt

Setup Configuration File for MongoDB Connection

In your PyCharm IDE, create a MongoDB configuration file on config/db_config.ini. Setup the MongoDB detail in the file. Make sure the MONGO_DB_NAME is the same as the database name you created above.

MONGO_DB_CONNECTION_STRING=mongodb://localhost:27017
MONGO_DB_NAME=political_data_analysis

Setup Twitter Developer Credential

This step is for you to setup the Twitter Developer credential in order for you to connect to Twitter and download data. You need following 4 details from Twitter developer account, you can obtain those according to https://developer.twitter.com/en/docs/authentication/oauth-1-0a/api-key-and-secret

access_token=<FROM TWITTER DEVELOPER ACCOUNT>
access_token_secret=<FROM TWITTER DEVELOPER ACCOUNT>
consumer_key=<FROM TWITTER DEVELOPER ACCOUNT>
consumer_secret=<FROM TWITTER DEVELOPER ACCOUNT>

Create Twitter credential file config/twitter_api.key in your project, and put the above 4 items to the config/twitter_api.key file.

Create Python Code

Create a new file as main.py to write the code to download the data. First, import the TwitterDownloadApp Library and use the Tweepy Stream Function. The libraries will automatically read the config/twitter_api.key you created above.

from twitterdownloadapp.data_scraper.tweepy.process_tweepy_stream import TweepyStream

Define the collection (a.k.a Table) in the MongoDB you want to store the tweets. In this article, I am storing to raw_malaysia_politic_tweets. Make sure you also set save_to_db = True. If you set it to False then the code will only print the tweets on your console.

# MONGODB CONFIGURATION
mongo_db_collection = “raw_malaysia_politic_tweets”
save_to_db = True

Define the keywords you want to download.

# KEYWORDS
search_words = [“umno”, “Pakatan Harapan”]

Define the area you want to get the Tweets from

# AREA
bottom_left_long = 100.724754
bottom_left_lat = 0.738467
top_right_long = 104.61595
top_right_lat = 6.765193

How the Geo area works is by square, from Top-Right to Bottom-Left. You need to manually find the Longitude and Latitude as below. This URL can help you to get it

Add the last part of the code to pull the data and save to Mongo

# PROCESS
TweepyStream(save_to_db, mongo_db_collection).run(search_words, bottom_left_long, bottom_left_lat, top_right_long, top_right_lat)

Below is the complete code, you can just copy and paste to the main.py

from twitterdownloadapp.data_scraper.tweepy.process_tweepy_stream import TweepyStream# MONGODB CONFIGURATION
mongo_db_collection = "raw_malaysia_politic_tweets"
save_to_db = True
# KEYWORD
search_words = ["umno", "Pakatan Harapan"]
# AREA
bottom_left_long = 100.724754
bottom_left_lat = 0.738467
top_right_long = 104.61595
top_right_lat = 6.765193
# PROCESS
TweepyStream(
save_to_db,
mongo_db_collection
).run(
search_words,
bottom_left_long,
bottom_left_lat,
top_right_long,
top_right_lat
)

Run

Run the main.py from the PyCharm IDE and you should able to see in the console the tweets are being downloaded from Twitter and save to MongoDB

Check on MongoDB, you should able to see newly added Tweets to the DB

--

--

--

Currently a Master’s in Data Science Postgraduate Student at UKM. Previously worked as Software Release Management in SAP Lab Ireland.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Make part of TextView clickable

How to use Feature Dao dapp

Propagating Metadata across our data architecture

##If You Are Python Programmer, Then This Should Worry You

Effective Java! Combine Generics and Varargs Judiciously

DevOps Transformation — The Never Ending Journey

Integrate AWS Lambda, SQS and SNS — a AWS Serverless sample

Farsite Game!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Carlson Hoo

Carlson Hoo

Currently a Master’s in Data Science Postgraduate Student at UKM. Previously worked as Software Release Management in SAP Lab Ireland.

More from Medium

To Automate Post-Sale Flights Funnel For Performing Various Tasks On User’s Behalf

Building a Website From Scratch — Beginner’s version Part I

How to deploy your ML model on web for free

Google Summer of Code 2021 -Week #3–4