Sentiment analysis on news papers headline and store in MongoDB

Wanted to combine a few things that I know or want to learn to create a small project. This project is to use API to make a call to a public API that serve news headline. Then take the title of the news and run it through Google Cloud’s natural language API for sentiment analysis and store the data into MongoDB in the cloud.

I develop and tested the codes on my local computer, but here I am showing how to deploy the on cloud compute VM in Google Cloud.

Step 1 : Create a new VM in Google Cloud

Not going to elaborate on this, you need billing enable for your account, create a project and setup a VM on it. If this is the first time you do it on Google cloud, you will get some credit that will be useful. If not, choose the smallest CPU and memory, it will cost about USD0.01 an hour. Just remember to delete the VM when you are done.

Step 2 : Install the necessary packages

After the VM is created, SSH into it and do the following. Git is to clone my repository with my code and pip is to install few python packages that is required.

$ sudo apt-get install git

$ sudo apt-get install pip

Step 3 : Clone the code from my Github repository

$ git clone https://github.com/sckhoo/sentiment_analysis.git

Step 4 : Setup the environment

Change your working directory into the folder that created from the git clone command

$ pip3 install -r requirements.txt

$ gcloud config set project <PROJECT_ID>

$ gcloud services enable language.googleapis.com

$ gcloud auth application-default login

It will provide you with a link that you need to go to from your browser and it will authenticate you with your google acount. After the authenticated, you will be provided with a code.

Next is to create a file “.env” that wil hold the API token and database credential. The following is the sample, with invalid token. You should create the file in the same format, but use your own credential.

The following are the code for the program. There are 3 main python scripts, main.py. mdb.py and sent.py.

I will using a free news api to pull the news headline. You can get your free API Token from their website.

If you done everything correctly, if you run the main.py script, you will get the following output.

Next is to check your mongodb instance to see the data inserted from the script. The following were filtered from my collection in the database.

I will come back in couple of weeks to write a front end, to visualize the data to look at the trend of the news sentiment, and see if there is any news author that consistently showing negative or positive articles.

Onwards, 1% better everyday.

Comment, suggestion, criticism and pizza are welcome.

Leave a comment

Your email address will not be published. Required fields are marked *