uncategorized

Create a Twitter Bot in 4 Simple Steps

Bots are usually created to automate certain repeated task like you must be visiting your favourite website’s regularly to read the latest post instead you could create a Bot that notifies’s you with the URL of new content which you should check it out. As an example of such Bot, we will create Bot that aggregate memes from 9GAG and Reddit on post it on the Bots Twitter account.

#What this Bot will do?

  1. Fetch images from Reddit and 9GAG and tweet it at a regular interval (say 15 mins).
  2. When every we have new follower we will tweet him a friendly welcome message.

#What do you need?

  1. Twitter account - you have to login into developer portal to get credentials for bot
  2. Nodejs
  3. Redis (for message queue) - spam control will explain more on this.
  4. Third party npm packages like twit, bull etc.

#Step 1: Creating the twitter app

If you have a regular twitter account then you can log in at apps.twitter.com and create a new application. Once you have created the application go to permissions tab and select Read and Write permission. Then go to Keys and Access tokens tab, you have to copy following four keys

  • Consume key
  • Consumer Secret key
  • Access Token
  • Access Token Secret

this tokens will be used by the twit library for authentication purpose otherwise you cannot send a tweet to your twitter account.

#Step 2: Now you have my permission to code

First, let’s create npm project and install necessary packages, we require twit package to interact with twitter API and bull as a message queue.

You can clone the github repository of this project and get the code, and you can follow along to understand the project.

Required npm packages :

  1. twit - Interact with twitter API
  2. bull - Redis backed message queue library
  3. feedparase - Parse 9GAG feeds
  4. cheerio - HTML parser

#Basic setup :

  1. Initialized the project with command npm init and fill the requested details

  2. Install the necessary packages for the project by using this command npm install --save twit

  3. Hello world tweet code.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
var Twit = require(’twit’);  

/* fill this object with the keys you got from twitter dev account */
var config = {
consumer_key: '',
consumer_secret: '',
access_token: '',
access_token_secret: ''
}

/* create instance */
var Twitter = new Twit(config);

/* post a tweet */
Twitter.post('statuses/update', { status: 'hello world!' }, function(err, data, response) {
if(err){
console.log('Error posting tweet')
} else {
console.log('Congrats! tweet posted')
}
})

Now that we can tweet programmatically our next task is to post some meaning full content, may some funny pic or some quotes, facts, etc. Which takes us to the next step.

#Step 3: Fetching the content for your Bot

We will fetch 9GAG and Reddit feeds and extracts title and URL of memes from the feed. There is no official 9GAG RSS feeds but there are some other unofficial sources which can help us, feeds are in XML format which will be parsed using npm package feedparser. Feeds which we get in XML format still needs some cleaning so we use cheerio library to parse HTML tags and extract required text and image URL from the HTML content.

Below is the code for fetching and parsing the feeds and added it to queue

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
var FeedParser = require('feedparser');
var request = require('request');
var feedparser = new FeedParser();
var cheerio = require('cheerio');


var gagFeedUrl = 'http://9gag-rss.com/api/rss/get?code=9GAG&format=1';
var req = request(gagFeedUrl)

req.on('error', function (error) {
console.log('Feed fetching', error)
});

// stream the response to feedparser, we don't have to wait for
// entire response to arrive
req.on('response', function (res) {
let stream = this;

if (res.statusCode !== 200) {
this.emit('error', new Error('Bad status code'));
} else {
// streaming the response
stream.pipe(feedparser);
}
});

// error handling
feedparser.on('error', function (error) {
console.log('Feed Error', error)
});

// parse response by feedparser
feedparser.on('readable', function () {
let stream = this;
let item, content = {};

while (item = stream.read()) {
console.log('Reading feed');
// parsing the messy XML data
content.title = cheerio.load(item.title).text();
$ = cheerio.load(item.description);
links = $('img');
if ($(links).length != 0) {
if ($(links).length != 0) {
$(links).each(function (i, link) {
content.url = $(link).attr('src');
});
}
} else {
links = $('source');
if ($(links).length != 0) {
$(links).each(function (i, link) {
content.url = $(link).attr('src');
});
}
}
console.log('Content : ', content)
/* content object format
constent = {
url :'image_url',
title: 'title of the meme'
}
*/
}
});

While Reddit, on the other hand, gives response in JSON format nothing much to do here just fetch the content and convert it into the content object as shown in above code.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
var request = require('request');
var _ = require('lodash');

/* fetch post in from format for funny sub-reddit */
var redditUrl = 'https://www.reddit.com/r/funny.json';
request.get({
method: 'GET',
url: redditUrl,
json: true
}, function (err, body, data) {
_.each(data.data.children, (post) => {
let content = {
url: post.data.url,
title: post.data.title
}
console.log('Content : ',content)
});
}

We will be fetching feeds at regular interval and frequency differs for different source, because in 9GAG content are posted very often as compared to Reddit and if we fetch the feeds too frequently then we will get same feeds(which is of no use) and also our IP might get flagged as spam so to avoid that we will fetch content from 9GAG every 30 min and from Reddit every 60 min.

We should not tweet all the content which we fetch right away this will flood people’s timeline with our post’s following us or we might hit the Twitter API limit.

So we have couple of problems at our hand that to solve:

  1. Fetch feeds from different source at different interval, interval may differ for each source
  2. Post tweets to twitter account at regular interval (say 15 mins)
  3. Persist the content object in case our program crashes

To solve this we will use queues. There is a nice Redis back library for queue call bull. Addressing our first requirement bull has repeatable jobs feature which will send the same message to the queue at the defined interval. This message will trigger a callback function with the message as the parameter to the function and a callback function which will be called to notify bull that function is executed successfully so that error is not logged. And addressing our last requirement bull persist the messages in Redis so we don’t have to worry about parsed feeds lost due to the program crash.
Once we have fetched the feeds from the Reddit and 9GAG we will push the parsed content to another queue call tweet_queue. Posting the message on tweet_queue will trigger a callback function which will tweet the image with the title. The code is shown below how it will be carried out.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
var Queue = require('bull');
var redis_config = {
'port': 6379,
'host': '127.0.0.1'
}

/* create a queue */
var nineGagQueue = new Queue('9gag_queue', redis_config);
var tweetQueue = new Queue('tweet_queue', redis_config);
var redditQueue = new Queue('reddit_queue', redis_config);

/* actual processing of the queue message is done in this callback function */
nineGagQueue.process((job, done) => {
/* fetch feeds from 9GAG here
* code to fetch the content is shown above
* */
var content = {
'url': 'image url',
'title': 'post title'
}

// add content to twitter queue
tweetQueue.add(content);
done();
});

redditQueue.process((job, done) => {
// fetch feeds from Reddit here
// code to fetch the content is shown above
var content = {
'url': 'image url',
'title': 'post title'
}

// add content to twitter queue
tweetQueue.add(content);
done();
});

/* add message to queue */
nineGagQueue.add({
'url': 'http://9gag-rss.com/api/rss/get?code=9GAG&format=1'
}, {
repeat: {
cron: '*/30 * * * *'
}
});

redditQueue.add({
'url': 'https://www.reddit.com/r/funny.json'
}, {
repeat: {
cron: '6 */1 * * *'
}
});

Now that we have the Bot and the content we are the final stage making it work together. We also need to make our content to work for us just tweeting is not enough we also have to try getting noticed without spamming and irritating user. This takes us to the next step.

#Step 4: Getting your Bot noticed

Creating Bot is not enough, we also need some discovery mechanism which will help our Bot get to discovered and gain some followers. One method is to have hashtags in tweets. The question is how do we generate meaningful hashtag? The title which we extract from the feed can be of some help. As the words of title hold some context to the image, choosing the word to hashtags form the title will be a sensible approach. But again there will be many words in the title, we won’t be hash tagging all the words only one or two. Heuristic to choose the word to hash tag is very simple just don’t tag stopwords and choose any other two words from the title text. There is a nice npm package stopword which has all the English stop words.

Stopwords are the most frequent words in the language which contribute very little to the meaning of the sentence. They are usually the connecting words like a, the, etc.

Code for tweeting and hashtagging word is as follows

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
/*
* this util method helps to upload the remote url directly to twitter
*/
var uploadRemoteMedia = require('../utils').uploadRemoteMedia;
var sw = require('stopword')
var _ = require('lodash');

tweetQueue.process(function (job, done) {
var content = job.data;
uploadRemoteMedia(twitter, content.url)
.then(function(bodyObj) {
const newString = sw.removeStopwords(content.title.split(' '))
if (newString.length > 0) {
// hashtagging the word that are not in stopword
let already = {};
for (let i = 0; i < 3; i++) {
let replace_word = newString[_.random(0, newString.length)]
if (!already[replace_word]) {
content.title = content.title.replace(replace_word, '#' + replace_word)
}
already[replace_word] = true;
}
}
let status = {
status: content.title,
media_ids: bodyObj.media_id_string // Pass the media id string
};
twitter.post('statuses/update', status, function (error, tweet, response) {
if (error) {
console.log('tweeting error');
} else {
console.log('tweeted');
}
done();
});
})
.catch(function (err) {
console.log('upload err', err);
done();
})
})

uploadRemoteMedia is helper method to upload image URL directly to twitter without downloading it to the local system then again upload it to twitter. It uses twitter stream API to stream the media content. Code for the function is in this link.

Once we have a follower it would be nice to greet the follower with a message like Nice meeting to you virtually. This will make our Bot a little interactive rather than just sitting passively and tweeting memes. There is twitter stream API which can help us achieve this functionality. Here is the snippet of the code which will help us achieve that

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

var stream = twitter.stream('user');

stream.on('follow', function (stream_event) {
console.log('Someone just followed us');
var user_name = stream_event.source.screen_name;
var tweet = {
status: 'Hi @' + user_name + ' ! Nice to meet you virtually.'
}

twitter.post('statuses/update', tweet, function(err, data, response) {
if(err){
console.log("Error tweeting");
}
else{
console.log("tweeted successfully");
}
});
});

All the code we saw above is in bits and pieces, to run the project clone the github repository, edit the config file with the twitter credential’s and run npm start commands to see Bot in action.

#Conclusion

As we saw how easily we managed to create a bot that aggregate images from various sources and tweets it. The purpose of the bot is to automate some task in our case we were aggergating memes you could do something other sorts of automation like aggregating feeds from various news sites or posting links of top Hacker news posts etc. Possibilities are limitless.

#Useful links

  1. Link to github repository
  2. Twitter Dev portal
  3. Twitter API Documentation
  4. Twit docs
  5. Bull docs
Share