Home < Dev Blogs < Dev Blog 1: Wu-Trump

Dev Blog 1: Wu-Trump

Make America Swarm Again
11/4/19

Foreword: This is not any sort of endorsement for President Trump. I know I have two Trump-related things on my site now (see: This Is Trump): he’s an easy target and apparently a great source of inspiration for side projects.

I was sitting on the couch one day listening to Wu-Tang Clan and it hit me: I should write a Twitter bot that reads in Wu-Tang lyrics and Donald Trump’s tweets and generates its own tweets based off the two!

Ok I don’t remember exactly how the idea came to be, but it was something like that.

This was a wacky side-project that drew on my experiences in previous classes. A few years ago in Advanced Programming Techniques, we learned how to develop a Markov Chain that reads in a bunch of sample text, constructs a rudimentary language syntax, and generates simple (usually nonsensical) sentences. Later on I took Web and Mobile Development, where we learned to leverage REST APIs and make simple GET/POST requests.

For this project I used a simple open-source Markov Chain implementation in Python, Markovify. It let me control stuff like sentence length, so I could ensure the tweets generated wouldn’t exceed Twitter’s 280 character limit. If you’re interested in learning more about the algorithm, check out this article; I read it before starting this project, and it was helpful to see how it works in depth.

I tested it out by feeding it some Wu-Tang; the lyrics from each song on Enter The 36 Chambers were easy to find on Genius.com. I had to clean up/cut a bit of the text in the lyrics text files, as RZA threw in lots of random conversational bits and Genius added a bunch of annotations.

After cleaning up the input, I was able to start generating some Wu-Tang lyrics:

Not my goal, but an enjoyable milestone.

The next step was to pull tweets from Donald Trump’s twitter. I found a Twitter API named Tweepy and set up a public/private pair of consumer keys and access tokens. I was able to start pulling Trump’s tweets fairly easily, so I set it to pull his most recent 1000 tweets and write them to a file:

## Authenticate using API keys...

# Open file (create if doesn't exist)
f = open(filename,"w+")

i = 0
limit = 1000

# fetch tweets
for status in tweepy.Cursor(api.user_timeline, count=limit, screen_name='@realDonaldTrump', tweet_mode='extended').items():
    
    # skip retweets
    if hasattr(status, 'retweeted_status'):
        continue
    
    # get full text and parse symbols correctly (ex. &amp -> &)
    full_text = status._json['full_text']
    tweet = HTMLParser.HTMLParser().unescape(full_text)

    # write tweet text to file (make sure to encode as UTF-8 to avoid encode error when writing)
    f.write(tweet.encode('utf-8') + "\n")
    
    # stop if reached limit
    i += 1
    if i >= limit:
        break

print "wrote",i,"tweets to",filename
f.close()

Then all I had to do was write a simple script that fed all the Wu-Tang lyrics and Trump tweets into Markovify and generated sentences for tweets:

def ReadFolder(folder_name):
    ## read in text from each file in folder...

# Read in Wu tang lyrics and trump tweets
wutang = ReadFolder("WuTang")
trump = ReadFolder("Trump")

# Put them together and build the model
text = wutang + trump
text_model = markovify.NewlineText(text)

# Print 10 randomly-generated sentences of no more than 140 characters
for i in range(10):
    print(text_model.make_short_sentence(280))

Running this spits out a handful of sentences generated from both text sources.

You can see that all of them are essentially gibberish sentences, and some clearly draw more from the Wu-Tang text from the Trump tweets (and visa versa for others). There’s also a very neat unintended side-effect of this program though: some sentences include hashtags and even quote-tweets used in Donald’s tweets. Twitter seems to handle these elements of tweets as raw text in the tweet body, which get represented as embedded content when viewed on Twitter. I didn’t expect this to happen at all, but it was really entertaining to see some of these tweets be targeted at random accounts from Donald’s tweets.

Unfortunately I couldn’t find a reliable way to automate the process of turning these sentences into tweets. In theory I could set it up to automatically generate a sentence and tweet it every day or so, but I wanted to make sure that each tweet had a good balance of Tang and Trump. More importantly, each tweet had to be funny! I decided to just cherry-pick my favorite sentences generated by the program and manually tweet them on the associated twitter account.

This was a fun project to work on outside of my work at Night Kitchen. My friends, coworkers, and even professors really enjoyed it! In fact, the professor who teaches the class where I learned this stuff asked to share it with his students as motivation for the Markov Chain assignment. I’m glad I could use what I’ve learned to make something fun and cool with swear words and computer science.

Check out the Github repository here.