Now that you have a set of followers, wouldn't it be nice to know a little more about them? In this blog post, I provide some Python scripts that can help you to get to know your followers better. By having this information, you could cater your tweets to their interests.
First, not all followers are created equal. Some of your followers might be following thousands of people, and so your tweets are likely to go unnoticed, lost in the mix of tweets from other people. As an aside, you may be asking how I keep up with the tweets of hundreds of followers. The examples below might give you some hints as to the software I have written to keep up on the statuses of those that I follow.
Personally, I feel that my most important followers are those that have engaged in conversations with me. (Numerically speaking, I give those followers a high "weight" value.) Send me a "custom" Direct Message or better yet, a meaningful @amyiris reply, and I know you are engaged in the discussion. ReTweets also win points in the weighting system! (Here's your chance to increase your value to me!...)
If you like this post, please ReTweet It.
Let's call the remaining followers "silent" followers - they've never engaged directly with me, or said @amyiris in a public Tweet. Even those followers are not created equally. For sake of argument, I'll assume that someone following me who has 1000 followers is "more valuable" than someone who only has 1 follower. So it may be important to know who those people are, perhaps to aim Tweets at them to try to engage them, and move them out of the "silent" group.
So I decided to write a Python Script to allow me to get to know my followers. First I wanted to find out how many followers each one had. Using the Python Twitter library (with only a slight modification to handle pagination of my followers - since I have more than 100), I came up with the following script:
Note, I have customized my twitter.py file to do this. See http://www.pastie.org/395307 for an updated copy. (I'm not thrilled with this version, so it's not quite ready for Prime Time, but if you are experimenting, you'll want this file, or something like it.) Also see http://www.pastie.org/395445 for a version of this program, since Blogger keeps killing my indentation! Sorry!
import csv, math
import twitter as twitterapi
pwd='removed, of course'
api = twitterapi.Api(username='amyiris', password=pwd)
me = api.GetUser('amyiris')
numpages = int(math.ceil(me.followers_count/100.0))
followerpage = (api.GetFollowers(page=x)
for x in range(1,numpages+1))
myfollower = (y[z]
for y in followerpage \
for z in range(len(y)))
csvfile=open('followers6.csv','a')
csvout=csv.writer(csvfile,dialect='excel')
for f in myfollower:
csvout.writerow([f.screen_name,
f.followers_count,
f.status,
])
csvfile.close()
Anyway, this gave me a very cool CSV file, that has a row for each follower. Each row contains my follower's screen_name (Column A), how many followers he or she has (Column B), and their current status (Column C). Here's what it looked like in Excel:

Opening that file in Excel, I could manipulate it to get some stats:
My top follower has 39,060 followers of their own.
I have five followers with zero people following them, and five more that only have one.
My average follower has 386 followers of his or her own, which is significantly skewed thanks to a few big ones. The 50th percentile (median) follower has 156 followers of his or her own.
My "reach" is currently about 2000 people. That is, if I tweet something, 2000 people get it. But there are 769,000 people that are one step removed (including duplicates). That is, if I send out a tweet, and in the unlikely event that all 2000 people that I reach ReTweet it, then the ReTweet would be received 769,000 times (some people receiving it multiple times). Interesting, but not necessarily something to celebrate. This is all experimentation.
Once it's in Excel, you can sort and analyze it. Here's my "long tail" graph, showing how many followers each of my followers has. 32 of my followers have more than 2000 of their own followers, but I cut the graph off there, so that it's a little more meaningful:

Now I have access to some really cool information about my followers. For instance, I've captured their most recent tweets. So I can see what's on everyone's mind, at this precise moment.
You've seen those "tag clouds" (here's a sample, from Wikipedia; credit: Markus Angermeier, so you know what I am referring to):
Imagine if I could get a snapshot of what my followers are talking about or thinking about at this very moment.
I modified my program to parse the statuses of my followers, and to provide a short list of the top 25 words that my followers are Tweeting. Here were the results of my first attempt:
Top 25 words in my Followers Statuses, and the frequency with which they appear:
652 the
611 to
522 a
427 I
325 for
308 of
303 and
291 in
257 is
219 on
(etc.)
Obviously this isn't very helpful. What I'd really like to do is compare the frequency of words to a standard word frequency table. Then I could see if a word stands out from expected conversation.
One way to do this would be to maintain common tweeted words. Another might be to start with a frequency table downloaded from the internet.
The strategy I took was to create a frequency table by grabbing the 1000 most common words from the Public Timeline. And then compare the frequency of words in my followers' statuses against the frequency table from the public timeline. Not a perfect solution, but something that can be done rather quickly with the Python Twitter interface.
You can examine the code below. I chose to normalize words, making them lower case (so the words "I" and "i" get counted together), removing contractions (so "can't" and "cant" are counted together), and changing all other punctuation and special characters to spaces. Then I ranked words relative how much more frequently they appear in my followers' statuses as opposed to the public timeline. If a word appears in my followers' statuses but never appeared in the public timeline, I arbitrarily pretend that it appeared .5 times on the public timeline (to avoid a divide by zero condition).
The results? Here are some of the interesting keywords, and how much more frequently they appeared in my followers' statuses as opposed to the public timeline (at the moment that I ran the program).
night 4.2x
morning 3.3x
google 2.8x
tomorrow 2.5x
thinking 2.5x
project 2.5x
wow 2.4x
code 2.4x
follow 2.4x
media 2.3x
tweet 2.0x
working 1.9x
facebook 1.9x
twurl 1.7x
sorry 1.7x
software 1.7x
mac 1.7x
iphone 1.7x
rt 1.6x
website 1.6x
twitter 1.6x
python 1.6x
html 1.6x
try 1.5x
reason 1.5x
design 1.5x
awesome 1.4x
firefox 1.3x
feed 1.3x
ruby 1.2x
programming 1.1x
presentation 1.1x
And words that my followers spoke about far less than the public timeline: dragon, headache, stay, god, gym, accountable, aesthetically, reasons, 4corners.
I believe that you could compile a very good profile of your followers, using these methods over time. This small sample tells me something that I suspected. My followers appear to be more likely than the average Twitter user to be interested in Python, Ruby, programming and code projects. They seem to work in the software industry, more so than the average Twitter user. You might also guess that they are apologetic that their code isn't awesome, that they don't think or talk about the gym, god, or being accountable!
The word 4corners on the public timeline is an example of a "hot" topic. Since my followers' tweets may be several hours or even days old, hot topics will not appear with the same frequencies as the public timeline. Apparently at the time of my program execution, there was a link circulating about a fire at the 4corners, and that's how that got picked up as one of the 1000 most frequently used words on the public timeline. This brings up another great way to use this data - informing your followers of hot topics that appear with abnormal frequency on the public timeline.
Here's how my code ended up:
import csv, math, threading, time
import twitter as twitterapi
pwd='removed, of course'
api = twitterapi.Api(username='amyiris', password=pwd)
me = api.GetUser('amyiris')
numpages = int(math.ceil(me.followers_count/100.0))
followerpage = (api.GetFollowers(page=x)
for x in range(1,numpages+1))
myfollower = (y[z]
for y in followerpage for z in range(len(y)))
csvfile=open('followers13.csv','a')
csvout=csv.writer(csvfile,dialect='excel')
tagcloud={}
pubcloud={}
def getpublicwords():
statuses=api.GetPublicTimeline()
for s in statuses:
for word in normalize(s.text).split():
pubcloud[word]=pubcloud.get(word,0)+1
def normalize(text):
utext=unicode(text).lower()
utext=utext.replace("'","") # remove contractions
rtext=""
for c in utext:
if ("a" <= c <="z") or ("0" <= c <= "9"):
rtext += c
else:
rtext += " "
return rtext
for f in myfollower:
if f.status:
fstatus = repr(f.status.text)[2:-1]
else:
fstatus=""
csvout.writerow([f.screen_name,
f.followers_count,
fstatus,
])
for word in normalize(fstatus).split():
tagcloud[word]=tagcloud.get(word,0)+1
csvfile.close()
while len(pubcloud)<1000:
getpublicwords()
time.sleep(61.0)
for word,count in tagcloud.items():
tagcloud[word]=count/float(pubcloud.get(word,.5))
print "Top 25 words in my Followers Statuses:"
for count, word in sorted([(v,k)
for k,v in tagcloud.items()],
reverse=True)[:25]:
print count, word
Imagine how much more interesting you can be, if you talk about things that interest your followers!
If you like this post, please ReTweet It.

10 comments:
I think you have the makings of a pretty cool web service here, if you wanted to. Maybe host it on Google App Engine, since you're already doing it in python. :)
Seeing (near) real time tag clouds of what your followers are talking about would be great. Maybe could even cluster them and produce tag clouds for each cluster. Assuming that any decent following consists of a fairly diverse group of people, being able to split them into related groups would be very handy.
Amy, like the scripts, but your hacked version of twitter.py is falling far behind how about you coordinate w/ the project and get your changes merged in?
Liked Jason's comment about making it a service.
Thanks, John. I'm sure I started with an old version of Python-Twitter, but when I searched for a newer version, I came up empty.
Guess I need to look harder!
I'm happy to have my changes incorporated into the original, but don't want to pretend to be as sharp as the original author, or claim ownership.
I'm just experimenting and tossing it out to the community! Hope it's of value to someone.
Jason, great idea. I bet there are some budding entrepreneurs out there that will see your message, and have it created by morning!
FYI, second script has barfed with the following. Don't have more time to debug it, sorry
IMPORTED TWITTER
Traceback (most recent call last):
File "stdin", line 56, in
module
File "stdin", line 23, in getpublicwords
AttributeError: 'NoneType' object has no attribute 'split'
ps. I had to remove angle brackets from around stdin and module to get blogger to accept this
Amy, this link will get you to the SVN repository
http://code.google.com/p/python-twitter/
John, thanks for all the comments and the links.
Regarding the script failing, it appears to be failing to retrieve the Public Timeline.
I guess I should have mentioned, you may need to substitute your user name and password into the file. That's the change that I made prior to publishing the script (I took my password out).
Retrieval of the public timeline may require authentication by Twitter (I didn't think so, but it's possible).
Otherwise, for some reason, you appear to be retrieving a public timeline with no status messages. Could be a problem with my software, it wouldn't surprise me. Or it could be that the Twitter API is busy (?)
But this worked this morning!
Hope it works for you!
Thanks for the hint, Amy. I had put in my own creds. I've tried it again and it's been running for a *long* time, I'll let it keep consuming cycles and see what happens
BTW, for anyone out there who already has an older version of twitter.py installed, you can change the name of Amy's file to twitterAmy.py, and import twitterAmy as twitterapi in the followers.py example. To make this work you additionally need to change the import statement in the twitterAmy.py file to be import twitterAmy and at line 1491 or something like that, you need to change the reference to twitter to twitterAmy and you are off to the races.
John,
Blogger really messed up my indentation - every time I edit the post, it seems to change the indentation.
You may want to check out the pastie ( http://www.pastie.org/395445 ) to make sure that indentation doesn't mess you up.
The other thing to consider is to toss a print statement in there, right before the "csvout" call. Print all those things that are being sent to the csv file. That will give you a sense of progress.
Also, it takes a while to get 1000 words off the public timeline. You can only retrieve every 60 seconds (Twitter caches it, so that's why I have the 61 second delay in there). And to get 1000 unique words, I think it takes about 10 separate calls, times 20 messages. So around 10 minutes in that loop alone.
If it's taking more than 20 minutes, and it has a lot of activity (as opposed to wait time), then you probably have a bug (sorry).
Unless you have a lot of followers. Like I said< I ran it with 2000 followers in about 12 minutes at 7 AM.
Keep me posted, and thanks for trying the software!
Amy, thanks for your help, The version on Pastie.org worked, so you were correct about the indentation. Not sure what the output means yet, so more digging ;-)
For some reason it runs in a never ending loop, each time fetching the same 100 followers (i have more than 100).
It's not an indentation thing.
Help ?
Post a Comment