Tag Archives for twitter
I have complained about Twitter’s search system before. I always used to think it was strange that you had to go to a subdomain, search.twitter.com. just to search anything at all. Eventually they integrated the search engine right onto the home page, so that problem went away. But as we’ve known for a while, it is nearly impossible to find older tweets since Twitter Search only keeps an index of tweets searchable for about two weeks.
Now of course that is probably small potatoes for most: why do you want to be able to search old tweets when Twitter is all about real time information? And now that the Library of Congress is taking over, who cares? These are good questions, and ultimately, the times are few when I truly need to find an old tweet through a search engine. But these things are steeped in principle: I should be able to if wanted. Moreover, there should be a better way to find a month-old tweet (one that I wrote, no less) than paging back through my entire timeline by clicking “more” 40 times.
Today a quirk of Twitter search was exposed that is even stranger. Apparently, Twitter is filtering search results that contain the word “RT.” This cuts down on repetitive results, to be sure, but again, a tailored list of results is not what we should be receiving from searching. Personally I think repetitive RT tweets are 1) easy to ignore and 2) a visual cue for gauging the popularity and, in some ways, the importance of a particular tweet.
I was a bit skeptical of this glitch even happening since it sounds like a bug, but it does indeed occur. Here’s the rub: it only happens when you are logged in. So logged-out or non-users searching the Twitter homepage get full results, as does anyone searching on search.twitter.com.
It is likely that Twitter is doing this on purpose, and for several reasons.
- They want to clean up results for users and see it as a wanted convienience and added value (which it may indeed be for many people).
- They want to encourage use of their “Retweet” button are are underhandedly “forcing” users to use it if they want to be included in search results.
- Building on the two above, @josiefraser from the Digizen project in the UK, thinks they are doing this to follow up on search deals with Google and Microsoft, to have Twitter’s new and approved Retweets as a way to easily and quantitatively rank tweet relevance.
So anyway, just another thing to be skeptical of when using Twitter for searching. With the addition of “sponsored tweets” and now the elimination of manual RTs, Twitter Search has really hit rock-bottom in terms of transparent and accessible search results. Worse yet, the users who do use the Retweet button do not get credit for having done so. All that appears is a note, saying that a certain tweet has been retweeted by x number of others (see picture above).
Hopefully the serious research archive at the Library of Congress will be useful for some who need a real archive. For the rest of you, hopefully most of you will see this as an improvement, an elimination of noise, and tell me to quit my bellyaching. As for me, well, I’ll just go crawl back under my griping rock and wait until someone finds me an easy way to search old tweets (that doesn’t involve Google or equally unappealing RSS parsing).
Update: The Next Web has weighed in saying, “If I were a betting blogger, I’d place my wager on Twitter addressing the filter as a “coding error” that will soon be “corrected”.” We’ll see.
By now everyone is up to their ears with tweets about the Library of Congress’s annoucement that they will archive every Tweet. Here are my initial concerns and lauds.
- Cost. Library Journal has already questioned this. How much storage space is this going to require? How will it be sustainable? And how often are they planning on doing updates to the data stream? Will they begin collecting Tweets in real time? Monthly? Yearly?
- Content and archival quality. What about all those shortened bit.ly links? Or the old ones from services that have shut down, like Twurl? Or the really old ones that might be full URLs but that have rotted away? We can’t expect this to be perfect, but is LOC planning on trying to capture anything external to what the tweets may refer to? I got this idea from @dancohen. He suggests that LOC may need to take snapshots of the linked websites, and I think that sounds almost essential in a way albeit messy and difficult.
- Searchability. This could either be the greatest thing to happen to Twitter search, or a huge disappointment. Will LOC make their database of Tweets searchable? Right now, Twitter search is good for about two weeks. Library of Congress has a huge opportunity to blast that wide open, and we can only hope that they are able (infrastructure and $$$-wise) to do so.
- Privacy. A commenter was posted on the LJ blog about this issue. Is there a privacy problem here? Yes, our tweets are public, but is it somehow unethical even if it may not be a violation of copyright to republish Tweets in what could become public archive? Don’t ask me for an answer. Because I’ll say “no, it isn’t.”
- Metadata. How will the data about the tweets and their authors be captured and stored? Furthermore, Twitter is about to let us start adding annotations and other metadata to tweets in our stream. Will this sort of marginalia be lost?
All in all I have a feeling that this project is going to set a tone for social media archiving practice. One of the most talked about services being archived by one of the world’s largest libraries. If they truly think this is important (and I am tempted to agree), I think there is an excellent opportunity here to demonstrate that importance publicly. Essentially, I think the LOC is about the create the standard and best practices for social media archiving with this project, for better or for worse. If it is not implemented well in the beginning, it has the potential to set the bar too low (in both the technical and the public eye) for future endeavours seeking to capture online content.
In any case, this is a very exciting development to round off my library education. Two more days!
UPDATED Apr 15:
- ReadWriteWeb has some more good questions. Among them: “Will the archive include friend/follower connection data? Will it be usable for commercial purposes? Will there be a Web interface for searching it, and will that change the face of Twitter search for good? Is there any way that the much larger archive of Facebook data could be submitted to the same body for analysis of the same kind?” The answer to some of these is already known: no commercial use, there will [sounds like] be little web interface for searching–instead they will present a curated set for public use, while the entire archive will remain for serious research only.
- To address the problem of search, Google Replay was announced yesterday as well. This is Google’s attempt to capture what SearchEngineBLog calls a “vox populi” view of historical events. You can essentially search Google’s index of tweets easily for a specific date or range and keywords to get a sense of what was said about topics such as health care reform. With Twitter handling a reported 19-billion searches a month on their junky index, it’s about time we got another option. Google Replay, just like in their real-tme results display, resolves those shortened links, but I don’t know whether or not the full URL is saved within the index or if it is resolved on the fly. My guess is the latter.
What I want and have always wanted was a way to search for specific tweets by specific users. Sometimes I can recall a fuzzy thing like, “I know @somebody tweeted something about “Topic A” like a month ago.” With Google Replay, we’re getting closer, but it’s not perfect, yet. It does effectively use Twitter handles as a search term, for example: “iphone @danhooker” brings up some tweets (but not all) that I have sent or that were RTd by me. I hope it will get better. Google has that habit, so I fully expect–and pray–this will be a workable option for meaningful Twitter search in the future.
Twitter has just made a number of pretty big announcements in the past two days. First, they announced a potential “huge” overhaul of their web UI. Then yesterday, they release Twitter for Blackberry AND they announced that they have acquired Atebits, the little company that makes Tweetie, a popular Mac and iPhone Twitter client and are going to turn it into Twitter for iPhone.
What does this all mean? I don’t see the business model here yet, but they are clearly working on something. Twitter for iPhone (aka Tweetie) is moving from a $2.99 app to become free, so they are not monetizing the app purchase so far. One thing that Tweetie for Mac and other clients have done is put ads in the stream in order to get a little bit of revenue that way. Is that something Twitter is hiding up their sleeve? We don’t know now, and until we do, I guess all we can do is be happy. (Or dismayed at the proliferation of mobile phone “apps” instead of standards-based mobile web sites). The attitude of one Twitter funder is expressed this way:
Much of the early work on the Twitter Platform has been filling holes in the Twitter product. It is the kind of work General Computer was doing in Cambridge in the early 80s. Some of the most popular third party services on Twitter are like that. Mobile clients come to mind. Photo sharing services come to mind. URL shorteners come to mind. Search comes to mind. Twitter really should have had all of that when it launched or it should have built those services right into the Twitter experience.
With the launch of Twitter for iPhone and Blackberry it seems that some of those services are getting built in as we speak. One thing that dismays me a little bit is that there are no rumours about Twitter Search being improved, or the indexing and archiving processes getting any better. Maybe this is the librarian in me rearing its ugly head (or the subject of another blog post), but we need an effective and non-maddening way to get to old tweets. I guess I’ll just hang my hat on that one, and go back to clicking that “more” button.
I have been considering recently why we use Twitter. I know why *I* use it: it rocks for networking and collecting and sharing resources, and filtering the web-at-large through my network of other librarians and educators and web junkies. Collectively speaking however, I hear a lot of chatter about the concept of the “real-time web” and Twitter’s usefulness as a real-time search tool. ReadWriteWeb ran an interesting series recently about this topic of the “real-time web”, and it necessarily focused a lot on Twitter because right now it is the focal point of content coming in real-time. (Friendfeed, of course, is also out there, but so many fewer people use it that it gets ignored mostly.)
Hammering home this particular emphasis, the somewhat recent re-design of Twitter’s home page positioned the service as one that is based around Trending Topics and searching. My question is why is there all this buzz around the real-time web and searching on Twitter, when we all know that the Trending Topics are routinely clogged with spam, and even if they weren’t, are mostly about things like Chris Brown’s raunchy bowtie on Larry King, or what Miley Cyrus and Katy Perry are chatting about today. (Not to mention the fact that Twitter only indexes your Tweets in their search engine for around a week and a half.)
With the exception of tracking certain conferences via hashtags, (some of which now are even hard to follow due to spam), sometimes I feel like I am the only one who has never found Twitter Search to be useful. I find it very difficult to argue Twitter’s professional utility when the first thing a new user sees upon visiting the home page is a bunch of strange looking “topics” with odd ‘#’ symbols everywhere and the name Jay-Z three times in big blue font.
And that is not Twitter’s only problem if they want to be taken seriously as a real-time web tool. Slate ran an article recently about why microblogging (and implicitly, the real-time web) is too important for Twitter to be the only service out there. It was produced in the wake of the DDOS attacks that took Twitter offline briefly about a month ago. Imagine your RSS reader: if one blog or feed is down, you only lose access to that particular stream. On Twitter, as it stands now, if one stream is down, they’re all down. As I was writing this, I lost access to my Twitter home page for a couple minutes. What’s real-time about that?
One of the best critiques of Twitter I have read recently came from James Clay in his post entitled “Ten reasons why Twitter will eventually wither and die…” I would love to argue with him about why I think Twitter could outlast some of these things, but instead I am begrudgingly inclined to agree. Unless Twitter realizes its own importance, or the real-time web junkies start exploring new venues into making this data decentralized, then I think we are in for (or in need of) a real wake-up call sooner rather than later.