Questio Verum

Monday, February 14, 2011

HUCS Quest 2011

The CS department is offering a $1000 scholarship for the first student who successfully completes the "Quest". You may obtain a Registration Form from Dr. Baird which must be turned in to the secretaries in Science 100 by noon Tuesday Feb 15. The first clue will be posted at 4 pm on the same day. The following information is from Dr. Baird:

HUCS Quest (Harding University Computer Science Quest) is intended to be a fun, challenging and intellectually stimulating game for computer science and computer engineering majors. A series of clues and hints will be given which will ultimately lead to the final clue, which contains directions for claiming the prize. The prize will be a $1000 scholarship awarded for the following semester to the winner of the quest. The winner of the quest must be eligible to receive the scholarship, or willing to designate someone who is eligible to whom the scholarship will be given. The recipient of the scholarship must be a computer science or computer engineering major.

Contestants must sign up to be in the contest and, when eligibility has been verified, they will be added to a special "HUCS Quest Roster" on the Easel system. There will be a series of clues and hints given out over the Easel system. The timing of the release of clues and hints will be determined by the judges in an attempt to influence the length of the contest. (We would like for the contest to last no longer than three weeks.) The clues and hints will ultimately lead to the final clue which is a microSD memory chip which will be hidden somewhere on the Harding campus. This chip will contain directions for claiming the prize. Contestants must follow those directions to claim the prize.

Clues and hints will involve several aspects of computer science, such as cryptography, algorithms, file structures, etc. Solving the clues may require construction of a computer program. Almost all of the clues and hints also involve knowledge of some other area of learning such as languages, art, literature, music, history, mathematics, or one of the sciences. Many of the clues will cause contestants to research both online and in the library and to consult with friends in other disciplines.

Students may form teams to work together on the quest. Teams may be from 1 to 3 persons and must be declared when registering for the contest. The scholarship prize will be divided evenly between team members and all eligibility rules apply to each member of the team.

When students sign up to participate in this contest, they will also sign a form indicating their willingness to abide by the contest rules, which are given in the attached document. At the time of registration, your major (which must be CS or CE) will be confirmed on Pipeline before your registration form is accepted. If you are currently one of these majors and your major is not listed as such, please contact the Registrar's office to get it changed before you come to sign up for the contest.

If you have any questions about this quest, please contact me. We had a very successful HUCS Quest contest in February 2009. I hope this one will be fun for all who participate.

Tim Baird, Ph.D.
Chair, Department of Computer Science

Update:

The contest is over. First place: Brandon Huber, Robert Craig, and James Robbins. Second place: Nathan Hourt

Friday, February 11, 2011

Saving 172 BBC websites with BitTorrent

A recent budget cut at the BBC meant that as many as 200 websites were going to be shut down. However, an individual named Ben Metcalfe crawled 172 of the websites before they were deleted and has made them available via BitTorrent. I don't think Metcalfe really needed to expend the effort since it's likely the Internet Archive has archived the sites or will shortly, but it's nice to see an individual being pro-active in ensuring the sites would not be lost. The loss of the sites is a big downer for some who have contributed significant content to the sites in the past.

Friday, February 04, 2011

Bing is "copying" Google's search results?

Earlier this week, Google revealed the results of a sting operation that appeared to catch Bing red-handed. Microsoft has apparently been using Internet Explorer to determine which queries and search results their users were clicking on when using Google, and they incorporated that information into Bing's search results. Although Google says this is unfair copying of search results, Microsoft is claiming that the information is just a small part of their overall formula for ranking search results.

The debate has received a lot of attention. Even Colbert had something to say about it: "Evidently, hiybbprqag is a word meaning 'You got serverd!'"

This debate has been good fodder for my Web Science course, and it was even relevant to my Seminar class which discussed ethics and intellectual property this morning. Search engine results are intellectual property, so is Microsoft's use of clickstream data fair use, or did they cross the line?

Update:

This post generated some interest on my Facebook account. One of the comments included this link to Danny Sullivan's article that gives more analysis of the situation.

Thursday, February 03, 2011

A short history of computing

I finally produced a set of 40 slides on the history of computing in both PowerPoint and PDF formats. I injected a little humor, including some "infamous" quotes like Ken Olson's: "There is no reason anyone would want a computer in their home."

Along with standard events like the Analytical Engine, ENIAC, and the Internet, I've also included some of my favorites like Tron, the first movie to use extensive 3D graphics. Most of the photos were obtained from Wikipedia, but I included a few of my own, like this photo of the first Google server that is currently housed in the lobby of the Gates Building at Stanford.

I know there was a lot of stuff I left out. If there's something you think I should really include in my slides, let me know, and I'll give it due consideration... my CS1 students will also thank you for suggesting more information they have to remember for their first exam. wink

You can also check out some of my other historical slides on graphical user interfaces and the Internet.

Thursday, January 27, 2011

Introduction to Web Science

The spring semester is at full throttle, and I figured it was time to write my first blog post of 2011. I'm teaching Introduction to Web Science this semester (I mentioned this back in November). It's an upper-level elective for CS majors that approaches Web Science from a computing perspective. There have been a few other undergraduate Web Science courses offered at other universities, but they are quite rare.

There is no single book on the topic of Web Science, so I'm using a combination of texts (see the class web page) that focus on web search engines, networks, and collective intelligence.

If you'd like to follow along with the class, I'm placing our PowerPoint presentations online on the class web page.

Thursday, December 23, 2010

Merry Christmas from the McCown's

Merry Christmas!

Photo by Stacy Schoen

"For unto you is born this day
in the city of David a Savior,
who is Christ the Lord." - Luke 2:11

Friday, December 17, 2010

Tron's legacy

There's probably no one in Searcy, Arkansas, who is more excited to see TRON: Legacy tonight than myself. If you question my passion for the film, take a look at what is hanging on the most prominent wall in my office:

I saw the first Tron movie when I was about 9 years old. It presented a fictional world inside of computers where programs fought for survival by hurling light discs at each other and racing motorcycle-like vehicles that left a deadly trail of light behind them.

To say that it left an impression on me is an understatement. If you were to ask any NASA scientist why they entered their field, many would point to Star Trek. If you were to ask many computer scientists of a certain age why they got into computing, many would point to Tron (and perhaps War Games).

I'm going into the movie with middle-of-the-road expectations. It's got to be tough making a film that pleases the original Tron fanatics and a younger mainstream audience at the same time. I'll report back later after seeing the film.

In the mean time, check out this Tron music video which features an "alternative ending" to Tron.

End of line.

Update

I really enjoyed the film... 3 out of 4 stars. It had a semi-original plot, but the most interesting thing was just the inner world of Tron. The CG that produced a young Flynn was pretty good, but there is still obvious room for improvement. The 3D was pretty good and was well suited for this type of movie. I think my favorite scene was in the End of the Line Club where Daft Punk had a cameo appearance. Wish the main character, Sam, had his father's zany sense of humor... he's a bit too cool. I'm definitely going to see it again while it's on the big screen.

Tuesday, December 07, 2010

Android Workshop at SIGCSE 2011

I'm offering a workshop on Android application development at SIGCSE 2011 in March 2011: Audacious Android Application Programming. (The workshop name was shamelessly stolen from Michael Rogers' iPhone workshop from last year's SIGCSE.)

Here's a brief description:

As smartphones and mobile devices become ubiquitous, many CS departments are adding mobile computing electives to their curriculum. Google’s Android OS is a freely available and popular smartphone platform with applications programmed in Java. Workshop participants will be introduced to mobile app development and the Android SDK. We will write some simple Android apps with Eclipse and run them on an emulator. For those interested in teaching an upper-level Android course, reusable programming labs and projects will be distributed, and we will discuss some teaching strategies. Participants should be capable of writing Java programs in Eclipse and should bring their own laptop preloaded with Eclipse and the Android SDK.

The workshop will be held Wednesday, March 9, from 7:00 - 10:00 pm at the Sheraton Dallas Hotel in Dallas, Texas. Cost is $65. More details will be made available soon on the workshop website.

Thursday, December 02, 2010

Memento wins Digital Preservation Award 2010

Congratulations to Herbert Van De Sompel and Michael Nelson for being awarded the Digital Preservation Coalition's Digital Preservation Award 2010 for the development of Memento.

"‘Memento offers an elegant and easily deployed method that reunites web archives with their home on the live web,’ explained Richard Ovenden, chair of the Digital Preservation Coalition. ‘It opens web archives to tens of millions of new users and signals a dramatic change in the way we use and perceive digital archives.’"

I've been working with Herbert and Michael on the development of the Memento Browser for Android. It's great to see these guys being recognized for their hard work.

Tuesday, November 30, 2010

Thoughts on cheating

I've been thinking a lot about cheating these past few weeks. It was triggered by a cheating incident that occurred in my CS1 course where a student had copied source code found on the web. Interestingly enough, this coincided with news that Oracle had amended its patent infringement lawsuit against Google to include a line-by-line comparison of code it claimed Google illegally copied.

About the same time, a massive cheating scandal was uncovered at the University of Central Florida involving 200 students who cheated on their midterm exam (approximately one-third of the class). And just a few weeks before I had read that cheating in CS accounted for 23% of all honor code violations at Stanford University where the students involved in the cheating make up only 6.5% of the student body. Oh, and did I mention the article in the Chronicle of Higher Education about the "Shadow Scholar" who has cashed-in by written innumerable papers on behalf of college students?

With so much cheating going on, one can't help but wonder if students today value honesty less than previous generations or is it just easier to cheat (and catch cheating) today? Is there something we could do as CS educators to reduce the amount of cheating going on in CS?

On Monday my chairman placed an article in my mailbox entitled Cheating in Computer Science by William Murray, a faculty member at the Naval Postgraduate School who is well-known in the computer security field. Murray thinks current CS teaching practices in which students must write original programs actually promote cheating by creating an artificial problem for which cheating is often the easiest strategy. Instead, Murray suggests we should employ practices that de-incentivise cheating, perhaps by promoting skills which are highly valued in the workplace like code re-use and teamwork.

I certainly agree that code re-use and teamwork can have positive benefits when learning programming. Pair programming is something I've been using for several years with positive results. I also promote limited code-reuse in my upper level courses when it's code that can augment my students' final projects, as long as the reuse is documented thoroughly and the re-user can adequately explain the code she's reused.

However, I don't see code re-use or peer programming as a panacea for reducing cheating. When I questioned my student who was caught copying code from the Web, it was clear that he didn't really understand what he had copied. He didn't understand it enough to even be able to fashion it into a solution resembling the program specification I had given. How did code re-use help him learn anything? I'm not saying he could not have potentially learned something, just that code re-use is not an immediate fix for cheating.

Teamwork is also not an immediate fix. I still remember vividly during my undergraduate years working on a team with someone who was understanding the class material significantly better than the rest of us. We relied on him heavily to get good grades on our projects, and many times we failed to fully understand our (his) solutions to the problems. And in pair programming, there are times when the weaker partner is just not going to "get" what is so obvious to the stronger partner, and it's easier to just turn in the finished assignment rather than struggle with the solution individually.

Murray goes on to suggest how teaching programming skills using existing programs will somehow remove the incentive to cheat:

"I no longer teach programming by teaching the features of the language and asking the students for original compositions in the language. Instead I give them programs that work and ask them to change their behavior. I give them programs that do not work and ask them to repair them. I give them programs and ask them to decompose them. I give them executables and ask them for source, un-commented source and ask for the comments, description, or specification. I let them learn the language the same way that they learned their first language. All tools, tactics and strategies are legitimate." (emphasis mine)

Let me get this straight: a student can use the strategy of copying someone else's solution, cite the person who did all the work developing the solution, and get credit for the work? Surely this is not what Murray is advocating. These are certainly worthwhile approaches that students could learn a lot from, but Murray does not make it clear how these approaches are less resistant to dishonest practices.

Murray later states that "Nice people do not put others in difficult ethical dilemmas," suggesting that I am somehow a mean guy for putting my students in a difficult situation when I ask them to write original code. I'm sure some of my students would agree when they start working on their assignments just hours before they are due. Perhaps every department on campus is guilty of the same thing since we all create "artificial" situations where students must come up with original solutions instead of borrowing others'.

My goal is not necessarily just to be nice, but to hold my students to a level of rigor where, if they take it seriously and put in the time and honest effort, they will be well prepared to enter the job market and have a basis of knowledge from which they can learn new skills. Many times this will require original work to problems that others may have already solved. However, having students solve these problems on their own or in pairs will put their brain through a mental workout that will prepare them to be a productive member of a development team in the future.

So what can CS instructors do to make cheating less appealing? Coming up with new assignments that are engaging, changing exam questions, and all those other time-consuming tasks are certainly beneficial. But I think a more successful approach is to simply make the case for academic integrity as a relationship between teacher and student, a relationship that can be harmed when deceit is allowed to enter the picture. Deception will potentially harm the student's self-image more than anything and cause serious regrets in the long-term. For those of us that seek to follow God, the relationship is three-way, and deception in a relationship with God is a non-starter.

I think we have to realize that many college students are still quite young and lack the maturity to take the high road. Our job as faculty is to help those who mess-up learn from their mistakes and exhort them to exercise integrity in the small things and the big things. This is something I'm still learning to apply to myself.

Friday, November 26, 2010

Firefox... you're killing me!

I'm trying to get caught up on my grading before the students return from the Thanksgiving break. Unfortunately, Firefox is driving me nuts, so pardon the short rant.

My web development class is mainly composed of freshmen and sophomores, many of whom have only been writing programs for a semester or two, and occasionally they will write CGI programs with infinite loops. Accessing these CGI programs causes an endless amount of data to be sent to the browser. Using Firefox to access these URLs often causes the entire browser to lock-up as shown below.

So I'm stuck using the Task Manager to kill Firefox and then restart the browser, just because one tab has gone haywire. This adds several minutes to my grading time each time I have to restart. Internet Explorer doesn't do much better; only Chrome lets me kill the offending tab without restarting the entire browser.

I suppose this wouldn't bother me so much if it wasn't such an obvious pitfall that a seasoned browser can't accommodate.

Listen up GUI students... this is a prime example of when threading is necessary so your UI thread can continue to respond to the user!

Saturday, November 20, 2010

Get flag images from Wikipedia

I needed a large number of national flag images at a certain resolution for a project my Web Development course was working on. By examining some flag images I saw on Wikipedia, I noticed that they were creating flag images on the fly.

For example, to create the United Kingdom's flag that is 100 pixels wide, you can access this URL:

http://upload.wikimedia.org/wikipedia/commons/thumb/4/45/Flag_of_the_United_Kingdom.svg/100px-the_United_Kingdom.png

which produces this flag:

So I developed a Perl script that would automate this process for me. I've included it below for anyone else who might need flag images. Note that I had to set the user agent string, or Wikipedia would not respond properly to the http request. If you use this script to download a lot of images, please be nice throttle your requests with the sleep() command.


#!/usr/bin/perl

# This script will attempt to download the national flag
# produced by Wikipedia using the $flag country name and 
# $image_size as the image width.  By Frank McCown.

use LWP::Simple;
use HTTP::Response;
use strict;

# Width of the image
my $image_size = 100;

# Country's name
my $flag = 'the United Kingdom';

my $filename = lc $flag;
$filename =~ s/\s/_/g;
$filename = $filename . "_" . $image_size . ".png";

my $url_filename = $flag;
$url_filename =~ s/\s/_/g;

my $img_url = "http://upload.wikimedia.org/wikipedia/commons/thumb/4/45/Flag_of_" .
$url_filename . ".svg/" . $image_size . "px-" . $url_filename . ".png";

print "Getting $img_url\n";

my $ua = LWP::UserAgent->new;
$ua->agent('Mozilla/5.0 Firefox 5.6');
$ua->from('your@email.com');

my $response = $ua->get($img_url);

if ($response->is_success) {
   print "Writing to $filename\n";

   open(IMG, ">$filename");
   binmode(IMG);
   print IMG $response->content;
   close IMG;
}
else {
   print "ERROR: Could not download.\n";
}

Thursday, November 04, 2010

New Web Science course offered in Spring 2011

I'll be offering an Introduction to Web Science course this Spring. Web Science is an emerging field of study which encompasses computer science, law, economics, and a number of other disciplines. This course is for upper-level CS majors and will therefore focus mainly on computing aspects of Web Science. Below is a description of the course. If you are a Harding CS major looking for a challenging and enlightening elective, I hope you'll consider taking it.

The Web has fundamentally changed how we learn, play, communicate, and work. Its influence has become so monumental that it has given birth to a new science: Web Science, or the science of decentralized information structures. Although Web Science is interdisciplinary by nature, this course will be focusing mainly on the computing aspects of the Web: how it works, how it is used, and how it can be analyzed. We will examine a number of topics including: web architecture, web characterization and analysis, web archiving, Web 2.0, social networks, collaborative intelligence, search engines, web mining, information diffusion on the web, cloud computing, and the Semantic Web.

Programming projects will use Python, HTML & JavaScript, some Google APIs, and the Facebook API.

Prerequisites: COMP 245 & 250

Friday, October 08, 2010

Facebook adds ability to download your Facebook data

A few years ago I thought it would be really helpful to create a tool that would allow anyone to archive their Facebook account, just in case something happened to it. Think about it... 20 years from now, wouldn't it be interesting to see what was going on in your day-to-day life? And what if Facebook were to start charging fees to access your account or, Lord forbid, to disappear?

Last year we finally released the ArchiveFacebook Firefox add-on which allows you to save to your hard drive your Facebook account, just as it appears in your web browser.

My hope was that this tool would have a limited life span. I wanted it to nudge Facebook into providing a method to download and even transport user data to other social networks. Finally, it looks like Facebook has caved-in.

Coming soon, you will have the option to download a zip file from Facebook that contains all your wall posts, photos, messages, etc. You can browse the contents of the zip file in your browser.

The video below shows how this will work.

I have not yet been given access to the feature, but I will report back later once I've had a chance to use it. I'm not sure if it will be possible to upload the archived data into another social network. My guess is that someone will need to write a program that converts the zip file into an open format that can then be transported.

Thank you to Carlton Northern, Hany SalahEldeen, and others who have put a lot of time into making the numerous and painful modifications to keeping ArchiveFacebook working as Facebook made website changes. It may finally be time for it to retire.

Update on 10-20-2010

I was able to download my entire Facebook account today. It only took a few minutes after I requested the archive that Facebook made it available to me in a 6MB zip file. As you can see below, it's a spartan set of pages with all your Wall posts, photos, messages, etc.:

I scrolled down the very long Wall page and found my very first Wall post dated September 28, 2006 at 9:03 pm from my friend Stacey: "Welcome to the ridiculous! How's Bean? How are you?" According to the Facebook Wikipedia article, this was two days after Facebook had opened to the general public. I guess that makes me an early adopter (for once). wink

One technical problem I ran across: Facebook has mangled the image src attribute (src="../photos%2FProfile%20Pictures%2F514544861521.jpg" should be src="../photos/Profile%20Pictures/514544861521.jpg"), so I couldn't see my Photos in Firefox. I had no problem seeing them in Chrome.

Saturday, September 18, 2010

Memento Browser for Android is available

I've just created a home for the Memento Browser for Android, a project I started working on this past summer. The free Android app allows you to view older versions of web pages by merely selecting a date. The browser uses the Memento protocol to find archived versions of the page and displays whatever page is closest to the requested date.

For example, the screenshot below shows the browser viewing cnn.com:

If you wanted to see what this page looked like on Sept 7, 2007, you could select that date, and in a few seconds be looking at this archived page from WebCite:

Note that the page displayed is actually one day later than the requested date. That's because the browser was not able to find an archived copy on the exact date requested. The browser is only displaying archived copies from Internet Archive, WebCite, and a few other archives. While they have a huge amount of the web archived, they certainly don't have everything archived.

You can download Memento Browser here. I am working on an iPhone version of the app with a colleague of mine, but I don't have an ETA for it yet.

If you don't have an Android device, you can still download the MementoFox add-on for the Firefox browser which does the same thing.

Finally, you can watch a demo of the browser in action here.

Friday, September 03, 2010

Loving my Droid X

I've had it less than a week, and I'm hooked. The Droid X's screen is large (4.3"), sharp, and bright. The keypad is easy to type, the touch interface is extremely accurate and responsive, and reaction times are quick. The video playback is fantastic. It's quick to connect to the Verizon 3G network or local wifi, and I've gone nearly 3 days without having to recharge. It's running Android 2.1, but Froyo (2.2) is supposedly coming out soon.

Below is a photo I took from chapel this morning using the Droid X's 8-megapixel camera. Not bad considering the lighting. I'm standing in the pit at the front of the auditorium with my fellow faculty members. (The singing, as you could imagine, is awesome with a packed auditorium.)

One small negative: When I first entered my Facebook account credentials, it sucked up all my "friends" and put them in my list of contacts. Now my 500+ contact list is full of people I haven't seen in years and certainly don't contact on a day-to-day basis. If I want to remove an individual from my list of contacts, I'm told I have to remove them from my list of friends on Facebook. Boo.

I haven't added many apps yet (the phone comes with approximately 30 apps pre-installed). But one I did add was the Bible app from YouVersion. It allows me to simply say "Genesis chapter five verse twenty", and boom, you're there. Won't this be fun to play with during Bible class on Sunday. wink

Hey Apple, when are you going to create an iTunes for Android? This is probably the only reason I will hang on to my iPod Touch for now. (Yes, I've heard of doubleTwist, and I'll give it a try soon.)

Any other apps I should install?

Tuesday, August 31, 2010

Some computing history

The fall semester is in full swing here at Harding, and I've decided to convert some of my notes on historical events in computing to slides. If you are interested, here are my slides on Internet and Web history and history of graphical user interfaces (GUIs). I'll admit the GUI slides are slanted toward Microsoft because we focus on Windows programming in my GUI course.

I'm still working on my general history of computing and will post an update later.

Tuesday, August 17, 2010

Why I left Wikipedia

An article in this week's Newsweek reports that Wikipedia has been floundering since the spring: "Thousands of volunteer editors, the loyal Wikipedians who actually write, fact-check, and update all those articles, logged off-- many for good." The WSJ first reported the fallout almost a year ago when it was discovered that 49,000 English editors left Wikipedia during the first three months of 2009 compared to a loss of 4,900 during the same period in 2008.

Update: As one of the comments below states, the WSJ article was hasty in their conclusions. It all hinges on what you call an "editor", and a more balanced definition suggests that editors are not leaving Wikipedia in droves.

As the Newsweek article points out, there are a number of reasons why Wikipedia may be stagnating. There are so many articles already present that there is little new ground to break. Some may be scared away or frustrated by overly aggressive editors. Or perhaps "most people simply don't want to work for free."

Some research at Georgia Tech shows that editing a Wikipedia article is very challenging for computing newbies; the "Editing this way will cause your IP address to be recorded publicly" message causes lots of confusion, and this certainly prevents many from joining the ranks of Wikipedia editors.

I have always been a Wikipedia fan. I first started making serious contributions in 2004 when I was beginning my PhD research and discovered that many of the new concepts I was being introduced to simply didn't exist in Wikipedia.

I wrote a number of articles from scratch like web archiving, web search query, adversarial information retrieval, and URL normalization and made a significant number of edits on other technical topics. I was motivated in part by being the first to write the articles and the fact that I would likely refer back to them as reference material as I continued my research.

However, I found that keeping vandalism at bay and fighting poor edits was quite time-consuming. Some articles that I valued quite highly like web crawler needed tons of work, and although the desire was there, I just didn't have the time... I was trying to complete my PhD, and maintaining Wikipedia articles was not paying the bills.

I had an ah-ha moment at a conference a few years ago when someone quoted from Wikipedia's article on digital preservation, and I could have sworn I had been the sole author of the quoted piece. Wikipedia was given credit as the source, not me. That didn't bother me all that much, but it did make me realize that contributing to Wikipedia is often not in the interests of academics who are often judged by the amount of citable material they produce. Someone citing what you wrote in Wikipedia doesn't "count" like someone citing what you wrote in a journal article.

Over the past year or so, I just have lacked the motivation necessary to put time into an anonymous forum. My time is expensive, and Wikipedia is not paying. It's hard enough just to find time to edit my blog!

I still think Wikipedia is extremely valuable, and I hope it never goes away. I regularly send my students there and encourage them to make a serious contribution.

Have you seen The Book of Eli? At the end of the movie, a group of people are attempting to restore some of the greatest literary works of mankind. They are quite happy to have nearly a complete set of Britannica encyclopedias. No mention is made about the remnants of Wikipedia. :-(

Thursday, August 05, 2010

Students needed to work the WAC

I just received word that my grant proposal with the NSF has been funded. The project is called the "Web Archive Cooperative" or WAC. It's a 3 year grant with Hector Garcia-Molina (Stanford University), Andreas Paepcke (Stanford University), Michael L. Nelson (Old Dominion University), and myself.

In short, the WAC is our attempt to provide services, tools, and data access to web scientists. We are researching methods to provide access to web data like query logs, tag annotations, blogs, profiles and Twitter messages that are often located in disparate archives. We are working on finding this data, building software tools for combining and analyzing the data, and methods to preserve the data for the long term.

What this means is that I will be looking for some highly talented/motivated CS students (currently enrolled at Harding) to work with me over the next 3 years during the summers. You will get to work closely with me and in conjunction with others at Stanford and ODU, and you will receive a stipend. If you think this is something you'd like to get involved with, please let me know.

Tuesday, July 13, 2010

CS library book analysis

For the past three years I've been the designated faculty member in charge of ordering computing books for our campus library. Although it can be tedious at times, I usually enjoy the job, especially since it gives me the chance to browse through the latest books on computing and order pretty much what I would like (and, of course, what I think the students would like smile

).

The other day I got to wondering though, how many of these books are our students actually checking out? Are paper-bound books still useful to them when you can find so much information on the Web?

So several months ago I asked our librarian to give me some usage data on our computing books. I was only able to analyze the data this week, and what I found was somewhat surprising.

First, to see the relative age of books in our library, I created a histogram of the 998 books based on publication date:

The earliest book (Computers and Society, edited by Nikolaieff) is from 1970. Almost half of the books (45%) were published between 1999 and 2003. Only three books published this year had made it into the library by the time this data was obtained.

The check-out data was from 2001 to present. Out of 998 books, 22% have never been checked out (at least since 2001). Eighteen percent have only been checked out once, and only 25% have been checked out more than five times.

Below is a histogram with log scale showing how many times our books have been checked out. The largest bar on the left is the 75% chunk of books that have been checked out 0-5 times. There are only two books that have been checked out 31-35 times and only one book that has been checked out more than 40 times.

In case you were wondering, here are the top 10 most frequently checked-out computing books, along with the book's publication date and number of times checked out. Many of these books are not surprises:

Introduction to Algorithms by Cormen, Leiserson, & Rivest (1990) - 41
C++ Primer Plus: Teach Yourself Object-Oriented Programming by Prata (1995) - 35
Applied cryptography: Protocols, algorithms, and source code in C by Schneier (1994) - 35
Design Patterns: Elements of Reusable Object-Oriented Software by Gamma et al. (1995) - 30
C++ How to Program: Introducing Object-Oriented Design with the UML by Deitel & Deitel (2001) - 26
Computer Virus Crisis by Fites, Johnston, & Kratz (1992) - 26
PASCAL: Programming and Problem Solving by Leestma & Nyhoff (1990) - 25
Mythical Man-Month: Essays on software engineering by Brooks (1995) - 25
C#, A Programmer's Introduction by Deitel et al. (2003) - 25
HTML and CGI Unleashed by December & Ginsburg (1995) - 25

So what about the books that no one checks out? Browsing through the list, I see what I assume would be very popular books like Pattern Hatching: Design Patterns Applied by Vlissides (1998), Object Oriented Perl by Conway (2000), User Interface Design for Programmers by Spolsky (2001), SQL in a Nutshell by Kline et al. (2004), and iPhone SDK 3 Programming by Ali (2009).

To get a better overall picture, I looked at the percentage of books by publication year that have been checked out (at least once since 2001) as shown below.

There is an even decline in check-out rates from 1995 on which suggests that the longer a book is around, the more likely it is to be checked out. That certainly makes sense, however the longer most computing books are around, the less useful they become.

For example, Designing with Web Standards by Zeldman (2007) has been checked out five times. This is arguably a relevant book, at least until HTML5 is released as a new web standard; then its value plummets. Browsing through the titles of our books, many of them fall into this category. Even among our most checked-out books, several of them are somewhat outdated (3?, 6, 7, 9, 10). This is the greatest problem I face when purchasing CS books for the library... I try to purchase books that I think will be immediately useful to our students and at the same time have a shelf-life greater than one year. It's not an easy balance to maintain.

Returning to my original question, are library books still used by our CS students? The data seems to suggest that a fair amount of books are eventually checked out at least once. However, if we estimate that a book costs around $50, and 218 books have never been checked out, that means $10,900 worth of books are sitting unused on the library shelves. Ouch.

Of course, a more thorough analysis would involve surveying our students about their library usage. Why are they checking out a particular book? Are they actually reading what they check out? Is the information they are seeking in the book they've checked out? Are they finding what they need in the library? Are they finding equivalent information on the Web and therefore don't need the book? This would certainly make for an interesting study.

So, do you still find computing books useful? Should we be purchasing fewer books? What would be a better use for the money?