Thursday, December 31, 2009

ACU, iPhones, and Wired

My sister, an ACU alumnus, emailed me a few weeks ago an article on Wired.com about her alma mater: How the iPhone Could Reboot Education. This is the second time Wired has written about ACU's iPhone/iPod initiative; ACU was the first university to give all incoming freshmen an iPhone or iPod Touch in an experiment to see how useful the devices could be in an academic setting.

ACU is one of the schools that Harding competes with for students, so you could imagine that some of us here at Harding are a little skeptical of the PR ACU's program has generated. Some think it's just a gimmick to attract new students.

One of Harding's admission officers told me recently that we regularly have prospective students ask, "Since ACU is offering me a free iPhone, what is Harding going to give me?" Of course ACU is not giving anything away to students for "free"; the article states that the dorm computer labs were shut down, and other expenses are likely paid for by increases in tuition or technology fees. But many 18 year-olds are unlikely to see the correlation between tuition costs and freebies.

I do, however, think having all your students fitted with the same mobile device presents some interesting opportunities. Some ACU professors quizzed students to see if they understood the lesson, and students could respond using their iPhones anonymously. (Some Harding professors are doing the same thing with specialized handsets.) Another ACU professor has students look up information in class, and they discuss how accurate or trustworthy the information is. I sometimes do something similar with my students who often have a computer that is sitting in front of them.

So far, ACU reports that their Mobile Learning initiative is paying off. A quote from the ACU website:
The majority of students in specific courses where mobile devices have been routinely used rate themselves as having improved their academic performance (grades and organization) and engagement (active learning, contact with professors and teaching assistants, involvement and attention).
This quote is interesting in that the students themselves have said they do better and are more engaged in classes that use iPhones. Whether they are actually learning more is anyone's guess, but any time you can make a student think they are doing better in class is usually a good thing. (Of course, one wonders if students report that they like using their iPhones in class because a negative assessment might be mean their iPhones are yanked away in the future. wink)

As an instructor who has experience teaching students in a computer lab where students are allowed to use their computer in class for taking notes and class activities, the biggest obstacle I've faced is getting students to pay attention to what's happening in the classroom and off of Facebook, Google, games, etc. This is surely the temptation also faced by students when using iPhones in the classroom; the potential for distraction is very real. (See student comment #4.)

You're not likely to see Harding give iPhones to every incoming freshman anytime soon, but some point in the near future I think it's likely that all freshmen will have some smart mobile device with them. Universities will need to be creative in using this personal accessory to their advantage.

P.S. Congratulations to my friend and former colleague Autumn Sutherland who was named a 2009-2010 Mobile Learning Fellow at ACU.

Thursday, December 24, 2009

Merry Christmas from the McCown's

The McCown's wish you a very Merry Christmas. May God bless your holidays.


Photo by Stacy Schoen

Friday, December 18, 2009

Recovering blog.stackoverflow.com and www.codinghorror.com

Several days ago, it was brought to my attention that two notable blogs, blog.stackoverflow.com and www.codinghorror.com were completely lost due to a hard drive failure. Jeff Atwood, the owner of both blogs, tells all about his experience trying to recover the text and images from Internet caches and other locations. (I imagine Jeff's reaction was a little like the picture above when he discovered his blogs were gone.) Lucky for him, a computer science student at the University of Bologna had an almost complete mirror of the Coding Horror website.

Jeff also acknowledges that he's to blame for the loss, and I'm sure Jeff will be re-doubling efforts to backup his sites in the future. However, these types of losses will continue for the foreseeable future... backing up your stuff is often the lowest priority for all of us because many of us believe it won't happen to us. Or we consciously know it will happen to us, but we have so many other things we need to get done today that setting up a backup routine is put off until tomorrow. I'm in the same camp... I didn't create a backup of my class work until over a month into the semester!

Even when you backup your stuff, you sometimes find that your backups weren't working or are inaccessible when you need them. That's what happened to Jeff.

Jeff tried to use Warrick initially, but it was giving him all kinds of problems. Yahoo and Microsoft have done some things that make Warrick break, and I'll be spending next week making fixes. I'll blog more about the fixes next week. For now, it's back to grading final exams.

Monday, December 07, 2009

CS Education Week

At Harding, this week is Dead Week, the week before final exams. But according to the U.S. House of Representatives, this week is also Computer Science Education Week (CSEdWeek). In an effort to raise awareness of how important CS is to society, the ACM has created a website with all kinds of information about CS.

Here are some facts about computer science that you may not have been aware of:
  • By 2016, current government projections show that more than 800,000 high-end computing jobs will be created in the economy making it one of the fastest growing occupational fields.

  • Five of the top ten fastest growing jobs will be in computing-related fields (i.e., computer software engineer jobs expected to grow 45% over the next five to seven years).

  • Computer science and computer engineering bachelor degrees are in high demand and command two of the top three average salary offers from employers among all majors.

Paper on teaching Web IR

In March I'll be presenting a paper entitled Teaching Web Information Retrieval to Undergraduates at SIGCSE 2010 in Milwaukee, WI. In this paper, I discuss how I built a curriculum for my search engine course and the types of projects I assigned. The first time I taught the course, students built a search engine from scratch. The second time, students modified the open source search engine Nutch. Teaching with Nutch turned out to be quite a challenge. If you want to know more, read my paper.

Tuesday, November 10, 2009

Super Scooter

One of the projects I had my game programming class complete was a scaled-down version of the classic Super Mario Bros. game for Nintendo. I had all 15 students work on the same project (not the best idea I've ever had) over the period of 4 weeks. While I encouraged them to use graphics created by others, some of the artists in the group decided to make their own. The final product was something everyone was quite proud of.

You can download and run Super Scooter if you are running Windows XP or better.

Wednesday, November 04, 2009

PyArkansas 2009

Dr. Steve Baber and I will be taking a group of students to PyArkansas 2009 next Saturday, Nov 14. PyAR is being held at the University of Central Arkansas in Conway. The one day conference has a number of classes on Python, Django, and Blender.

The CS department has rented two vans to ship everyone down there and back. Let me know if you are interested in attending... we only have a few more seats available. Or if you can provide your own transportation and wouldn't mind taking one or two others, please let me know.

Wednesday, October 28, 2009

Facebook: Memorialize the deceased

In a blog post on Monday, Facebook brought attention to a previously existing feature intended to "memorialize" Facebook users who have deceased. You can submit a "Deceased" form (pictured below) that notifies Facebook about "dead" accounts. Once Facebook determines that the account owner is indeed deceased and flips the switch, no one can log into the account anymore, and the person's face no longer appears in friend recommendations or Suggestions. However, you can still post message's on the departed's Wall.



The problem with memorializing the account is that the user's family or friends, if they had the user's password, can no longer access the user's Messages or other personal data. (This could be good or bad thing.) For anyone in this predicament, I highly recommend you archive the deceased's account using ArchiveFacebook before their account is memorialized. Then you will always have a snapshot of the person's Facebook account on your own hard drive.

I'm giving a talk about the ArchiveFacebook Firefox add-on tomorrow afternoon in a Harding University Computing Seminar. If you're in Arkansas Thurs, feel free to stop by at 4:00 pm in Science 113.

P.S. This issue of "what happens to my data now that I'm gone?" is going to continue being more relevant as more of our data is stored in the clouds.

Saturday, October 24, 2009

Article in CACM

Check out my article Why web sites are lost (and how they're sometimes found) in the November edition of the Communications of the ACM. My co-authors were Cathy Marshall (Microsoft Research) and Michael Nelson (Old Dominion University).

If you don't have an ACM Digital Library subscription, you can access the pre-print here.

Abstract:
We have surveyed individuals who have lost their websites (through hard drive crashes, ISP bankruptcies, etc.) or have tried to recover websites that once belonged to others. We investigate why these websites were lost and how individuals reconstructed them, including how they recovered data from search engine caches and web archives. The findings suggest that digital data loss is likely to continue since backups are frequently neglected or performed incorrectly; furthermore, respondents perceive that loss is uncommon and that data safety is the responsibility of others. Finally we suggest that this benign neglect be countered by lazy preservation techniques.

Wednesday, October 21, 2009

My Archos 5 Internet Tablet

I just received my new Archos 5 Internet Tablet in the mail. It uses a touch interface and is running Google Android. Yes, my iPod Touch is jealous. But so far I'm having some issues.

I've plugged it in and tried to connect to our secured wireless network. Hmm... couldn't find it. Oh well, the guest network connected just fine.

So now I'm trying to ensure the firmware is updated. I follow the directions, click on "Firmware update", and get the following error message:
USB cable attached. Media Center features are not available during USB connection.

Two questions: 1) Why can't I connect while my USB cable is attached? That's rediculous to force me to unplug it just to update my firmware. 2) What does updating the firmware have anything to do with the Media Center?

More to come.

Today (10/22/09) I was able to get the firmware updated. I had to first unplug the USB cable, and then later I was told to plug it back in... weird. Then the update went fine.

But after completing the update, I can't find the firmware update option anymore. It used to be under Menu > Settings > About device. Now the "About device" option has disappeared from Settings!

I tried out the web browser... you have to really press down hard to get the scroll up and down to work. And the two-finger zoom feature in the iPod Touch is apparently absent; you have to click on + or - buttons instead. So far I'm not real impressed.

Monday, September 28, 2009

Mobile Computing offered in Spring 2010

This spring I will be co-teaching with Gabriel Foust a new course called Mobile Computing (COMP 475) for 3 credit hours. The course will cover programming the iPhone and Google Android operating systems and development of mobile web applications. The course will meet from 3 to 4:15 pm on Mon and Wed. The prerequisite for this course is Data Structures (COMP 245).

Foust and I are excited to be offering this course for the first time. I hope it will become a course we offer on a regular basis in the future.

Thursday, September 24, 2009

Google: We're sorry...

I tried to access my school email account this morning, and I got this error screen:



It says:
"We're sorry... but your computer or network may be sending automated queries. To protect our users, we can't process your request right now."
Google is sorry again that their automated query detector has been tripped. At least they aren't accusing me of having a virus this time.

Anyone else seeing this? Apparently yes.

Monday, September 21, 2009

Archive your Facebook account with ArchiveFacebook

It's finally here... a tool to archive your Facebook account. I've talked about the development of this tool in previous posts. It's a Firefox add-on called ArchiveFacebook which allows you to create a complete off-line, browseable archive of your Facebook account. ArchiveFacebook will archive your Wall, photos, messages... your entire life which has been recorded in Facebook.

You may not believe this, but Facebook will not always be around. Your Facebook account will not always be accessible. It's up to you to archive your data before it lands in the big bit-bucket in the sky.

Thanks to Carlton Northern who worked on this project for the past 6 months and to Michael Nelson who helped direct the development work.

Thursday, September 17, 2009

Facebook - content is currently unavailable

Someone tagged me in a photo on Facebook yesterday, but when I click on the link I received in my email, I get the very "helpful" error message:
This content is currently unavailable

The page you requested cannot be displayed right now. It may be temporarily unavailable, the link you clicked on may have expired, or you may not have permission to view this page.




If the page is temporarily unavailable, I should try again and again and again to access it. But if the link has expired, I am wasting my time trying to access it again and again. And if I don't have permissions, how do I get it? There's no helpful tip given as to how to get permission to view the image.

Surely Facebook could tell me which of these is the true problem and suggest what I do next.

I would qualify this as a variation of GUI blooper #28.

Tuesday, September 08, 2009

Summer reading

Here's a list of the books I finished reading this summer. I'm looking for some titles to read next, so feel free to leave me a recommendation.

The Seven Faith Tribes by George Barna. This book is subtitled, "Who They Are, What They Believe, and Why They Matter," but I think a more accurate description of the book would be "Who They Are and How They Better Get Along Before All Hope Is Lost". Barna uses his massive amounts of survey data to identify seven faith tribes of America: Casual Christians (making up 2/3 of all Americans), Captive Christians (16%), Jews (2%), Mormons (1.5%), Pantheists (1.5%), Muslims (.5%), and Skeptics (11%). Barna outlines 20 shared values between the tribes (e.g., represent the truth well, develop inner peace and purity, seek peace with others, etc.) and calls all tribes to band together and push these values into the media, government, and families, to advance our common national interests. While I admire Barna's call to us all to unite and help our country, the lack of implementation specifics left me somewhat skeptical.

Outliers by Malcolm Gladwell. This book is an informative and entertaining weaving together of various studies and anecdotes that shed light on the (often overlooked) significant factors that lead to success. There's an excellent chapter (Ch 2) that talks about Bill Joy and other computing luminaries which is worth reading, even if you don't want to read the whole book.

Blink by Malcolm Gladwell. I enjoyed Outliers so much that I was inspired to read Blink. It focuses on the abilities and distractions caused by our unconscious minds. Gladwell focuses on "thin-slicing", the ability to determine what is important from just a very small amount of information, and how it can be influenced by prejudice and stereotypes. I enjoyed Blink, but not as much as I did Outliers. Tipping Point is next on my list.

The Bravehearted Gospel by Eric Ludy. Christianity has gone soft over the years, and Ludy calls for us to reclaim the Truth of the Bible. I really enjoyed the rallying cry, but I'm still digesting this one.

Surprised by Hope by N. T. Wright. OK, I've been reading this for more than a year and still have about 50 pages to go. It's tough reading but, Wright makes a good case that a Christian's hope should be based on the future resurrection, not "going to heaven." If you enjoy thinking deeply about eschatology, this book is for you.

Tuesday, September 01, 2009

Did You Know? 2009

This video by Jeff Brenman, karl Finch and Scott McLeod illustrates just how much today's world is changing, especially in regards to technology. Some facts from the video that should really hit home with my computer science students:
"It is estimated that 4 exabytes (4.0x10^19) of unique information will be generated this year.That is more than the previous 5,000 years. The amount of new technical information is doubling every 2 years. For students starting a 4 year technical degree this means that half of what they learn their first year of study will be outdated by their third year of study."
Interesting facts, but I can't say I totally agree with the conclusion (in bold). New information doesn't necessarily replace old information. Technologies do change, but the underlying ideas change at a much slower pace.

Friday, August 28, 2009

Vote for my SXSW panel proposal!

Kelly Elander (professor in the Communications dept at Harding) and I are wanting to offer a panel at SXSW 2010 called How Educators Teach Web Skills: You're Doing What? There are over 2000 proposals, and only 300 will be chosen.

We need your vote!

Please vote for our panel by clicking the thumbs-up icon. Voting will close on Friday, September 4, at midnight.

Monday, August 24, 2009

How to be a successful student in CS

Today is the first day of the fall semester here at Harding. It's always exciting to see all the students back from the summer, and there's so much hope for the semester that you can almost feel it in the air.

I was recently sent a questionnaire from The Wall Street Journal asking me what it took for a student to be successful in computer science. I thought today would be an excellent day to share my responses.


Generally speaking, what actions can students take to prepare themselves to succeed in your class or similar classes?

Give plenty of time outside of class to do homework and review that day's information. Use your time wisely in class by taking good notes and asking questions when something doesn't make sense. Start on assignments as early as possible to give yourself plenty of time in case you run into difficulties later; this will allow you to seek help before it is too late and will enable you to get your assignments turned in on time.


Based on your knowledge of your college/university overall, what should incoming students do to generally be successful in school? (Success includes academic success, social success, career success, or however you wish to characterize it.)

Be prepared to spend lots of time wresting with the difficult material. Do not overload yourself with a full-time job while you are a full-time student unless absolutely necessary. Get to know the people who sit next to you in class... they can be of great help when you miss class or need some extra help. Do your best to maintain a good relationship with the professor... visit him/her outside of class and show interest in the subject matter; professors enjoy students that show interest in the class and are more likely to write you a great letter of recommendation when you are seeking employment.


If you could tell parents one thing to help their children succeed in college, what would it be?

Let them fight their own battles, but be there for them if they get in over their heads. Your child is becoming a man/woman and needs to know how to be independent. Hopefully you've already started your child down that road, and college is another step along the road.


What qualities or activities differentiate your best students from others?


The best students sit up front and pay attention. They start on their assignments early and refuse to give anything than their best. They take responsibility for their own learning and don't rely purely on the professor to spoon-feed them all the information they need to be successful in class.


If a student knew nothing about your discipline, how would you describe it to him/her?

It is the study of how to make computers do extraordinary things. It encompasses graphics, artificial intelligence, web development, video games, mobile computing, algorithmic thinking, and many other aspects that touch the lives of every living being. It is the future.

How would you "sell" your discipline to a student trying to decide what to major in? (For instance, what do students like best about this discipline? What might be most surprising?)

If the student seemed right for a computer science major (showed mathematical prowess and the ability to think logically), I would tell them that CS pervades every other science and field and is in desperate need of talented young people. It is hard to imagine a field more significant to the future of the world than CS; medicine, economics, education, physics, chemistry, biology, entertainment, and farming all are significantly impacted by advances in CS. The job market for software developers (many CS graduates take this route) has rarely been better, and software engineers have higher overall job satisfaction than most any other profession.

Saturday, August 01, 2009

Misunderstanding Markup comic strip

If you're confused about the difference between XHTML 1.0, 1.1, 2 and HTML 5, you should read this entertaining comic strip by Jeremy Keith. This will be required reading for my Internet Development classes.

Thursday, July 16, 2009

Report on InDP in D-Lib Magazine

My report on the Innovation in Digital Preservation workshop (InDP 2009) has just been published in D-Lib Magazine. Overall I think the workshop was a success, although we really missed not having Andreas Rauber there. I'm not sure if I'll be the one to lead the 2nd InDP, but I hope there will be one in the future.

Thanks to Spencer Lee (Virginia Tech) who filmed the workshop and created a virtual presence for InDP in Second Life, where the memories of InDP will last forever (or five years, whichever comes first). Below are some screenshots from Second Life that Spencer sent me.




Wednesday, July 15, 2009

What are you doing this summer?

I've been asked a number of times what I'm doing this summer since I'm faculty and have no classes to teach. Last summer I was doing research in Los Alamos, but this summer has been very different. A lot of my time is spent at home, getting adjusted to life with a newborn and toddler and helping Becky get some extra sleep in the mornings.

Professionally, I've presented a few papers at a conference, co-chaired a workshop, and am working on a paper about my search engine courses.

But most of my working days are spent producing a series of instructional videos for Introduction to Programming with C++ (2nd ed) by Y. Daniel Liang. You can sample a video I made just this week on file I/O. I'm not sure if the videos will be available to book owners only or made freely available on the book's website. I'll hopefully wrap these up by end the end of July and then start on videos for Liang's Introduction to Java (8th ed).

I'll be preparing soon for my Games Programming course. This course has only been offered once at Harding before, and it was taught by Dana Steil who is currently away working on his PhD. I'm excited about teaching this courses, but it's also a lot of work to teach a class for the first time, and it's a little disconcerting that I will likely not get to teach it again since Dana will likely want the course back when he returns.

So that's my summer. What are you doing?

Update:

I'm no longer doing Liang's Java book. I didn't finish the C++ videos until Aug... where does the time go?

Friday, July 10, 2009

Power.com: Give me your Facebook data!

TechCrunch is reporting that Power.com is suing Facebook over their lack of data portability. Power.com is a service which allows you to aggregate your various social networks into a single location, but Facebook's data, as indicated in their Terms of Service, is still off-limits to them. Disregarding the restrictions, Power.com tried using the Facebook API and screen-scraping to get their data until being sued earlier in the year by Facebook.

This is exactly what I've been working on (with a graduate student at ODU) for the last few months. But I'm doing this to preserve the data, not necessarily to aggregate it along with other social networks. However, there's no reason why a preserved Facebook account could not be uploaded into another service.

My guess is my approach won't be looked at kindly by Facebook, but they'll probably leave me alone since I'm only providing a service for individuals to archive their account, and I'm not aggregating the data to my own server.

Tuesday, July 07, 2009

Email Preservation Parser

Here's an excerpt from an email announcement I received from Riccardo Ferrante (Smithsonian Institution Archives) about a tool for preserving email. It was one of the tools developed by the Collaborative Electronic Records Project (CERP).
The Email Parser migrates an email account and its messages into a single XML file using the Email Account XML Schema developed in collaboration with the North Carolina State Archives and the EMCAP project.

The CERP Email Parser migrates an email account in MBOX format into XML, using the schema to preserve the full body of messages, together with their attachments, and keeps intact the account’s internal organization (e.g., an Inbox containing subfolders labeled Policies, Special Events, and Projects). The CERP team successfully preserved email accounts from a variety of applications including Microsoft Outlook, AppleMail, LotusNotes, and Netscape. All email messages retain their full header content, in contrast to some tools produced in earlier research efforts.

Monday, June 22, 2009

Elrod on Twitter and Iran

I just found out that Harding Professor Mark Elrod was interviewed just a few days ago by Jessica Dean on KATV-7 about Iranians using Twitter (see the video below). Just a few months ago David Adams was interviewed by THV-11 about the history of the flu. Looks like the vast expertise of our history dept is starting to get tapped by the local press. wink

Thursday, June 18, 2009

I'm at JCDL 2009 in Austin

JCDL 2009 is about to wrap up. It's been a good conference with some interesting presentations, and I've enjoyed catching up with old friends. The conference is being held on the UT campus... short on grass but big on buildings. I think the UT football stadium is more impressive than many NFL stadiums I've visited. I guess that's what happens when you win a few national championships.

I especially enjoyed the two panels. The first panel, What should we preserve from a born-digital world?, basically came to the conclusion that everything should be saved. I concur... disk space is cheap, and it's hard to know what will truly be valuable years from now. I also enjoyed hearing about Megan Winget's work in preserving games.

The second panel, Google as Library Redux, discussed the unfortunate conclusion of Google's lawsuit with publishers and authors, agreeing to a settlement instead of pressing the court to settle the bigger questions in regards to copyright, orphaned works, etc. One of the more provocative statements came from Michael Lesk who said JCDL was irrelevant because there were no attendees from Google, Amazon, Microsoft, etc. We are being ignored. Ouch. But he may be right. I see plenty of guys from Google et al. at the WWW and SIGIR conferences.

I gave a couple of talks this year (see my slides below). There was a lot of interest particularly in my Facebook paper, What Happens When Facebook is Gone?, where I discuss the ramifications of having all our data locked-up in the walled garden of Facebook. Carlton Northern, a graduate student at ODU, is currently working on a Facebook archiving add-on for Firefox, and hopefully it will be available soon.




My second paper, A Framework for Describing Web Repositories, is work pulled from my dissertation. In it I discuss how we can view web repositories (everything from a search engine cache to a web archive) in a more abstract manor. I propose some new terminology and an API that web repositories could/should implement to be helpful to clients accessing the repository's contents.




Tomorrow I'll be co-hosting a the InDP 2009 workshop. It's an all-day event, and I'll be flying home late tomorrow night. It'll be good to be back with the family.

Tuesday, June 09, 2009

I think I'm going to be sick...

No need to blatantly lie to your professor anymore... a new "service" helps students deceive their professors by giving them a corrupted file to turn-in, possibly buying them a few more hours or days to work on their assignment. When the professor goes to access the assignment and notices the submitted file was corrupted, he'll just ask the student to re-submit her file. The student is happy to oblige, and this time she submits the completed assignment to the unsuspecting professor.

I'm not sure if I'm more sickened by the thought of someone developing such a service or the thought that they are likely to be quite successful.



Update on 6/22/09

I thought about this problem a little more, and there's really a simple solution for the technically-inclined.
  1. Have the student produce an MD5 hash of the file before it is emailed or submitted to the professor, and have the student email the hash to the professor.

  2. If the received file is corrupted, the professor should produce an MD5 hash of the file. If it matches the hash from the student, he received the correct file, so the student's original file was corrupted. Let him bring in his laptop and show you how his file could be opened successfully on his machine since it won't open on yours. Probably he won't be able to, so give him a zero.

  3. If the submitted file's hash does not match the submitted hash, the file got garbled in transmission or the student did not email the correct hash. The student should just resubmit the file... eventually the received file's hash should match the original hash. If the student is not able to produce a file that matches the original hash, he's either incompetent because he did not properly create the original hash, or he modified the original file (which he shouldn't do if it's finished), or he's trying to cheat. Either way, give him a zero. (Wow, I'm mean!)

Tweet this: Manor one of 20 developers to follow

Elijah Manor, one of our Harding CS graduates, was just listed in 20 Developers to Follow on Twitter. Very cool.

Thursday, June 04, 2009

Google Squared & Wolfram Alpha

Structuring the world's unstructured data... this is the future of search. These last few weeks have seen some impressive attempts to do just this by Wolfram Alpha and Google Squared.

Wolfram Alpha, which launched on May 18, is pulling results from their highly curated, massive database which is likely built atop massive (possibly unstructured) data sets. Google Squared, launched on May 12, is pulling results straight from the unstructured Web. These two approaches are complementary, but they are also competitive.

I'll provide just a couple of examples.

Below is Wolfram Alpha's answer to the query passing touchdowns Dallas Cowboys, Denver Broncos. Wolfram Alpha is providing a graph of data they probably acquired from a trusted source (they give some source information, but nothing specific).



The same query against Google Squared won't produce a very useful result. But a query for NFL teams results in a table of results pulled from a variety of websites. The data making up the first row is from www.detroitlions.com, a travel website, Wikipedia. Why they are not just taking information from a single trusted site like NFL.com is anyone's guess... it likely has to do with making their search algorithms more generic.


Give these search engines a try and let me know what you think.

Sunday, May 31, 2009

Thousands of websites about to bite the dust...

Yahoo announced a month ago that it was pulling the plug on GeoCities, one of the Web's first free web-hosting services. There doesn't appear to be any plan to migrate the thousands (millions?) of websites this will affect to other services. If you don't act by the end of the summer, you're Geocities website will disappear.

That is unless the Internet Archive has grabbed a copy, but they aren't likely to have many pages from each Geocities website archived. I've been conversing with someone who lost a backup of her Geocities website years ago, and IA only had a handful of pages archived. This is likely going to be a recurring story in the years ahead.

My first website was on Geocities. In fact, that's how I first learned how to use HTML in 1997. I'm so embarrased by that first website that I'm keeping the address a secret. I fear the day the Internet Archive's Wayback Machine has full-text search, because someone's going to pull it up and post it on Facebook or something. That's one stream of bites I'm not afraid of losing.

Tuesday, May 26, 2009

Flight simulator site AVSIM destroyed by hackers

This morning I got a call from an individual who alerted me to the AVSIM tragedy. Apparently this popular flight simulator website with 13 years of articles, forum posts, etc. was not being backed-up properly, and a hacker took them out.




Tom Allensworth, the website's founder, stated:
"The method of the hack makes recovery difficult, if not impossible, to recover from. AVSIM is totally offline at this time and we expect to be so for some time to come. We are not able to predict when we will be back online, if we can come back at all."

It's possible Warrick could recover a significant amount of lost content, but I have not heard from anyone at AVSIM about it. Perhaps they are using it now as we speak.

Thursday, May 21, 2009

Braden William McCown has arrived!

Braden made his appearance at 2:38 pm this afternoon. He was 8 pounds, 4 ounces, and 22" long. He had a basketball in one hand and a tennis racket in the other which made the birth very painful wink, but we were very thankful he decided to come during the day instead of the middle of the night. Ethan was excited to meet Braden and even gave him a couple of kisses. Let's hope they remain good buddies!




Becky and I are very appreciative of all the calls, emails, and Facebook messages we've received. We are excited to introduce you all to the little guy.

God is good!

Wednesday, May 20, 2009

Java Sitemap Parser

I've just released the Java Sitemap Parser on SourceForge.net. The software is capable of reading Sitemaps in XML, Atom, RSS, and text format. As far as I can tell, this is the first open source Sitemap-parsing software available on the Web.

The Java Sitemap Parser was the final project for my Search Engine Development class. I talked about the project a few weeks ago and how prevalent Sitemaps are becoming. Originally we wanted to add Sitemap support to Nutch, but developing just the parser proved to be quite a task. By releasing it as an independent project, I'm hoping Nutch, Heritrix, and other open-source crawlers will integrate it into their systems.

Tuesday, May 12, 2009

I love my teacher evaluations

Every semester I usually get evaluated by my students (I just got my results back today). They answer questions like, "How effective has the instructor been in this course?" and "Rate the instructor's command of the subject matter." All responses are anonymous.

This is common practice at most universities, and it often creates terror in the hearts of many faculty. I've known colleagues who have never read their teacher evaluations for fear of what their students might say, and I've known others who can still recite word-for-word some of the cruelest comments made by students over 20 years ago.

I've received my share of poor evaluations, especially when I was a new teacher. It took me a few semesters to get the hang of teaching, and now my evaluations are generally good (not great, but typically a little higher than the average Harding professor).

What I've found over my 10+ years of teaching is that some students give really helpful comments that can help you improve your class next time around. "I wish we could have spent some time discussing how to apply some of the new principles we learned to our project." Some students are going to really like you and let you know it. "The professor had good teaching skills, was responsive and helpful to questions, and was very knowledgeable."

Other students... well... you have to take their comments with a grain of salt. You have to realize that some students are not going to like it if you require them to work hard (many students think they should receive a B just for attending every lecture). Some students are just poor at evaluating others' performance. Others have yet to realize that they are responsible for their own learning. Occasionally a student is going to be having a bad day, and you're anonymous evaluation is going to be the perfect target.

What really helped me was learning how to properly interpret students' remarks and judge whether the criticism has merit or not. I think learning this skill is important to any new faculty member, otherwise you'll be crying yourself to sleep after reading your evaluations.

Here are a few comments I've received over the past couple of years along with my interpretation of said comment and response. smile

  1. Student 1: The projects expected a lot from the students.
    Student 2: Smaller, less-brutal projects would not be a bad idea.

    Interpretation: I thought this class was supposed to be easy!

    Response: If computer science was easy, we wouldn't be getting paid like we are, and everyone would be doing it. The projects are tough because I'm preparing you for the far more difficult and complex projects you'll encounter when you enter the workforce. You'll thank me later.


  2. Have different projects that we can choose from instead of making everyone do the same project.

    Interpretation: I like my classes like my Burger King - my way!

    Response: I always entertain ideas for new projects, but it's unreasonable for any teacher to spend hours coming up with a menu of project choices to cater to every whim. In a software development job, you are unlikely to have a boss ask you which project you'd like to work on... you'll work on what needs to be completed.


  3. Instead of making us use the programming language you want us to use, let us use one we are already familiar with.

    Interpretation: Learning something new is highly overrated.

    Response: If you graduate from Harding being comfortable with only one or two languages, you should get your money back, because we haven't adequately prepared you. You'll need to learn new languages all the time as a working professional.


  4. Disable the Internet on the classroom computers so that we can only access web sites are necessary for class. Remove Solitaire, Minesweeper, Hearts, etc. from the computers.

    Interpretation: Save me from myself!

    Response: I appreciate this student's honesty. I asked our lab administrator today to remove all games. There's going to be some very disappointed students next Fall. wink


  5. Student 1: The fast pace of the class made it difficult to fully learn concepts.
    Student 2: It felt like sometimes you paced the classes very slowly.

    Interpretation: The pace of the class is perfect!

    Response: If roughly the same number of students complain that the pace of the course is too fast and too slow, I know I'm covering it at just the right pace.


  6. You try to cover too much material for a semester. Your previous classes didn't have to learn as much as we've had to. :-(

    Interpretation: Curse you ever-evolving technology!

    Response: One of the enigmas of higher education is that the consumers (the students) are often happier to receive less for what they are paying for (education). Can you imagine the same student being upset if McDonald's gave him a large order of fries for the price of a medium? Harding should fire me if I quit trying to keep my classes current and just teach the exact same stuff every semester.


  7. Don't give us really hard assignments, and don't expect us to have them done by the next class period... we do have other classes and lives!

    Interpretation: I'm serious about "me" time.

    Response: You should schedule 2-3 hours of outside-class time for each hour you are in class. (This is a universal rule that applies to all your major courses, not just mine.) So if I give a homework assignment on Mon and expect it due Wed, you should have already allocated 2-3 hours (at least) to getting the assignment finished. If your assignments are taking much longer than that to complete on a regular basis, that's a sign that you need to start getting some extra help and adjust your schedule accordingly. Remember that half of the class thinks we're going too slowly (see #5 above).


  8. Weaknesses of the instructor: Calvinism

    Interpretation: ???

    Response: "Isms in my opinion are not good. A person should not believe in an ism - he should believe in himself. I quote John Lennon: 'I don't believe in Beatles - I just believe in me.' A good point there. Of course, he was the Walrus. I could be the Walrus - I'd still have to bum rides off of people." - Ferris Beuller

Update:

Inspired by Jordan's comments, I have added a little to my original post.

Friday, May 08, 2009

Spring semester is over

I wrapped up all my grading today. We have a senior reception tonight and the graduation ceremony tomorrow.

Below is the grade distribution for my Intro to Programming, Internet Development, and Search Engine courses. The average was 79.0, and the median 85.4. If I had time I'd compare this to my past semesters, but I don't think much has changed.


I guess I'm a little guilty of grade creep... the average student is supposed to get a C, right? I think my students would argue with that conclusion. A recent survey found that 30% of college students agree with the statement: "If I show up to every class, I deserve at least a B." Surely that percentage isn't nearly as high at Harding. wink

Wednesday, May 06, 2009

Team Digital Preservation

In an effort to bring digital preservation to the masses, DigitalPreservationEurope (DPE) is developing an entertaining series of short animations introducing and explaining digital preservation problems and solutions. Below is their first video. It's a throw-back to animated cartoons of the 1960s, and it is fantastic. Watch as Team Digital Preservation thwarts Team Chaos' plans to disrupt digital information from a nuclear power plant.
"You fiend! It's essential to have long term stable and trusted information on how nuclear power plants are built and what's inside them!" - DigiMan




Future cartoons will be made available on DPE's You Tube Channel.

Monday, May 04, 2009

Improving movie recommendations

If you haven't yet checked out the new CACM blogs, you need to soon. One of the posts that caught my attention was Greg Linden's What is a Good Recommendation Algorithm? Linden wonders if Netflix's one million dollar reward for a better recommendation engine is a little short-sighted. The goal for their recommendation system is to only show people how much they might like a movie. But Linden points out:
However, this might not be what we want. Even in a feature that shows people how much they might like any particular movie, people care a lot more about misses at the extremes. For example, it could be much worse to say that you will be lukewarm (a prediction of 3 1/2 stars) on a movie you love (an actual of 4 1/2 stars) than to say you will be slightly less lukewarm (a prediction of 2 1/2 stars) on a movie you are lukewarm about (an actual of 3 1/2 stars). Moreover, what we often want is not to make a prediction for any movie, but find the best movies. (emphasis mine)


Shifting gears a little, I want talk about a couple of small fixes to an existing movie recommendation system that could make customers a lot happier.

I haven't used Netflix, but I've been using Blockbuster Online for over a year, and I've played with their recommendation feature a lot. I would assume their recommender is on par with Netflix (hint: someone needs to compare the two).

One feature Blockbuster offers allows you to select "Do not show me this movie again", a little icon on the side of each movie's ratings. I've clicked this icon a lot (is it just me, or there's a lot of garbage out there?), hoping Blockbuster would stop recommending these specific movies to me and others like them. However, the screen shot below is what I saw this morning when I logged into my account:


Note how I was recommended "Zack" and "Quarantine" despite having clicked on the no-show icon weeks ago. They also recommend , a movie I've already rated (and therefore obviously seen). But since I didn't rent "Changeling" directly from Blockbuster, they still offer it as a movie I "might have missed."

These movies do not appear in my formal set of recommendations (the screen that results from clicking on the Recommendations link), so my guess is Blockbuster is using a different set of algorithms to populate their might-have-missed list from their formal recommendation list. However, I suggest that the might-have-missed list should take advantage of previous ratings to improve overall customer satisfaction.

This should be common sense: Do not suggest a movie that a user has already marked "do not show me this movie again". Especially not on the first page the user sees when logging into your site.

One more point. Below is a screen shot from the first page of recommendations made by Blockbuster. None of the movies below appeal to me, but I can see how they might have been recommended based on my viewing history and ratings.



But one movie really stands out as a bad recommendation: "Swing" (bottom-left). Note how it has only received two stars on average, equivalent to "I didn't like this movie".

Why would Blockbuster think I would like this movie when most people don't?

I know my taste in movies is probably not typical, but I don't think I've ever given a movie with an average rating of two stars a rating better than two stars. Even if Blockbuster thinks this movie matches my tastes, it would make much more sense to put movies with higher overall ratings on the first result page and bump lower rated movies back a few pages.

My experience in general has been that Blockbuster's recommendations don't really work. I've found one recommended movie in the past year that I thought looked interesting. Then again, I don't often try iffy movie recommendations because I'm not ready to gamble on two hours of a nice evening.

I'm looking forward to a time when the recommendation system really works well, but until then, I'll be consulting with my friends and family who have a much better idea of what I really like to see.

Saturday, May 02, 2009

Micah Pate has been found

If you haven't already heard, Micah Pate's body has been found. Micah's husband Thomas is being charged with the killing this morning.

Micah Rine Pate was a Harding University graduate and Searcy native. Her parents are employees of Harding and Harding Academy. As you can imagine, the Searcy community has been rocked with this story. Our prayers go out to the Rine family and to Thomas' family.

The photo on the right is a screen shot of Micah's Facebook page. Many of her friends are posting sad farewells to her and telling her family how much they loved her. Her account will likely remain active as long as Facebook is around. I imagine her family is going to "capture" her Facebook account as well as an artifact of remembrance. I'm presenting a paper on this subject in June at JCDL 2009.


Update:

Two vigils in Searcy were held for Micah and the Pates, one at Harding. KARK 4 News had a news story about it last night. One thing that comes across in the story and interviews is Micah's faith and the positive influence she has had on others.

Wednesday, April 29, 2009

Upload an image in PHP

I created this function for my Internet Development students which saves a single uploaded image to disk. Example:
// Assuming the web server has write permissions to /mydir
SaveUploadedImage("/mydir/myimage.png");

The function can easily be modified to handle multiple filenames (change the parameter to accept an array of filenames and modify the final foreach block). Note that this is modified code from the webdeveloper.com forum. If you want to know more about uploading files in PHP, check out the PHP - File Upload tutorial.


// Return empty string if uploaded image is successfully saved as
// $image_filename or an error message. $image_filename should be
// saved in a directory that the web server can write to.
function SaveUploadedImage($image_filename)
{
// This function is greatly modified code from
// http://www.webdeveloper.com/forum/showthread.php?t=101466


// Possible PHP upload errors
$errors = array(1 => 'php.ini max file size exceeded',
2 => 'html form max file size exceeded',
3 => 'file upload was only partial',
4 => 'no file was attached');

// Store nonempty files in the active_keys array
$active_keys = array();
foreach ($_FILES as $key => $file)
{
if (!empty($file['name']))
$active_keys[] = $key;
}

// Check at least one file was uploaded
if (count($active_keys) == 0)
return 'No files were uploaded';

// Check for standard uploading errors
foreach ($active_keys as $key)
{
if ($_FILES[$key]['error'] > 0)
return $_FILES[$key]['tmp_name'] . ': ' . $errors[$_FILES[$key]['error']];
}

// See if the file we are working on really was an HTTP upload
foreach ($active_keys as $key)
{
if (!is_uploaded_file($_FILES[$key]['tmp_name']))
return $_FILES[$key]['tmp_name'] . ' not an HTTP upload';
}

// Make sure the image uploaded appears to be an actual image
foreach ($active_keys as $key)
{
if (!getimagesize($_FILES[$key]['tmp_name']))
return $_FILES[$key]['tmp_name'].' is not an image';
}


// Save every uploaded file to the same filename (normally we'd want to
// save each file with its own unique name, but we are assuming there
// is only one file).
foreach ($active_keys as $key)
{
if (!move_uploaded_file($_FILES[$key]['tmp_name'], $image_filename))
return 'receiving directory (' . $image_filename . ') has insufficient permission';
}

// If you got this far, everything has worked and the file has been successfully saved.

return '';
}

Wednesday, April 22, 2009

Nutch, Sitemaps, and Google's findings

My search engine class is winding down, but our final project is to implement a Sitemap Protocol parser for Nutch, a popular open-source search engine. I mentioned a while back that Nutch is not for wimps... my students would certainly vouch for the huge learning curve to making code modifications. I've even had to scale back how much work my students do because of the complexity of changes required. I'm going to do the difficult part of integrating their code with the innards of Nutch sometime in the next few weeks.

The reason I mention our Sitemap project is that WWW 2009 is meeting in Madrid this week, and a paper entitled Sitemaps: Above and Beyond the Crawl of Duty is being presented today by Uri Schonfeld (UCLA) and Narayanan Shivakumar (Google). This is the first paper to report on widespread usage of Sitemaps in the Web using Google's crawling history.

Schonfeld & Shivakumar report that Sitemaps were used by approximately 35 million websites in late 2008, exposing several billion URLs. 58% of the URLs included last modification dates, 7% included change frequency, and 61% a priority. About 76.8% of Sitemaps used XML formatting, and only 3.4% used plain text. Interestingly, 17.5% of Sitemaps are formatted incorrectly.

The figure below represents how many URLs Google discovered via Sitemaps (red) vs. regular crawling (green) for cnn.com. Notice that on any given day, more URLs could normally be discovered via Sitemaps.



Another interesting figure (below) shows when a URL was discovered via Sitemaps vs. regular web crawling for cnn.com. In most cases URLs were discovered at the same rate, but there are a number of them (dots below the line) that were discovered via Sitemaps much earlier than web crawling.


CNN's website is not typical. Schonfeld & Shivakumar report that in a dataset of 5 billion+ URLs, 78% were discovered via Sitemaps first compared to 22% via web crawling.

The paper also describes an algorithm that can be used by search engines to prioritize URLs discovered via web crawling and Sitemaps as well. I've covered the high-lights, but I recommend you read the paper if you're interested in some of the finer details.

Friday, April 17, 2009

Looks can be deceiving

It's been busy around here... Spring Sing, Easter, Tax Day, etc.

This morning Steve Baber presented a devotional thought at our computing seminar that I thought I'd share with you all. He talked about how easily our eyes can be deceived. Are you seeing a man on the left or the word Liar?

This is especially true when it comes to how we perceive others. How often do you catch yourself judging someone by their looks, their clothes, the house they are living in and the car they are driving?

James warns against this practice in James 2:1-4:
My brothers, as believers in our glorious Lord Jesus Christ, don't show favoritism. Suppose a man comes into your meeting wearing a gold ring and fine clothes, and a poor man in shabby clothes also comes in. If you show special attention to the man wearing fine clothes and say, "Here's a good seat for you," but say to the poor man, "You stand there" or "Sit on the floor by my feet," have you not discriminated among yourselves and become judges with evil thoughts?
Here are three contestants from Britain's Got Talent that feature some contestants whose appearance is misleading: Susan Boyle, Paul Potts, and Andrew Johnston.

Friday, April 03, 2009

Day 2 at DigCCurr 2009

This was a full day of presentations. One of my favorite panels was on personal digital archiving with Jeremy John, Cathy Marshall, David Pearson, and Andreas Rauber. My presentation seemed to go well... the room was packed with people sitting on the floor. Overall I was very pleased with the conference and met a good number of interesting people.

After the conference ended, I took advantage of the 70 degree weather and took a walk around the UNC campus. I then headed up to Franklin St. where a mass of well-dressed college students were gathering. (Franklin St. is where all the cool places to hang out are located. It's also the place where students jump over bonfires after big UNC wins.) The 2 mile walk back to the hotel was fantastic... the homes on Franklin St. are some of the most beautiful and unique homes I've seen.

Now I'm sitting in my hotel room (11 pm) missing my family while a number of college students stand outside my window talking as if no one else but them were at the hotel. It'll be a lot worse tomorrow night if UNC wins, but I'll be back in Searcy by then.

Update:

Looks like UNC is going to the championship, and the partying on Franklin Street continues.

Thursday, April 02, 2009

I'm at DigCCurr 2009

I flew into Raleigh/Durham late last night, and today I am attending the DigCCurr 2009 conference (Digital Curation Practice, Promise and Prospects) in Chapel Hill, NC. Tomorrow I'll be presenting a paper based on my summer work at LANL: Everyone is a Curator: Human-Assisted Preservation for ORE Aggregations. This was work I did with Herbert Van de Sompel and Michael Nelson (my former adviser at ODU).

There are 270 people registered for DigCCurr, but I only know a handful of them. So it was good to see Michael Nelson this morning getting coffee in the lobby... I had no idea he'd be here. Of course now I have to spruce up my presentation and remove my disparaging remarks about Herbert and Michael. wink

Tuesday, March 31, 2009

Happy birthday, Ethan!

My son is 2 today! He's becoming a big boy, and I'm very proud of him.

Monday, March 30, 2009

Enrollment in computer science is finally increasing

Good news: A survey by the Computing Research Association shows that enrollment in computer science courses in the 2007-2008 academic year was up 6.2%, the first increase since the dot-com bust six years ago. The number of new undergraduates majoring in CS is up 9.5%.

Bad news: Women still only receive 11.8% of CS degrees.

So why are enrollments increasing? Fear of the bad economy? The coolness of the iPhone?

We haven't yet seen an increase here at Harding, but I'm betting we will soon.

Wednesday, March 25, 2009

The Web in a box

The Internet Archive and Sun Microsystems have just announced the launching of a new data center that stores IA's entire web archive and serves the Wayback Machine. According to Brewster Kahle:
This 3 Petabyte (3 million gigabyte) datacenter will handle the 500 requests per second as it takes over the full Wayback load.

Tuesday, March 17, 2009

Workshop on Innovation in Digital Preservation (InDP 2009)

In conjunction with JCDL 2009, Andreas Rauber and I will be hosting the first workshop on Innovation in Digital Preservation (InDP 2009). We are soliciting full and short research papers as well as position papers. Read more about the workshop below and on the workshop website.

Digital Preservation (DP) research is often driven by traditional needs and approaches to solve the challenges arising. This is partially due to the rather traditional settings in which the challenges of obsolescence of digital objects have first been identified and dealt with, as well as partially due to the high levels of quality and auditability that these mostly very professional settings require.

But increasingly we are facing non-traditional DP challenges, ranging from non-traditional data collection (such as the Web, especially Web 2.0) to non-traditional institutions and actors, such as SMEs or private/home users. Additionally, non-traditional approaches to maintain digital objects, such as retargetable binary code or self-aware objects are gaining momentum.

This full-day workshop aims to provide a forum where researchers can share and discuss the latest innovations in DP by non-traditional methods. Topics include but are not limited to:

  • Personal archiving and personal information management
  • Archiving Web 1.0, 2.0, and Deep Web
  • Innovative approaches to preservation actions
  • Self-aware objects
  • Archiving solutions for small institutions
  • Binary retargetable code
  • Disaster recovery
  • Theoretical models of information preservation
  • Value of information and forgetting
  • Preserving electronic art

InDP 2009 will be held in conjunction with the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL) in Austin, Texas on June 19, 2009.

Saturday, March 14, 2009

Map of Science

This past summer I worked with the Digital Library Research and Prototyping Team at LANL. One of the projects they were working on, a "map of science", was just featured on the Wired Science Blog. The graph below is based on a massive collection of scholarly usage data (click on the image to get a more detailed look at it). Basically, the graph shows how users accessing scholarly work in one field (e.g., Cognitive Science) may also access work in another related field (e.g., Sports Medicine).



You can find more information about this work in:

Clickstream Data Yields High-Resolution Maps of Science by Johan Bollen, Herbert Van de Sompel, Aric Hagberg, Luis Bettencourt, Ryan Chute, Marko A. Rodriguez, Lyudmila Balakireva. Public Library of Science ONE, March 11, 2009.

Wednesday, March 11, 2009

Harding and the Economy

An article just published in the Christian Chronicle reports how the economy is affecting Harding University and some of our other sister institutions. Cascade College is closing their doors at the end of the spring semester (see the screen shot from their website below). Pepperdine University is eliminating 50 full- and part-time staff members as well as men's track and the women's swimming and diving program.


Things at Harding aren't quite as dim. The endowment is down, but no staff or faculty jobs are being cut. What the CC didn't note is that next year's enrollment numbers look really good. There is a budget freeze, and it's been rumored we will not receive any pay raises next year, but there's nothing to loose sleep over.

In general, post-secondary education is usually a winner in tough economic times. Individuals out of work will re-tool to make themselves more competitive. Some states are even putting more money into higher education, realizing the positive, long-term economic impact it can have.

At $423 a credit hour, Harding is not cheap, but it is less expensive than many private universities and many state universities. That's going to help us weather the storm.

Everyone is going to need to tighten their belts a little, but Lord willing, Harding is going to emerge from this economic downturn intact. I pray the same is true for our sister institutions.