Saturday, December 30, 2006

Favorites of 2006

Here are some of my favorite things from 2006.

On-line laughs:
  1. The most-viewed YouTube video of 2006: The Evolution of Dance by Judson Laipply
  2. Possibly the worst recording of O Holy Night, ever
  3. David Brent's Microsoft training video
  4. The Iraq Report with subtitles
  5. Two sons try to take a Mother’s Day photo (these guys must know my brother and me)

TV commercials:
  1. Liberty Mutual "pay-it-forward" commercial
  2. LA County Fair - "Duh, Ashley, all wool comes from a cow..."
  3. “That should kill him…” Ameriquest doctor commercial
  4. Peyton Manning supporting his team

Movies I saw:
  1. The Prestige - It had me on the edge of my seat the entire time
  2. X-Men: The Last Stand - I hope this won’t be the last of the series
  3. Facing the Giants - Created by a church with no professional actors, it's a very moving and inspirational film
  4. Invincible - Almost makes me want to be a Phily fan
  5. Casino Royale - A little on the violent side, but probably the best Bond yet
  6. Flags of Our Fathers - War is tragic

Books I read:
  1. The Language of God by Francis Collins - Collins does a great job of distilling the myth that evolution = atheism
  2. Freakonomics by Steven D. Levitt and Stephen J. Dubner - Some interesting things to think about
  3. Sacred Marriage by Gary Thomas - Rather than make us "happy", marriage is designed by God to make us more holy
  4. Finding God at Harvard by Kelly Monroe - A series of encouraging stories from various Harvard alums about their path to faith

Websites I love to visit:
  1. Wikipedia.org – Sometimes you have to sort through a lot of “truthiness”, but it’s an extremely useful tool for answering the question “What is X?”
  2. YouTube.com – It feeds my addiction for funny commercials
  3. Google Scholar – OK, I don’t love to visit this site, but it's probably the #1 reason why getting my Ph.D. will take less than 4 years. It’s much larger and faster than Citeseer and far easier to use than Windows Live Academic

Friday, December 29, 2006

Goodbye to 2006

We just returned home from visiting my parents in St. Louis. Sara was able to fly up too, but John had to stay in Dallas and work the 26th. We had a really good visit... plenty of R&R and quality time with the fam. It was fun having us all together and watching the Broncos win and the Cowboys… well… we’ll try to forget about that game. I worked just a little- fixed and started several scripts which needed to run over the break, but a power-outage on the 26th undid all my work.

Although the Bean is not yet born, he got a majority of the gifts, including a really cute Broncos outfit with matching booties and bib. Sara also got him an “I love Daddy” jumper and a hand-made outfit from Bolivia. Can’t wait for him to be here!

In our family fantasy football league, I whooped up on my dad and sister to become the first family league champion. smile In the Bayside League, I was 6 out of 14. Not as good as my second place finish last year, but it was fun none the less. This is probably the last time I’ll play in the Bayside League since we won’t be living here next fall. Hope someone will pick it up after I leave.

On a sadder note, I found out that Marilyn Fowler passed away on Christmas Eve. I used to go to church with Al and Marilyn in Searcy. She had been battling leukemia and had completed her third round of chemotherapy and was so close to beating it. I’m glad she’s out of pain now, but it’s got to be hard on Al and the family.

Only 3 days left of 2006. This has been a really good year. A few highlights:
  • Apr 7: Passed my Ph.D. candidacy exam
  • May: Finished taking the last course of grad school
  • July 19: Becky and I celebrated our third anniversary
  • July 28: Found out Becky was pregnant!
  • Becky won 2 awards at work (Aug and May)
  • Oct 8-14: Our first cruise- to the Bahamas
  • Nov 17: Found out we’re having a boy!
  • Nov-Dec: Taught a class on Christian Apologetics (toughest class I’ve ever taught)
  • Today: Posted my 100th blog post


A few goals for 2007:
  • Name our newborn son
  • Change my first diaper
  • Move back to Arkansas
  • Begin teaching at Harding University
  • Graduate with my Ph.D.

Some books I want to read in 2007:

May God richly bless us all in 2007.

Saturday, December 23, 2006

End of the semester at ODU

I’m in my office at ODU overlooking a very barren campus. Everything is closed down, and all the faculty, staff, and students have started their vacations (mine starts tomorrow when the woman, the bean, and I fly out to see my parents in St. Louis ).

A few days ago I took some photos of the many changes going on around campus:




Here's my building (ECS). Nothing has changed here, but I thought I'd get a picture of it anyway. My office window is 3 floors up, left of the lobby windows.
This is what the new dorms look like from outside my office window.
These are the new dorms that are going up across from the ECS building. They are on top of what was the parking lot to the gym. Notice the stylish port-a-potties.This is a close-up of one of the finished dorm. The other will be finished in a few months.
The old gym has been gutted... it looks like a hurricane swept out the old basketball courts.These are two new structures next to the gym. I think they're going to be the new indoor tennis courts, but I could be wrong.



ODU is also building a new bookstore and a research center near the village.

Some interesting, news-worthy items about the 76 year old university:
  • According to the New York Times, ODU is the most racially diverse four-year institution in the country.
  • ODU is getting a football team in 2009- a little late for me, but just in time for the class of '09.
  • Sometimes when the university upgrades their software, you get thousands of dollars loaned to you with 0% interest.

Tuesday, December 19, 2006

Tom cat

Tom was put down yesterday after failing to respond positively to medication for a blood clot. He was a great old cat. We reluctantly adopted Tom when he was only a kitten; he followed me home one night after I’d been out TP’ing the neighborhood. He moved with my parents and sister from Denver to Kansas City, Bella Vista to St. Louis. We will miss you, buddy.

Monday, December 18, 2006

Saturday, December 16, 2006

Rhonda Frasier - See you on the other side

I just learned today that a good friend of mine from my post-college years in Denver has passed away. The following is from the Harding University alumni newsletter:
Rhonda Frazier (’94)

A celebration of life will be held at 1 p.m. Saturday, Dec. 16, at the Church of Christ in Prineville, Oregon, for Rhonda Gaylene Frazier of Madras, formerly of Lane County, who died Dec. 8 of Alzheimer's disease. She was 34.

She was born Aug. 23, 1972, in Heidelberg, Germany, to Doug and Dawn Kimball Frazier. She worked in fashion merchandising. She was a member of Chi Omega Phi and the women’s track team at Harding University.

Survivors include her parents; a son, Clay; a grandmother, Azalea Kimball Hatfield; a sister, Janelle Strong of Prineville; and a brother, Justin of Eugene.

Arrangements by Autumn Funerals in Redmond. Remembrances to Clay Frazier Trust Fund at Mid-Oregon Credit Union.

Rhonda and I were buddies when I graduated from Harding in '96 and moved back to Denver. She had also made the move to Denver after graduating from Harding, and she lived in the same apartment complex as Mark Story, another good friend of mine. I was probably at either her place or Mark's at least once a week.

I lost touch with Rhonda after I moved back to Searcy and she moved to Oregon. One Christmas she sent me a Christmas card with a photo of her and her son, but I haven’t talked to her in years. She was a Christian and a very thoughtful friend, and I pray that the Lord takes her home and provides comfort to her family.

Here’s a photo of Rhonda and I playing in a mud volleyball tournament in 1996.



This photo was taken at a Christmas party in 1996. Rhonda is on the far left.

Monday, December 11, 2006

Link rot in CACM

I was really surprised this afternoon to see an article in the Communications of the ACM that cited a cached URL from the MSN search engine in place of a missing web page:
The link to the U.S. Secret Service “Operation 4-1-9” report at www.secretservice.gov/alert419.htm appeared to be broken when this column was written, but cached copies remain available (for example, cc.msnscache.com/cache.aspx?q=3910458378891〈=en-US).
Communications of the ACM, Volume 49, Number 12 (2006), Page 18.
The editors of CACM may not be aware of this, but search engines do not keep cached copies of pages long. In fact, they will often purge their caches of any web page that returns a 404 when crawling. (You can read my paper on an experiment which illustrates this.) Citing a cached page from a search engine should never be done in academic writing. Instead of citing just one broken URL, CACM has now cited two.

If you are interested in learning more about link rot and how to combat it, check out this Wikipedia article that I contribute to.

Speaking of link rot, Baden Hughes of the University of Melbourne has recently published a study entitled Link? Rot. URI Citation Durability in 10 Years of AusWeb Proceedings. (Not sure why he used a question mark in his title.) He used many of the methodologies that I used when examining link rot in D-Lib Magazine last year. Turns out AusWeb URLs have a much lower half-life (6 years) than D-Lib article URLs (10 years). This is probably because authors of D-Lib articles are more aware of link rot than authors in other professions.

I’m curious if any other on-line magazine or journal can beat D-Lib’s 10 year half-life. I have a suspicion JMIR articles could since many of them use WebCite.

Friday, December 08, 2006

Agassi and Blake in Norfolk

Last night I finally got to see Andre Agassi in person at Anthem LIVE! at the Constant Center. Anthem LIVE! was hosted by James Blake to raise money for cancer research (James’ father died of cancer several years ago). Becky surprised me with a ticket to the event a few weeks ago. She couldn’t go with me though since this weekend she was in her friend Amy’s wedding in Memphis.


This is only the second time I’ve seen a live professional tennis match- the first was years ago in Denver when I saw an exhibition with Jimmy Connors and David Wheaton. And then there was the time I ran into Steffi Graf (now Andre’s wife) years ago in an elevator in San Antonio (nice legs ), but she wasn't exactly swining a racquet. So I was really excited about the event.

Anthem LIVE! opened with Boyd Tinsley (of the Dave Matthews Band) and the Blake brothers playing doubles against the Bryan brothers. Bob and Mike Bryan are currently the number 1 ranked doubles team in the world. After 3 games, James and his brother Thomas took on the Bryan brothers in a lively 8 game set with the Bryan brothers ending up on top.

After the doubles match, I ran down to the entrance way where Agassi was going to enter and took some snaps with my phone (Becky took our digital camera with her to Memphis). The crowd went nuts when Agassi came out of the tunnel. I was about 10 feet away despite the usher who kept yelling at me to return to my seat. Common- I’m not going to stab the guy. smile

Andre and James played an entertaining two set match, ending in a win for James after a third set tie-breaker. There were some great shots by both players, and the crowd loved it. Most people stayed to the very end (around 10:30 pm).

An interesting fact I learned last night: Agassi and Graf are the only two tennis players to have won every Grand Slam tournament and a gold metal in tennis. Can you image the genes their kids must have?!

I may not have gotten to see Agassi play in the US Open or Wimbledon, but this was pretty awesome. Now if I can just get Becky to let me name our son Agassi...

Saturday, December 02, 2006

Forum posts lost from Beryl Project

This week I received an email from Paul Dorman who was wanting to use Warrick to recover http://forums.beryl-project.org. Beryl is a combined window manager and compositing manager that runs on top of Xgl or AIGLX. According to Paul, the site was lost when a hard drive crashed and no backups were available. There is some discussion of recovering the website here.

Apparently one of the forum members named TreviƱo used Warrick to recover quite a few of the pages and has them hosted here. One of the forum members praised Google for "their excellent off site backup that is watchin us all." Kinda gives you a warm cozy feeling.

Friday, December 01, 2006

Hasta la Vista

Microsoft finally released Windows Vista yesterday. Robert Vamosi has written about the Five reasons to love (and hate) Windows Vista, and Joel Spolsky has written an interesting bit about the complexity of the Vista's off button.

I’m looking forward to giving Vista a try, but not anytime soon… my current laptop runs like a dog with XP, much less with Vista. Better to purchase a new computer in 2007 with the OS already installed.

Thursday, November 23, 2006

Today is Thanksgiving

Today is Thanksgiving, and I certainly have a lot to be thankful for. Sometimes it’s good to actually list your blessings, especially when you feel like things just aren’t going your way. So here’s a list of a handful of things I’m thankful for in no particular order:
  • My stunningly beautiful, intelligent, and hilarious wife
  • My first child to be born in April
  • My two favorite teams playing today- Cowboys and Broncos
  • My parents and brother who will be at the Cowboys game, and my sister who is soaking up the sun in Mexico today
  • Andy and Stephanie Walz who have invited us over today to eat turkey!
  • Finishing up a rather time-consuming paper
  • A fantastic advisor with the unique combination of basketball and OAI-PMH skillz
  • Funding for my Ph.D.
  • A church that Becky and I feel particularly blessed to be a part of
  • The class I’m teaching at church on Christian apologetics that seems to be going really well
  • My friends, the Hornes, who are moving back to Virginia Beach
  • Our beat-up Geo Metro that still runs
  • Harding students that are a light to the world
  • Hilarious Payton Manning commericials
  • The Office
  • BBQ chicken pizza at California Pizza Kitchen

Wednesday, November 22, 2006

Search engine API study

I submitted my paper comparing search engine API results with WUI results to the WWW’07 conference on Monday. If you are interested in reading it, feel free to contact me. I wrote a little about it a few weeks ago.

Today I posted the hundreds of graphs that we couldn’t fit into the paper on my website. The graphs were created with R scripts which were a monster to write. I’ll probably be posting those on my website soon for anyone who is interested.

Monday, November 20, 2006

Harding students at HUFS

The other night I was reading the Harding alumni magazine and came across the article The Tour de France and Switzerland to Boot. The article is about the experiences of the first group of Harding students to study at the new international program called HUFS (Harding University in France/Switzerland). Robert McCready, the author and HU professor who chaperoned the group, gave a really encouraging report about the students’ conduct:
As we approached Toulouse, our quiet bus driver asked the guide for the microphone. He proceeded to tell us that in 37 years of chauffeuring, he had never met as fine a group and was impressed by the students’ respect for him and for one another, their wiping their feet before getting on the bus, and their joy in singing devotional songs. He will retire in two years and expressed his wish to do so with Harding students as his last group.
Not only that, but as a result of numerous positive encounters between the students and local Christians in Toulouse, a couple of women decided to enroll at Harding. I’m really proud of those students and the way they were a light to the world.

The magazine also noted that Ward Sandlin, a 1991 Harding alum, was awarded the Air Medal by the Coast Guard for his performance during Hurricane Katrina where he saved 161 lives immediately after the disaster. Congrats!

Saturday, November 18, 2006

We're having a boy!

Yesterday morning Becky and I found out that our little bean was a boy! Becky’s mother came into town on Wednesday evening so she could be with us for the ultrasound. It was a very emotional experience seeing my boy for the first time. When he moved his arms around I felt like this kid was the most incredible of all of God’s creations. There’s just something indescribably awesome about being a father. Next up: picking a name. smile

Thursday, November 16, 2006

Google Archive?

In September Google apparently registered for multiple domain names that implied a Google Archive (or what I call Internet Archive Part Deux) is in the works. Garett Rogers of ZDNet was the first to break the story. It wouldn't be surprising if Google decided to quit throwing away their cached copies of the Web and allow users to search the Web through time. This functionality is something the folks at the Internet Archive have been working on for quite some time. Personally I'm glad they don't provide an archive search- I would be very embarrassed if people could see the first website I created back in 1997. Perhaps with Google's deep pockets we'll see a searchable (and more up-to-date) Internet archive before the year is out.

Saturday, November 11, 2006

WIDM 2006

I presented my paper Lazy Preservation: Reconstructing Websites by Crawling the Crawlers today at the Workshop on Web Information and Data Management (WIDM). I was also the session chair for the Web Organization session. Joan was able to fight through her cough and present her mod_oai paper as well.

This was a competitive workshop (only 11 of 51 submitted papers were accepted), but I was a little disappointed with the small number of attendees (only a dozen or so). The presentations though were quite good. My favorite was “Coarse-grained Classification of Web Sites by Their Structural Properties” where they looked at website characteristics like the number of slashes in a URL and average URL length to determine if a website was a blog, a personal site, a commercial site, etc. Who would have thought you could guess which category a website fell into by looking at URL properties?

I also really enjoyed the keynote speaker, Sihem Amer-Yahia from Yahoo Research, who talked about a project at Yahoo where they are trying to personalize web search based on the community interests of the searcher.

Next year WIDM is going to be in Portugal along with CIKM. Hmm…

CIKM days 2 and 3

Wednesday

The keynote speaker this morning was Gary Flake of Microsoft Labs who entitled his talk: How I Learned to Stop Worrying and Love the Imminent Internet Singularity. (I guess he really liked the title of my blog. ) Some of the topics included power laws, long tails, network effects, and the Innovator’s Dilemma. Essentially the talk was about how human knowledge, the ability to analyze the online world, and the ability to create digital artifacts are all converging to create an Internet singularity which is going to take over the world (or at least seriously change the way we do things). During the Q&A session after the talk, Gary briefly spoke about the “parasitic relationship” between publishers and academia and how we should throw the bums out and publish only on on-line journals. Stevan Harnad would have been proud.

I sat in on several presentations that mostly focused on database enhancements- not really my thing. One of the few papers that I did find interesting though was Xiaoguang Qi’s paper entitled Knowing a Web Page by the Company that it Keeps. Xiaoguang presented an interesting way to know more about what a web page is about by examining the parents, siblings, and children of the page. They also used theYahoo web search API to discover parents.

The banquet Wednesday evening was ok. I didn’t know anyone, but I had a decent chat with a fellow from Jordan who worked in the database area. He told me he was somewhat disappointed with the conference and suggested VLDB was much more interesting. I guess I was a little disappointed too since the focus of most of the research was only peripherally related to my own interests, but I probably should have expected that coming into the conference.

Thursday

The keynote speaker this morning was Joseph Kielman from the Dept of Homeland Security. Basically HS would like to model the way the behavior of the entire world and have the computer say, “Hey, I think Joe Mohammad is about to go jihad on us.” Kielman gave some indication that the bureaucracy at HS made getting things done very difficult.

The one presentation I really liked today was written by a group from Yahoo and Stanford and entitled Estimating Corpus Size via Queries. They showed a method that could be used to answer the question: How many pages in Chinese from US-registered servers are indexed by Yahoo? Their method requires several assumptions to be true such as the query must produce less than 1000 results since search engines do not give access to more than 1000 results.

I skipped out on the last session of the conference so I could catch a matinee showing of The Prestige. It’s a movie about two magicians who are obsessed with discovering each other’s secrets (excellent movie, by the way). It got me thinking… if CIKM would introduce a couple of magic tricks between presentations, maybe get the session chair to make boring speakers suddenly disappear in a flash of smoke, this might turn into one of the “can’t miss” conferences of the year. As it currently stands, I have to admit that librarians know how to have more fun (see JCDL).

Tuesday, November 07, 2006

CIKM 2006 in Arlington

This week I’m attending CIKM 2006 in Arlington, DC. I’ll be presenting a paper on lazy preservation on Friday at WIDM. In the meantime I can just sit back and enjoy the conference and the town. Unfortnately, Becky couldn't come up with me, but at least I got to meet my sister last night for dinner.

This morning Hector Garcia-Molina gave a talk on the research they are doing at Stanford on pair-wise entity resolution. Basically he talked about what entity resolution was and how they were taking an approach that may or may not end up being better than the approaches currently being presented. Hector did a great job of speaking clearly and engaging the audience. It was one of the few talks today that I didn’t find myself wanting to screaming “Drop the laser pointer!” and “Quit talking to the projection screen!”

I attended the “Mining Reviews and Blogs” session this afternoon which had a number of interesting papers, and the poster presentations/reception this evening. I haven’t seen anything that’s very related to my work, but it’s nice to be exposed to some cutting-edge research in information retrieval.

Friday, November 03, 2006

Do the search engine APIs lie?

OK, the title of this post is a little strong. Search engine APIs don't intend to deceive anyone, but they typically do not give the same result as what the rest of the world sees when using the public web interfaces.

Everyday for the past 5 months I’ve been sending thousands of queries to the Google, MSN, and Yahoo on the Internets using the web user interface (WUI), the little box that everyone types their queries into, and using the web search APIs that each of the search engines makes available for free to the public. There’s been a lot of questions as to whether the APIs give the same results as the WUIs, and I’m going to be the first to provide a strong quantitative analysis to see which API's are the most synchronized with their WUIs.

In order to process the incredible amount of data I’ve been collecting, I’ve developed an elaborate set of Perl scripts that transform the raw collected data into tables that are then imported into MySQL. The scripts take several days to complete processing. Then I’ve developed numerous R scripts that pull data from MySQL and plot them to an array of graphs.

I’m currently working on writing up my findings for a conference. If you’d like a pre-print of my paper, I’d be happy to share it with you. Here’s a little teaser.


The graph above shows the daily Kendall tau distance between the top 100 search results obtained from Google’s WUI and API for the term carmen electra. The green line shows how the WUI results change every day, and the blue like shows how the API results change every day. If the results are exactly the same (including their ranking), the distance is 1, but if the results have nothing in common, the distance is 0. The red line shows the distance between the WUI and API results each day. You’ll notice that for the most part the WUI and API values don’t move in a synchronized way, and the WUI and API results are very dissimilar. Other popular search terms like stacy keibler, jessica simpson, and lindsay lohan exhibited similar patterns (although the WUI vs API distance was closer to about 0.8). When we examine search results for terms like nfl football or computational complexity, the WUI and API results are very synchronized, and the WUI vs API distance is closer to 0.9. Maybe they purposefully discriminate against air-heads?



This graph shows the decay of the search results for the term subroutine for all three search engines. To compute decay, I compared the results obtained on each day with each of the results after that day using a normalized overlap measure. In other words, I computed the percentage of results that were shared between the results obtained on day 1 with day 2, 3, 4, etc. Yahoo shows a strong decay line with a half-life of 30 (on day 30 half of the results were gone). Google and MSN show decay lines that actually un-decay (if there is such a word). After several months of the results becoming more different, the results start to return back to their starting point.



One last graph: how many times does the WUI and API agree when asked for the total number of results for a search term? For all three search engines, the answer is almost always zero! But if you look at the graph above, you’ll see that the MSN total results used to agree almost every time until day 58 (late July) when they changed something internally. Now about half of the time their WUI gives a larger number, and the half of the time the API gives a larger number. By the way, the gap under day 107 was due to MSN invalidating our API license key. It took me 17 days before I replaced the key. Moral of the story- keep a close eye on your experiments!

Saturday, October 28, 2006

Clash of the Titans

Yesterday I attended the fourth annual Clash of the Titans event at Regent University. Bill O’Reilly moderated between former Israeli Prime Minister Ehud Barak and Palestinian Spokesperson Dr. Hanan Ashrawi as they debated the question: "Can Military Force Bring Lasting Peace to the Middle East?" All three participants did a really good job of conveying their point of view without being unnecessarily rude. In fact, Barak claimed that if Ashrawi was in charge of the Palestinians, there would be peace in the region.


The crowd was quite enamored with O’Reilly when he first came out on stage; I thought a few of the audience members looked as if they’d died and gone to heaven. And both Barak and Ashrawi received equal applause. There wasn’t any booing this year unlike the last debate, but I suppose that’s because last year’s subject, the war in Iraq, hit so close to home, and there was a large military presence in the audience last year.

I think Ashrawi did a really good job of conveying to the audience the Palestinians viewpoint. Both Barak and Ashrawi admitted the only solution to the current situation was the two-state solution, but neither thought this solution would come anytime soon, especially with Hamas ruling in Palestine (Ashrawi was very careful to avoid criticizing Hamas, stating they were in power due to the protest vote).

Barak stated that he wished Israel could be located somewhere else in the world. America has it pretty good since they aren’t situated in the middle of a hostile group of neighbors. Barak told an old joke about Moses that went something like this:
Moses stuttered because he was not an eloquent man. God asked him where he wanted to settle and Moses said, 'Ca-ca-ca-ca,' and God understood he meant Canaan. But Moses was trying to say 'Canada.'

Wednesday, October 25, 2006

Reconstructed www.survivorsunited.com

Gina Jones at the Library of Congress tipped me off to another missing website: www.survivorsunited.com. The LOC had crawled this site last August as part of the Darfur Collection. According to the Yahoo directory, this site is about an
Organization formed to help women who were victims of rape or sexual assault in the Darfur conflict. Site provides news, timeline, collection of related documents, and calls to action.
I have reconstructed the website and made it available here.

Monday, October 23, 2006

Google Code Search

A few weeks ago Google launched Google Code Search which allows you to search through source code that Google finds while crawling the Web. Like Google Scholar, this promises to be a really useful tool for searching specifically for resources that you want while ignoring all the other garbage that’s out there. It's also another small step into pulling the deep web up to the surface.

Of course this also opens the door to embarrassing numerous programmers and organizations and exposing many security holes by allowing you to search for code vulnerabilities, usernames/passwords, backdoor passwords, etc. Google has acknowledged these potential problems but maintains that all tools can be used for good or bad purposes and that the good far outweighs the bad in this case. Kudos for Google not caving into the whiners.

My own vanity search revealed that Google has only indexed a very small amount of code that I’ve made available on the Web. (Of course not all the code I’ve made available has my name on it.) I am really surprised Warrick hasn’t been indexed yet considering the code has been available for quite some time. It’s possible that switching the URL where the source code is located when the version changes it is causing Google to be a little skittish.

Tuesday, October 17, 2006

Back from the Bahamas

Becky and I returned Saturday morning from our cruise to the Bahamas. Our room on board was spacious, and we loved getting breakfast in bed and twice-daily room service. We had a great time snorkeling in Nassau and the Atlantis hotel, and the beach in Freeport was incredible. I celebrated my birthday onboard and got to blow out candles on a cake of Baked Alaska.



Besides getting to spend a lot of time with Becky, I also had plenty of time to read. I brought along a book called The Language of God: A Scientist Presents Evidence for Belief by Francis S. Collins, the head of the Human Genome Project. Collins does a fantastic job at presenting how science and faith can be and should be integrated. Collins is an excellent communicator, and he especially does a good job of summarizing the current state of scientific knowledge. I highly recommend this book to the atheist, agnostic, and believer. I found the Language of God and Finding God at Harvard (which I just completed last month) to be very encouraging to my faith.

Now it’s back to the grindstone...

Saturday, October 07, 2006

Storm before the calm

Becky and I are going on our first cruise next week. We depart Norfolk tomorrow for Nassau and then to Freeport. This was supposed to be the "celebrate finishing the dissertation" cruise that we were going to take next summer, but the baby changed all that. Becky just finished her first trimester, so she should be feelin’ good for the cruise.

I’m glad to be getting out of town... yesterday a nor’easter came crawling into town. Combined with the high tide and full moon, my neighborhood was flooded just as bad as it was when Ernesto came through. I took these photos this morning.


Colonial Place mermaid

Apartment building next door

House next door

Our parking lot

Friday, October 06, 2006

Mark Foley websites recovered

Mark Foley, a Congressman from Florida, resigned on Sept 29, 2006 over allegations of inappropriate emails to minors who worked as Congressional pages. It was brought to my attention on Tuesday (thank you Martha!) that his websites

http://www.house.gov/foley/

and

http://www.markfoley.com/

were both shut off after the resignation. I have reconstructed both sites using Warrick and made them available here.

Become.com's web crawler

Today a member of the Heritrix list serve pointed everyone to an article on Sun’s website that discusses Become.com’s web crawler. The article dates back to August of 2005, so it’s a little dated. I couldn’t find any updated information on the crawler, but apparently it is proprietary, and the source code will likely never see the light of day.

Become.com actually developed 2 crawlers in 2004- one written entirely in Java and the other mostly Java with some C++. The article states that the crawlers "may be the most sophisticated, massively scaled Java technology application in existence."

The article doesn’t mention anything about Heritrix, a crawler which is also completely written in Java. Although Heritrix doesn’t currently have a distributed architecture, it could still be deployed in such an environment. It would be really interesting to see the two crawlers compete at the National Java Crawling Championships.

Tuesday, September 19, 2006

Software Engineering: Best Job in America

Money Magazine has listed software engineering as the number one Best Job in America. Even with the threat of off-shoring jobs, a computer science degree isn’t looking so bad after all. In fact, of the top 10 jobs listed, 4 of them could be obtained by someone with a CS degree. Having been both a software engineer and now a college professor (number two on the list), I can attest that both jobs are very rewarding.