Wednesday, April 29, 2009

Upload an image in PHP

I created this function for my Internet Development students which saves a single uploaded image to disk. Example:
// Assuming the web server has write permissions to /mydir
SaveUploadedImage("/mydir/myimage.png");

The function can easily be modified to handle multiple filenames (change the parameter to accept an array of filenames and modify the final foreach block). Note that this is modified code from the webdeveloper.com forum. If you want to know more about uploading files in PHP, check out the PHP - File Upload tutorial.


// Return empty string if uploaded image is successfully saved as
// $image_filename or an error message. $image_filename should be
// saved in a directory that the web server can write to.
function SaveUploadedImage($image_filename)
{
// This function is greatly modified code from
// http://www.webdeveloper.com/forum/showthread.php?t=101466


// Possible PHP upload errors
$errors = array(1 => 'php.ini max file size exceeded',
2 => 'html form max file size exceeded',
3 => 'file upload was only partial',
4 => 'no file was attached');

// Store nonempty files in the active_keys array
$active_keys = array();
foreach ($_FILES as $key => $file)
{
if (!empty($file['name']))
$active_keys[] = $key;
}

// Check at least one file was uploaded
if (count($active_keys) == 0)
return 'No files were uploaded';

// Check for standard uploading errors
foreach ($active_keys as $key)
{
if ($_FILES[$key]['error'] > 0)
return $_FILES[$key]['tmp_name'] . ': ' . $errors[$_FILES[$key]['error']];
}

// See if the file we are working on really was an HTTP upload
foreach ($active_keys as $key)
{
if (!is_uploaded_file($_FILES[$key]['tmp_name']))
return $_FILES[$key]['tmp_name'] . ' not an HTTP upload';
}

// Make sure the image uploaded appears to be an actual image
foreach ($active_keys as $key)
{
if (!getimagesize($_FILES[$key]['tmp_name']))
return $_FILES[$key]['tmp_name'].' is not an image';
}


// Save every uploaded file to the same filename (normally we'd want to
// save each file with its own unique name, but we are assuming there
// is only one file).
foreach ($active_keys as $key)
{
if (!move_uploaded_file($_FILES[$key]['tmp_name'], $image_filename))
return 'receiving directory (' . $image_filename . ') has insufficient permission';
}

// If you got this far, everything has worked and the file has been successfully saved.

return '';
}

Wednesday, April 22, 2009

Nutch, Sitemaps, and Google's findings

My search engine class is winding down, but our final project is to implement a Sitemap Protocol parser for Nutch, a popular open-source search engine. I mentioned a while back that Nutch is not for wimps... my students would certainly vouch for the huge learning curve to making code modifications. I've even had to scale back how much work my students do because of the complexity of changes required. I'm going to do the difficult part of integrating their code with the innards of Nutch sometime in the next few weeks.

The reason I mention our Sitemap project is that WWW 2009 is meeting in Madrid this week, and a paper entitled Sitemaps: Above and Beyond the Crawl of Duty is being presented today by Uri Schonfeld (UCLA) and Narayanan Shivakumar (Google). This is the first paper to report on widespread usage of Sitemaps in the Web using Google's crawling history.

Schonfeld & Shivakumar report that Sitemaps were used by approximately 35 million websites in late 2008, exposing several billion URLs. 58% of the URLs included last modification dates, 7% included change frequency, and 61% a priority. About 76.8% of Sitemaps used XML formatting, and only 3.4% used plain text. Interestingly, 17.5% of Sitemaps are formatted incorrectly.

The figure below represents how many URLs Google discovered via Sitemaps (red) vs. regular crawling (green) for cnn.com. Notice that on any given day, more URLs could normally be discovered via Sitemaps.



Another interesting figure (below) shows when a URL was discovered via Sitemaps vs. regular web crawling for cnn.com. In most cases URLs were discovered at the same rate, but there are a number of them (dots below the line) that were discovered via Sitemaps much earlier than web crawling.


CNN's website is not typical. Schonfeld & Shivakumar report that in a dataset of 5 billion+ URLs, 78% were discovered via Sitemaps first compared to 22% via web crawling.

The paper also describes an algorithm that can be used by search engines to prioritize URLs discovered via web crawling and Sitemaps as well. I've covered the high-lights, but I recommend you read the paper if you're interested in some of the finer details.

Friday, April 17, 2009

Looks can be deceiving

It's been busy around here... Spring Sing, Easter, Tax Day, etc.

This morning Steve Baber presented a devotional thought at our computing seminar that I thought I'd share with you all. He talked about how easily our eyes can be deceived. Are you seeing a man on the left or the word Liar?

This is especially true when it comes to how we perceive others. How often do you catch yourself judging someone by their looks, their clothes, the house they are living in and the car they are driving?

James warns against this practice in James 2:1-4:
My brothers, as believers in our glorious Lord Jesus Christ, don't show favoritism. Suppose a man comes into your meeting wearing a gold ring and fine clothes, and a poor man in shabby clothes also comes in. If you show special attention to the man wearing fine clothes and say, "Here's a good seat for you," but say to the poor man, "You stand there" or "Sit on the floor by my feet," have you not discriminated among yourselves and become judges with evil thoughts?
Here are three contestants from Britain's Got Talent that feature some contestants whose appearance is misleading: Susan Boyle, Paul Potts, and Andrew Johnston.

Friday, April 03, 2009

Day 2 at DigCCurr 2009

This was a full day of presentations. One of my favorite panels was on personal digital archiving with Jeremy John, Cathy Marshall, David Pearson, and Andreas Rauber. My presentation seemed to go well... the room was packed with people sitting on the floor. Overall I was very pleased with the conference and met a good number of interesting people.

After the conference ended, I took advantage of the 70 degree weather and took a walk around the UNC campus. I then headed up to Franklin St. where a mass of well-dressed college students were gathering. (Franklin St. is where all the cool places to hang out are located. It's also the place where students jump over bonfires after big UNC wins.) The 2 mile walk back to the hotel was fantastic... the homes on Franklin St. are some of the most beautiful and unique homes I've seen.

Now I'm sitting in my hotel room (11 pm) missing my family while a number of college students stand outside my window talking as if no one else but them were at the hotel. It'll be a lot worse tomorrow night if UNC wins, but I'll be back in Searcy by then.

Update:

Looks like UNC is going to the championship, and the partying on Franklin Street continues.

Thursday, April 02, 2009

I'm at DigCCurr 2009

I flew into Raleigh/Durham late last night, and today I am attending the DigCCurr 2009 conference (Digital Curation Practice, Promise and Prospects) in Chapel Hill, NC. Tomorrow I'll be presenting a paper based on my summer work at LANL: Everyone is a Curator: Human-Assisted Preservation for ORE Aggregations. This was work I did with Herbert Van de Sompel and Michael Nelson (my former adviser at ODU).

There are 270 people registered for DigCCurr, but I only know a handful of them. So it was good to see Michael Nelson this morning getting coffee in the lobby... I had no idea he'd be here. Of course now I have to spruce up my presentation and remove my disparaging remarks about Herbert and Michael. wink