Watch out, Google: When it comes to Internet search, there’s a new competitor in town.
Seventeen-year-old Nicholas Schiefer has found a better way to search small documents, such as tweets and Facebook statuses – all for his Grade 11 science fair project.
The Pickering resident, who attends Holy Trinity School, created an algorithm to filter through, and find relevant information. Created using linear algebra and discrete math, his algorithm is named “Apodora” after a python species with extraordinary search capabilities.
Not only did Mr. Schiefer win a gold medal at the Canada-Wide Science Fair, but he also earned the attention of students who dubbed him the “next Mark Zuckerberg,” said science and mathematics teacher Nina Dolgovykh.
Before he starts Grade 12 in the fall, Mr. Schiefer, who also likes to swim and ski, has a summer job at IBM. He spoke with Globe reporter Emily Jackson about his micro search invention.
You have been compared to Facebook’s Mark Zuckerberg. How does that make you react?
I’m not really sure how well that applies. The genius in Facebook was not so much algorithmic, but in the social aspect of the network. What [Mr. Zuckerberg]managed to create very well was a desire. In search in general, we already have the desire to search. The technology is trying to catch up to what people expect.
Tell me about your science fair project.
I focused on micro search, which deals with search on very short documents. It’s pretty new and exciting – there hasn’t been too much research done on it.
I wanted to create an algorithm that would try to discern and exploit the relationships between words so people can get better search results.
How did you get interested in search?
I’ve been interested in computers for a long time. I remember back in 2000 or 2001, I first used Google. Being a six or seven-year-old, I thought it was pretty magical. I guess that’s stuck with me.
Why is micro search different from regular search?
A lot of traditional algorithms for information retrieval tend to break down when you apply them to micro search. The reason for that is that most, nearly all existing algorithms make the independent assumption – that all words are completely independent from other words.
Obviously, that is false, but it’s been shown to work pretty well.
But that assumption breaks down quite badly with micro search. You do not have room to stuff your text full of synonyms and descriptions of everything you say so a search engine can find it.
For example, if you wanted to search tweets for the word “cat.” If a tweet contains the word “kitten,” that’s not going to be very helpful. It’s assuming cat and kitten are independent, even if they’re not.
What makes your algorithm unique?
I’ve managed to create a system that is fairly accurate in identifying relationships between words. It can infer things more statistically rather than relying on humans.
There have been other algorithms that have attempted to try to relate words to each other. What is innovative about my approach is that I don’t just consider direct relationships.
Some searches find words that appear in similar contexts. That’s pretty good, but that’s following the relationships to the first degree. My algorithm tries to follow connections further. Connections that are close are deemed more valuable. In theory, it follows connections to an infinite degree.
One thing which I really liked about my algorithm is that it didn’t rely on my hand coding almost anything. The computer was able to infer that certain words were related.
Who would want to use micro search?
Anyone who would want to extract information, especially in social media, and basically any place where you’re unable to get a long piece of text.
It’s been shown that people are increasingly reading shorter and shorter documents. It has posed the challenge of “How we can retrieve this information?”
What do you want to do with your algorithm after you finish high school?
I’m really not sure. It is a science fair project, but it turned out very well for me. I’m certain that I want to go to university, but I don’t know where yet.