stworthiness, then I think it is not unreasonable to pay for
that.
Naturally, an encyclopedia builder will not like my proposal. But to him or her
I say: package your encyclopedia inside a useful access system, because without
it the raw information you provide is just more data, and can easily get lost in
the sea of data available and growing every hour.
*Interview of September 2, 2000
= What has happened since our last interview?
I see a continued increase in small companies using language technology in one
way or another: either to provide search, or translation, or reports, or some
other communication function. The number of niches in which language technology
can be applied continues to surprise me: from stock reports and updates to
business-to-business communications to marketing...
With regard to research, the main breakthrough I see was led by a colleague at
ISI (I am proud to say), Kevin Knight. A team of scientists and students last
summer at Johns Hopkins University in Maryland developed a faster and otherwise
improved version of a method originally developed (and kept proprietary) by IBM
about 12 years ago. This method allows one to create a machine translation (MT)
system automatically, as long as one gives it enough bilingual text. Essentially
the method finds all correspondences in words and word positions across the two
languages and then builds up large tables of rules for what gets translated to
what, and how it is phrased.
Although the output quality is still low -- no-one would consider this a final
product, and no-one would use the translated output as is -- the team built a
(low-quality) Chinese-to-English MT system in 24 hours. That is a phenomenal
feat -- this has never been done before. (Of course, say the critics: you need
something like 3 million sentence pairs, which you can only get from the
parliaments of Canada, Hong Kong, or other bilingual countries; and of course,
they say, the quality is low. But the fact is that more bilingual and
semi-equivalent text is becoming available online every day, and the quality
will keep improving to at least the current levels of MT engines built by hand.
Of that I am certain.)
Other developments are less spectacular. There's a steady improvement in the
performance of systems that can decide whether an ambiguous word such as "bat"
means "flying mammal" or "sports tool" or "to hit"; there is solid work on
cross-language information retrieval (w
|