Sponsors / Partners

Accenture Microsoft NYSE Shell Salesforce AAAS Science Red Herring Fortune CNN Time Magazine
 

2004 World Technology Awards Winners & Finalists

Dr. Dimitri Kanevsky and Dr. Stéphane H. Maes

Please describe the work that you are doing that you consider to be the most innovative and of the greatest likely long-term significance.

Conversational Biometrics

The nominees have proposed conversational biometrics as a novel way to perform high accuracy biometric identification to verify a speaker’s identity through voice. By combining voiceprint authentication and personal information, conversational biometrics can enhance security in conducting transactions while maintaining speed and convenience for the user. Applications for conversational biometrics include several possibilities such as security / authentication, user modeling, data mining, payment and telematics. One of their significant contributions to the field is described in US Patent 6529871, which introduces the fundamentals of conversational biometrics. MIT Technology Review has recently cited this invention as one of the five most influential patents for 2003.

With this invention, the nominees have overcome many of the limitations that plague traditional speaker recognition systems and prevent their real life deployments for high security solutions. Traditional speaker recognition methods perform statistical analysis on acoustic utterances to estimate models of the speaker's vocal tracts. These models allow very accurate identification and verification of speakers over very large populations. However, they are very sensitive to aging, acoustic noise, channel distortions introduced by microphones or telephony channels etc. Some users also have voice characteristics that often confuse the recognizers. Given these constraints, it becomes difficult to guarantee secure performance against impersonators and recordings. Despite the usability appeal and economical incentives of security systems based on voice, these challenges have prevented wide deployments.

Conversational biometrics overcome these challenges by combining “acoustic” speaker recognition with speech recognition, typically gathered through natural dialogs to collect information that only the legitimate user would know. Imagine that user John is calling a bank to move money from his account. John identifies himself. The bank system then asks John random questions, such as "Where were you born?" or "In what month did you last transfer money?" A person who might overhear the answers still cannot use them in the future, since the questions presented are random and constantly changing. The system analyzes John's answers, and only transfers money if the voice patterns "match" AND the information he gives is correct. Such dialogs are very intuitive for users and allow them to easily progress through the transaction.

Prototypes have been built by IBM and deployed by customers, thereby confirming the market readiness of the technology. We expect that the technique will eventually become as mainstream—and secure—as today’s fingerprinting, iris-scanning, and facial recognition systems, which are projected to form a combined $1.2 billion market this year, as airports, banks and others begin using biometrics to enhance security.

This work is representative of the focus of the nominees on new technologies that introduce new user interfaces, user experience and capabilities for mobile users. Other technology foci include performing secure transactions over the phone and interacting through multi-modal interfaces. The applications are able to switch at will from visual, keyboard, keypad, handwriting and voice inputs to collaborating with multiple messaging paradigm. This collaboration may take place over wireless networks or having the devices, appliances or vehicles intelligently dialoging or reporting problems.

More and more, people need to access information, perform transactions and communicate with their peers, from anywhere, at any time and under any circumstances. To achieve this, new technologies must be mastered and solutions developed to support new user interfaces, exploit mobile and environmental capabilities and link users to their peers or data. It will result in new computation and interaction paradigms for such a converged mobile and fixed world where everybody and everything is connected and collaboration is ubiquitous. Conversational biometrics provides insight and examples of how users could very conveniently and naturally perform secure transactions in such a world.

Brief Biography

1) Dr. Dimitri KANEVSKY

Dr. Kanevsky is currently a Research Staff Member in the Human Language Technologies at the IBM Watson Research Center in New York where he has been responsible for a number of speech recognition projects that include the development of the first ever Russian automatic speech recognition system, the Broadcast News Transcription Technologies System for which he received a Research Award and a project for embedding speech recognition in automobiles that IBM recognized as the 2003 Technical Accomplishment.

Dr. Kanevsky has worked at a number of prestigious centers for higher mathematics, including the Max Planck Institute in Germany and the Institute for Advanced Studies at Princeton. Dr. Kanevsky was the recipient of the Humboldt Fellowship for mathematics in 1982.

In 1979, Dr Kanevsky invented a multi-channel vibration based hearing aid, and founded a company to produce and market this device. He also developed the first uses for speech recognition as a communication aid for deaf users over the telephone, for which he received an award from the National Search for Computing Applications from John Hopkins to Assist Persons with Disabilities.

In 2002 Dr. Kanevsky led the Conversational Interactivity for Telematics project that was based on his invention of Artificial Passenger. Dr. Kanevsky also successfully applied mathematical methods to speech recognition. His work on discriminative algorithms allowed the introduction of a new class of training models which significantly improved the accuracy of various speech recognition systems. Its repeated reference in scientific publications led to the 2002 Science Accomplishment at IBM. Dr. Kanevsky also made a significant contribution into a statistical field by solving the difficult classification problem later published in such prestigious journals as The Annals of Statistics and Comptes Rendus de l’ Academie des Sciences.

In 2003, MIT Technology Review selected one of Dr. Kanevsky’s biometrics-based patents in security, as one of top five patents.

In 1998 Dr. Kanevsky introduced the first remote transcription stenographic services over the Internet, and created the ViaScribe product speech recognition concept and system that allows automatic transcription of lectures in real time and the creation of multimedia notes.

Dr. Kanevsky holds 68 patents and was granted the title of Master Inventor IBM in 2002, where he ranked among the top ten patentees, published over sixty scientific publications ranging from pure and applied mathematics, to statistics, speech and handwriting recognition, and language modeling. Dr. Kanevsky received a Ph.D. in Mathematics in 1977 from the Moscow University where he solved several difficult problems (related to structures of rational points on cubic surfaces) in the number theory and algebraic geometry, his field of expertise.

2) Stéphane H. Maes

Stéphane H. Maes holds Bachelor, Master and PhD degrees simultaneously in Electrical Engineering and Physics from the UCL, Louvain, Belgium. He completed his PhD jointly at CAIP (Center For Computer Aids To Industrial Productivity), Rutgers University, NJ. He also successfully graduated from the International Space University. His academic work was sponsored by several grants from the European Union (Erasmus), the European Space Agency (ESA), DARPA and the prestigious National Funds for Scientific Research (FNRS). His PhD thesis focused and the introduction of revolutionary acoustic modeling techniques for speech and speaker recognition that mimic the processing performed by the human auditory system. While at IBM he also successfully completed IBM MBA program.

He pursued this line of Research as Member of Technical Staff at AT&T Bell Laboratories, Murray Hill, NJ where he developed new techniques for robust speech processing in noisy environment.

In 1995, he joined IBM T.J. Watson Research Laboratories. Within the Human Language Technology Group, he pioneered automatic transcription of large vocabulary continuous speech recognition, participated to the successful IBM submissions to the broadcast new transcription and telephony transcriptions evaluation context organized by DARPA and contributed to the line of Via Voice Product. At IBM, Stéphane launched IBM Research and development activities in speaker recognition and drove the establishment of a biometric program. He managed an audacious program on speaker recognition over very large populations.

His work evolved beyond core recognition research as he drove and managed activities around the development of embedded speech recognition (that led to IBM embedded speech recognition and text to speech engines and widely demonstrated at trade show in the form of the personal speech assistant: a speech enabled palm pilot), in car speech interfaces (including the development of speech browsers and SpeechML, IBM precursor and base for VoiceXML), device-independent and multi-channel authoring (including XForms, now an established W3C standard). His work defined and evangelized some of the fundamental basis of multi-modal and multi-device technologies and user interfaces as well as the notion of conversational platforms. In particular, his work significantly defined IBM telephony roadmap and offerings. Throughout these years, Stéphane and his team drove the standardization of the underlying technologies at the relevant standard bodies (e.g. W3C, ETSI, 3GPP, ITU, WAP Forum).

Stéphane became more convinced of the key importance to develop appropriate middleware, tools and user interfaces to support a new world where computing becomes ubiquitous and where user can access data, perform transaction and collaborate at any time, anywhere, in any situation. On loan from IBM Research, he joined IBM Pervasive Computing (PvC) architecture and standard team and focus on the definition of an overall strategy for the mobile space and key related architecture initiative like SPDE (Service Provider Delivery Environment).

In 2002, Stéphane joined Oracle Mobile – Voice and Wireless division as Director of Architecture. He is the Chief Architect for Mobile and Advance Technologies across Oracle activities and line of product, responsible for R&D activities, the product architecture, technology strategy and overall standard strategy and participation. He also evangelizes Oracle mobile offerings and vision. He is strongly impacting and driving Oracle activities in Voice, Telephony, Web Services, Mobile/Telcos, RFID and Telematics. Some of his most relevant recent externally visible activities include chairing OMA (Open Mobile Alliance) activities on Policy enforcement and management and on Enterprise mobilization; driving the OMA Service Environment Architecture as well as OMA convergence with Parlay and standardizing mobile e-mail around P-IMAP (Push IMAP).

Stéphane is the author of numbers of scientific papers in reviewed international conferences and publications, of essential technical contributions to technical standard organizations and industry fora. He is also holder of numerous granted patents or pending. Other activities include functions of reviewer, member of Scientific Advisory Boards or Boards of Directors for technical publications, conferences and industry fora.