Today's announcements from Microsoft included the first public beta release of the Microsoft Speech Server, the beta 3 release of the Speech Application Software Development Kit (SASDK), and the launch of several new resources associated with Microsoft's Speech Application Language Tags (SALT)-based speech offerings. The new resources announced today include: the Microsoft Speech Server Beta Program; an Early Adopter Program (EAP); and specialized training courses on the Speech Server, the SASDK, the voice user interface (VUI), and speech application design.
"Speech technology is on the cusp of reaching its full potential, and we are committed to bringing it to the mainstream," said Kai-Fu Lee, corporate vice president of the Speech Technologies group at Microsoft. "We are excited to deliver this open-standards-based technology to the market as a common platform on which developers, partners and enterprises can create great speech applications."
"Microsoft Speech Server is unique to the marketplace in that it is the only speech server that supports both unified telephony and multimodal applications," commented Xuedong Huang, general manager of the Speech Technologies group at Microsoft. By building our speech technology offerings upon the open, industry-standard SALT specification, customers can use speech to access information from standard telephones and cell phones as well as GUI-based devices like PDAs, Tablet PCs and 'smart' phones."
Here are additional details (in Microsoft's words) regarding the specific speech technologies relating to today's announcements . . .
Microsoft Speech Server Beta v1.0Designed to run on the Windows Server 2003 operating system, Microsoft Speech Server is the most flexible and integrated platform for delivering low total cost of ownership for speech deployments. Taking advantage of the improved secure architecture and new security-aware features of Windows Server 2003, Microsoft Speech Server includes additional security features to help protect and defend systems, resources and users from potential security threats. Built on SALT, an open industry standard, Microsoft Speech Server extends existing Web markup languages by adding speech recognition and prompt functionality to both telephony and multimodal applications. For connectivity into the enterprise telephony infrastructure and call-control functionality, Intel Corp. and Intervoice Inc. will provide a Telephony Interface Manager (TIM) that supports Microsoft Speech Server. The TIM will provide fast and easy integration of the speech server with the Intel NetStructure communications boards, enabling deployment of robust speech processing applications. Multimodal applications do not require a TIM.
The following are additional key components of the Microsoft Speech Server:
- Speech Engine Services (SES):
- Speech Recognition Engine. This component includes the state-of-the-art Microsoft Speech Recognition Engine for accurately handling users'speech inputs.
- Prompt Engine. The Prompt Engine joins prerecorded prompts from adatabase and plays them back so that users hear a human voice.
- Text-to-Speech Engine. When prerecorded prompts are unavailable,SpeechWorks' Speechify Text-to-Speech Engine synthesizes audio outputfrom a text string.
- Telephony Application Services (TAS):
- SALT Interpreter. This component deals with all the speech interfaceand presentation logic (input and output). In addition, the SALTInterpreter handles interactions between the speech application and thetelephony components of the architecture.
- Media and Speech Manager. The Media and Speech Manager handles requestsmade by SALT Interpreters to SES for speech recognition and promptplayback, and manages interfaces with the third-party TIM to deliveraudio to and from the telephone user.
- SALT Interpreter Controller. The SALT Interpreter Controller managescreation, deletion and resetting of the multiple instances of the SALTInterpreter that are managing dialogs with individual callers.
Microsoft Speech Application SDK Beta v3.0The Microsoft Speech Application SDK is a set of tools and ASP.NET controls based on the SALT specification that enables developers to build both telephony and multimodal applications. Developers can incorporate speech functionality into Web applications quickly and easily, and can learn the concepts necessary to build a speech application with the familiar Microsoft Visual Studio .NET 2003 development environment. Users can access these applications across a variety of devices, from the desktop to the telephone, using speech as a possible mode of interaction.
New features included in beta 3 of the SASDK include these:
- Pocket Internet Explorer Bits. This feature allows Pocket PC access toMicrosoft Speech Server applications.
- Speech Application Wizard. This wizard enables developers to jump-startapplication development by creating a new project in Visual Studio .NET2003 that contains all the necessary objects.
- Telephony Application Simulator. This simulation of the Speech Serverallows developers to deploy telephony applications on the desktop andinteract with the application.
- Enhanced dual-tone multifrequency (DTMF) support.
- Speech Application Controls. Preset controls manage responsescontaining digits and letters, for example, credit card numbers andexpiration dates, currency amounts, ZIP codes and Social Securitynumbers.
- Enhancements to Grammar Authoring. The enhancements provide a flowchartview of grammars, the ability to type text for grammar phrases intogrammar files, a Pronunciation Editor for unusual words, andintegration into the Visual Studio .NET 2003 environment.
- Speech Controls Outline Panel. A dockable Visual Studio menu showsusers the sequence of controls in the speech application.
CoursesMicrosoft is offering three, five-day instructor-led courses for companies interested in building the skills necessary to support enterprise-grade solutions. The courses include the following:
- "Speech Applications: Planning, VUI Design and Maintenance"
- "Developing Speech Applications With the Microsoft Speech ApplicationSoftware Development Kit"
- "Deploying and Administering Microsoft Speech Server"
Further informationAdditional information about Microsoft Speech Server and the Speech Application SDK is available
here, and details on the Microsoft Speech Server beta program are
here.