ICDM 2020 —MLCS Workshop

Technical Description of Workshop

Many aspects of artificial intelligence, including search, question answering, and Internet of Things automation in home assistants, rely on robust cognitive services such as natural language understanding from speech and text. One of the technical challenges that remains an open research area and is coming to the forefront of this field is that of adapting cognitive services across languages, to serve a worldwide community of multilingual and multicultural users. This workshop will address research problems in cognitive service design and development that center around multilingual translation, speech recognition, and - to a degree - text understanding. These problems include:

Code mixing: mixed languages in speech and text (conversation, queries, commands)
Language recognition: identifying languages in small units of mixed natural language
Accents: identifying and adapting to regional accents and second-language speaker
Dialogue agents: responses; handling language switching in conversational contexts
Standardization/transcription: translating mixed texts and transcripts to one language

Nearly 20% of people in the United States, and 56% in Europe, consider themselves to be multilingual. Self-described bilingual speakers number 43% worldwide and trilingual speakers 13%; only 40% of people across the world are monolingual as of 2018. In recent years, there has been extensive research on cognitive services, language detection and monolingual translation; however, as globalization adds increasing numbers of multilingual users, the topic of multilingual cognitive services is becoming more prominent, with its own technical challenges, methodologies, and user needs. This workshop aims at gathering data science and machine learning researchers from many related areas to discuss how to meet these challenges and needs with new data mining approaches.

For example, there are many different brands of home assistants in different countries. However, when they are used by multilingual speakers, failures of natural language recognition by cognitive services can greatly diminish their accessibility and usability, to the point that they become less practical in their primary purpose (speech-based functions) than mobile devices and applications. When multilingual speakers ask for music by their favorite creative artists or search for information on notable people, places, and things, they are often unable to use native personal and place names, or local terms, because these embedded named entities may be treated as foreign phrases by a regionalized cognitive service. The crucial issue is that most cognitive services are regionalized to be intrinsically monolingual, an assumption that is part of the inherent problem for the large and growing body of multilingual users.

Therefore, we seek to bring together researchers from different fields of data mining, including transdisciplinary and interdisciplinary data scientists, to discuss their innovations, views, and visions regarding cutting-edge cognitive services technology.

Active research areas that are related to cognitive services include:

Data mining and computational linguistics in multilingual domains
Multimodal data science, especially video (dialogues, speechreading)
Machine learning using multilingual natural language data, including text/transcripts
Multilingual speech recognition/prediction with deep learning/artificial neural nets
Human-centered computing, including cognitive models and user modeling
Home assistants and other dialogue agents
Machine translation
Human-robot interaction (HRI) and human-computer interaction (HCI)
Usability of interactive services: how to respond to multilingual queries and dialogue
User adaptation and personalization
Understanding emotions in user context: home/work, friends/strangers, online/in person

The emphasis of this workshop shall be approaches based on the above methodologies.

ICDM 2020 MLCS Workshop

Multilingual Cognitive Services Workshop 2020

Technical Description of Workshop

Intended Audience and Impact

Programs

TBD

TBD

TBD

TBD

TBD

TBD