Welcome to visit Fusang!
Current location:front page >> science and technology

China ASEAN Language Intelligence Institute independently builds a multilingual corpus

2025-09-19 08:47:50 science and technology

China ASEAN Language Intelligence Institute independently builds a multilingual corpus

In recent years, with the rapid development of artificial intelligence and natural language processing technology, the construction of multilingual corpus has become an important foundation for promoting cross-language communication and technological innovation. The China ASEAN Research Institute of Language Intelligence (hereinafter referred to as the "Institute") recently announced that it has successfully built a multilingual corpus covering the main languages ​​of the ten ASEAN countries, aiming to promote language interoperability, cultural dissemination and intelligent technology cooperation between China and ASEAN countries.

The construction of this corpus not only fills the gap in the field of multilingual language resources in China, but also provides high-quality data support for artificial intelligence applications such as machine translation, speech recognition, and text analysis. Here is an overview of the main features and data of this corpus:

China ASEAN Language Intelligence Institute independently builds a multilingual corpus

Language typesCorpus Scale (100 million words)Coverage areasSource of data
Chinese50News, law, science and literaturePublic publications, government documents
Thai12Social media, news, travelProvided by network crawling and cooperative institutions
Vietnamese10Economics, cultures, educationAcademic papers, news media
Malay8Business, law, daily conversationsCorporate cooperation, translation agency
Indonesian8News, social media, film and televisionPublic data sets, network crawling

Corpus application scenarios

The construction of this corpus provides basic support for applications in multiple fields, mainly including:

1.Machine Translation: Through high-quality multilingual parallel corpus, the institute has trained a translation model that supports language pairs such as Chinese-English, Chinese-Thailand, and Chinese-Vietnam, and the translation accuracy is significantly improved.

2.Voice recognition: The voice data in the corpus provides training materials for the speech recognition systems of ASEAN countries, helping to develop applications such as intelligent voice assistants and customer service systems.

3.Cross-language information retrieval: Users can search related content in ASEAN languages ​​through Chinese keywords, which greatly facilitates academic research and commercial information acquisition.

4.Cultural communication and research: The literature, film and television content in the corpus provides cultural scholars with rich analytical materials and promote cultural exchanges between China and ASEAN countries.

Future planning

The Institute said that the scale and language types of the corpus will be further expanded in the future, and plans to include more ASEAN small languages ​​such as Burmese and Cambodian. At the same time, the institute will cooperate with academic institutions and enterprises in ASEAN countries to promote the open sharing of corpus and contribute to global language intelligence research.

The construction of this multilingual corpus is not only an important achievement of the China ASEAN Institute of Language Intelligence, but also provides strong support for language interoperability and technical cooperation under the "Belt and Road" initiative. With the continuous advancement of artificial intelligence technology, the application prospects of multilingual corpus will be broader.

Next article
  • How to rename a folder: a guide to recent hot topics and techniques across the webIn the digital age, folder management is an important part of daily work and study. Recently, the topic of "how to rename a folder" has sparked heated discussions on major technology forums and social media. This article will combine the hot discussions on the Internet in the past 10 days to provide you with a detailed guide to renaming
    2025-11-20 science and technology
  • How to cast Honor of Kings screen to computerAs "Honor of Kings" continues to be popular, many players hope to project their mobile phone screens onto their computers to obtain a wider field of view and a smoother operating experience. This article will introduce in detail the screen casting methods that have been hotly discussed on the Internet in the past 10 days, and provide structured data for reference.1. Why do
    2025-11-17 science and technology
  • How to retrieve deleted photos from mobile phoneIn modern life, mobile phone photos record many of our precious moments, but photos are accidentally deleted from time to time. This article will give you a detailed introduction on how to retrieve deleted photos on your phone, and provide you with the hot topics and hot content on the Internet in the past 10 days as a reference.1. How to recover deleted mobile phone ph
    2025-11-14 science and technology
  • How to put CD in ASUS laptopWith the rapid development of the digital age, although the frequency of use of optical discs has gradually decreased, optical drives are still needed in certain scenarios (such as installing systems, playing DVDs, etc.). ASUS notebooks are one of the mainstream brands, and some models still retain the optical drive function. This article will introduce in detail how to correctly insert a
    2025-11-12 science and technology
Recommended articles
Reading rankings
Friendly links
Dividing line