前沿技术讲座1:实战中的知识图谱
报告嘉宾:
摘要:知识图谱是一系列结构化数据的处理方法,它涉及知识的提取、表示、存储、检索等诸多技术。从渊源上讲,它是知识表示与推理、数据库、信息检索、自然语言处理等多种技术发展的融合。但传统的知识处理方法,在实际的工程应用,特别是互联网应用中,面临实施成本高、技术周期长、熟悉该类技术的人才缺乏、基础数据不足等诸多现实制约。实战中的知识图谱,需要充分利用成熟的工业技术,不拘泥于特定的工具和方法,特别是不盲目追求标准化、技术的先进性或者新颖性,以实际的业务出发,循序渐进推进工程的实施。在本教程中,我们首先回顾知识图谱从理念到工程逐步落地的发展历程,梳理各技术与知识图谱应用的关系。然后,我们结合工程上具体的实例,来展示知识图谱的核心理念和技术如何在成本约束下实施,包括结构化数据生成,可维护的知识结构,海量知识的数据库管理,和多层次的语义检索等。最后,我们讨论知识图谱在搜索,自动问答等应用中如何与统计/机器学习方法相结合来解决实际问题。
前沿技术讲座2:Testing and Assessing the Quality of Knowledge Graph
报告嘉宾:
报告Slides下载: http://pan.baidu.com/s/1i5ntO8l
摘要:In this tutorial, we will introduce the notions of quality control for constructing and reusing knowledge graphs. Firstly, we will introduce a test driven approach of schema construction for knowledge graphs, by leveraging the ideas of competency questions and test driven software development. We will show some typical patterns of competency questions and illustrate how to use competency questions to construct authoring tests for knowledge graphs. Secondly, we will introduce data quality model, data quality assessment model and methods of data quality assessment. We will illustrate the role of data quality evaluation in big data trading.
前沿技术讲座3:Understanding Short Texts
报告嘉宾:王海勋 博士,王仲远 博士
报告Slides下载: http://pan.baidu.com/s/1kVDEjER
摘要:Billions of short texts are produced every day, in the form of search queries, ad keywords, tags, tweets, messenger conversations, social network posts, etc. Unlike documents, short texts have some unique characteristics which make them difficult to handle. First, short texts, especially search queries, do not always observe the syntax of a written language. This means traditional NLP techniques, such as syntactic parsing, do not always apply to short texts. Second, short texts contain limited context. The majority of search queries contain less than 5 words, and tweets can have no more than 140 characters. Because of the above reasons, short texts give rise to a significant amount of ambiguity, which makes them extremely difficult to handle. On the other hand, many applications, including search engines, ads, automatic question answering, online advertising, recommendation systems, etc., rely on short text understanding. In all these applications, the necessary first step is to transform an input text into a machine-interpretable representation, namely to “understand” the short text. A growing number of approaches leverage external knowledge to address the issue of inadequate contextual information that accompanies the short texts. These approaches can be classified into two categories: Explicit Representation Model (ERM) and Implicit Representation Model (IRM). In this tutorial, we will present a comprehensive overview of short text understanding based on explicit semantics (knowledge graph representation, acquisition, and reasoning) and implicit semantics (embedding and deep learning). Specifically, we will go over various techniques in knowledge acquisition, representation, and inferencing has been proposed for text understanding, and we will describe massive structured and semi-structured data that have been made available in the recent decade that directly or indirectly encode human knowledge, turning the knowledge representation problems into a computational grand challenge with feasible solutions insight.
前沿技术讲座4:知识图谱的摘要和集成
报告嘉宾:
摘要:大数据被认为是继信息化和互联网后整个信息革命的又一次高峰,然而如何将大数据转化为知识依然面临巨大挑战。知识图谱旨在描述真实世界中存在的各种实体或概念及其关系,它是数据语义链接的基石,有助于自然语言理解、数据挖掘等领域的发展。然而知识图谱的大规模、异构性等给基于知识图谱的应用带来了挑战。本次报告将首先介绍知识图谱、链接数据和本体的基础知识,接下来介绍知识图谱的摘要技术,其中重点介绍实体描述摘要、实体关联摘要的最新研究进展,最后介绍知识图谱的集成技术,其中重点介绍本体匹配和实体链接技术的最新研究进展。