已为您找到该学者21条结果 成果回收站
于戈, 吕建华†, 王国仁
软件学报,2003,14(9):1615~1620,-0001,():
-1年11月30日
路径表达式作为XML数据查询语言的核心部分,关于它的计算方法的研究成果已有很多,然而针对路径表达式本身进行优化的研究却相对较少。提出了两种针对路径表达式的优化策略:路径缩短策略和补路径策略,从而提高了XML路径查询效率。路径缩短策略根据XML文档模式信息,将路径表达式查询长度缩短,从而简化查询本身以降低需要的查询代价;而补路径策略则试图使用代价更小的等价路径表达式来替换原始查询。经过对实验数据的分析,这两种优化策略对于绝大多数路径表达式查询可以应用,并可大幅度地改进路径表达式的查询性能。
XML, 路径表达式, 查询处理, 查询代价, 查询优化
-
37浏览
-
0点赞
-
0收藏
-
0分享
-
158下载
-
0
-
引用
于戈, Ge Yu, Xiaoguang Li, Yubin Bao, and Daling Wang
CICLing 2005, LNCS 3406, pp. 593-603, 2005.,-0001,():
-1年11月30日
To evaluate document-to-document relevance is very important to many advanced applications such as IR, text mining and natural language processing. Since it is very hard to define document relevance in a mathematic way on account of users' uncertainty, the concept of topical relevance is widely accepted by most of research fields. It suggests that a document relevance model should explain whether the document representation describes its topical contents and the matching method reveals the topical differences among the documents. However, the current document-to-document relevance models, such as vector space model, string distance, don't put explicitly emphasis on the perspective of topical relevance. This paper exploits a document language model to represent the document topical content and explains why it can reveal the document topics and then establishes two distributional similarity measure based on the document language model to evaluate document-to-document relevance. The experiment on the TREC testing collection is made to compare it with the vector space model, and the results show that the Kullback-Leibler divergence measure with Jelinek-Mercer smoothing outperforms the vector space model significantly.
-
68浏览
-
0点赞
-
0收藏
-
0分享
-
42下载
-
0
-
引用
【期刊论文】MMPClust: A Skew Prevention Algorithm for Model-Based Document Clustering*
于戈, Xiaoguang Li, Ge Yu, and Daling Wang
DASFAA 2005, LNCS 3453, pp. 536-547, 2005.,-0001,():
-1年11月30日
To support very high dimensionality, model-based clustering is an intuitive choice for document clustering. However, the current model-based algorithms are prone to generating the skewed clusters, which influence the quality of clustering seriously. In this paper, the reasons of skew are examined and determined as the inappropriate initial model, the unfitness of cluster model and the interaction between the decentralization of estimation samples and the over-generalized cluster model. This paper proposes a skew prevention document-clustering algorithm (MMPClust), which has two features: (1) a content-based cluster model is used to model the cluster better; (2) at the re-estimation step, a part of documents most relevant to its corresponding class are selected automatically for each cluster as the estimation samples to break this interaction. MMPClust has less restrictions and more applicability in document clustering than the previous methods.
-
47浏览
-
0点赞
-
0收藏
-
0分享
-
136下载
-
0
-
引用
【期刊论文】Efficiently Mapping Integrity Constraints from Relational Database to XML Document1
于戈, Xiaochun Yang, Ge Yu, and Guoren Wang
ADBIS 2001, LNCS 2151, pp. 338-351, 2001.,-0001,():
-1年11月30日
XML is rapidly emerging as the dominant standard for exchanging data on the WWW. Most of application data are stored in relational databases due to its popularity and rich development experiences over it. Therefore, how to provide a proper mapping approach from relational data to XML documents becomes an important topic. Integrity constraints are useful for semantic specification that plays the important roles in relation schema definition. The existing XML schema language does not define general constraints and maintaining method for integrity constraints. So how to use XML to express and maintain integrity constraints especially the advanced integrity constraints, e.g., general constraints of relational data is one of challenge research issues. In this paper, a novel mapping approach is proposed to map relation data to XML document with active nodes, XMLA, and extended DTD with constraints, DTDC. The ability to maintain integrity constraints makes our approach more effective than other approaches.
-
123浏览
-
0点赞
-
0收藏
-
0分享
-
293下载
-
0
-
引用
【期刊论文】An Efficient Iterative Optimization Algorithm for Image Thresholding
于戈, Liju Dong, and Ge Yu
CIS 2004, LNCS 3314, pp. 1079-1085, 2004.,-0001,():
-1年11月30日
Image thresholding is one of the main techniques for image segmentation. It has many applications in pattern recognition, computer vision, and image and video understanding. This paper formulates the thresholding as an optimization problem: finding the best thresholds that minimize a weighted sum-of-squared-error function. A fast iterative optimization algorithm is presented to reach this goal. Our algorithm is compared with a classic, most commonly-used thresholding approach. Both theoretic analysis and experiments show that the two approaches are equivalent. However, our formulation of the problem allows us to develop a much more efficient algorithm, which has more applications, especially in real-time video surveillance and tracking systems.
-
75浏览
-
0点赞
-
0收藏
-
0分享
-
234下载
-
0
-
引用