Inventi Impact: Computer Networks & Communications

Articles

Inventi:ecnc/102579/25

SNMatch: An Unsupervised Method for Column Semantic-Type Detection Based on Siamese Network

01-Jul-2025 Research 2025 : July-September

Tiezheng Nie, Hanyu Mao, Aolin Liu, Xuliang Wang, Derong Shen, Yue Kou

Column semantic-type detection is a crucial task for data integration and schema matching, particularly when dealing with large volumes of unlabeled tabular data. Existing methods often rely on supervised learning models, which require extensive labeled data. In this paper, we propose SNMatch, an unsupervised approach based on a Siamese network for detecting column semantic types without labeled training data. The novelty of SNMatch lies in its ability to generate the semantic embeddings of columns by considering both format and semantic features and clustering them into semantic types. Unlike traditional methods, which typically rely on keyword matching or supervised classification, SNMatch leverages unsupervised learning to tackle the challenges of column semantic detection in massive datasets with limited labeled examples. We demonstrate that SNMatch significantly outperforms current state-of-the-art techniques in terms of clustering accuracy, especially in handling complex and nested semantic types. Extensive experiments on the MACST and VizNet-Manyeyes datasets validate its effectiveness, achieving superior performance in column semantic-type detection compared to methods such as TF-IDF, FastText, and BERT. The proposed method shows great promise for practical applications in data integration, data cleaning, and automated schema mapping, particularly in scenarios where labeled data are scarce or unavailable. Furthermore, our work builds upon recent advances in neural network-based embeddings and unsupervised learning, contributing to the growing body of research in automatic schema matching and tabular data understanding.

How to Cite this Article
Attribution/ CC Compliant Citation: Nie, Tiezheng, et al. "SNMatch: An Unsupervised Method for Column Semantic-Type Detection Based on Siamese Network." Mathematics 13.4 (2025): 607. https://doi.org/10.3390/math13040607 https://creativecommons.org/licenses/by/4.0/ Some formatting elements, header, footer, logos, dates and pagination were modified while adapting this article.
Download Full Text

Call Us: +4 (800) 888-0008

Inventi Impact: Computer Networks & Communications

Articles

Inventi:ecnc/102579/25

SNMatch: An Unsupervised Method for Column Semantic-Type Detection Based on Siamese Network

How to Cite this Article

Links

Contact Us