| 30 |
leSAX |
Publication
(ICDE 2025)
|
Time series similarity search is a fundamental task across various applications, including classification, motif discovery, and anomaly detection. However, existing iSAX-based index methods, while known for their efficiency, often rely on hand-crafted techniques (e.g., PAA and SAX) for z-normalized time series data. These techniques do not fully exploit the full representation space and pose challenges to indexing. This software implements a novel learned index for facilitating time series similarity search. |
Code |
| 29 |
DARKER |
Publication
(VLDB 2024)
|
Transformer-based models have facilitated numerous applications with superior performance. A key challenge in transformers is the quadratic dependency of its training time complexity on the length of the input sequence. This software implements an efficient transformer with a novel data-driven kernel-based attention mechanism for time series data. |
Code |
| 28 |
DKWS |
Publication
(TKDE 2024)
|
This software implements a novel distributed keyword search framework on graphs. |
Code |
| 27 |
Temporal JSON Keyword Search |
Publication
(SIGMOD 2024)
|
This software implements temporal keyword search features on JSON documents. It showcases the support for temporal features at a modest cost. |
Code |
| 26 |
Prilo |
Publication
(SIGMOD 2023)
|
This software implements a privacy preserving query service for localized graph pattern queries that enables users to privately obtain the query results. |
Code |
| 25 |
Plug-and-Play SQL |
Publication
(ER 2023)
|
This software implements a conceptual model for a database query’s input type. The input type is the shape of the data needed by a query. Pairing a conceptual model with a query creates a plug-and-play query that can be type matched to a database’s schema to determine whether the query can be safely evaluated. The software showcases the portability, ease-of-use, and type safety of plug-and-play queries. |
Code |
| 24 |
IPS |
Publication
(ICDE 22)
|
Time series shapelets (shapelets) are discriminative subsequences that have been recently found both effective and interpretable for solving time series classification problems. However, shapelet discovery is known to be computationally costly. IPS is a solution to address this problem that utilizes the instance profile (IP) to capture the characteristics of shapelets in a robust manner to discover high-quality shapelets efficiently. |
Code |
| 23 |
MIDAS |
Publication
(ACM SIGMOD 2021)
|
This software is built on top of CATAPULT and enables efficient and effective maintenance of canned patterns of a visual graph query interface as the underlying collection of small- or medium-sized data graphs evolve. Specifically, MIDAS adopts a selective maintenance strategy that guarantees progressive gain of coverage of the patterns without sacrificing diversity and cognitive load. |
Download |
| 22 |
SSA |
Publication
(ICDE 2021)
|
This software implements privacy preserving query services for strong simulation queries in the database outsourcing paradigm. In such a paradigm, clients send their queries to a third-party service provider (SP), who has the outsourced large graph data, and the SP computes the query answers. However, as the SP may not always be trusted, the sensitive information of the clients’ queries, importantly, the query structures, should be protected. This software adopts strong simulation as a practical query semantic for this paradigm. |
Download |
| 21 |
ShapeNet |
Publication
(AAAI 2021)
|
This software implements a novel algorithm called ShapeNet, which embeds shapelet candidates from different lengths into the unified space for shapelets selection. The network is trained using our cluster-wise triplet loss, which considers the distance between anchor and multiple positive (negative) samples and the distance among positive (negative) samples. Then, it computes representative and diversified final shapelets rather than directly using all the embeddings for model building to avoid a large fraction of computing non-discriminative shapelet candidates. A classical classifier (e.g., SVM) is then adopted. |
Download |
| 20 |
BSPCover |
Publication
(IEEE TKDE 2022)
|
Time-series shapelets are discriminative subsequences, recently found effective for time series classification (TSC). It is evident that the quality of shapelets is crucial to the accuracy of TSC. However, the majority of research has focused on building accurate models from some shapelet candidates. This software implements a novel efficient shapelets discovery method, called BSPCOVER, to discover a set of high-quality shapelet candidates for model building. |
Download |
| 19 |
PANE |
Publication
(VLDB 2021)
|
Given a graph where each node is associated with a set of attributes, attributed network embedding (ANE) maps each node to a compact vector, which can be used in downstream machine learning tasks. PANE is an effective and scalable approach to ANE computation for massive graphs that achieves state-of-the-art result quality on multiple benchmark datasets, measured by the accuracy of common prediction tasks. |
Download |
| 18 |
AURORA |
Publication
(SIGMOD 2020)
|
AURORA is a plug-and-play visual subgraph query interface (VQI) for a large collection of small- or medium-sized data graphs that constructs the query interface in a data-driven manner. One can simply install it on top of any such graph database and use it to generate data-specific VQI to facilitate top-down and bottom-up visual subgraph query formulation. |
Download |
| 17 |
FERRARI |
Publication
(VLDB J 2020, ICDE 2019)
|
This software implements a novel visual exploratory subgraph search paradigm on a large collection of small- or medium-sized data graphs. A preliminary version of the software was demonstrated in VLDB 2017. |
Download |
| 16 |
G-CARE |
Publication
(SIGMOD 2020)
|
This software realizes the world's first framework for benchmarking graph cardinality estimation techniques for subgraph matching queries. |
Download |
| 15 |
LATTE |
Publication
(SIGMOD 2020)
|
This software is a user-friendly visual interface for constructing Solidity smart contracts. It is targeted for end users who do not have programming skills or background in Solidity. The system can also serve expert users who can generate the initial code using LATTE and then augment it to their need. |
Download |
| 14 |
NRP |
Publication
(VLDB 2020)
|
Homogeneous network embedding (HNE) maps the graph structure in the vicinity of a node to a compact, fixed-dimensional feature vector. This software focuses on HNE for massive graphs, e.g., with billions of edges. On this scale, most existing approaches fail, as they incur either prohibitively high costs, or severely compromised result utility. Our proposed solution, called Node-Reweighted PageRank (NRP), is based on a classic idea of deriving embedding vectors from pairwise personalized PageRank (PPR) values. |
Download |
| 13 |
PPKWS |
Publication
(IEEE ICDE 2020)
|
This software implements a new keyword search framework, called public-private keyword search (PPKWS), on public-private graph models. PPKWS consists of three major steps: partial evaluation, answer refinement, and answer completion. |
Download |
| 12 |
BigIndex |
Publication
(TKDE 2020)
|
This software implements a generic ontology-based indexing framework for keyword search for graphs. |
Download |
| 11 |
FROST |
Publication
(ACM TIST 2020)
|
Facility relocation (FR) problem, which aims to optimize the placement of facilities to accommodate the changes of users’ locations, has a broad spectrum of applications. Despite the significant progress made by existing solutions to the FR problem, they all assume each user is stationary and represented as a single point. Unfortunately, in reality, objects (e.g., people, animals) are mobile. Consequently, these efforts may fail to identify superior solutions to the FR problem. For the first time, this software takes into account movement history of users to address the above limitation. |
Download |
| 10 |
CATAPULT |
Publication
(ACM SIGMOD 2019)
|
This software automatically selects canned patterns for a visual graph query interface designed for a large collection of small- or medium-sized data graphs (e.g., chemical compounds). Given a data graph collection and a pattern budget, it automatically selects the canned patterns to be displayed on a GUI by optimizing coverage, diversity, and cognitive load of the patterns in the underlying data repository. CATAPULT is a core component for realizing plug-and-play visual graph query interfaces. |
Download |
| 9 |
TEA/TEA+ |
Publication
(ACM SIGMOD 2019)
|
This software captures the implementation of two novel local graph clustering algorithms based on Heat Kernel PageRank (HKPR) to address the efficiency and accuracy limitations of existing local clustering techniques. Specifically, these algorithms provide non-trivial theoretical guarantees in relative error of HKPR values and time complexity. The basic idea is to utilize deterministic graph traversal to produce a rough estimation of the exact HKPR vector, and then exploit Monte Carlo random walks to refine the results in an optimized and non-trivial way. |
Download |
| 8 |
PANDA |
Publication
(VLDB J 2017, VLDB 2018)
|
This software implements a novel graph querying paradigm called partial topology-based network search and a query processing system called PANDA to efficiently find top-k matches of a partial topology query (PTQ) in a single machine. A PTQ is a disconnected query graph containing multiple connected query components. PTQs allow an end user to formulate queries without demanding precise information about the complete topology of a query graph. |
Download |
| 7 |
AutoG |
Publication
(VLDB J 2017, VLDB 2016)
|
This software implements a novel framework for subgraph query autocompletion (called AUTOG). Given an initial query q and a user’s preference as input, AUTOG returns ranked query suggestions Q′ as output. Users may choose a query from Q′ and iteratively apply AUTOG to compose their queries. |
Download |
| 6 |
PINOCCHIO |
Publication
(TKDE 2016)
|
The location selection problem, which aims to mine the optimal location from a set of candidates to place a new facility such that a score (i.e., benefit or influence on some given objects) can be maximized, has drawn significant research attention in recent years. State-of-the-art LS techniques assume each object is static and can only be influenced by a single facility. However, in reality, objects (e.g., people, vehicles) are mobile and are influenced by multiple facilities, which prevents classical LS solutions from selecting accurate results. This software takes mobility and probability factors into consideration to address the aforementioned limitations. Specifically, given a set of candidate locations, it aims to mine the optimal location which can influence the most number of moving objects. |
Download |
| 5 |
DUALSIM |
Publication
(SIGMOD 2016)
|
Subgraph enumeration is important for many applications such as subgraph frequencies, network motif discovery, graphlet kernel computation, and studying the evolution of social networks. Recently, efforts to enumerate all subgraphs in a large-scale graph have seemed to enjoy some success by partitioning the data graph and exploiting distributed frameworks such as MapReduce and distributed graph engines. However, we notice that all existing distributed approaches have serious performance problems for subgraph enumeration due to the explosive number of partial results. DUALSIM is a disk-based, single machine parallel subgraph enumeration solution that can handle massive graphs without maintaining exponential numbers of partial results. Specifically, it implements a novel concept of the dual approach for subgraph enumeration, which swaps the roles of the data graph and the query graph. DUALSIM outperforms the state-of-the-art methods by up to orders of magnitude, while they fail for many queries due to explosive intermediate results. |
Download |
| 4 |
Structure-Preserving Query Service |
Publication
(ICDE 2015, TKDE 2015)
|
This software implements the first practical private approach for subgraph query services, asymmetric structure-preserving subgraph query processing, where the data graph is publicly known and the query structure/topology is kept secret. Such query services are useful when the query computation is outsourced to a third-party service provider. |
Download |
| 3 |
ASTERIX |
Publication
(SIGIR 2017, SIGMOD 2013)
|
Existing XML keyword search (XKS) engines primarily suffer from two limitations. First, although the smallest lowest common ancestor (SLCA) algorithm (or a variant, e.g., ELCA) is widely accepted as a meaningful way to identify subtrees containing the query keywords, SLCA typically performs poorly on documents with missing elements, i.e., (sub)elements that are optional, or appear in some instances of an element type but not all. Second, since keyword search can be ambiguous with multiple possible interpretations, it is desirable for an XKS engine to automatically expand the original query by providing a classification of different possible interpretations of the query w.r.t. the original results. However, existing XKS systems do not support such result-based query expansion. ASTERIX is an innovative XKS engine that addresses these limitations. |
Download |
| 2 |
Generalized Subgraph Search |
Publication
(CIKM 12)
|
This software implements a new type of graph queries, which injectively maps its edges to paths of the graphs in a given database, where the length of each path is constrained by a given threshold specified by the weight of the corresponding matching edge. |
Download |
| 1 |
MustBlend |
Publication
(DASFAA 2013, ICDE 09, ICDE 06)
|
MUSTBLEND (MUlti-Source Twig BLENDer) is a novel visual XML querying paradigm where the visual query formulation and processing are interleaved. A key practical feature of MUSTBLEND is its portability as it does not employ any special-purpose storage, indexing, and query cost estimation schemes. |
Download |