1 Star 0 Fork 4

寒水映月 / short_url

forked from philosophy / short_url 
加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
Apache-2.0

从哪里跌倒就从哪里爬起来

短链接系统的设计与实现

目录

1:什么是短链接

2:为什么需要短链接

3:系统的设计与目标

4:系统API的设计

5:数据库设计

6:系统的实现

7:结语

1:什么是短链接

短链接顾名思义,就是一个比较短的链接,我们平时看到的链接可能由于一些其他的问题导致我们的链接比较长,可能长这样:

https://www.baidu.com/s?wd=%E4%B8%8A%E6%B5%B7%E8%87%B4%E5%85%A8%E5%B8%82%E4%BA%BA%E6%B0%91%E7%9A%84%E6%84%9F%E8%B0%A2%E4%BF%A1&tn=baidutop10&rsv_idx=2&ie=utf-8&rsv_pq=abb53d500004deb0&oq=%E7%88%86%E6%BB%A1!%E4%B8%8A%E6%B5%B7%E5%B0%8F%E5%90%83%E5%BA%97%E9%81%87%E8%BF%9E%E5%A4%9C%E6%8A%A5%E5%A4%8D%E6%80%A7%E6%B6%88%E8%B4%B9&rsv_t=46b1xbgK2MjS8qj9hu%2B06QusLsL5QfltAGjtfKUgayO%2BvSC0MajShrNGroi3h1s5eQ&rqid=abb53d500004deb0&rsf=33e47e97990129c1c0bb42a6c85e13b9_1_15_1&rsv_dl=0_right_fyb_pchot_20811&sa=0_right_fyb_pchot_20811

如果我们需要将某个链接发在某个文章或者推广给别人的时候,这么长的链接不仅会给人一种喧宾夺主的感觉,而且影响我们整篇文章的排版和美感,而短链接的出现就是用一个很短的URL来替代这个很长的链接,当用户访问短链接的时候,会重定向到原来的链接。比如缩短之后的链接下面这样:shorturl.at/fnDQS,我们也可能会注意到一些商业推广的短信都会转成一些很短的链接的形式。

2:为什么需要短链接

URL短链接用于为长URL创建较短的别名,我们称这些缩短的别名为“短链接”;当用户点击这些短链接时,会被重定向到原始URL;短链接在显示、打印、发送消息时可节省大量空间。我们也可以基于这些短链接来统计链接访问量等信息。

URL缩写经常用于优化设备之间的链接,跟踪单个链接以分析受众,衡量广告活动的表现,或隐藏关联的原始URL。

如果你以前没有使用过短链接相关的系统,可以尝试使用https://www.shorturl.at创建一个短链接并访问,并花一些时间浏览一下他们的服务提供的各种选项。可以让你更好的理解短链接的应用场景。

3:系统的设计与目标

我们的短链接系统总共需要满足的目标有两方面功能性需求和非功能性需求

1:功能性需求

  • 给定一个URL,我们的服务应该为其生成一个较短且唯一的别名,这叫做短链接,此链接应足够短,以便于复制和粘贴到应用程序中;
  • 当用户访问短链接时,我们的服务应该将他们重定向到原始链接;
  • 用户即使输入的是相同的长链接,生成的短链接也应该是不相同的;
  • 链接可以在指定时间跨度之后过期,用户应该能够指定过期时间;
  • 可以统计短链接的访问次数;

2:非功能性需求

  • 系统必须高可用。
  • URL重定向的延迟情况应该足够小;
  • 短链接应该是不可猜测的。

4:系统API的设计

​ 本着系统接口易用性和扩展性的要求,我们设计接口是应考虑如下几个问题,接口职责单一性,扩展性,安全性,幂等性,可阅读性。结合以上几个问题考虑,设计如下两个API,创建短链接的API和访问短链接的API,根据职责单一性和可阅读性,我们明确的定义的接口的名称,参数列表,返回值等信息,接口调用失败返回错误信息和错误码,而且每一个接口都会有明确的使用文档。由于系统要求即使是相同的长链接,创建短链接也会得到不同的短链接,所以我们创建短链接的接口不做幂等处理,而访问短链接的接口只是做重定向和访问统计功能,天生幂等。在考虑接口的安全性,我们可以在创建短链接的API上加上接口防刷策略等,在访问短链接的API上根据短链接做布隆过滤器,短链接不存在的请求快速失败等。

1:创建短链接

1.1:API

@PostMapping
public String createUrlMapping(@RequestBody CreateUrlMappingCmd createUrlMappingCmd)

1.2:接口参数

 class CreateUrlMappingCmd {
    /**
     * 原始的长链接
     */
    private String originUrl;
}

1.3:接口返回值

成功生成短链接将返回短链接URL;否则,将返回错误代码。

2:访问短链接

2.1:API

@GetMapping("{shortUrl}")
public String redirect(@PathVariable String shortUrl) 

2.2:接口参数

创建短链接接口返回的短链接

2.3:接口返回值

接口重定向到长链接,无返回值

5:数据库设计

由于业务逻辑相对简单,所以数据持久化如果使用关系数据库的话,这里仅使用一张表就可以解决存储问题。所以这里的问题重点是我们要考虑数据量的问题。

5.1:数据表设计

create table url_mapping
(
    id              bigserial                                                           not null primary key,
    short_url       char(6)      default ''                                             not null,
    origin_url      varchar(200) default ''                                             not null,
    start_timestamp timestamp    default current_timestamp::timestamp without time zone not null,
    end_timestamp   timestamp    default current_timestamp::timestamp without time zone not null
);
create unique index url_mapping_short_url_index on url_mapping (short_url);
comment on table url_mapping is 'short url and origin url mapping';
comment on column url_mapping.id is '主键';
comment on column url_mapping.short_url is '短链接';
comment on column url_mapping.origin_url is '原始的链接';
comment on column url_mapping.start_timestamp is '映射有效开始时间';
comment on column url_mapping.end_timestamp is '映射有效结束时间';

5.2:硬盘估计

假设每秒钟200个创建短链接的请求,一条数据存储500字节,那么每年的硬盘使用量为

200 * 60 * 60 * 24 * 30 *12 * 500 / 1024 /1024 / 1024 ≈ 2896GB ≈ 3TB

所以如果使用关系数据库的话我们要考虑分库分表的设计方案比如采用一致性hash算法根据key去路由到不同的数据库去存储,这样有利于我们横向扩展;当然也可以采用mongoDB这样的文档数据库来存储扩展的时候相对容易一些。

5.3:缓存设计

考虑到系统的API响应时间问题,所以会在接口的访问的同时加上一层缓存,降低接口的响应时间,同时也避免请求直接到达数据库,导致数据库压力过大,造成服务不可用。所以为了保证缓存的高可用,我们也会在缓存这一层做集群,保证缓存的高可用。

6:系统的实现

6.1:创建短链接映射的实现

创建短链接的实现主要包括以下几个关键点

1:短链接的映射

2:数据的存储(包括数据存储数据库的路由,缓存的更新等)

3:创建的短链接要放入布隆过滤器,防止访问短链接的时候快速失败

创建短链接的映射有以下两种方式创建短链接,1:通过哈希映射的方式 2:通过Id自增映射的方式,通过Id映射的方式可以有以下几种实现,数据库自增Id、Redis自增Id,UUID。下面会一一介绍这几种实现方式。

6.1.1:哈希映射

​ 一般的哈希映射有MD5、SHA、谷歌Murmur等,由于MD5和SHA的实现考虑到加密性的问题,所以效率不如谷歌的Murmur算法高效,而且谷歌Murmur的离散性会更好一点。但是还会有以下两个问题,1:哈希冲突、2:哈希值作为短链接依然过长。

哈希冲突,既然是哈希算法,就不能避免哈希冲突,所以为了解决HASH冲突的问题我们可以在原来的URL上加上时间戳和随机数的拼接作为盐值来减少HASH冲突的问题,如果在冲突的话我们可以使用重试的方式再次HASH。

短链接过长,一般我们得到的哈希值是一个32为的十六进制数,如果使用这个值作为短链接的话可能还是太长,我们考虑url一般是字母的大小写和数据组合而成的,所以我们可以考虑使用62进制数来解决URL过长的问题,如果短链接的长度为6位的话,那么我们的最大数为62的6次幂。那么系统可以使用的时间为:pow(62,6)/(200 * 60 * 60 * 24 * 30 *12)≈ 9年,如果时间不够的话我们可以扩展到7位,就可以使用566年了,哈哈哈,足够用了。由于篇幅问题这里就不放具体的代码实现了,文章末尾会附上具体的代码实现的完整链接,具体的实现可以参考com.philosophy.web.domain.generate.GenerateShortUrlByHash

6.1.2:Id映射

​ 如果使用Id映射的话我们可以使用一个数字和一个长链接,由于数字特别大的时候可能也会特别长,所以我们依然可以使用上述的62进制树来解决这个问题。但是我们依然要考虑以下几个问题,数据库和Redis的自增Id是连续的,很容易可以猜想到下一个短链接是什么,由于是高可用,所以我们可以在集群的每一个节点中每一次获取一定量的序号(假设每次获取10000个序号,要保证获取和更新数据库的原子性),然后在将这一万个需要乱序,这样就不可以猜想到下一个短链接是什么了,当然如果要使用UUID比如雪花id的实现的话就可以不用考虑这些问题了。自增Id具体实现com.philosophy.web.domain.generate.GenerateShortUrlByIdentity,雪花Id的实现com.philosophy.web.domain.generate.GenerateShortUrlBySnow

6.2:访问短链接的实现

访问短链接的实现就比较简单了,防止恶意访问,这里加上布隆过滤器,具体的功能我们只需要根据短链接获取长链接,然后发布访问事件(可以进行后续的扩展,比如统计URL的访问次数,也可以起到异步解耦和提高接口性能),最后进行请求重定向就可以了,重定向的时候我们要注意一个小点(返回码301和302的区别),由于我们这里要统计接口的访问次数,所以返回302。具体实现com.philosophy.web.controller.VisitController#redirect

6.3:应用启动

加载数据到布隆过滤器

7:结语

系统虽小,五脏俱全,通过对本系统的实现,也学习到了很多新的知识以及一些系统的设计思想;路漫漫其修远兮,吾将上下而求索。

源代码地址:https://gitee.com/philosoxhy/short_url

Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright [yyyy] [name of copyright owner] Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

简介

短链接系统设计 展开 收起
Java
Apache-2.0
取消

发行版

暂无发行版

贡献者

全部

近期动态

加载更多
不能加载更多了
Java
1
https://gitee.com/ithanshui/short_url.git
git@gitee.com:ithanshui/short_url.git
ithanshui
short_url
short_url
master

搜索帮助