MySQL 查找重复数据
简介:在本教程中,您将学习如何在MySQL中查找一个或多个列的重复值。
在我们开始之前
由于许多原因,重复数据库在数据库中会经常发生。查找重复值是使用数据库时必须处理的重要任务之一。 对于演示中,我们将创建一个表命名为contacts四列:id,first_name,last_name,和email。
DROP TABLE IF EXISTS contacts_test;
CREATE TABLE contacts_test (
id INT PRIMARY KEY AUTO_INCREMENT,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
email VARCHAR(255) NOT NULL
);
以下语句将行插入contacts_test表中:
INSERT INTO contacts_test (first_name,last_name,email)
VALUES ('Carine ','Schmitt','carine.schmitt@verizon.net'),
('Jean','King','jean.king@me.com'),
('Peter','Ferguson','peter.ferguson@google.com'),
('Janine ','Labrune','janine.labrune@aol.com'),
('Jonas ','Bergulfsen','jonas.bergulfsen@mac.com'),
('Janine ','Labrune','janine.labrune@aol.com'),
('Susan','Nelson','susan.nelson@comcast.net'),
('Zbyszek ','Piestrzeniewicz','zbyszek.piestrzeniewicz@att.net'),
('Roland','Keitel','roland.keitel@yahoo.com'),
('Julie','Murphy','julie.murphy@yahoo.com'),
('Kwai','Lee','kwai.lee@google.com'),
('Jean','King','jean.king@me.com'),
('Susan','Nelson','susan.nelson@comcast.net'),
('Roland','Keitel','roland.keitel@yahoo.com');
SELECT
*
FROM
contacts_test;
+----+------------+-----------------+---------------------------------+ | id | first_name | last_name | email | +----+------------+-----------------+---------------------------------+ | 1 | Carine | Schmitt | carine.schmitt@verizon.net | | 2 | Jean | King | jean.king@me.com | | 3 | Peter | Ferguson | peter.ferguson@google.com | | 4 | Janine | Labrune | janine.labrune@aol.com | | 5 | Jonas | Bergulfsen | jonas.bergulfsen@mac.com | | 6 | Janine | Labrune | janine.labrune@aol.com | | 7 | Susan | Nelson | susan.nelson@comcast.net | | 8 | Zbyszek | Piestrzeniewicz | zbyszek.piestrzeniewicz@att.net | | 9 | Roland | Keitel | roland.keitel@yahoo.com | | 10 | Julie | Murphy | julie.murphy@yahoo.com | | 11 | Kwai | Lee | kwai.lee@google.com | | 12 | Jean | King | jean.king@me.com | | 13 | Susan | Nelson | susan.nelson@comcast.net | | 14 | Roland | Keitel | roland.keitel@yahoo.com | +----+------------+-----------------+---------------------------------+ 14 rows in set (0.00 sec)在
contacts_test表中,我们有一些行是有重复的值first_name,last_name和email列。让我们学习如何找到它们。
查找一列中的重复值
在基于一列的表中查找重复值,您使用以下语句:SELECT
col,
COUNT(col)
FROM
table_name
GROUP BY col
HAVING COUNT(col) > 1;
如果在表中出现多个值,则认为值是重复的。在这个语句中,我们使用GROUP BY带有COUNT函数的子句来计算指定列(col)的值。HAVING 子句中的条件仅包括值count大于1的行,即重复的行。
您可以使用此查询查找contacts表中包含重复电子邮件的所有行,如下所示:
SELECT
email,
COUNT(email)
FROM
contacts_test
GROUP BY email
HAVING COUNT(email) > 1;
以下显示了查询的输出:
+--------------------------+--------------+ | email | COUNT(email) | +--------------------------+--------------+ | janine.labrune@aol.com | 2 | | jean.king@me.com | 2 | | roland.keitel@yahoo.com | 2 | | susan.nelson@comcast.net | 2 | +--------------------------+--------------+ 4 rows in set (0.00 sec)如您所见,有些行具有相同的电子邮件。
查找多列中重复值
有时,您希望基于多个列而不是一个列找到重复项。在这种情况下,您可以使用以下查询:SELECT
col1, COUNT(col1),
col2, COUNT(col2),
...
FROM
table_name
GROUP BY
col1,
col2, ...
HAVING
(COUNT(col1) > 1) AND
(COUNT(col2) > 1) AND
...
仅当列的组合重复时,行才被视为重复,因此我们AND在HAVING子句中使用了运算符。
例如,要查找的行contacts_test表中重复值first_name,last_name和email列,可以使用下面的查询:
SELECT
first_name, COUNT(first_name),
last_name, COUNT(last_name),
email, COUNT(email)
FROM
contacts_test
GROUP BY
first_name ,
last_name ,
email
HAVING COUNT(first_name) > 1
AND COUNT(last_name) > 1
AND COUNT(email) > 1;
以下说明了查询的输出:
+------------+-------------------+-----------+------------------+--------------------------+--------------+ | first_name | COUNT(first_name) | last_name | COUNT(last_name) | email | COUNT(email) | +------------+-------------------+-----------+------------------+--------------------------+--------------+ | Janine | 2 | Labrune | 2 | janine.labrune@aol.com | 2 | | Jean | 2 | King | 2 | jean.king@me.com | 2 | | Roland | 2 | Keitel | 2 | roland.keitel@yahoo.com | 2 | | Susan | 2 | Nelson | 2 | susan.nelson@comcast.net | 2 | +------------+-------------------+-----------+------------------+--------------------------+--------------+ 4 rows in set (0.00 sec)在本教程中,您学习了如何根据MySQL中一列或多列的值查找重复行。
