# What is Discretization in Machine Learning |  Machine Learning Tutorial | DataMites

https://www.youtube.com/watch?v=UC-3uRGiNBY
Translation: zh-TW

[00:09] hello
  你好

[00:10] hello everyone the concept that we going to
  大家好我們今天要探討的概念是

[00:12] everyone the concept that we going to have a look at today is called
  大家好我們今天要探討的概念叫做

[00:16] have a look at today is called discretization so what is discretization
  今天要探討的叫做離散化，那麼什麼是離散化？

[00:18] discretization so what is discretization why do we do this and how do we do this
  離散化，那麼什麼是離散化？我們為什麼要這麼做？又該如何做？

[00:21] why do we do this and how do we do this is what we going to have a look at today
  這就是我們今天要探討的內容。

[00:23] is what we going to have a look at today right before you understand what
  這就是我們今天要探討的內容。在你們了解

[00:25] right before you understand what discretization is you need to understand
  在你們了解離散化是什麼之前，你們需要了解

[00:27] discretization is you need to understand what the difference between continuous
  離散化是什麼，你們需要了解連續

[00:30] what the difference between continuous and discrete data is right so looking at
  連續和離散數據之間的區別是什麼？那麼，看看

[00:34] and discrete data is right so looking at the definition of continuous data you
  離散數據是什麼？那麼，看看連續數據的定義，你們

[00:35] the definition of continuous data you can say so these are variables that can
  連續數據的定義，你們可以說，這些變量可以

[00:39] can say so these are variables that can take up infinite number of possible
  可以說，這些變量可以在給定範圍內取無限多個可能的值。

[00:42] take up infinite number of possible values within a given range right what I
  在給定範圍內取無限多個可能的值。我的意思是

[00:44] values within a given range right what I mean to say is that they are not
  我的意思是，它們不受限於整數或整數。

[00:45] mean to say is that they are not restricted by being an integer or whole
  它們不受限於整數或整數。

[00:48] restricted by being an integer or whole numbers or whatever they can take up
  整數或任何東西，它們可以理論上取無限多的選項。

[00:51] numbers or whatever they can take up theoretically speaking infinite number
  理論上來說，有無限多的選項。例如，想想

[00:53] theoretically speaking infinite number of options for example think about
  例如，想想

[00:56] of options for example think about something like
  選項。例如，想想類似

[00:58] something like height right
  身高之類的東西，對吧？

[01:00] height right someone can be 160 cm or they can be
  高度，對，一個人可以是 160 公分，或者他們可以是

[01:04] someone can be 160 cm or they can be 160.5 CM if you are very precise with
  一個人可以是 160 公分，或者他們可以是 160.5 公分，如果你非常精確地測量

[01:07] 160.5 CM if you are very precise with the measuring right they can be some
  160.5 公分，如果你非常精確地測量，對，他們可以是

[01:11] the measuring right they can be some 61.6 CM or whatever I believe you get
  測量，對，他們可以是 61.6 公分或 whatever，我相信你明白了

[01:15] 61.6 CM or whatever I believe you get the point right this is a feature that
  61.6 公分或 whatever，我相信你明白了重點，對，這是一個特徵，可以

[01:18] the point right this is a feature that can practically speaking take up
  重點，對，這是一個特徵，實際上可以佔用

[01:22] can practically speaking take up infinite number of options right same
  實際上可以佔用無限數量的選項，對，同樣

[01:25] infinite number of options right same goes with
  無限數量的選項，對，同樣適用於

[01:26] goes with weight as well depending on how you're
  體重，這也取決於你如何

[01:29] weight as well depending on how you're measuring it
  體重，這也取決於你如何測量它

[01:30] measuring it this also can take up infinite number of
  測量它，這也可以佔用無限數量的

[01:33] this also can take up infinite number of options right temperature time all of
  這也可以佔用無限數量的選項，對，溫度、時間，所有這些都是

[01:36] options right temperature time all of these are examples of continuous
  選項，對，溫度、時間，所有這些都是連續數據的例子

[01:38] these are examples of continuous data so if that is the case what do you
  這些是連續數據的例子，所以如果情況是這樣，你認為什麼是

[01:41] data so if that is the case what do you think is discrete
  數據，所以如果情況是這樣，你認為什麼是離散數據

[01:43] think is discrete data the discrete data going by the
  離散數據，根據

[01:47] data the discrete data going by the definition of continuous we'll
  數據，離散數據，根據連續的定義，我們將

[01:48] definition of continuous we'll understand that these are the kind of
  連續的定義，我們將理解這些是那種

[01:50] understand that these are the kind of variables which can take up only a
  理解這些是那種變數，它們只能取

[01:53] variables which can take up only a finite set of values right think of
  變數，它們只能取有限的數值集合，對，想想

[01:57] finite set of values right think of something like the number that comes up
  有限的數值集合，對，想想像擲骰子時出現的數字

[02:00] something like the number that comes up when you roll a dice right they can be
  擲骰子時出現的數字，對，它們可以是

[02:02] when you roll a dice right they can be only 1 2 3 4 5 6 right
  當你擲骰子時，正確，它只能是 1 2 3 4 5 6，正確。

[02:08] if you are for example categorizing someone as an adult a minor or a senior citizen or whatever this also is me limiting the options that this variable can take up right
  如果你是，例如，將某人歸類為成年人、未成年人或老年人，或任何其他，這也是我限制該變數可以採取的選項，正確。

[02:26] so discrete data is something that that is limited by the options that it can take up that is the difference between continuous and discrete data so
  所以離散數據是某種受其可採用的選項限制的東西，這就是連續數據和離散數據之間的區別，所以

[02:41] discretization is the process of converting a continuous data to a discrete data
  離散化是將連續數據轉換為離散數據的過程。

[03:03] Right, why do we do this?
  對，我們為什麼要這樣做？

[03:10] There are a different range of reasons why discretization is important.
  離散化之所以重要的原因有很多種。

[03:13] There are certain algorithms which work better with discrete data than it does with continuous data.
  某些演算法在處理離散資料時，比處理連續資料時效果更好。

[03:20] For example, decision trees, if you have learned about decision trees, you know that it makes splits based on certain conditions, right?
  例如，決策樹，如果你學過決策樹，你就會知道它是根據某些條件進行分割的，對吧？

[03:30] So if it is making splits based on a continuous variable, right, it would have to, for example, let's say I have something like age, right?
  所以，如果它是根據連續變數進行分割，對吧，它就必須，例如，假設我的年齡是，對吧？

[03:39] I have 22, 23, 24, 25 and so on.
  我有 22、23、24、25 等等。

[03:46] So in order to figure out what is the best split, my decision tree algorithm will have to create these many splits and compare the, you know, Information Gain or Genie index or whatever is the uh.
  所以，為了找出最佳分割點，我的決策樹演算法必須進行這麼多次分割，並比較，你知道的，資訊增益或基尼指數或任何其他指標。

[04:04] Genie index or whatever is the uh parameter over there right on the other hand.
  精靈指數或任何東西是那邊的呃參數，另一方面。

[04:06] If I was creating this to be a hand.
  如果我創建這個是為了成為一個手。

[04:10] If I was creating this to be a discrete kind of a discrete kind of a data right.
  如果我創建這個是為了成為一種離散的、離散的數據對嗎。

[04:11] It could be something like adults.
  它可以是像成年人這樣的東西。

[04:17] Adults miners and senior.
  成年人礦工和老年人。

[04:19] Miners and senior citizens in this case the number of.
  礦工和老年人在這種情況下的數量。

[04:22] Citizens in this case the number of splits and the computation complexity of.
  公民在這種情況下的分割數量和計算複雜度。

[04:24] Splits and the computation complexity of this algorithm is greatly reduced used.
  分割和該算法的計算複雜度大大降低了。

[04:28] This algorithm is greatly reduced used right.
  該算法大大降低了使用權。

[04:30] it only needs to make splits based on these categories or discrete.
  它只需要基於這些類別或離散的進行分割。

[04:32] On these categories or discrete categories over here right.
  在這些類別或這裡的離散類別上，對嗎。

[04:34] So that is one reason there are certain algorithms.
  所以這是一個原因，存在某些算法。

[04:37] One reason there are certain algorithms especially certain algorithms like.
  一個原因是存在某些算法，特別是某些算法，例如。

[04:40] Especially certain algorithms like decision trees which works better with.
  特別是像決策樹這樣的某些算法，它與...配合得更好。

[04:42] Decision trees which works better with discrete data than it does with.
  決策樹與離散數據的配合比與...配合得更好。

[04:45] Discrete data than it does with continuous data right.
  離散數據比連續數據對嗎。

[04:46] That results in faster computation as well right.
  這也導致了更快的計算，對嗎。

[04:49] And also better interpretability right.
  而且可解釋性也更好，對嗎。

[04:52] So the moment I have let's say.
  所以當我擁有，比如說。

[04:55] Have let's say a age feature just like before and I.
  比如說有一個像以前一樣的年齡特徵，然後我。

[04:58] And I have a set of Ages over here if I.
  然後我有一組年齡在這裡，如果我。

[05:06] have a set of Ages over here if I categorize that into this kind of
  這裡有一組年齡，如果我將其歸類為這種

[05:08] categorize that into this kind of categories and look at the frequencies
  將其歸類為這種類別並查看頻率

[05:11] categories and look at the frequencies for example right I might be able to see
  類別並查看頻率，例如，我可能會看到

[05:14] for example right I might be able to see that I have about 10 adults and five
  例如，我可能會看到我有大約 10 名成年人和 5 名

[05:17] that I have about 10 adults and five minors and 10 senior citizens or
  我有大約 10 名成年人、5 名未成年人和 10 名老年人，或者

[05:19] minors and 10 senior citizens or whatever this also improves the
  未成年人和 10 名老年人，或者無論如何，這也提高了

[05:22] whatever this also improves the interpretability of the data right
  無論如何，這也提高了數據的可解釋性，對吧？

[05:25] interpretability of the data right another thing is dealing with the noise
  數據的可解釋性，對吧？另一件事是處理噪聲

[05:29] another thing is dealing with the noise or the outliers the moment you convert a
  另一件事是處理噪聲或異常值，當您將

[05:31] or the outliers the moment you convert a continuous data into a categorical data
  或異常值，當您將連續數據轉換為分類數據時

[05:34] continuous data into a categorical data it greatly reduces the effect of
  連續數據轉換為分類數據，它極大地減輕了

[05:37] it greatly reduces the effect of outliers over here for example you might
  它極大地減輕了這裡異常值的影響，例如您可能有

[05:40] outliers over here for example you might have temperatures in the range let's say
  異常值，例如您可能有溫度範圍，假設是

[05:43] have temperatures in the range let's say 0 10 15 and for some reason you have a
  溫度範圍，假設是 0、10、15，並且由於某種原因，您有一個

[05:48] 0 10 15 and for some reason you have a outlier over here which is about let say
  0、10、15，並且由於某種原因，您這裡有一個異常值，大約是

[05:50] outlier over here which is about let say 60° C or something right if you are
  異常值，大約是 60° C 或類似的溫度，對吧？如果您正在

[05:54] 60° C or something right if you are converting this kind of a feature to a
  60° C 或類似的溫度，對吧？如果您正在將這種特徵轉換為

[05:56] converting this kind of a feature to a category or a discrete kind of a feature
  將這種特徵轉換為類別或離散類型的特徵

[05:59] category or a discrete kind of a feature you might might have categories like low
  類別或離散類型的特徵，您可能有低溫、中溫、高溫等類別

[06:02] you might might have categories like low temperature medium temperature high
  您可能有低溫、中溫、高溫等類別

[06:06] temperature medium temperature high temperature right and then let's say you
  溫、中溫、高溫，對吧？然後假設您

[06:10] temperature right and then let's say you might say that 0 and turn 10 over here.
  溫度是正確的，然後假設您可能會說 0，然後在這裡轉到 10。

[06:12] might say that 0 and turn 10 over here comes into the low temperature category.
  可能會說 0，然後在這裡轉到 10，這屬於低溫類別。

[06:15] comes into the low temperature category 15 comes into the medium category and 60.
  屬於低溫類別，15 屬於中溫類別，60 屬於...

[06:18] 15 comes into the medium category and 60 and anything above is going to come into.
  15 屬於中溫類別，60 及以上都將進入...

[06:20] and anything above is going to come into this High category right so that is the.
  任何高於此的都將進入這個高溫類別，對，那就是...

[06:24] this High category right so that is the effect of this outlier have you been uh.
  這個高溫類別，對，那就是這個異常值的影響，您是否...

[06:27] effect of this outlier have you been uh treating this as a continuous data would.
  這個異常值的影響，您是否將其視為連續數據，會...

[06:29] treating this as a continuous data would have.
  將其視為連續數據，將會...

[06:30] have been quite evident right it would have.
  相當明顯，對，它會...

[06:33] been quite evident right it would have actually uh created biases in your.
  相當明顯，對，它實際上會對您的...產生偏差。

[06:36] actually uh created biases in your machine learning algorithms as well.
  實際上會對您的機器學習演算法也產生偏差。

[06:39] machine learning algorithms as well right so that problem is dealt quite.
  機器學習演算法也產生偏差，對，所以這個問題處理得相當...

[06:41] right so that problem is dealt quite nicely when you are converting this to a.
  對，所以當您將其轉換為...時，這個問題處理得相當不錯。

[06:44] nicely when you are converting this to a category right and there are many more.
  不錯，當您將其轉換為一個類別時，對，還有很多...

[06:47] category right and there are many more reasons as well depending on the context.
  類別，對，還有很多原因，取決於上下文。

[06:49] reasons as well depending on the context depending on the problem that you're.
  原因，取決於上下文，取決於您正在面臨的問題。

[06:50] depending on the problem that you're facing discretization might help you in.
  取決於您正在面臨的問題，離散化可能會在這方面幫助您。

[06:53] facing discretization might help you in this manner so how do we do.
  面臨，離散化可能會在這方面幫助您，那麼我們該如何進行...

[06:56] this manner so how do we do discretization so this is not an.
  這種方式，那麼我們該如何進行離散化，所以這不是一個...

[06:58] discretization so this is not an exhaustive list.
  離散化，所以這不是一個詳盡的列表。

[07:00] exhaustive list right but these are the few most common.
  詳盡的列表，對，但這些是最常見的幾種。

[07:03] right but these are the few most common types or popular types of discretization.
  對，但這些是最常見的幾種類型或流行的離散化類型。

[07:05] types or popular types of discretization methods starting off with something.
  類型或流行的離散化方法，從某種東西開始。

[07:07] methods starting off with something called equal width binning okay so let.
  方法，從稱為等寬分箱的方法開始，好的，那麼...

[07:11] called equal width binning okay so let me put up a sample data over here of.
  稱為等寬分箱，好的，所以讓我這裡放一個樣本數據，關於。

[07:15] me put up a sample data over here of Ages let's say okay I have someone who is 20 21 22 25 I have someone is 30 or.
  讓我這裡放一個樣本數據，關於年齡，假設，好的，我有一個人是 20、21、22、25，我有一個人是 30 或。

[07:20] is 20 21 22 25 I have someone is 30 or 40 and then I have a couple of 50 55 and.
  是 20、21、22、25，我有一個人是 30 或 40，然後我有幾個是 50、55 和。

[07:26] 40 and then I have a couple of 50 55 and 60 and then I have.
  40，然後我有幾個是 50、55 和 60，然後我有。

[07:31] 60 and then I have these people as well okay so what equal.
  60，然後我也有這些人，好的，所以什麼是等。

[07:34] these people as well okay so what equal width binning does for you as the name.
  這些人，好的，所以等寬分箱對你來說有什麼作用，正如名稱。

[07:39] width binning does for you as the name suggests you are going to create bins of.
  寬分箱對你來說有什麼作用，正如名稱所暗示的，你將創建等寬的箱子。

[07:42] suggests you are going to create bins of equal width what I mean by that is let's.
  暗示的，你將創建等寬的箱子，我的意思是，讓我們。

[07:46] equal width what I mean by that is let's say my first bin is starting from 20 and.
  等寬，我的意思是，讓我們假設我的第一個箱子從 20 開始，並且。

[07:50] say my first bin is starting from 20 and going all the way to 30 okay so this is.
  說我的第一個箱子從 20 開始，一直到 30，好的，所以這是。

[07:53] going all the way to 30 okay so this is my first bin.
  一直到 30，好的，所以這是我的第一個箱子。

[07:57] my first bin right and let's say 31 to 40 is another.
  我的第一個箱子，對，然後讓我們說 31 到 40 是另一個。

[08:00] bin 41 to 50 is another.
  箱子，41 到 50 是另一個。

[08:11] bin 41 to 50 is another bin 50 to 60 60 to 70 70 to 80 and 80 to bin 50 to 60 60 to 70 70 to 80 and 80 to 90 right you can also alter the width
  41到50是另一個分組，50到60，60到70，70到80，80到50到60，60到70，70到80，80到90，右邊，你也可以改變寬度

[08:23] 90 right you can also alter the width over here as well you can create a width over here as well you can create a width of 20 for example I can say 20 to 40
  90，右邊，你也可以在這裡改變寬度，你也可以在這裡創建一個寬度，例如20，我可以說20到40

[08:30] of 20 for example I can say 20 to 40 right I can say something like 41 to
  例如20，我可以說20到40，右邊，我可以說類似41到

[08:36] right I can say something like 41 to 60 you can say something like 61 to
  右邊，我可以說類似41到60，你可以說類似61到

[08:43] 80 81 to
  80，81到

[08:47] 80 81 to 100 right so I have these four pinnings
  80，81到100，右邊，所以我有了這四個分組

[08:51] 100 right so I have these four pinnings you can alter the width of the bin
  100，右邊，所以我有了這四個分組，你可以改變分組的寬度

[08:54] you can alter the width of the bin according to your requirement right if
  你可以根據你的需求改變分組的寬度，右邊，如果

[08:57] according to your requirement right if you want to have a look at a more
  根據你的需求，右邊，如果你想看一個更

[09:00] you want to have a look at a more granular sense of data then you might
  你想看一個更細粒度的數據，那麼你可能

[09:02] granular sense of data then you might want to increase the number of bins
  細粒度的數據，那麼你可能想增加分組的數量

[09:05] want to increase the number of bins right but let's say this is the bin that
  想增加分組的數量，右邊，但是讓我們說這是我們正在處理的分組

[09:07] right but let's say this is the bin that we are working with right so what
  右邊，但是讓我們說這是我們正在處理的分組，右邊，所以

[09:09] we are working with right so what happens is anyone who is from 20 to 40
  我們正在處理，右邊，所以發生的事情是任何從20到40的人

[09:13] happens is anyone who is from 20 to 40 years of age is going to be put into
  發生的事情是任何20到40歲的人都會被放入

[09:16] years of age is going to be put into this bin right so we have 1 2 3
  歲的人都會被放入這個箱子裡，所以我們有1、2、3

[09:20] this bin right so we have 1 2 3 4 5 and six
  這個箱子裡，所以我們有1、2、3、4、5和6

[09:25] 4 5 and six right so we have six members in this and
  4、5和6，所以我們有六個成員在這其中，而且

[09:28] right so we have six members in this and from 41 to 60 range I have 1 2 3
  對，所以我們有六個成員在這其中，而且從41到60歲的範圍我有1、2、3

[09:32] from 41 to 60 range I have 1 2 3 4 over here 61 to
  從41到60歲的範圍我有1、2、3、4在這裡，61到

[09:38] 80 um actually I have only three over
  80，嗯，實際上我只有三個在

[09:40] 80 um actually I have only three over here not four 61 to 80 I have just one
  80，嗯，實際上我只有三個在這裡，不是四個，61到80我只有一個

[09:44] here not four 61 to 80 I have just one and uh 81 to 100 I have
  在這裡，不是四個，61到80我只有一個，還有，嗯，81到100我有的

[09:49] and uh 81 to 100 I have three right this is what equal width
  還有，嗯，81到100我有的三個，對，這就是等寬

[09:52] three right this is what equal width binning is going to look like you are
  三個，對，這就是等寬分箱會是什麼樣子，你正在

[09:55] binning is going to look like you are creating bins which are of equal width
  分箱會是什麼樣子，你正在創建寬度相等的箱子

[09:58] creating bins which are of equal width and then you are assigning
  創建寬度相等的箱子，然後你正在分配

[10:00] and then you are assigning the bins to each of
  然後你正在分配箱子給每一個

[10:02] the bins to each of the continuous data over here right so
  箱子給這裡的每一個連續數據，對，所以

[10:06] the continuous data over here right so that is one way of creating discrete
  這裡的連續數據，對，所以這是一種創建離散

[10:09] that is one way of creating discrete feature out of your continuous feature
  這是一種創建離散特徵的方法，從你的連續特徵中

[10:12] feature out of your continuous feature something else is called equal frequency
  特徵，從你的連續特徵中，還有別的東西叫做等頻

[10:14] something else is called equal frequency binning right so one issue that you can see with equal width binning is that the frequency of the members in each bin is quite different right we have five over here three over here 1 three so equal frequency bining is focused on one thing that it is going to create bins which has equal number of members let is say over here I have 1 2 3 4 5 6 7 8 9 10 11 12 I have 12 members over here let's say I want to create four BS right if this is the case I will first arrange the entire thing into an ascending or descending order right and since this is
  另一種稱為等頻分箱，對吧，所以等寬分箱的一個問題是，每個分箱中的成員頻率差異很大，對吧，我們這裡有五個，這裡有三個，1，三個，所以等頻分箱的重點是它將創建成員數量相等的箱，假設我這裡有 1 2 3 4 5 6 7 8 9 10 11 12，我有 12 個成員，假設我想創建四個 BS，對吧，如果是這樣，我會先將整個東西按升序或降序排列，對吧，而且既然這是

[11:14] descending order right and since this is 12 12 divid 4 is going to give you three
  降序排列，因為這是 12，12 除以 4 等於 3

[11:17] 12 12 divid 4 is going to give you three so each of this bins are supposed to
  12 除以 4 等於 3，所以每個 bin 都應該有

[11:19] so each of this bins are supposed to have three members so the first three
  所以每個 bin 都應該有三個成員，所以前三個

[11:22] have three members so the first three members are going to come over here so
  有三個成員，所以前三個成員將來到這裡，所以

[11:25] members are going to come over here so 20 21 22 so the WID of this particular
  成員將來到這裡，所以 20、21、22，所以這個特定 bin 的寬度是

[11:30] 20 21 22 so the WID of this particular pin becomes 20 all the way to 22 right
  20、21、22，所以這個特定 bin 的寬度是 20 到 22，對嗎？

[11:35] pin becomes 20 all the way to 22 right and the next three members that is 25 30
  bin 從 20 到 22，對嗎？下三個成員是 25、30

[11:40] and the next three members that is 25 30 and
  下三個成員是 25、30 和

[11:43] 40 right so the width of this bin
  40，對嗎？所以這個 bin 的寬度是

[11:47] 40 right so the width of this bin becomes 25 to
  40，對嗎？所以這個 bin 的寬度是 25 到

[11:49] becomes 25 to 40 and then 50 55
  變為 25 到 40，然後是 50、55

[11:55] 66 so the width over here becomes 6050
  66，所以這裡的寬度變為 6050

[11:59] 66 so the width over here becomes 6050 to 66 and finally
  66，所以這裡的寬度變為 6050 到 66，最後是

[12:02] to 66 and finally 90
  到 66，最後是 90

[12:03] 90 97 is going to be
  90、97 將是

[12:06] 97 is going to be this right so over here we have achieved
  97 將是這個，對嗎？所以我們在這裡實現了

[12:10] this right so over here we have achieved one thing that each of the bins have
  這個，對嗎？所以我們在這裡實現了一件事，即每個 bin 都

[12:13] one thing that each of the bins have equal number of members but you would
  有一件事是每個 bin 都有相同數量的成員，但你會

[12:16] equal number of members but you would see we have disruption when it comes to
  成員數量相等，但你會看到當涉及到時，我們會有中斷

[12:19] see we have disruption when it comes to the width of the bin now previously we
  看到當涉及到儲存格寬度時，我們會有中斷，現在以前我們

[12:22] the width of the bin now previously we had equal width in this case we have
  儲存格寬度，現在以前我們有相等的寬度，在這種情況下我們有

[12:24] had equal width in this case we have equal frequency now this is also one of
  相等的寬度，在這種情況下我們有相等的頻率，現在這也是其中一種

[12:27] equal frequency now this is also one of the methods of
  相等的頻率，現在這也是其中一種方法

[12:28] the methods of doing I
  方法，進行 I

[12:30] doing I ization and then you have clustering
  進行 I 化，然後你有聚類

[12:32] ization and then you have clustering based binning as well which uses
  化，然後你也有基於聚類的儲存格劃分，它使用

[12:34] based binning as well which uses Advanced clustering machine learning
  基於儲存格劃分，它使用進階聚類機器學習

[12:36] Advanced clustering machine learning algorithms in order to figure out that
  進階聚類機器學習演算法，以便找出

[12:39] algorithms in order to figure out that which particular category is supposed is
  演算法，以便找出哪個特定類別應該是

[12:41] which particular category is supposed is this particular continuous data supposed
  哪個特定類別應該是這個特定連續數據應該

[12:44] this particular continuous data supposed to fall into right and then create the
  這個特定連續數據應該落入哪個範圍，然後創建

[12:47] to fall into right and then create the bin out of that and also we have the
  落入範圍，然後從中創建儲存格，我們也有

[12:49] bin out of that and also we have the concept
  儲存格，我們也有概念

[12:50] concept of custom billing custom billing is
  自訂計費的概念，自訂計費是

[12:54] of custom billing custom billing is referring to the situation where you as
  自訂計費，自訂計費是指在你作為

[12:58] referring to the situation where you as the researcher you as the data scientist
  作為研究員，作為資料科學家的情況

[13:01] the researcher you as the data scientist owing to your domain knowledge owing to
  研究員，資料科學家，由於你的領域知識，由於

[13:03] owing to your domain knowledge owing to your domain expertise o to your business
  由於你的領域知識，由於你的領域專業知識，由於你的商業

[13:06] your domain expertise o to your business knowledge and everything you are able to
  領域專業知識和所有知識，你能夠

[13:08] knowledge and everything you are able to figure out that what is supposed to be
  知識和所有一切，你能夠找出應該是什麼

[13:10] figure out that what is supposed to be the bin length right so rather than
  找出儲存格長度應該是什麼，所以與其

[13:14] the bin length right so rather than relying upon equal width or relying upon
  儲存格長度，所以與其依賴相等寬度或依賴

[13:17] relying upon equal width or relying upon frequency or whatever you are creating frequency or whatever you are creating your own bins based on your domain your own bins based on your domain knowledge for example I can say that anyone who is 0 to 18 is should be anyone who is 0 to 18 is should be considered as a minor right and then I have anyone who is between the age of 18 to 60 to be an adult and 60 and greater to be a senior citizen right so in this case the width of the binning of the bins is something that I have created customly right so that is also one of the method now mind that this is not an exhaustive list this is not the only set of U binning or a discretization me methods that you have in hand there are more and we will be covering those as well going forward so St stay tuned and uh see you in the next video thank you
  依賴等寬或依賴頻率或任何你創建的、基於你的領域創建的、基於你的領域知識的自己的分箱，例如我可以說，0到18歲的任何人，0到18歲的任何人應被視為未成年人，對吧，然後我有18到60歲之間的人被視為成年人，60歲及以上的人被視為老年公民，對吧，所以在這種情況下，分箱的寬度是我自定義創建的，對吧，所以這也是一種方法，現在請注意，這不是一個詳盡的列表，這不是你手中擁有的唯一一套分箱或離散化方法，還有更多，我們將在接下來的內容中涵蓋，所以請繼續關注，並在下一個視頻中見到你，謝謝。
