In this section, we apply our manufacturing cost estimation methodology on four datasets from three different industries. We present these real-world problems from least to most complexity according to their sizes in terms of number of numeric and categorical variables and observations. The data was collected from socks, electromagnetic parts, and plastic tools manufacturing factories in Ankara and Konya, Turkey. Mixed numeric and categorical design attributes, cost drivers, or other variables comprise in these datasets. Due to the confidentiality agreements that were signed with these companies, we cannot state any brand names or product codes. Note that these data sets are diverse and representative but do not cover the realm of cost estimation possibilities. Therefore, the results presented herein cannot be assumed to be fully generalizable.
Because of the relative smallness of the data sets, we leverage the data fully. We use leave-one-out cross-validation in our study to validate the performance of the estimation models that are being constructed. An observation is left out to test a cost estimation model that is built or trained with the remaining observations in the dataset. The observation being left out for every replication can be considered as an external test data point since it is not used in the cluster analysis nor model building phases.
For clusters, first, we conduct a cluster analysis and then build cluster specific cost estimation models based on the entire data except the left-out observation. Second, we find the cluster in which the left-out observation falls. Finally, we test the corresponding cluster specific estimation model with the left-out data point. With the same logic, first we build a spline model leaving one product out of the data sample. Second, we evaluate the spline model validity with the left-out observation point.
gives the overall structure of our proposed approaches to manufacturing cost estimation. Next, we describe the case studies and data sets.
5.1. Company and dataset descriptions
We thought it is very important to validate and demonstrate our proposed methods on actual cost estimation data rather than simulated data sets. Actual data can be imprecise and sparse. These are qualities that complicate cost estimation, and our data sets reflect this.
5.1.1. Socks manufacturing data
The first application problem dataset was collected from a socks manufacturer which produces copyrighted and licensed socks for some major brands in Europe and USA. Their range of products consists of sports, casual, and formal/dress socks for women, men, children, and infants. The manufacturing processes include pattern design, knitting, toe seam, washing-softening, pattern printing, final quality control, and packaging. Steam, silicon, and antibacterial washing are the types of washing-softening operations. In the printing department, the company can apply lithographs, holograms, and heat transfer, embroidery, rubber, acrylonitrile butadiene styrene (ABS), and caviar bead prints.
The dataset that we collected from the company’s database contains information for 76 products of women’s and men’s socks. There are nine variables associated with these products, and eight of these variables are qualitative (categorical), namely raw material, pattern, elasticity, woven tag, heel style, leg style, fabric type, and gender. The only quantitative variable measured on a continuous scale in this dataset is the actual cost which is recorded in Turkish Lira (TL) money units. is the summary of the dataset and associated attributes. The columns of the table are variable name, data type, variable type, and categories (for categorical data) or range (for numeric data) from left to right, respectively. For nominal variables, the order of categories is not important since there is no logical transition between categories. However, for ordinal variables, categories represent the order of the labels from the lowest to the highest category in its ordinal scale. For instance, elasticity is an ordinal variable that can take a value from “None” to “Double”. In this case, “None” represents the lowest elasticity level and “Double” represents the highest elasticity level of the sock material.
5.1.2. Electrical grounding parts data – tubular cable lugs
The second application problem dataset was collected from an electromagnetic parts manufacturer which produces lightening protection elements, grounding materials, metal masts for various purposes, and cabins for specific purposes. Steel, copper, stainless steel, aluminum, brass, bronze, cast iron, plastic, and concrete are the primary raw materials used to manufacture these static grounding systems. In the facility, they can coat these materials with electro galvanization, hot deep galvanization, electro copper coating, electro tin coating, electro chromium-nickel (Cr-Ni) coating, black insulation, and green-yellow insulation.
The dataset that we collected from the company’s database contains information for various tubular cable lugs of 68 observations. There are 12 variables associated with these 68 observations, namely lug type, cross-section, hole diameter, number of holes, gap between holes, material weight, process time, inner diameter, outer diameter, coating type, coating time, and the actual cost. Ten of these variables are quantitative attributes and nine of them are recorded on continuous scales. These nine continuous valued variables are cross-section, hole diameter, gap between holes, material weight, process time, inner diameter, outer diameter, coating time, and the actual cost, and their units are recorded in mm2, mm, mm, kg, mm, mm, minutes, and TL, respectively. The remaining one quantitative variable takes integer values. The label of the strictly integer valued quantitative variable is the number of holes, and it does not have any measurement units. There are at most two holes on a lug and the minimum number of holes is zero. DIN, forend, long, standard, and forend standard are the categories of the variable lug type. is the summary of the dataset and its associated attributes.
5.1.3. Lightening protection parts data – air rods
The third application problem dataset was collected from the same electromagnetic parts manufacturer as in the second problem and includes information about 197 air rods for lightening protection purposes. In the dataset, there are 10 variables associated with these 197 observations. Five of these variables take continuous numeric values and the remaining five are categorical labels. The numeric variables are rod diameter, rod length, screw size, material weight, and the actual cost. The values of these variables are measured with these units, respectively: mm, mm, mm, kg, and TLs. The screw size takes a value of zero when there is no screw used, and the actual minimum screw size is 8.5 mm. The categorical variables are screw type, main material, coating, raw material, and screw nut coating. In the summary of the dataset and its associated attributes are shown.
5.1.4. Plastic products data
The last dataset was taken from a plastic parts manufacturer which produces kitchenware, food and non-food storage containers, and salad, pastry, bathroom, and hanger accessories. In this dataset, there are many products with completely different physical shapes. However, we may group them according to their raw material types, manufacturing processes/operations, or some other factors. The dataset covers 51 variables for 130 plastic products. There are ten main categories of variables, raw material, press, vacuum, paint, sticker, wall plug, labor complexity, and actual cost. There are 13 variables under the raw material category where 12 of them are binary and one is numeric. These 12 variables represent the type of raw material such as anti-shock, acrylonitrile butadiene styrene (ABS), poly carbon, and carbon fiber. If a material is used in the main material mixture for a particular product, the value of the underlying material variable takes one, otherwise zero. The only variable measured on a continuous scale is mixture weight under the raw material subject. It is recorded in grams. The second variable category is press, which stands for the pressing process. There are three machine groups in the company that can perform press operations. Tederic, TSP, and Haitian are the names of these machine groups. There are 11, eight, and four different machines under the Tederic, TSP, and Haitian groups, respectively. Every machine corresponds to a variable in the dataset. There can be multiple alternative machines to perform the same operation; however, if a machine is used for any step of production for a particular product, its variable takes a numeric value representing the machining time. If the underlying machine is not used for that product, the value of that machine’s variable takes a value of zero. The next variable category is for the vacuuming process. There are two variables under the vacuum topic: (1) Poly vinyl chloride (PVC) type for the vacuuming process and (2) the number of vacuums required. The PVC type is a categorical variable and the number of vacuums takes discrete numeric values. Under the boxing category, there are seven variables. Six of these variables are numeric variables and one of them is a categorical variable. These variables are number of items in a box, net weight, gross weight, length, width, depth of the box, and the type of the boxing material. Each remaining category corresponds to a single variable. Package, paint material weight, sticker, wall plug, labor complexity, and actual cost are, respectively, binary, numeric, binary, binary, ordinal, and numeric variables. The unit of the paint material weight is grams. Also, the actual cost is recorded in TLs. Furthermore, the labor complexity is tracked according to the complexity of the manufacturing and assembly operations and ranked from 1 (easiest) to 3 (most complex), sequentially. In , the summary of the dataset and its associated attributes are shown.
We termed the application problems dataset 1 (DS 1), dataset 2 (DS 2), dataset 3 (DS 3), and dataset 4 (DS 4) for the socks manufacturing, the tubular cable lugs, the air rods, and the plastic products problem sets, respectively.
5.2. Cluster analysis and the number of clusters
As discussed earlier we used Kaufmann and Rousseeuw’s (2022)
-medoids algorithm as it was implemented in “PAM”. The first target is to determine the appropriate number of clusters. The
-index, the Gamma, and the average silhouette width graphs are the primary tools to choose the appropriate number of clusters. We plotted the values of the underlying indices from 2 to 20 clusters. As expected, the value of Gamma and the average silhouette width increase as the number of clusters increases. The value of the
-index decreases as the number of clusters increases which is consistent with the pattern of the other two indices. The graphs of these three indices with respect to the number of clusters are given in , , , for test cases DS 1 through DS 4, respectively.
Remember that our policy is to seek a consensus among these three graphs. For DS 1, a settlement point of the indices is seven clusters as shown in with the black points where a local trough is observed right before a dramatic jump in the -index. Furthermore, at the point of seven clusters, local peaks can be observed one step before the sudden drops in Gamma and silhouette width trends. For DS 2, the silhouette width does not have any value higher than 0.5. However, a local peak is observed at 11 clusters. When we compare the performance of the other two indices with the silhouette width, 11 is a reasonable value as the appropriate number of clusters. Furthermore, after 11 clusters, the cluster contents become unbalanced where too many observations accumulated in some groups. For DS 3, we picked the point where the silhouette width goes above 0.5 for the first time because a value above 0.5 indicates a robust clustering structure. After 14 clusters, the value of silhouette width stagnates right below the 0.5 line. If we check the consistency of silhouette width with the other two statistics, we can see that 14 clusters are appropriate. For DS 4, the silhouette width never moves higher than 0.5, but there is a sudden drop in the -index value at 10 clusters. When the Gamma index is considered, the value increases slowly to the point at 10 clusters and after that it becomes stable. Combining the information derived from these statistics, we can conclude that 10 is a proper value. There are several other possible points that these indices suggest, but 7, 11, 14, and 10 are the most conspicuous points for DS 1, DS 2, DS 3 and DS 4, respectively, when we monitor these graphs from left to right simultaneously.
shows the number of observations allocated to each cluster using
-medoids based on the chosen number of clusters for each application dataset. When we analyze the individual observations in each cluster, it is easy to see that the categorical variables play an important role in forming the cluster contents. Also, we plotted the minimum (min), maximum (max), and average (mean) actual cost values of products allocated in each cluster in , , , for DS 1 through DS 4, respectively. These graphs are provided to illustrate how actual cost values strongly overlap among clusters for the most cases. It is interesting to observe that the similarity of products does not necessarily follow the same similarity pattern of the actual cost values. Since multiple cost drivers contribute to product cost, there is no single factor determining the cluster contents. The interactions of multiple cost drivers are more influential than a single one for each product.
As discussed earlier we used the R package called “crs” to build spline models in the presence of categorical and numeric design attributes, but none of the continuous predictors came out to be a higher degree than cubic splines considering the cross-validated set of parameters. When the polynomial degree of a predictor is zero, the variable is automatically removed from the spline model due to its irrelevance. We ran the spline model script with both “additive” and “tensor” inputs initially. The results show that using tensor products (that is, including interaction terms) provided slightly more accurate results. For the final input parameter, “knots”, we let the cross-validation decide the best knot placement strategy. See for the complete set of parameters used for the spline models.
As we discussed earlier, we used leave-one-out cross-validation to leverage the data for both validation and model building. Without proper validation, our methodology would not have credibility to be used in a real-life business environment. This validation module is fully integrated in the same R script.
In , we present the performance metrics of each cost estimation approach, termed CLU (clustering), SPL (splines), and REG (regression) for the four test cases, DS 1, DS 2, DS 3, and DS 4. These metrics are the mean absolute relative error (MARE) and the max absolute relative error (Max ARE) over the validated predictions for each product. Notice that SPL does not have error defined for DS 1 in the table since this dataset does not contain any continuous predictors to form a spline basis. The minimum values of MARE and Max ARE are depicted in bold in for each dataset. According to the MARE values, the most accurate cost estimation approach is CLU based on overall performance. However, SPL generates slightly more accurate predictions for DS 3 compared to CLU. Clearly, REG was outperformed by both CLU and SPL. It is a little difficult to decide which cost estimation method is superior between CLU and SPL. SPL is not applicable to the first dataset (DS 1) since there are no continuous predictors to build a spline basis. This is a disadvantage for wholly categorical or qualitative datasets. A second aspect is that SPL was significantly bettered by CLU for DS 2. But, when we consider Max ARE values, SPL did better than CLU for two of the three test cases. Of paramount importance, both CLU and SPL were able to predict the manufacturing cost of products with good accuracy, especially compared to the often-used REG method. shows the performance of the cost estimation methods over the four data application problems in terms of the MARE values given in .
We also evaluated the performance of spline models by setting the maximum polynomial degree to 1 to make a fair comparison between SPL and CLU, and SPL and REG because CLU and REG are basically linear models in our test cases. Furthermore, we removed the interaction terms in the spline models by setting the “basis” input as “additive” to eliminate interaction terms. The performance difference between the default tensor product SPL model and the linear additive SPL model was minimal and these changes did not affect its overall accuracy. The linear additive SPL model still outperformed REG by far. We can conclude that even considering suboptimal spline model parameters, SPL is a better alternative than REG.
We used a paired t-test to evaluate the significance of the mean of the differences in AREs. In , p-values for the paired t-tests on the mean of the differences are given. All cost estimation approaches produce significantly different ARE results than each other at a 95% confidence level. Therefore, we can conclude that there is a clear dominance in the performance of CLU compared to REG and SPL compared to REG. However, for the CLU and SPL pair, we could not conclude if one of them is superior over the other because for only DS 2, CLU demonstrates a clear dominance when MARE values are considered. For DS 3, SPL turns out to be the best approach but very close in performance to CLU. For the last application dataset, DS 4, CLU finds slightly more accurate estimated values than SPL.
We also considered the sensitivity of MARE with respect to the number of clusters for CLU. As expected, MARE decreases as the number of clusters increases and finally it converges to a limit value. The limit MARE values are around 5%, 3%, 4%, and 11% for the test cases DS 1 through DS 4, respectively. shows the change in MARE values when the number of clusters increases for each application dataset. Even though increasing the number of clusters results in more accurate estimates, it might be likely to be over-parameterized which results in a less robust and less dependable model.
We provide the
values for each cost estimation method in . The maximum
(R-sq) value for each data set is in bold to show the best model fit among the three methods. The
values of CLU and REG from the table show that finding a well-suited model for DS 1 is challenging due to lack of relevant continuous predictors in the dataset. Adding more variables to the cost estimation for DS 1 might increase the true explanatory power of the models but unfortunately the dataset was strictly limited to only eight categorical predictors. However, this dataset is atypical as most manufactured products include both numeric and categorical cost drivers. For the other datasets, each cost estimation approach can explain the total variability with a high
. For a better illustration of
values, we plotted the fitted values (predicted cost) by observed values (actual cost) in , , , for DS1 through DS 4, respectively.