3D-Mobster opened this issue on Aug 09, 2023 ยท 28 posts
parkdalegardener posted Thu, 10 August 2023 at 2:08 PM
3D-Mobster posted at 10:00 AM Thu, 10 August 2023 - #4472294
Ok; so a lot here to consider. I'll start with I don't use comfy so their node system I cannot comment upon.I know about Firefly, but haven't tried it though, so my knowledge is a bit limited about it. I have heard good and not-so-good things about it from people that have tried it. But it is still in beta, so eventually they are going to fix it and improve it. But it looks very promising.
I didn't know about the Normalmap-online so have bookmarked it, as it looks very cool.
I haven't tried SDXL yet, as I don't think my graphic card can handle it, it struggles with SD, but does slightly better if I use ComfyUI, but can't really get around the UI on that one. :) I am considering getting a new graphic card, but are waiting to see if some of them don't drop in price, but also I have my suspicions that they are working on graphic cards that are suited more for AI, given how popular it has become and they all pretty much run on GPU and VRAM. So far I don't have issues just using SD, as I mostly use AI for fun and testing purposes. I still think the biggest issue with AI is the lack of control, even with controlNet or when doing training (Which I have had little success with to be honest :D), I think you are somewhat limited and it is often very difficult to get consistent characters, both when it comes to their look, but also clothing etc. But also the lack of prompting detailed/specific scenes is a huge issue for me and to really embrace it as a valid tool for the stuff I like to create that is more important than the high-resolution testing material that SDXL offers, which is probably also why I haven't even tried giving it a go :).
You could probably use a tileable skin texture, as a sub-texture for dealing with potential seams and to use as a base colour that you can then adjust as you say, but again, most texture tools, at least substance painter which is the one I use, have the option to paint across UV tiles, it's very easy to spot seams now with the current tools. I honestly think it is far more difficult to find high-resolution textures of humans, as they often come with lighting information and other "artefacts" that you don't really want in your texture, so AI could be useful for that as well. But maybe Normalmap-online can fix that or at least improve it?
One idea that I think could be done, that would make real AI texturing, is if you were able to throw a 3D model into a program and either based on a description it could generate the texture. One could imagine a spaceship, where all polygroups come with attached prompts, so, for instance, the outer surface could be "spaceship exterior" and the prompt would simply generate a texture based on whatever was written in that. Tools along this line I could imagine that we will see in the future if they managed to allow for better control.
I agree that there are definitely people that are going to be left behind. But I don't think it is only due to slow hardware or adaptation. Because currently or what is changing is that people that work in the industry come with experience, whether they are self-taught or have a degree, there is still a lot of time invested behind each person. But with AI, you not only get an extremely powerful tool, but it also comes with all the experience that it was trained on and can constantly be improved. The other thing is the extreme productivity that they deliver. One of the fun facts for instance within animation, you often see characters with only 4 fingers, because it saves a lot of time over thousands of frames not having to animate all of them, especially if we are talking 2D hand drawings. But assume that the AI were better at drawing hands than it is, you suddenly have an "animator" that doesn't care about such a thing and what details are needed, which a human workforce might simply not be able to compete with.
So the AI, depending on how it develops, especially if the control gets to a point where you can define characters and it can use these, whether humans can actually compete.
I have two systems here. A 20 series 8 gig and a 30 series 10 gig. There is nothing the 8 gig card cannot touch, including XL generations if I run the refiner weights as a second step. It can be used to train 1.x and 2.x LoRas, TIs, and the like. I cannot train XL on the 8 gig card.
The 10 gig card can load the whole XL pipeline as a single text to image operation the way it supposed to be used according to the research papers.
Adobe's in/out painting could be a bit better it is true if you are comparing it's output to that of other online generators who use custom blends and trainings based on 1.x. Adobe, at least; is trying to separate itself from the ethics of current data sets used in training. The censorship and training upon user created imagery opens a different debate.
Controlnet is a gift from the gods when used in conjunction with any 3D application and a miracle machine even on 2d images. It too can output a normal or depth map without using an outside application like normal map online and implement them during generation. Segmentation is what selective matte making for free looks like. It is the method of doing your spaceship. Run the segmentation net and it's all point and click. For instance you can click on a tailfin and the ship body they become the same material in that specific situation. Perfect mattes that you can add and subtract into one or more material regions. Not prime time yet for 3d models until someone gets better backside generation and I don't see that happening soon come.
Pose estimation gives fantastic results when used with a simple doll posed in your 3d software of choice. Controlnet can grab the pose from your doll and apply it to whatever figure type you may have in mind as ai generated output. For instance pose Andy in Poser or any other figure in your software of choice and kick out an image. Put it in controlnet while asking for a bunny dressed as a viking warrior and you get a bunny dressed as a viking warrior in the pose controlnet extracted from the Andy image. Works for facial expressions and hands too. I am waiting for similar control weights for SDXL.
The VRAM issue may be driver specific. Newer graphic drivers and every new "game ready" driver offload VRAM to system RAM due to the need to load a large number of 4k+ texture files and video information at the load of newer games. Offloading VRAM for texture files is no big deal. Offloading VRAM for ML tasks decreases the ability to actually run some tasks or to perform them in a "reasonable" amount of time. Older studio drivers did not do this.
The amount of control one gets is actually very good. Again; it is far from perfect, but it is very good. That is where the need for artistic types remain alongside the technicians. In a large professional environment the ability to rapidly output product does not equate to a job well done. One must realize that while rapid, ai is not exactly smart. If I tell my art department that I need product placement in a particular setting ai cannot do it. It can put "a" product into a scene but not "your" product into a scene if all you are doing is typing a phrase into a prompt box.
Send the product to the art department and have the photogs snap a pic or ten. Those pics are uploaded to the tech crew as they are being taken real time. Ten minutes and you have training data. A decent 40 series card with a whack of VRAM and you can output a LoRa for your product in as little as a half hour. At this point you can get a reasonable output image of "your" product placement in the setting required as quickly as you can type in a prompt. So an hour or so of prep work and you can instantly have "your" product placed wherever. The question remains though. Does typing your phrase into the prompt box make you a good image now? Probably. Will it sell your product or put forward your idea? Probably not. That is where the "artistic" type people come in.
I see the need to have reasonable tech folk that can train the weights (models) and ethically acquire the training data, from in-house or elsewhere to be one part of the team; while one still needs artistic folk to get the most out of the tools to provide the return companies need to survive. A thousand monkeys typing on a thousand typewriters might produce a reasonable Netflix pilot in the long run, but you get a lot of drivel while you wait. In a professional environment that doesn't work. You hire trained monkeys. It's time to become a trained monkey if this is the type of work one does.
What we consider animation has changes a lot. I know. I trained first as a photographer and then as a film maker / special effect artist in the mid 70s. Everything I learned in university is ancient history. I changed and became an animator and watched auto tweening programs like Poser force me to adapt or or die. Well no problem. I'll learn to 3D model. That will keep me relevant I thought. And that is the point. Adapt or die.
If I was a mechanic I would have become obsolete had I not learned about electronic ignition as the carburettor has been replaced by it, and now electric cars will make ICE a thing of the past. Plug in a car and the machine tells you what is wrong with the engine. It then tells you all the steps in order to fix the issue. Button pusher does just that and the problem persists. Now I need my mechanic. The mechanic is the artist that can finess the machine.